Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 31 Current »

Use the YAKE API to extract relevant keywords of a text.

Configuration

Example configuration in a processor:

{
  "servers": [
    {
      "host": "website",
      "port": 5000,
      "path": "yake"
    }
  ],
  "name": "Keyword Extraction Processor",
  "active": true,
  "type": "keyword-extraction-processor",
  "sourceField": "source",
  "language": "en",
  "maxNgramSize": 3,
  "minNgramSize": 2,
  "maxNumberOfKeywords": 20,
  "deduplication_algo": "seqm",
  "outputField": "output",
  "id": "efe35dc7-fa16-4787-9362-db23395c96e8"
}

Configuration parameters:

servers.host

(Required, String) The host where is located the YAKE API.

servers.port

(Required, Int) The host port where is located the YAKE API.

servers.path

(Required, String) The host path to call the YAKE API.

sourceField

(Required, String) The specific field from where extract the keywords.

language

(Optional, String) The language of the text to be processed. Default: "en"

maxNgramSize

(Optional, int) Max contiguous sequence of items. Default: 3

minNgramSize

(Optional, int) Min contiguous sequence of items. Default: 2

maxNumberOfKeywords

(Optional, int) Total of keywords to be extracted. Default: 20

deduplication_algo

(Optional, String) Function that evaluates the recordset for duplicate records, with options being "leve", "jaro" and "seqm". Default: "seqm"

outputField

(Required, String) Name of the field to allocate the keywords array.

Additional Reference

YAKE reference

  • No labels