Use the YAKE API to extract relevant keywords of a text.

Configuration

Example configuration in a processor:

{
  "servers": [
    {
      "host": "website",
      "port": 5000,
      "path": "yake"
    }
  ],
  "name": "Keyword Extraction Processor",
  "active": true,
  "type": "keyword-extraction-processor",
  "sourceField": "source",
  "language": "en",
  "maxNgramSize": 3,
  "minNgramSize": 2,
  "maxNumberOfKeywords": 20,
  "deduplication_algo": "seqm",
  "outputField": "output",
  "id": "efe35dc7-fa16-4787-9362-db23395c96e8"
}

Configuration parameters:

servers.host

(Required, String) The host where is located the YAKE API.

servers.port

(Required, Int) The host port where is located the YAKE API.

servers.path

(Required, String) The host path to call the YAKE API.

sourceField

(Required, String) The specific field from where extract the keywords.

language

(Optional, String) The language of the text to be processed. Default: "en"

maxNgramSize

(Optional, int) Max contiguous sequence of items. Default: 3

minNgramSize

(Optional, int) Min contiguous sequence of items. Default: 2

maxNumberOfKeywords

(Optional, int) Total of keywords to be extracted. Default: 20

deduplication_algo

(Optional, String) Function that evaluates the recordset for duplicate records, with options being "leve", "jaro" and "seqm". Default: "seqm"

outputField

(Required, String) Name of the field to allocate the keywords array.

Additional Reference

YAKE reference

Discovery Documentation

Keyword Extraction Processor

Configuration

Additional Reference

Related content