NLP Service Processor

Use the spaCy API to build information extraction or natural language understanding.

Configuration

Example configuration in a processor:

{
  "servers":[
    {
      "host": "website",
      "port": 80
    }
  ],
  "connectTimeout": 1000,
  "readTimeout": 1000,
  "name":"NLP Service Processor",
  "active":true,
  "type":"nlp-service-processor",
  "mode": "deps",
  "sourceField":"source",
  "collapse_punctuation": false,
  "collapse_phrases": true,
  "outputField": "output",
  "id": "efe35dc7-fa16-4787-9362-db23395c96e8"
}

Configuration parameters:

servers.host

(Required, String) The host where the spaCy API is located.

servers.port

(Required, Int) The host port where the spaCy API is located.

mode

(Required, String) The client call for the spaCy API. Options: ent, deps

sourceField

(Required, String) The specific field to be processed.

connectTimeout

(Optional, Int) timeout to connect to the server. Should be expressed in milliseconds. Default 60000 (1m)

readTimeout

(Optional, Int) timeout to read from the server. Should be expressed in milliseconds. Default 60000 (1m)

model

(Optional, String) The model installed on the server. Default: "en"

collapse_punctuation

(Optional, Boolean) Boolean to decide if merge punctuation onto the preceding token. Defaults to false.

collapse_phrases

(Optional, Boolean) Boolean to decide if merge noun chunks and named entities into single tokens. Defaults to false.

outputField

(Optional, String) Name of the field to allocate the result. Default: nlpOutput

Input/Output examples

deps

  • Processor
{
  "servers":[
    {
      "host": "website",
      "port": 80
    }
  ],
  "connectTimeout": 1000,
  "readTimeout": 1000,
  "name":"NLP Service Processor",
  "active":true,
  "type":"nlp-service-processor",
  "mode": "deps",
  "sourceField":"source",
  "collapse_punctuation": 0,
  "collapse_phrases": 1,
  "outputField": "output",
  "id": "efe35dc7-fa16-4787-9362-db23395c96e8"
}
  • Input
{
  "source": "They ate the pizza with anchovies"
}
  • Output
{
  "source": "They ate the pizza with anchovies",
  "output": {
    "arcs": [
      {
        "dir": "left",
        "start": 0,
        "end": 1,
        "label": "nsubj"
      },
      {
        "dir": "right",
        "start": 1,
        "end": 2,
        "label": "dobj"
      },
      {
        "dir": "right",
        "start": 1,
        "end": 3,
        "label": "prep"
      },
      {
        "dir": "right",
        "start": 3,
        "end": 4,
        "label": "pobj"
      },
      {
        "dir": "left",
        "start": 2,
        "end": 3,
        "label": "prep"
      }
    ],
    "words": [
      {
        "tag": "PRP",
        "text": "They"
      },
      {
        "tag": "VBD",
        "text": "ate"
      },
      {
        "tag": "NN",
        "text": "the pizza"
      },
      {
        "tag": "IN",
        "text": "with"
      },
      {
        "tag": "NNS",
        "text": "anchovies"
      }
    ]
  }
}

ent

  • Processor
{
  "servers":[
    {
      "host": "website",
      "port": 80,
      "path": "ent"
    }
  ],
  "name":"NLP Service Processor",
  "active":true,
  "type":"nlp-service-processor",
  "sourceField":"source",
  "outputField": "output",
  "id": "efe35dc7-fa16-4787-9362-db23395c96e8"
}
  • Input
{
  "source": "When Sebastian Thrun started working on self-driving cars at Google in 2007, few people outside of the company took him seriously."
}
  • Output
{
  "source": "When Sebastian Thrun started working on self-driving cars at Google in 2007, few people outside of the company took him seriously.",
  "output": 
[
  {
    "end": 20,
    "start": 5,
    "type": "PERSON"
  },
  {
    "end": 67,
    "start": 61,
    "type": "ORG"
  },
  {
    "end": 75,
    "start": 71,
    "type": "DATE"
  }
]
}

Additional Reference

spaCy reference

©2024 Pureinsights Technology Corporation