Hugging Face Processor

Processor for all the Hugging Face functionality, it can be configured for the following actions:

Summarize

Use the Hugging Face Service API to summarize texts

Note: Due to hugging-face-service, torch library and dependencies not being fully deterministic, results may differ in separate executions even if the input is the same. See Pytorch docs.

Configuration

Example configuration in a processor:

{
  "type": "hugging-face-processor",
  "name": "Summarization Processor",
  "description": "my summarization processor",
  "labels": {},
  "active": true,
  "config": {
    "servers": [
      {
        "host": "localhost",
        "port": 8089
      }
    ],
    "connectTimeout": 60000,
    "readTimeout": 60000,
    "summarizationTextField": "content",
    "summarizationOutputField": "summary",
    "summarizationModel": "sshleifer/distilbart-cnn-12-6",
    "maxLength": 200,
    "minLength": 0,
    "cleanUp": true
  }
}

Configuration parameters:

servers.host

(Required, String) The host where the Hugging Face Service is located.

servers.port

(Required, Integer) The host port where the Hugging Face Service is located.

connectTimeout

(Optional, Integer, Defaults to 60000) The timeout to connect to the server in milliseconds.

readTimeout

(Optional, Integer, Defaults to 60000) The timeout to read from the server in milliseconds.

summarizationTextField

(Required, String) The name of the field with the text to summarize.

summarizationOutputField

(Optional, String, Defaults to 'summary') The name of the field for the generated summary.

summarizationModel

(Optional, String) The Hugging Face model to use. Available models can be checked here. If no model is specified the first available model in the service will be used.

maxLength

(Optional, Integer, Defaults to 200) The maximum length the generated tokens can have.

minLength

(Optional, Integer, Defaults to 0) The minimum length the generated tokens can have.

cleanUp

(Optional, Boolean, Defaults to true) Whether to clean up the potential extra spaces in the result.

Analyze Sentiment

Use the Hugging Face Service API to classify texts into positive, negative, or neutral, according to their perceived emotion.

Note: Due to hugging-face-service, torch library and dependencies not being fully deterministic, results may differ in separate executions even if the input is the same. See Pytorch docs.

Configuration

Example configuration in a processor:

{
  "type": "hugging-face-processor",
  "name": "Sentiment Analysis Processor",
  "description": "my sentiment analysis processor",
  "labels": {},
  "active": true,
  "config": {
    "servers": [
      {
        "host": "localhost",
        "port": 8089
      }
    ],
    "connectTimeout": 60000,
    "readTimeout": 60000,
    "sentimentTextField": "content",
    "sentimentOutputField": "sentiment",
    "sentimentModel": "cardiffnlp/twitter-roberta-base-sentiment-latest",
    "sentimentMinScore": 0.5
  }
}

Configuration parameters:

servers.host

(Required, String) The host where the Hugging Face Service is located.

servers.port

(Required, Integer) The host port where the Hugging Face Service is located.

connectTimeout

(Optional, Integer, Defaults to 60000) The timeout to connect to the server in milliseconds.

readTimeout

(Optional, Integer, Defaults to 60000) The timeout to read from the server in milliseconds.

sentimentTextField

(Required, String) The name of the field with the text to analyze the sentiment.

sentimentOutputField

(Optional, String, Defaults to 'sentiment') The name of the field for the analyzed sentiment.

sentimentModel

(Optional, String) The Hugging Face model to use. Available models can be checked here. If no model is specified the first available model in the service will be used.

sentimentMinScore

(Optional, Float, Defaults to 0.5) The minimum score for a result to be considered.

outputFunction

(Optional, String, Defaults to 'default') The function to apply to the model outputs in order to retrieve the scores. Accepts values: 'sigmoid' 'softmax' 'none'.

Classify

Use the Hugging Face Service API to classify texts into the best fitting label out of a set of candidate labels.

Note: Due to hugging-face-service, torch library and dependencies not being fully deterministic, results may differ in separate executions even if the input is the same. See Pytorch docs.

Configuration

Example configuration in a processor:

{
  "type": "hugging-face-processor",
  "name": "Classification Processor",
  "description": "my classification processor",
  "labels": {},
  "active": true,
  "config": {
    "servers": [
      {
        "host": "localhost",
        "port": 8089
      }
    ],
    "connectTimeout": 60000,
    "readTimeout": 60000,
    "classificationTextField": "content",
    "classificationOutputField": "classification",
    "classificationModel": "facebook/bart-large-mnli",
    "classificationMinScore": 0.75,
    "labels": [],
    "multiLabel": false
  }
}

Configuration parameters:

servers.host

(Required, String) The host where the Hugging Face Service is located.

servers.port

(Required, Integer) The host port where the Hugging Face Service is located.

connectTimeout

(Optional, Integer, Defaults to 60000) The timeout to connect to the server in milliseconds.

readTimeout

(Optional, Integer, Defaults to 60000) The timeout to read from the server in milliseconds.

classificationTextField

(Required, String) The name of the field with the text to classify.

classificationOutputField

(Optional, String, Defaults to 'classification') The name of the field for the classification result.

classificationModel

(Optional, String) The Hugging Face model to use. Available models can be checked here. If no model is specified the first available model in the service will be used.

classificationMinScore

(Optional, Float, Defaults to 0.75) The minimum score for a result to be considered.

labels

(Required, List[String]) The candidate labels to classify the text into.

multiLabel

(Optional, Boolean, Defaults to false) Whether multiple candidate labels can be true.

hypothesisTemplate

(Optional, String, Defaults to This example is [].) The template used to turn each label into an NLI-style hypothesis. The template is required to contain the substring '[]', as it is used to place each label.

©2024 Pureinsights Technology Corporation