Discovery API

Pureinsights Discovery Platform exposes a REST API that can be used by the UI components and can be called directly to configure and access OpenSearch features.

Likewise, the Admin API, the Discovery API can be configured to follow steps, described as processors, within an endpoint defined by the user.

Processor API

Create a processor

POST /admin/processor

Allows you to create a processor for different tasks related to search.

{
    "type": <String>,
    "description": <String>,
    ... Component-based configuration ...
}

Retrieve processors

GET /admin/processor

Returns a list of configured processors within the Discovery API.

Retrieve a processor

GET /admin/processor/<processor ID>

Retrieve processors configuration

Returns configuration for a given processor.

PUT /admin/processor/<processor ID>

Update a processor

Allows you to update a processor for different tasks related to search.

{
    "type": <String>,
    "description": <String>,
    ... Component-based configuration ...
}

Remove a processor

DELETE /admin/processor/<processor ID>

Removes a processor from the Discovery API.

Pre-defined processors

Logger

Log requests running through the Discovery API.

Configuration:

{
    "type": "logger",
    "description": "logger processor",
    <configuration>
}

Configuration parameters:

logMissingHeaders

(Optional, Boolean) If true, headers without values that were specifically asked to be logged are logged. Otherwise, headers without values are ignored. Defaults to true.

logMissingParameters

(Optional, Boolean) If true, query parameters without values that were specifically asked to be logged are logged. Otherwise, query parameters without values are ignored. Defaults to true.

singleValueArray

(Optional, Boolean) If true, single element arrays are converted to a value. Defaults to false.

group

(Optional, Boolean) If true, request body, headers and query parameters are grouped into a single object. Defaults to false.

noHeaders

(Optional, Boolean) If true, request headers are not logged. Defaults to false.

noParameters

(Optional, Boolean) If true, request query parameters are not logged. Defaults to false.

noBody

(Optional, Boolean) If true, request body is not logged. Defaults to false.

headers

(List<String>) Header names to explicitly log when queries are run. Defaults to all.

query

(List<String>) Query parameter names to explicitly log when queries are run. Defaults to all.

body

(List<JsonPointer >) Body fields to explicitly log when queries are run. Defaults to all.

Post

Post requests to a given instance.

Configuration:

{
    "type": "post",
    "description": "post processor",
    <configuration>
}

Configuration parameters:

method

(HTTP Method) HTTP method to use. If not set, it'll default to the incoming method.

defaultUrl

(String) Default URL to send to if nothing is found in the query parameters.

urlParameter

(Optional, String) Query parameter the URL to send to is found in.

passParameters

(Optional, List<String>) Parameters that will be passed though to the URL.

merge

(Optional, Boolean) If true, responses are merged before sending back to the caller. Defaults to true.

credentialsID

(Optional, UUID) The credentials ID for the search engine

connectTimeout

(Optional, Duration) The maximum time in Duration notation it takes to establish a connection to the server. Defaults to 30 seconds.

readTimeout

(Optional, Duration) The maximum time in Duration notation to wait for the server to send a response. Defaults to 10 seconds.

writeTimeout

(Optional, Duration) The maximum time in Duration notation to wait for the client to write the request to the server. Defaults to 10 seconds.

Elastic

Post requests to a given Elasticsearch instance using the official Java API (High level REST client).

Configuration:

{
    "component": "elasticsearch",
    "description": "Elasticsearch processor",
    <configuration>
}

Configuration parameters:

method

(HTTP Method) HTTP method to use when calling Elasticsearch. If not set, it'll default to the incoming method.

endpoint

(String) The endpoint to call Elasticsearch on.

contentType

(Optional, String) The content type to use for the request to ElasticSearch. For instance, on _msearch requests this is application/x-ndjson. Defaults to application/json

credentialsId

(Optional, UUID) The credentials ID for the search engine

connection

(Json) The Elasticsearch connection properties (credentials, scroll, bulk etc).

{
     "servers": List<ConnectionInfo>,
     "auth": AuthenticationCredentials,
     "scroll": {
         "size": Integer,
         "timeout": Duration
     }
}

servers (ConnectionInfo array) The search engine servers

auth (AuthenticationCredentials) The search engine authentication

scroll (Scroll) The configuration for scrolls when querying

Opensearch

Post requests to a given Opensearch instance.

Configuration:

{
    "type": "opensearch",
    "description": "opensearch processor",
    <configuration>
}

Configuration parameters:

method

(HTTP Method) HTTP method to use when calling Opensearch. If not set, it'll default to the incoming method.

endpoint

(String) The endpoint to call Opensearch on.

contentType

(Optional, String) The content type to use for the request to Opensearch. For instance, on _msearch requests this is application/x-ndjson. Defaults to application/json

credentialsId

(Optional, UUID) The credentials ID for the search engine

connection

(Json) The Opensearch connection properties (credentials, scroll, bulk etc).

{
     "servers": List<ConnectionInfo>,
     "auth": AuthenticationCredentials,
     "scroll": {
         "size": Integer,
         "timeout": Duration
     }
}

servers (ConnectionInfo array) The search engine servers

auth (AuthenticationCredentials) The search engine authentication

scroll (Scroll) The configuration for scrolls when querying

OpenSearch Manual Pre-Requisites

On the resources folder of this repository, the opensearch folder contains the necessary templates to properly use OpenSearch with other discovery components.

Featured Snippets

Execute mappings found in fs-mappings.json

 curl -H 'Content-Type: application/json' -X PUT 'http://<host>:<port>/_index_template/featured_snippets' --data-binary "@fs-mappings.json"

Frequently Asked Questions

Execute mappings found in faq-mappings.json

 curl -H 'Content-Type: application/json' -X PUT 'http://<host>:<port>/_index_template/frequently_asked_questions' --data-binary "@faq-mappings.json"

Mongo

Post database commands to a given MongoDB database instance. If array is set to true, all service responses are set in the field "/responses" as an array.

Configuration:

{
    "type": "mongo",
    "description": "MongoDB processor",
    <configuration>
}

Configuration parameters:

database

(Optional, String) The name of the MongoDB database from where the commands will be run

credentialsId

(Optional, UUID) The credentials ID for the search engine

array

(Optional, Boolean) Whether the incoming body contains a field "/requests" with an array of queries. Defaults to false.

fields

(Optional, Json) Defines the values used for interpolating the query.

{
     "collection": String,
     "index": String
}

collection

(Optional, String) The collection name when replacing the value ${mongoCollection} in the query.

index

(Optional, String) The index to use when replacing the value ${mongoIndex} in the query.

connection

(Json) The MongoDB connection properties (auth, url, servers etc). Either servers or url is required. If both are given, the url takes precedence.

{
     "servers": List<ConnectionInfo>,
     "auth": MongoCredential,
     "url": String,
     "tls": Boolean,
     "connection": HttpConnectionProperties
}

servers

(ConnectionInfo array) The search engine servers

url

(String) A MongoDB connection url, optionally containing the database name.

tls

(Boolean) Whether tls should be enabled. Defaults to false unless a Mongo Atlas url (mongo+srv://) is used in which case the default is true

connection

(Optional, HttpConnectionProperties) connectTimeout and readTimeout. Defaults to 10 seconds and 0 respectively.

Facets are a combination of aggregations and filters. This component exists to abstract away the search engine complexities to the end user. Supports disjunctive facets (i.e. selecting a filter value doesn't alter the counts for other existing filters)

Configuration:

{
    "component": "facets",
    "description": "Facets processor",
    <configuration>
}

Configuration parameters:

engine

(Optional, String) The main engine to use (Options are OPENSEARCH, ELASTICSEARCH, or MONGODB_ATLAS). If none given, defaults to OpenSearch.

aggregationsPath

(Optional, String) The path to return the aggregations.

disjunctiveFiltersPath

(Optional, String) The path to return the disjunctive filters.

facetFiltersPath

(Optional, String) The path to return the facet filters of the query.

filtersPath

(Optional, String) The path to return the filters query.

fields

(Map<String, Json>) Map of fields with the data for the query creation. Each field is named at the key.

{
    ...,
    "fields": {
        String: {
            "filter": Json,
            "disjunctive": Boolean
            "type": String,
            "facet": Json,
          }, 
        ...
    }
}

filter

(Optional, Json) The path to return the filters query.

disjunctive

(Optional, Boolean) Whether to create a disjunctive filter of this configuration. Defaults to false.

type

(String) The type of facet (Options are term or range).

facet

(Optional, Json) The facet json. These include the addition of configuration to a field depending on the type chosen.

type=term

{
     ...,
     "filterPathsLow": String[],
     "filterPathsHigh": String[],
}

filterPathsLow

(String[]) The path to use for setting the lt/lte field.

filterPathsHigh

(String[]) The path to use for setting the gt/gte field.

type=range

*For mongo engine, the boundaries range is described as a list on numbers divided by comma in string format, (i.e. in case of [1,2,3] should be "1,2,3")

{
    ...,
    "filterPaths": String[],
}

filterPaths

(String[]) The path to use for setting the query.

Script

Run scripts as part of the search query sent to the Discovery API.

Configuration:

{
    "type": "script",
    "description": "script processor",
    <configuration>
}

Configuration parameters:

language

(String) Language used to write the script. Available languages are groovy, python and javascript. Defaults to groovy.

script

(String) Script code to be executed as part of this processor.

Query Snap

Snaps to facets and their values from the query in the request, by examining the query for values from those facets. Where a match is found, the facet and value will be applied as a filter to the query and the value optionally removed from the main query text.

Configuration:

{
    "type": "snap",
    "description": "snap processor",
    <configuration>
}

Configuration parameters:

queryParameter

(Optional, String) The name of the query parameter that holds the query to snapped to. One of queryParameter or queryBody should be specified.

matcherLoadTimeout

(Optional, Duration) Milliseconds quantity to wait the matchers loading

queryBody

(Optional, JsonPointer) The path in the Json body of the query to snap to. One of queryParameter or queryBody should be specified.

snapTo

(Json) The list of facets that will be downloaded from the search engine and whose values will be snapped to snap to. The minDocCount is optional and it defaults to 1. It filters the minimum document count that match the facets.

{
     "index": String,
     "facets": List<String>,
     "minDocCount": Integer
}

For engine type mongodb_atlas: This engine limits the individual facet results to a configured amount. It is recommended to consider response size limit for Atlas support and performance.

{
     "database": String, 
     "collection": String,
     "facets": List<String>,
     "minDocCount": Integer
}

index (String) The index to load the facets from

facets (String array) The field names of the facets that should be loaded and snapped to

database (Required*, String) The name of the MongoDB database from where the data will be scanned

collection (Required*, String) The name of the MongoDB collection to retrieve data from

* The database name (database) and collection name (collection) need not be explicitly defined if they are included in the url parameter in the mongodb_atlas engine.

replaceQuery

(Optional, Boolean) If true, update the query in the query parameter or body, removing the tokens whose values matched facet values. Defaults to true.

reload

(Optional, Long) The number of milliseconds between reloads of the facet values. Defaults to 10 minutes.

engine

(Json) The configuration of the search engine holding the facets and values to snap to

{
     "servers": List<ConnectionInfo>,
     "credentials": BasicCredentials,
     "postFilter": boolean,
     "type": String,
     "aclField": String,
     "maxBuckets": Integer
}

servers (ConnectionInfo array) The search engine servers, required for ElasticSearch and OpenSearch.

url (String) A MongoDB connection url, optionally containing the database and collection names, used only for the MongoDB engine.

Example
- mongodb+srv://atlascluster.igqrdsg.mongodb.net/myDatabase.myCollection

* Only one connection method for the MongoDB engine (either servers or url is required. If both are given, the url takes precedence)

tls (Optional, Boolean) Whether tls should be enabled. Defaults to false.

credentialsId (Optional, UUID) The credentials ID for the search engine

postFilter (Optional, Boolean) If true the facet value query will be added as a post filter Defaults to true.

type

(String) The search engine type (Elastic/Open/Solr for instance). Currently, only elasticsearch,opensearch and mongodb_atlas are supported

aclField (String) The name of the search engine field to use for filtering facet values snapped to based on user permissions

maxBuckets (Integer) The amount of variations to be used in the facets query (results are ordered by repetition and field value). Defaults to 1000.

maskQueriesWithEntities

(Json, Optional) The list of entities and their masks to be replaced. Note if this field doesn't exist the processor will continue with the normal flow otherwise the mask query flow will start. Example:

{
     "[facet.keyword]": "[mask value]",
      ...
}

facet.keyword (String) This is the facet name value

mask value (String) The value to mask the facet name.

matchAllValues (boolean, Optional) When enabled, the filter created with the snapped facet values will add a terms clause per facet value instead of a terms clause per facet.

Template

The template component allows for the loading of Json objects into the request body, optionally based on the values of query parameters. This allows for loading of (say) a complex structure used for querying, filtering or sorting that can then have minor modifications made to it by a scripting component.

Configuration:

{
    "type": "template",
    "description": "template processor",
    <configuration>
}

Configuration parameters:

templates

(Optional, Json) Used in conjunction with the parameter option below, this holds a Json object containing a set of named templates. The top level field name is the template name. Its (object) value is the template. When selected the (object) value will replace or be merged into the request body.

defaultTemplate

(Optional, Json) The json describing the default template. This template will be chosen if there are no named templates or if the value in the query parameter (if specified) does not match one of the values in the parameter map.

replace

(Boolean) If true the template will completely replace the request body. Otherwise, the template will be merged into the existing request body. Defaults to false.

parameter

(Optional, Json) A Json object containing the name of a query parameter to examine, and a map of parameter values to template names. When used in conjunction with the templates parameter above, the value from the given parameter will be looked up in the template map to find the name of the template to use. This allows multiple parameter values to reference the same name template. Once the template name has been discovered, the template with that name will be found with in the named templates (from templates) and will replace or be merged with the request body. If the parameter value does not appear in the value map, or the named template does not exist, the defaultTemplate (if it exists) will be used instead.

{
     "name": String,
     "templates": Map<String,String>
}

name (String) The parameter whose value will be examined to decide which named template (from templates) will be used.

templates (Json) A simple Json object holding the mapping between the parameter value, and the named template to use. The field name will be matched against the parameter value, and the field value gives the name of the template to use.

Language Detector

This component detects the language from the input query. It expects a list of possible language to detect. The output language is added as part as a header.

Configuration:

{
   "type": "languageDetector",
   "description": "Language detector description",
    <configuration>
}

Configuration parameters:

languagesDetector

(Optional, Json) Languages to detect. The detection process will narrow down to only detect these languages. Full language names are expected in lower case, for example: ['spanish', 'english'] The default value is an empty list.

defaultLanguage

(Optional, String) If a language cannot be detected, then default to this value. Defaults to English.

minDistance

(Optional, Double) As per Lingua's documentation:

By default, Lingua returns the most likely language for a given input text. However, there are certain words that are spelled the same in more than one language. The word prologue, for instance, is both a valid English and French word. Lingua would output either English or French which might be wrong in the given context. For cases like that, it is possible to specify a minimum relative distance that the logarithmized and summed up probabilities for each possible language have to satisfy.

stripPunctuation

(Optional, Boolean) Removes punctuation from the query. Defaults to true

queryParameter

(Optional, String) Which query parameter to analyze. Defaults to q

Question Detector

This component detects whether a query is in question format or not. It analyzes a query to detect if it starts with a word that denotates a question or if it ends with a question mark (?). If any of those are met. Then a question is detected. If a question is not detected, then the flow terminates.

If a language has been previously detected (i.e. a custom response header exists), then the questions prefixes associated to that language will be used. If not, a default language will be selected.

Configuration:

{
  "type": "questionDetector",
  "description": "Question detector description",
  <configuration>
}

Configuration parameters:

questionPrefixes

(Optional, Json) Map of lists that contains the words that help to detect questions. For example:

"questionPrefixes": {
    "english": ["what", "who", "why", "where", "when", "how", "in", "from", "is", "to", "for", "does", "do"],
    "french": ["quel", "quels","quelle", "quelles", "qui", "ou", "où", "a", "à", "qui", "en", "combien", "quand", "comment", "pour", "pourquoi", "dans", "est", "quoi", "que", "qu" ]
  }

If defaults to a structure that contains the following English words: ["what", "who", "why", "where", "when", "how"]

includeResultInResponse

(Optional, boolean) Adds a boolean to the response to indicate the result of this question detection. Defaults to false.

queryParameter

(Optional, String) Which query parameter to analyze. Defaults to q

NLP Service

This component takes a query as input and send it to the spacy NLP service for analysis. Both entities and dependency responses from Spacy are added to the API request so they can be used by a downstream component.

Configuration:

{
  "type" : "NLPClient",
  "description" : "Performs NLP analysis using Spacy service.",
  <configuration>
}

Configuration parameters:

model

(Optional, String) Name of the spacy model to use. Can be overridden with the discovery-api-nlp-model response header.

Defaults to en

collapsePunctuation

(Optional, Boolean) Boolean value to tell Spacy whether to use the collapse punctuation feature during dependency parsing.

Defaults to false

collapsePhrases

(Optional, Boolean) Boolean value to tell Spacy whether to use the collapse phrases feature during dependency parsing.

Defaults to false

queryParameter

(Optional, String) Which query parameter to analyze.

Defaults to q

servers

(Json) The configuration of the spacy endpoint(s).

{
  "servers": List<ConnectionInfo>,
}

Chat GPT

This component takes a query and/or a series of contexts as various prompts that are send to the Open AI API's chat completion endpoint (see here). The prompts built by the component are sent as one or two messages, one with the system role and another with the user role. The assitant role message returned by the API is added to the response body.

Configuration:

{
  "type" : "chatGpt",
  "description" : "Uses Open AI's chat completion models to generate a response to the query",
  <configuration>
}

Configuration parameters:

systemPrompt

(Optional, String) The first part of the system message sent in the chat completion request. This text will be followed by the contents of the context sources if present, separated by a newline. If blank, or not specified, then no text will be added to the system message, unless context sources are present

messageField

(Optional, String) The field in the response's body where the chat completion response message will be put. Defaults to 'chatGptMessage'

contextSources

(Optional, String) The field in the request's body with the text or array of texts that will be included as context in the system message. The contexts will be added to the message after the systemPrompt value, separated by a newline. If the field specified by contextSources is an array, each context will be added and also separated by a newline. If this parameter isn't configured, no contexts will be added to the system message, and if the systemPrompt param is also not configured, then no system message will be sent at all.

credentialsId

(UUID) The id of the Credentials with the request cooldown config and authentication settings. This processor follows the same behaviour regarding credentials, request cooldown, and backoff policy as the Embeddings Search processor, but with a chat completion request instead of an embeddings one. Refer to said processors' credentials and request cooldown section for details.

Chat GPT processors and embeddings search processors won't share a cooldown, even if they have the same credential and same timeout config.

model

(Optional, String) The Open AI chat completion model to use in requests, defaults to gpt-3.5-turbo. The model to use can be overriden if there is a query parameter chatCompletionModel in the request, in which case the query parameter's value will be used.

temperature

(Optional, Double) What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Defaults to 0.

topP

(Optional, Double) An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Defaults to 1.

It is recommended to alter the temperature or topP but not both.

timeout

(Optional, String) The timeout on embeddings requests. Expressed in Duration notation, defaults to PT120S

user

(Optional, String) The user to be included in embeddings requests. If specified as null, no user will be included in requests. If not specified at all, defaults to pdp-content-processor.

backoffType

(Optional, String) The type of backoff to apply to the retries of the check for a passed cooldown. Options are none, constant, and exponential. Defaults to constant.

A backoff policy of type none can be useful if one wishes that requests never wait for the backoff to return a result, or in other words, if an immediate response is wanted.

backoffInitialDelay

(Optional, String) The initial delay between backoff checks for a passed cooldown. Expressed in Duration notation, defaults to PT10S

backoffMaxRetries

(Optional, Integer) Maximum amount of times a processor tries to check for a passed cooldown before failing the requests's processing. Defaults to 25

queryParameter

(Optional, String) The query parameter to take as user message in the chat completion request. Defaults to q. If the query parameter isn't included, only a system role message will be sent as part of the chat completion request. If included, the user role message comes after the system role one.

Replace Entities

This component detects in an array of cypher queries the entity to be changed.

Configuration:

{
  "type": "replaceEntities",
  "description": "Replace Entity description",
  <configuration>
}

Configuration parameters:

numberOfQueriesToUse

(Optional, Integer) Number of queries which are going to be used in this component. Defaults to 1.

entityDelimiter

(Optional, String) The delimiter used to detect entities. Defaults to "__".

replacementDelimiter

(Optional, String) if 'useQuotes' is true, this value will be used to replace the 'entityDelimiter'. Defaults to "'".

useQuotes

(Optional, Boolean) if the flag is true single quotes will be added between the entity replaced, otherwise single quotes won't be added. Defaults to true.

toReplaceWithinEntity (Optional, String/regex) This field contains the string to be replaced by the 'replacementWithinEntity' in the entity. Defaults to "'"

replacementWithinEntity (Optional, String) the new entity value. Defaults to "\'"

Neo4J

This component runs all cypher queries which are stored in the context Object as "cypherQuery".

Note: If the value is a list for example:

[
  {
    "entity": "x",
    "attribute": "y",
    "value": ["a", "b", "c"]
  },
  {
    "entity": "x",
    "attribute": "y",
    "value": ["d","b","f"]
  }
]

The value will be dedupe and flatten:

{
  "entity": "x",
  "attribute": "y",
  "value": ["a","b","c","d","f"]
}

Configuration:

{
  "type": "neo4j",
  "description": "neo4j description",
  <configuration>
}

Configuration parameters:

emptyResponseMap (Optional, Json) This text will be used in cases where an empty value comes from neo4j. Defaults to:

{
  "english": "Empty source data",
  "french" : "Données source vides"
}

The language will be taken from the header, if there's no language in the header, "english" will be the default text.

url (Optional, String) A Neo4j connection url

credentialsId (Optional, UUID) The credentials ID for Neo4j connection

database (Optional, String) An specific neo4j db to be pointed. Defaults to "neo4j" db.

Vector Query

This component creates a vector query for the given query string. The vector query is based on the template given or a default template if none is given.

The query may use a vector only, or may include a minimum should match element depending on configuration.

The default templates are tied to a specific engine type (ie opensearch, elasticsearch, or mongodb_atlas) which can be specified in the configuration

Configuration:

{
  "type": "replaceEntities",
  "description": "Replace Entity description",
  <configuration>
}

Configuration parameters:

engineType

(String, Optional) The type of search engine to produce queries for when using default templates. Defaults to opensearch. Supported engines are: opensearch, elasticsearch, mongodb_atlas

merge

(Optional, boolean) Replace or merge the query into the body. Defaults to false.

stripPunctuation

(Optional, boolean) Strip punctuation. Defaults to true.

queryParameter

(Optional, String) The (query) parameter. Defaults to 'q'.

minimumLength

(Optional, int) The minimum length of a query to process. Defaults to 0.

vectorField

(String) The field to form the vector query for.

minScore

(Optional, float) Minimum score for results. Defaults to 0.92f.

maxResults

(Optional, int) Maximum number of results. Defaults to 20.

servers

(Json) Credentials to be able to connect in to Bert API.

minimumShouldMatch (Json) Add MinimumShouldMatch template into the query.

{
  "minimumShouldMatch": {
    "total": "1<-1",
    "indexField": "vectors.text",
    "matchesField": {
      "should": ["a", "b"],
      "must": "c"
    }
  }
}

total (String) Total matches.

indexField (String) indexField.

matchesClause (String, Optional) The name of the clause to use (i.e "match", "term"). Default: "match". If mongodb_atlas is set as the engine, then "text" or "regex" should be used as the matchesClause. Others are not supported.

matchesField (HashMap) These are the fields which are going to be added in the should match template. The key is the field where the value (String or Array) will be added into the template.

clauses (HashMap) A map of clauses and its indexField and targetField configuration. This parameter makes it easy to add multiple clauses and different indexFields for each of them. The indexField refers to the field from the ElasticSearch/OpenSearch index where the clause should try to check for. The targetField configures the name of variable to search from in the response.

{
  "minimumShouldMatch": {
    "total": "1<-1",
    "indexField": "vectors.text",
    "matchesField": {
      "should": ["a", "b"],
      "must": "c"
    },
    "clauses": {
      "term": {
        "indexField": "all_filters",
        "targetField": {
          "should": "all_filters"
        }
      }
    }
  }
}

It can work as an extension of the default clause or to replace completely the default configuration as:

{
  "minimumShouldMatch": {
    "total": "1<-1",
    "clauses": {
      "match": {
        "indexField": "vectors.text",
        "targetField": {
          "should": "a"
        },
        "term": {
          "indexField": "all_filters",
          "targetField": {
            "should": "b"
          }
        }
      }
    }
  }
}

Embeddings search

This component creates a vector query for the given query string. The vector query is based on the template given or a default template if none is given.

The query may use a vector only, or may include a minimum should match element depending on configuration.

Open AI's API embeddings feature (see here) is used to get a vector representation of the query.

The templates given and the component in general only supports the creation of queries intended for Elasticsearch.

Configuration:

{
  "type": "embeddingsSearch",
  "description": "Embeddings search description",
  <configuration>
}

Configuration parameters:

merge

(Optional, boolean) Replace or merge the query into the body. Defaults to false.

stripPunctuation

(Optional, boolean) Strip punctuation. Defaults to true.

persistVectorsInContext

(Optional, boolean) Whether to persist query vectors in the Discovery API context. Defaults to false

credentialsId

(UUID) The id of the Credentials with the request cooldown config and authentication settings. This processor requires a special type of credentials to work, please see the next section for details.

queryParameter

(Optional, String) The (query) parameter. Defaults to 'q'.

minimumLength

(Optional, int) The minimum length of a query to process. Defaults to 0.

vectorField

(String) The field to form the vector query for.

minScore

(Optional, float) Minimum score for results. Defaults to 0.92f.

maxResults

(Optional, int) Maximum number of results. Defaults to 20.

embeddingsConfig

(Optional, Json) The configuration of the embeddings requests done to Open AI's API. Configuration parameter details are given at the end of this section

{
  "embeddingsConfig": {
    "model": "text-embedding-ada-002",
    "user": "pdp-embeddings",
    "timeout": "PT120S",
    "backoffType": "exponential",
    "backoffInitialDelay": "PT10S",
    "backoffMaxRetries": 7,
  }
}

minimumShouldMatch

(Json) Add MinimumShouldMatch template into the query.

{
  "minimumShouldMatch": {
    "total": "1<-1",
    "indexField": "vectors.text",
    "matchesField": {
      "should": ["a", "b"],
      "must": "c"
    }
  }
}

total (String) Total matches.

indexField (String) indexField.

matchesClause (String, Optional) The name of the clause to use (i.e "match", "term"). Default: "match". If mongodb_atlas is set as the engine, then "text" or "regex" should be used as the matchesClause. Others are not supported.

matchesField (HashMap) These are the fields which are going to be added in the should match template. The key is the field where the value (String or Array) will be added into the template.

clauses (HashMap) A map of clauses and its indexField and targetField configuration. This parameter makes it easy to add multiple clauses and different indexFields for each of them. The indexField refers to the field from the ElasticSearch/OpenSearch index where the clause should try to check for. The targetField configures the name of variable to search from in the response.

{
  "minimumShouldMatch": {
    "total": "1<-1",
    "indexField": "vectors.text",
    "matchesField": {
      "should": ["a", "b"],
      "must": "c"
    },
    "clauses": {
      "term": {
        "indexField": "all_filters",
        "targetField": {
          "should": "all_filters"
        }
      }
    }
  }
}

It can work as an extension of the default clause or to replace completely the default configuration as:

{
  "minimumShouldMatch": {
    "total": "1<-1",
    "clauses": {
      "match": {
        "indexField": "vectors.text",
        "targetField": {
          "should": "a"
        },
        "term": {
          "indexField": "all_filters",
          "targetField": {
            "should": "b"
          }
        }
      }
    }
  }
}

Configuration parameters of an embeddingsConfig object

model

(Optional, String) The Open AI embeddings model to use in requests, defaults to text-embedding-ada-002

timeout

(Optional, String) The timeout on embeddings requests. Expressed in Duration notation, defaults to PT120S

user

(Optional, String) The user to be included in embeddings requests. If specified as null, no user will be included in requests. If not specified at all, defaults to pdp-content-processor.

backoffType

(Optional, String) The type of backoff to apply to the retries of the check for a passed cooldown. Options are none, constant, and exponential. Defaults to constant.

A backoff policy of type none can be useful if one wishes that requests never wait for the backoff to return a result, or in other words, if an immediate response is wanted.

backoffInitialDelay

(Optional, String) The initial delay between backoff checks for a passed cooldown. Expressed in Duration notation, defaults to PT10S

backoffMaxRetries

(Optional, Integer) Maximum amount of times a processor tries to check for a passed cooldown before failing the request's processing. Defaults to 25

Credentials

This processor requires a special type of credentials due to the authentication to the Open AI API being done through an API token. The credentials also allow some configuration of the request cooldown functionality, which will be explained latter. The processor expects the presence of the following values in the config of the credential:

{
  "config": {
    "token": "<AnAPIToken>",
    "organizationId": "<AnOrgId>",
    "requestCooldown": "PT60S"
  }
}

Configuration parameters:

token

(Required, String) The API token used to the authenticate to Open AI's API.

organizationId

(Optional, String) An organization id to be included with requests to Open AI's API. If null, or not present, no organization id will be included in requests.

requestCooldown

(Required, String) When a request made by a processor with these credentials returns a rate limit error (see here), requests using this Credential (even in other processors and / or requests) will be put in cooldown and thus rejected for the length of this Duration. The shared request cooldown is, nonetheless, limited to processors with the same timeout configuration value. Defaults to PT65S

And so a full credential that can be used with this processor looks like this:

  {
  "type": "open-ai-component",
  "name": "chat-gpt personal credentials",
  "description": "Credentials for ChatGPT",
  "config": {
    "token": "<AnAPIToken>",
    "organizationId": "<AnOrgId>",
    "requestCooldown": "PT60S"
  }
}

The credential must be of type open-ai-component, otherwise the processors using them will fail.

Request cooldown

Embeddings Search processors come with a built-in request cooldown functionality to try to handle and minimize rate limit errors returned by Open AI API. How it works is that once a request made by a processor returns a rate limit error, the requests of all processors that share the same credential config values and timeout config value will be put in cooldown for the length of time specified in the credential. This cooldown can be applied again if, once the first cooldown has passed, a new request receives a rate limit error.

Each individual processor can be configured to check for the passing of this cooldown in a different manner through its backoff policy related config values. Note that this backoff affects only how many times and how often the check for the passed cooldown is made, not how for how long the requests will be rejected because of the cooldown.

It is recommended to have the backoff configured so that the processors only try to check for the ending of the cooldown for a reasonable amount of time, before failing due to the maximum amount of retries. This is because if a rate limit error is received because of a daily token limit or insufficient credit, then the cooldown will just be reapplied again and again and the processor could get stuck.

It is important to note that the request cooldown mechanism isn't perfect, and requests made too closely together in time by two or more processors sharing a cooldown can all receive a rate limit error, thus possibly increasing the time Open AI API's will be returning the same error. That being said, the requests have to be made at almost the same time for this happen, so it shouldn't be common. No matter how many requests receive the rate limit error, only one cooldown will be applied at a time.

Given that the rate limiting on Open AI API's side applies at the organization level first, and API token second, it is recommended to keep as many processors as possible sharing the same cooldown if they share the same API token and organization id in their credentials. Or in other words, it is recommended, if two or more processors share the same API token and organization id, to have them share the same requestCooldown value in the credential and timeout value in their configs, because then they will have a shared request cooldown and so less rate limit errors will be returned.

Featured Snippets

This component calls the HuggingFace service with a question and a list of text chunks.

Note: Due to hugging-face-service, torch library and dependencies not being fully deterministic scores or results may provide different responses in separated executions even if the input is the same. See Pytorch docs.

Configuration:

{
    "type": "featuredSnippet",
    "description": "Replace featured snippets description",
    <configuration>
}

Configuration parameters:

modelParameter

(Optional, String) The name of the query parameter to specify the Hugging Face model. Defaults to 'model'.

This query parameter receives: (Optional, String) The Hugging Face model to use. Available models can be checked here. If no model is specified the first available model in the service will be used.

queryParameter

(Optional, String) The name of the query parameter to specify the question. Defaults to 'q'.

This query parameter receives: (Required, String) The query to evaluate in the text chunks.

servers

(Json) Credentials to be able to connect in to HuggingFace API.

connection

(Json) connectTimeout and readTimeout.

minScore

(Optional, float) Minimum score for results. Defaults to 0.75f.

idField

(String) Path where the id is located in the ES/OS hit.

textField

(String) Path where the text is located in the ES/OS hit.

resultsField

(Optional, String) Path where the results are located. For ES/OS the field is 'hits.hits', for MongoAtlas is 'cursor.firstBatch'. When not configuring this parameter the default value is 'hits.hits'.

metadataField (Json, optional) Paths of extra data to extract in the ES/OS hit. for example

{
  "metadataField": {
    "uri": "_source.uri",
    ...
  }
}

Summarization

This component calls the Hugging Face service with a list of text chunks to summarize.

Note: Due to hugging-face-service, torch library and dependencies not being fully deterministic, results may differ in separate executions even if the input is the same. See Pytorch docs.

Configuration:

{
    "type": "summarization",
    "description": "Replace summarization description",
    "config": {
      "servers": {
          "host": "localhost",
          "port": 8089
      },
      "idField": "_id",
      "textField": "_source.content",
      ...
    }
}

Configuration parameters:

modelParameter

(Optional, String, Defaults to 'model') The name of the query parameter to specify the Hugging Face model.

This query parameter receives: (Optional, String) The Hugging Face model to use. Available models can be checked here. If no model is specified the first available model in the service will be used.

maxLengthParameter

(Optional, String, Defaults to 'maxLength') The name of the query parameter to specify the max token length of the result.

This query parameter receives: (Optional, Integer, Defaults to 200) The maximum length the generated tokens can have.

minLengthParameter

(Optional, String, Defaults to 'minLength') The name of the query parameter to specify the min token length of the result.

This query parameter receives: (Optional, Integer, Defaults to 0) The minimum length the generated tokens can have.

cleanUpParameter

(Optional, String, Defaults to 'cleanUp') The name of the query parameter to specify whether to remove extra spaces in the result.

This query parameter receives: (Optional, Boolean, Defaults to true) Whether to clean up the potential extra spaces in the result.

servers.host

(Required, String) The host where the Hugging Face Service is located.

servers.port

(Required, Integer) The host port where the Hugging Face Service is located.

connection.connectTimeout

(Optional, Integer, Defaults to 60000) The timeout to connect to the server in milliseconds.

connection.readTimeout

(Optional, Integer, Defaults to 60000) The timeout to read from the server in milliseconds.

idField

(Required, String) Path where the id is located in the ES/OS hit.

textField

(Required, String) Path where the text is located in the ES/OS hit.

resultsField

(Optional, String, Defaults to 'hits.hits') Path where the results are located. For ES/OS the field is 'hits.hits', for Mongo Atlas it's 'cursor.firstBatch'.

metadataField

(Optional, Json) Paths of extra data to extract in the ES/OS hit. for example:

{
  "metadataField": {
    "uri": "_source.uri",
    ...
  }
}

Sentiment Analysis

This component calls the Hugging Face service with a list of text chunks to classify them into positive, negative or neutral according to their perceived emotion.

Note: Due to hugging-face-service, torch library and dependencies not being fully deterministic, results may differ in separate executions even if the input is the same. See Pytorch docs.

Configuration:

{
  "type": "sentimentAnalysis",
  "description": "Replace sentiment analysis description",
  "config": {
    "servers": {
      "host": "localhost",
      "port": 8089
    },
    "idField": "_id",
    "textField": "_source.content",
    ...
  }
}

Configuration parameters:

modelParameter

(Optional, String, Defaults to 'model') The name of the query parameter to specify the Hugging Face model.

This query parameter receives: (Optional, String) The Hugging Face model to use. Available models can be checked here. If no model is specified the first available model in the service will be used.

functionParameter

(Optional, String, Defaults to 'outputFunction') The name of the query parameter to specify the function to apply to the model outputs.

This query parameter receives: (Optional, String, Defaults to 'default') The function to apply to the model outputs in order to retrieve the scores. Accepts values: 'sigmoid' 'softmax' 'none'.

minScore

(Optional, Float, Defaults to 0.5) The minimum score for a result to be considered.

servers.host

(Required, String) The host where the Hugging Face Service is located.

servers.port

(Required, Integer) The host port where the Hugging Face Service is located.

connection.connectTimeout

(Optional, Integer, Defaults to 60000) The timeout to connect to the server in milliseconds.

connection.readTimeout

(Optional, Integer, Defaults to 60000) The timeout to read from the server in milliseconds.

idField

(Required, String) Path where the id is located in the ES/OS hit.

textField

(Required, String) Path where the text is located in the ES/OS hit.

resultsField

(Optional, String, Defaults to 'hits.hits') Path where the results are located. For ES/OS the field is 'hits.hits', for Mongo Atlas it's 'cursor.firstBatch'.

metadataField

(Optional, Json) Paths of extra data to extract in the ES/OS hit. for example:

{
  "metadataField": {
    "uri": "_source.uri",
    ...
  }
}

Zero Shot Classification

This component calls the Hugging Face service with a list of text chunks and a list of labels, to classify them into the best fitting label.

Note: Due to hugging-face-service, torch library and dependencies not being fully deterministic, results may differ in separate executions even if the input is the same. See Pytorch docs.

Configuration:

{
  "type": "classification",
  "description": "Replace zero shot classification description",
  "config": {
    "servers": {
      "host": "localhost",
      "port": 8089
    },
    "idField": "_id",
    "textField": "_source.content",
    ...
  }
}

Configuration parameters:

modelParameter

(Optional, String, Defaults to 'model') The name of the query parameter to specify the Hugging Face model.

This query parameter receives: (Optional, String) The Hugging Face model to use. Available models can be checked here. If no model is specified the first available model in the service will be used.

labelsParameter

(Optional, String, Defaults to 'label') The name of the query parameter to specify the candidate labels.

This query parameter receives: (Required, String) One candidate label to classify the text into. To use more labels just repeat the parameter with each one individually.

multiLabelParameter

(Optional, String, Defaults to 'multiLabel') The name of the query parameter to specify whether more than one label can be true.

This query parameter receives: (Optional, Boolean, Defaults to false) Whether multiple candidate labels can be true.

hypothesisTemplateParameter

(Optional, String, Defaults to 'hypothesisTemplate') The name of the query parameter to specify the template to turn labels into a hypothesis.

This query parameter receives: (Optional, String, Defaults to 'This example is [].') The template used to turn each label into an NLI-style hypothesis. The template is required to contain the substring '[]', as it is used to place each label.

minScore

(Optional, Float, Defaults to 0.75) The minimum score for a result to be considered.

servers.host

(Required, String) The host where the Hugging Face Service is located.

servers.port

(Required, Integer) The host port where the Hugging Face Service is located.

connection.connectTimeout

(Optional, Integer, Defaults to 60000) The timeout to connect to the server in milliseconds.

connection.readTimeout

(Optional, Integer, Defaults to 60000) The timeout to read from the server in milliseconds.

idField

(Required, String) Path where the id is located in the ES/OS hit.

textField

(Required, String) Path where the text is located in the ES/OS hit.

resultsField

(Optional, String, Defaults to 'hits.hits') Path where the results are located. For ES/OS the field is 'hits.hits', for Mongo Atlas it's 'cursor.firstBatch'.

metadataField

(Optional, Json) Paths of extra data to extract in the ES/OS hit. for example:

{
  "metadataField": {
    "uri": "_source.uri",
    ...
  }
}

Feedback Capture

This component captures various values as a feedback, and puts the feedback data into the request body within the Discovery API.

Prerequisites:

Before capture feedback from users make sure to set up the proper index mapping for the feedback index. You can look at reference index mappings at the feedbackMappings.json file under the resources/feedback/<engine type>/ folder.

Configuration:

{
    "type": "feedbackCapture",
    "description": "This processor captures feedback data",
    <configuration>
}

Configuration parameters:

hashAlgorithm

(String) Algorithm to be used for hashing the answer returned by search. Defaults to MD5.

{
  "hashAlgorithm": "MD5"
}

API parameters:

endpoint

(String, Required) Endpoint for which feedback is given.

value

(Integer, Required) Value given as feedback. Value must be -1 for negative feedback, 1 for positive feedback.

type

(String, Required) Query type for which feedback is given. Valid values: SEARCH, KG, FS and FAQ.

isBanned

(Boolean) Flag to indicate if result should be banned from returned results. Defaults to false.

queryVector

(List<Double>, Required) Vector containing the query embeddings sent by the user.

question

(String) Question asked by the user.

answer

(String, Required) Suggested answer returned to the user.

comments

(String) Any additional comments to persist with the feedback.

Apply Feedback

This component applies the captured feedback into a set of results.

Results can be either boost positively or negatively, as well as banned from the set of results.

It is recommended to put this component as the final step in the search pipeline.

Assumptions:

Query vector for the question set should be present at the Discovery API context, this is set automatically by the VectorQuery component.
Answer type (KG, FS) must be present as part of the Discovery API response, this has to be set in previous steps in the pipeline.

Configuration:

{
    "component": "applyFeedback",
    "description": "This processor applies captured feedback to the set of results",
    <configuration>
}

Configuration parameters:

hashAlgorithm

(String) Algorithm to be used for hashing the answer returned by search. Defaults to MD5.

{
  "hashAlgorithm": "MD5"
}

operation

(String) Operation to be applied to the results score and the set weight. Defaults to MULTIPLIER.

{
  "operation": "MULTIPLIER|ADDITION"
}

weight

(Float) Weight to be applied to the results scores.

{
  "weight": 0.05
}

numberOfReviews

(Optional, Integer) Number of feedback reviews to retrieve for each answer. Defaults to 200.

{
  "numberOfReviews": 100
}

questionSimilarityMinScore

(Float) Minimum score for questions similarity. Question similarity ranges from 1-2. Defaults to 1.3.

{
  "questionSimilarityMinScore": 1.3
}

answerMinScore

(Float) Minimum score for considering an answer correct. Value can be from 0 to 1. Defaults to 0.25.

{
  "answerMinScore": 0.25
}

thresholds

(List<Integer>) Weights can be escalated based on given thresholds by 2 powered to the number of reviews that fit on threshold. Defaults to [5, 10, 20].

{
  "thresholds": [10, 100, 1000]
}

feedback

(Json) Configuration related to retrieving feedback reviews.

{
  "feedback": {
    "fields": {...},
    "reviews": {...},
    "results": {...},
    "storage": {...}
  }
}

fields (Json) Field names associated to captured feedback.

{
  "fields": {
    "vector": "queryVector",
    "answer": "answerHash",
    "banned": "banned",
    "value": "value"
  }
}

vector (String) Field where query vectors are persisted on captured feedback. Defaults to queryVector.

answer (String) Field where hashed answer is persisted on captured feedback. Defaults to answerHash.

banned (String) Field where flag to ban an answer is persisted on captured feedback. Defaults to banned.

value (String) Field where feedback value is persisted on captured feedback. Defaults to value.

reviews (Json) Pointers to extract feedback reviews from aggregated feedback fetching.

{
  "reviews": {
    "pointer": "/aggregations/reviews/buckets",
    "keyPointer": "/key",
    "valuesPointer": "/values/buckets",
    "valuesKeyPointer": "/key",
    "valuesCountPointer": "/doc_count",
    "bannedPointer": "/banned/doc_count"
  }
}

pointer (JsonPointer) Pointer to look up for the aggregated feedback reviews.

keyPointer (JsonPointer) Pointer to look up for the hashed answer value of the aggregated feedback reviews.

valuesPointer (JsonPointer) Pointer to look up for each feedback review of a given answer.

valuesKeyPointer (JsonPointer) Pointer to look up for the feedback type of each feedback review.

valuesCountPointer (JsonPointer) Pointer to look up for the feedback count of each feedback review.

bannedPointer (JsonPointer) Pointer to look up if the given answer should be banned.

results (Json) Pointers to extract computed answers returned by the Discovery API.

{
  "results": {
    "originalResults": "/results_original",
    "acceptedResults": "/results",
    "answer": "/answer",
    "score": "/score"
  }
}

originalResults (Optional, JsonPointer) Pointer to original results returned by the Discovery API. Defaults to /results_original

acceptedResults (Optional, JsonPointer) Pointer to accepted result returned by the Discovery API. Defaults to /results.

answer (Optional, JsonPointer) Pointer to each answer value returned by the Discovery API. Defaults to /answer.

score (Optional, JsonPointer) Pointer to each answer score returned by the Discovery API. Defaults to /score.

engine (Json) Captured feedback persistence storage.

{
  "engine": {
    "type": "ELASTICSEARCH",
    "queryTemplate": {...},
    "endpoint": "feedback/_search",
    "credentials": {...},
    "connection": {...}
  }
}

Specific fields for engine type mongodb_atlas:

{
  "database": String,     
  "collection": String,
}

Specific fields for engine type elasticsearch/opensearch:

{
  "endpoint": String,
  "servers": [...]
}

type(Optional, String) Engine type to used for search. Defaults to OPENSEARCH.

queryTemplate (Optional, Json) Query template to be used for fetching feedback reviews.

endpoint (String) Endpoint to call for retrieving the persisted feedback.

credentials (Optional, AuthenticationCredentials) The search engine authentication.

servers (List<ConnectionInfo>) The search engine servers.

connection (Optional, HttpConnectionProperties) Timeout properties for connect and read operations. Both default to 5 seconds.

Staging

This component performs operations in the staging repository (such as storage and fetch)

Configuration:

{
    "type": "staging",
    "description": "This component executes operations in the staging repository",
    <configuration>
}

Configuration parameters:

bucket

(String) The name of the bucket to use in the staging repository.

operation

(String) The staging operation to perform. Options are:

getAll Gets all documents from bucket

fetch Fetches a document with ID specified in idField from bucket

store Stores in bucket

staging

(Json) The staging connection properties (servers, connection, credentials).

{
     "servers": List<ConnectionInfo>,
     "connection": HttpConnectionProperties,
     "credentialsId": String
}

Staging

This component performs operations in the staging repository (such as storage and fetch)

Configuration:

{
    "type": "staging",
    "description": "This component executes operations in the staging repository",
    <configuration>
}

Configuration parameters:

bucket

(String) The name of the bucket to use in the staging repository.

operation

(String) The staging operation to perform. Options are:

getAll Gets all documents from bucket

fetch Fetches a document with ID specified in idField from bucket

store Stores in bucket

staging

(Json) The staging connection properties (servers, connection, credentials).

{
     "servers": List<ConnectionInfo>,
     "connection": HttpConnectionProperties,
     "credentialsId": String
}

servers (ConnectionInfo array) The staging servers

connection (Optional, HttpConnectionProperties) connectTimeout and readTimeout. Both defaults to 5 seconds.

credentialsId (Optional, UUID) The credentials ID for the staging connection

idField

(Optional, String) If fetch required, path to the ID of the doc to get. If store optional, the new ID of the doc.

multiDocumentPath

(Optional, String) If store, The path in the body that splits it into sub-documents. This value must be an array.

useResponse

(Optional, Boolean) If true, component fetches paths from previous component response. Defaults to false.

Engine score query

This component performs queries against a search engine and puts a simple result set in to the response for use in calculating engine scores.

Configuration:

{
    "type": "engineScoreQuery",
    "description": "This component executes executes queries for engine scoring",
    <configuration>
}

Configuration parameters:

engineType

(String, Required) The search engine to connect to. Currently only Solr is supported

servers/scheme

(String) The scheme for the engine (http/https).

servers/hostname

(String) The hostname for the engine.

servers/port

(Integer) The port for the engine.

servers/path

(String) The path for the engine.

queryParameterPath

(String, Required) The path - in the body of the request - to the query to be executed against the search engine. Currently in the Solr engine, the entire url is parsed and all parameters sent to the Solr endpoint specified by the 'servers' configuration

index

(String) The name of the index/collection to search

idField

(String) The name of the field at the search engine which holds the document id of hits (defaults to _id)

resultFields

(String list) The name(s) of any other fields from a hit that should be transferred back to the response

Find Entity

This component identifies an entity based on configuration parameters.

Configuration:

{
    "type": "findEntity",
    "description": "This processor finds an entity",
    <configuration>
}

Configuration parameters:

acceptedPOS

(String array) The array list of POS to identify the entity, valid POS: NOUN, ADJ, JJ, JJR, JJS, AFX, NN, PROPN, NNP, NNPS, NNS.

minTokens

(Integer) The amount of tokens of the entity to be extracted. If entity tokens is less than minTokens param the entity will be ignored.

Security Filter

This component adds a security filter for a search engine query (Boolean query with filter for Opensearch/Elasticsearch, compound filter for MongoDB Atlas Search).

Elasticsearch and Openasearch engines may not require a full request body but in the case of MongoDB, since the mongo component supports database commands, the request body requires an aggregation with a compound search stage to correctly add the filter.

Configuration:

{
    "type": "security_filter",
    "description": "A security filter component",
    <configuration>
}

Configuration parameters:

header

(String) The name of the header the token should be in.

field

(String) The name for the permissions filter field in the request.

engine

(Json) The engine connection properties (type, servers, credentials, etc). This structure will vary depending on the engine defined in the type field. (Engines supported: opensearch, elasticsearch, mongo).

issuerUrl

(Optional, String) The URL for validating that the token is still valid against a JWK Provider. Defaults to null.

publicAcl

(Optional, String) The public acl. Defaults to "PUBLIC::ALL".

hashParameter

(Optional, String) The user hash from the parameter. Defaults to "hash".

addFilter

(Optional, Boolean) Whether to create the filter query. Defaults to true.

addToRequest

(Optional, Boolean) Whether to add permissions to the request and build a filter with them. Defaults to false.

Engine configuration

Opensearch and Elasticsearch

{
     "engineType": String,
     "servers": List<ConnectionInfo>,
     "credentials": AuthenticationCredentials,
     "index": String
}

type

(Optional, String) The main engine to use (Options are opensearch or elasticsearch). If none given, defaults to OpenSearch.

servers

(Optional, ConnectionInfo array) The search engine servers

credentialsId

(Optional, UUID) The credentials ID for the search engine

index

(String) The document index for user information in the engine

MongoDB

*Either servers or url is required. If both are given, the url takes precedence.

{
     "type": String,
     "servers": List<ConnectionInfo>,
     "index": String,
     "credentials": MongoCredential,
     "url": String,
     "tls": String,
     "database": String,
     "collection": String,
}

type

(String) The main engine to use (mongodb_atlas).

servers

(Optional, ConnectionInfo array) The MongoDB clients

index

(String) The document index for user information in the engine

credentialsId

(Optional, UUID) The credentials ID for the search engine

url

(Optional, String) A MongoDB connection url, optionally containing the database and collection names.

tls

(Optional, Boolean) Whether tls should be enabled. Defaults to false unless a Mongo Atlas url (mongo+srv://) is used in which case the default is true

database

(Optional, String) The name of the MongoDB database to use when looking up the hash

collection

(Optional, String) The name of the MongoDB collection to use when looking up the hash

Endpoint API

The endpoint API allows to define and manage endpoints which can then be used in the API controller.

POST /admin/endpoint - Create an endpoint
GET /admin/endpoint - Get all the available endpoints
GET /admin/endpoint/<endpoint ID> - Get a single endpoint
PUT /admin/endpoint/<endpoint ID> - Update an endpoint
DELETE /admin/endpoint/<endpoint ID> - Delete an endpoint

The endpoint ID is generated by using a combination between the endpoint's HTTP method and its path. This ID is required for CRUD operations. However, the HTTP Method and the configured path are to be used when executing the endpoint through the API controller.

Different types of endpoints are available:

Simple Endpoint

Allows you to create an endpoint containing a chain of processors to run when hitting the endpoint. This is the default type.

{
  "name": <String>,
  "description": <String>
  "uri": <String>,
  "httpMethod": HTTP Method,
  "processors": List<Processors IDs>,
  "continueOnError": Boolean = false,
  "fallbacksCriteria": {
    "field": "{JsonPath}",
    "relation": "EQ|LT|LTE|GT|GTE",
    "value": "{Number}",
    "maxCount": "{Number}"
  }
}

Query Fallbacks

The fallbacksCriteria allows you to set up query fallbacks based on the given criteria set in the endpoint. When these criteria are satisfied every single processor on this endpoint will be rerun. It is recommended to use a script processor as the first step in the endpoint pipeline to programmatically change the request that is going to be re-run.

In order to run query fallbacks you need to set the fallbacks parameter to true in the request sent to the Discovery API.

Configuration

field

(JsonPath) The path for fetching a numeric field containing the number of results coming as the response from search.

relation

(String) The operator used to compare the value from field with the value value.

value

(Integer) The value to compare against the number of records coming from search. Defaults to 0.

maxCount

(Integer) The maximum number of fallback runs allowed per endpoint. Defaults to 3.

Finite-State Machine Endpoint

Allows you to create a Finite-State Machine that represents the path to follow while executing an endpoint.

{
  "uri": <String>,
  "httpMethod": HTTP Method,
  "type": "states",
  "initialState": "stateA",
  "states": {
    "stateA": {
      "type": "processor",
      "processors": <Processor ID>,
      "next": "stateB"
    },
    
    "stateB": {
      "type": "switch",
      "options": [
        {
          "condition": {
            "equals": {
              "field": "lang",
              "source": "requestHeaders",
              "value": "english"
            }
          },
          "next": "EnglishState"
        },
        {
          "condition": {
            "equals": {
              "field": "$.lang",
              "source": "responseBody",
              "value": "french"
            }
          },
          "next": "FrenchState"
        }
      ],
      "default": "EnglishState"
    },
    
    "EnglishState": {
      "type": "processor",
      "processors": [
        <Processor ID>,
        <Processor ID>
      ]
    },
    
    "FrenchState": {
      "type": "processor",
      "processors": [
        <Processor ID>,
        <Processor ID>
      ]
    }
  },
  "timeout": "{Duration}"
}

Note that the configuration requires an initial state, and all states are defined by name (which means there can't be duplicates).

If a state has no next state, it is assumed as the final state. An endpoint can potentially have multiple final states.

All endpoints have an implicit defaultError state, of type Error that shows all exceptions with a 500 status code. This default state can be overridden by a state with the same name in the configuration of the endpoint.

State Types

Processor

Executes a single processor, or multiple processors in sequence:

"EnglishState": {
  "type": "processor",
  "processors": <Processor ID/List of Processor IDs>,
  "next": <State Name>,
  "onError": <State Name> = "defaultError",
  "onTerminate": <State Name>
}

If the next state is undefined, this will be considered as a terminal state.
If the onError field is explicitly undefined (i.e. set to null), the state will execute all processors despite any error. Otherwise, the execution of the state stops and the FSM will continue with the given state.
If one of the processors sends a terminate signal and onTerminate is defined, it will be selected as the next state. Otherwise, the default next will be used.

Switch

Controls the flow of the execution given the first matching condition:

{
  "type": "switch"
  "options": [
    {
      "condition": {
        "equals": {
          "field": "lang",
          "source": "requestHeaders",
          "value": "english"
        }
      },
      "next": "EnglishState"
    },
    {
      "condition": {
        "equals": {
          "field": "lang",
          "source": "requestHeaders",
          "value": "french"
        }
      },
      "next": "FrenchState"
    }
  ],
  "default": "EnglishState"
}

Note that the condition is expressed using the Core DSL for filters. The available sources are:

requestBody - JSON Path to search in the payload of the request.
requestHeaders - Key to search in the headers of the request.
requestParams - Key to search in the request parameters.
responseBody - JSON Path to search in the payload of the response.
responseHeaders - Key to search in the headers of the response.
context - Key to search in the context of the response.

The default state is optional. If not provided and no condition is met, it is assumed as the final state.

Error

Terminal state that returns an error message.

"ErrorState": {
  "type": "error",
  "statusCode": <HTTP Status Code> = 500,
  "message": <String>
}

If no message is given, the response will contain a list of all the exception messages that were thrown during the execution of the endpoint (if any).

Composite Endpoint

Allows you to create an endpoint that combines a list of other endpoints and executes them in parallel.

{
  "uri": <String>,
  "httpMethod": HTTP Method,
  "type": "composite",
  "endpoints": [
    {
      "id": <Endpoint ID>,
      "tag": <String>
    }
  ],
  "timeout": "{Duration}"
}

API

Endpoint

<HttpMethod> /api/<uri>

Allows you to run queries through the Discovery API depending on the configured endpoints set up.