Open AI Processor
Open AI Processor
This processor uses Open AI's API for embedding and chat completion features (see here).
Action: Process (Embeddings)
Uses Open AI's API embeddings feature (see here) to get a vector representation of a given input.
Configuration
Example configuration in a processor:
{
"type": "open-ai-processor",
"sourceField": "description",
"output": "embeddings",
"multiSourceFieldSeparator": " ",
"model": "text-embedding-ada-002",
"timeout": "PT120S",
"user": "pdp-embeddings",
"backoffType": "exponential",
"backoffInitialDelay": "PT10S",
"backoffMaxRetries": 7,
"credentialId": "b0fa9c0f-912f-4345-88cf-1d78bdf462f1"
}
Configuration parameters:
sourceField
(Optional, String/List) field with the input text. Default is cleanContent
. If multiple fields are provided,
they will be concatenated with the value specified in multiSourceFieldSeparator
.
multiSourceFieldSeparator
(Optional, String) The string that will be used to concatenate multiple source fields if needed. Defaults to
output
(Optional, String) The records field where the embeddings result will be stored, defaults to openAiEmbedding
model
(Optional, String) The Open AI embeddings model to use in requests, defaults to text-embedding-ada-002
timeout
(Optional, String) The timeout on embeddings requests. Expressed in Duration
notation, defaults to PT120S
user
(Optional, String) The user to be included in embeddings requests. If specified as null
, no user will be included
in requests. If not specified at all, defaults to pdp-content-processor
.
backoffType
(Optional, String) The type of backoff to apply to the retries of the check for a passed cooldown. Options are none
,
constant
, and exponential
. Defaults to constant
.
backoffInitialDelay
(Optional, String) The initial delay between backoff checks for a passed cooldown. Expressed in Duration
notation,
defaults to PT10S
backoffMaxRetries
(Optional, Integer) Maximum amount of times a processor tries to check for a passed cooldown before failing a job.
Defaults to 25
ignoreExceedingInput
(Optional, Boolean) Defines if a record text that exceeds the limit of tokens is ignored or truncated. If true
, the
record is ignored. Otherwise, the record text is truncated. Defaults to false
Supported models generations are V1
and V2
, respectively denoted as -001
and -002
in the model ID.
In case of truncating a text, V1
models limit the number of tokens to 2046
, and the limit in V2
models is of 8192
tokens. A token is equivalent to 3
characters.
In case the configured model is from an unsupported version, the default model generation is V2
.
Input/Output examples
- Processor
{
"name": "Embeddings processor",
"type": "open-ai-processor",
"sourceField": "description",
"output": "embeddings",
"user": "pdp-embeddings",
"backoffType": "constant",
"backoffInitialDelay": "PT10S",
"backoffMaxRetries": 25,
"ignoreExceedingInput": false,
"credentialId": "b0fa9c0f-912f-4345-88cf-1d78bdf462f1",
"id": "fa16215c-1ff1-4233-8519-bb763144b7e9"
}
- Input
{
"description": "JavaScript lies at the heart of almost every modern web application, from social apps like Twitter to browser-based game frameworks like Phaser and Babylon. Though simple for beginners to pick up and play with, JavaScript is a flexible, complex language that you can use to build full-scale applications."
}
- Output
{
"description": "JavaScript lies at the heart of almost every modern web application, from social apps like Twitter to browser-based game frameworks like Phaser and Babylon. Though simple for beginners to pick up and play with, JavaScript is a flexible, complex language that you can use to build full-scale applications",
"embeddings": [
-0.0070322175,
0.031458195,
0.02097213,
...,
0.019307466,
-0.037907124
]
}
Empty/Null text to encode
It is required that the text provided in the source fields is not null
or empty. If the text provided is invalid,
the processor will ignore it and the output will be an empty array.
- Processor
{
"name": "Embeddings processor",
"type": "open-ai-processor",
"sourceField": "description",
"output": "embeddings",
"user": "pdp-embeddings",
"backoffType": "constant",
"backoffInitialDelay": "PT10S",
"backoffMaxRetries": 25,
"credentialId": "b0fa9c0f-912f-4345-88cf-1d78bdf462f1",
"id": "fa16215c-1ff1-4233-8519-bb763144b7e9"
}
- Input
{
"description": ""
}
- Output
{
"description": "",
"embeddings": []
}
Action: Completion
Uses Open AI's API chat completion feature (see here) to get a response from a given prompt.
Important Note: Be aware that regardless of the number of records configured per job, a separate request to the OpenAI API will be made for each record. This means that the number of API calls could be equal to the number of records, which might impact your API usage and potentially your billing if you’re using a paid API. Please consider this when configuring and running your jobs.
Configuration
Example configuration in a processor:
{
"type": "open-ai-processor",
"name": "Chat Completion",
"model": "gpt-3.5-turbo",
"promptField": "prompt",
"output": "response",
"temperature": "0.8",
"maxPromptTokens": 8192,
"user": "pdp-chat-completion",
"timeout": "PT120S",
"backoffType": "exponential",
"backoffInitialDelay": "PT10S",
"backoffMaxRetries": 7,
"credentialId": "b0fa9c0f-912f-4345-88cf-1d78bdf462f1"
}
Configuration parameters:
promptField
(String) Field with the prepared prompt for chat completion. The prompt must contain the context if needed.
As a recommendation to add the contents of other fields into the prompt in the prompt field, use a field mapper template or a script processor to insert the values as needed.
output
(Optional, String) The records field where the response will be stored, defaults to chatGptMessage
model
(Optional, String) The Open AI chat completion model to use in requests, defaults to gpt-3.5-turbo
.
See Model endpoint compatibility (see here).
temperature
(Optional, String of a Double) What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Defaults to "0.8"
maxPromptTokens
(Optional, Integer) The maximum amount of tokens that the prompt field can have. Implemented because if it surpasses 8192, the API will not give an answer even though the request was made. Defaults to 8192
.
timeout
(Optional, String) The timeout on chat completion requests. Expressed in Duration
notation, defaults to PT120S
user
(Optional, String) The user to be included in chat completion requests. If specified as null
, no user will be included
in requests. If not specified at all, defaults to pdp-content-processor
.
backoffType
(Optional, String) The type of backoff to apply to the retries of the check for a passed cooldown. Options are none
,
constant
, and exponential
. Defaults to constant
.
backoffInitialDelay
(Optional, String) The initial delay between backoff checks for a passed cooldown. Expressed in Duration
notation,
defaults to PT10S
backoffMaxRetries
(Optional, Integer) Maximum amount of times a processor tries to check for a passed cooldown before failing a job.
Defaults to 25
Input/Output examples
- Processor
{
"name": "Chat Completion Action",
"type": "open-ai-processor",
"credentialId": "b0fa9c0f-912f-4345-88cf-1d78bdf462f1",
"model": "gpt-3.5-turbo-16k",
"promptField": "prompt",
"output": "response",
"temperature": 0.3,
"user": "pdp-chat-completion-all",
"timeout": "PT120S",
"backoffType": "exponential",
"backoffInitialDelay": "PT10S",
"backoffMaxRetries": 7
}
- Input
{
"prompt": "For the title given 'Guest Blog: What is OpenSearch? - Pureinsights', generate questions that can be asked."
}
- Output
{
"prompt": "For the title given 'Guest Blog: What is OpenSearch? - Pureinsights', generate questions that can be asked.",
"response": "1. What is the main purpose of OpenSearch?\n2. How does OpenSearch differ from other search engines?\n3. What are the key features of OpenSearch?\n4. How can businesses benefit from using OpenSearch?\n5. What are the potential challenges or limitations of implementing OpenSearch?\n6. How does OpenSearch ensure data privacy and security?\n7. Can OpenSearch be integrated with existing systems and applications?\n8. What is the process for setting up and configuring OpenSearch?\n9. How does OpenSearch handle scalability and performance issues?\n10. Are there any case studies or success stories of organizations using OpenSearch effectively?"
}
Credentials
This processor requires a special type of credentials due to the authentication to the Open AI API being done through an API token. The credentials also allow some configuration of the request cooldown functionality, which will be explained latter. The processor expects the presence of the following values in the config of the credential:
{
"config": {
"token": "<AnAPIToken>",
"organizationId": "<AnOrgId>",
"requestCooldown": "PT60S"
}
}
Configuration parameters:
token
(Required, String) The API token used to the authenticate to Open AI's API.
organizationId
(Optional, String) An organization id to be included with requests to Open AI's API. If null
, or not present, no
organization id will be included in requests.
requestCooldown
(Required, String) When a request made by a processor with these credentials returns a rate limit error (see
here), requests using this Credential (even in other processors)
will be put in cooldown and thus rejected for the length of this Duration
. The shared request cooldown is,
nonetheless,
limited to processors with the same timeout
configuration value. Defaults to PT65S
And so a full credential that can be used with this processor looks like this:
{
"type": "open-ai-component",
"name": "chat-gpt personal credentials",
"description": "Credentials for ChatGPT",
"config": {
"token": "<AnAPIToken>",
"organizationId": "<AnOrgId>",
"requestCooldown": "PT60S"
}
}
The credential must be of type
open-ai-component
, otherwise the processors using them will fail.
Request cooldown
Open AI processors come with a built-in request cooldown functionality to try to handle and minimize rate limit errors returned by Open AI API. How it works is that once a request made by a processor returns a rate limit error, the requests of all processors that share the same credential config values and timeout config value will be put in cooldown for the length of time specified in the credential. This cooldown can be applied again if, once the first cooldown has passed, a new request receives a rate limit error.
Each individual processor can be configured to check for the passing of this cooldown in a different manner through its backoff policy related config values. Note this backoff affects only how many times and how often the check for the passed cooldown is made. Not how for how long the requests will be rejected because of the cooldown.
It is recommended to have the backoff configured so that the processors only try to check for the ending of the cooldown for a reasonable amount of time, before failing due to the maximum amount of retries. This is because if a rate limit error is received because of a daily token limit or insufficient credit, then the cooldown will just be reapplied again and again and the processor could get stuck.
It is important to note that the request cooldown mechanism isn't perfect, and requests made too closely together in time by two or more processors sharing a cooldown can all receive a rate limit error, thus possibly increasing the time Open AI API's will be returning the same error. That being said, the requests have to be made at almost the same time for this happen, so it shouldn't be common. No matter how many requests receive the rate limit error, only one cooldown will be applied at a time.
Given that the rate limiting on Open AI API's side applies at the organization level first, and API token second, it is recommended to keep as many processors as possible sharing the same cooldown if they share the same API token and organization id in their credentials. Or in other words, it is recommended, if two or more processors share the same API token and organization id, to have them share the same
requestCooldown
value in the credential andtimeout
value in their configs, because then they will have a shared request cooldown and so less rate limit errors will be returned.
Additional Reference
©2024 Pureinsights Technology Corporation