...
{
"servers": [
{
"host": "localhost",
"port": 8125
}
],
"connectTimeout": 1000,
"readTimeout": 1000,
"sourceField": [
"tika",
"other"
],
"model": "sentence-transformers/multi-qa-MiniLM-L6-cos-v1",
"multiSourceFieldSeparator": " ",
"output": "bertService",
"chunkExpansion": {
"append": "fieldA",
"prepend": "fieldB",
"separator": " - "
},
"chunkerType": "SIMPLE",
"single": true,
"maxChunks": 10,
"minChunkSize": 25,
"maxChunkSize": 100,
"removePunctuation": true,
"breakOnBlankLine": true,
"lineLengthThreshold": 100,
"htmlTags": [
"p"
],
"name": "BERT Service Processor",
"active": true,
"id": "b25f9a02-a8ca-471c-858e-51853c9e76a6",
"type": "tika-processor"
}
...
(Optional, String/List) field with the text. Default is "cleanContent". If multiple fields are provided, they will be concatenated with an empty space before the chunk process.
model
(Optional, String) name of Hugging Face model used to encode chunks. If not provided, Hugging Face Service uses the default one. If using bert-service as underlying model, this parameter can be ignored.
multiSourceFieldSeparator
...
(Optional, Boolean) if only the first chunk should be processed. Default is false
.
chunkerEnabled
(Optional, Boolean) Default is true
. If set to false all the input text will be processed at once within a single chunk.
maxChunks
(Optional, Int) the maximum chunks in a call to the service. Default is 512
.
...