...
{
"servers": [
{
"host": "localhost",
"port": 8125
}
],
"connectTimeout": 1000,
"readTimeout": 1000,
"sourceField": [
"tika",
"other"
],
"model": "sentence-transformers/multi-qa-MiniLM-L6-cos-v1",
"multiSourceFieldSeparator": " ",
"output": "bertService",
"chunkExpansion": {
"append": "fieldA",
"prepend": "fieldB",
"separator": " - "
},
"chunkerType": "SIMPLE",
"single": true,
"maxChunks": 10,
"minChunkSize": 25,
"maxChunkSize": 100,
"removePunctuation": true,
"breakOnBlankLine": true,
"lineLengthThreshold": 100,
"htmlTags": [
"p"
],
"name": "BERT Service Processor",
"active": true,
"id": "b25f9a02-a8ca-471c-858e-51853c9e76a6",
"type": "tika-processor"
}
...
(Optional, String/List) field with the text. Default is "cleanContent". If multiple fields are provided, they will be concatenated with an empty space before the chunk process.
model
(Optional, String) name of Hugging Face model used to encode chunks. If not provided, Hugging Face Service uses the default one. If using bert-service as underlying model, this parameter can be ignored.
multiSourceFieldSeparator
...