BERT Train Processor
Bert Train Processor
This processor uses the text received to add new vocabulary and train the designated model. To do this the processor makes use of HuggingFace Service.
Training a model takes a considerate amount of time, reason why the execution of this processor will probably take longer than expected.
Configuration
Example configuration in a processor:
{
"trainingBatch": 2,
"trainingBatchSize": 1000,
"model": "C:\\dev\\model",
"active": true,
"type": "bert-train-processor",
"sourceField": "text",
"servers": [
{
"port": 8888,
"host": "localhost"
}
],
"trainConnectTimeout": "PT30s",
"trainReadTimeout": "PT5m",
"timeInterval":"PT10s",
"name": "BERT trainer",
"id": "50dcc5e2-1fcd-40bc-82b7-a257b0ec38ed"
}
Configuration parameters:
trainingBatch
- Required, Int
Batch to be used while training the model.
trainingBatchSize
- Required, Int
Maximum amount of data to send to HuggingFace Service.
model
- Required, String
Path to the model to train, take into account that the model in the specified path will be overwritten by the trained model.
sourceField
- Required, String
Field containing the text that will be used to add vocabulary and train.
servers.host
- Required, String
Host where HuggingFace Service is located.
servers.port
- Required, Int
Host port where HuggingFace Service is located.
trainConnectTimeout
- Optional, String
Timeout to connect to the server, expressed in Duration type format.
trainReadTimeout
- Required, String
Timeout to read from the server and wait for training to be over, expressed in Duration type format.
timeInterval
- Required, String
Time that the component will wait before checking if training is over, expressed in Duration type format.
Example Output
In the specified model path the following files should be updated:
- added_tokens.json
- config.json
- pytorch_model.bin
- special_tokens_map.json
- tokenizer.json
- tokenizer_config.json
- vocab.txt
©2024 Pureinsights Technology Corporation