Chemical Tagger Processor
Chemical Tagger Processor
This processor takes input text from a field and extracts chemical elements using Oscar4 (Open Source Chemistry Analysis Routines) which is an open source extensible system for the automated annotation of chemistry in scientific articles. It can be used to identify chemical names, reaction names, ontology terms, enzymes and chemical prefixes and adjectives, and chemical data such as state, yield, IR, NMR and mass spectra and elemental analyses.
For more information about Oscar4: https://github.com/BlueObelisk/oscar4
Configuration
Example configuration in a processor:
{
"parser": {
"key": "text",
"structureType": "SMILES",
"output": {
"field": "output_field"
}
}
}
Configuration parameters:
parser.key
(Required, String) field with the text.
parser.structureType
(Optional, String) format used to describe the structure of chemicals. Defaults to SMILES
.
Available values: SMILES, STD_INCHI, STD_INCHI_KEY, CML.
parser.output.field
(Optional, String) field used for output.
Input/Output examples
Input
{
"text": "The quick brown ethyl acetate jumps over the lazy bromine"
}
Output
{
"chemicals": [
{
"name": "ethyl acetate",
"type": "COMPOUND",
"structure": "CCOC(C)=O"
},
{
"name": "bromine",
"type": "COMPOUND",
"structure": "[Br]"
}
]
}
©2024 Pureinsights Technology Corporation