Chemical Tagger Processor

Chemical Tagger Processor

This processor takes input text from a field and extracts chemical elements using Oscar4 (Open Source Chemistry Analysis Routines) which is an open source extensible system for the automated annotation of chemistry in scientific articles. It can be used to identify chemical names, reaction names, ontology terms, enzymes and chemical prefixes and adjectives, and chemical data such as state, yield, IR, NMR and mass spectra and elemental analyses.

For more information about Oscar4: https://github.com/BlueObelisk/oscar4

Configuration

Example configuration in a processor:

{
  "parser": {
    "key": "text",
    "structureType": "SMILES",
    "output": {
      "field": "output_field"
    }
  }
}

Configuration parameters:

parser.key

(Required, String) field with the text.

parser.structureType (Optional, String) format used to describe the structure of chemicals. Defaults to SMILES.

Available values: SMILES, STD_INCHI, STD_INCHI_KEY, CML.

parser.output.field

(Optional, String) field used for output.

Input/Output examples

Input

{
  "text": "The quick brown ethyl acetate jumps over the lazy bromine"
}

Output

{
  "chemicals": [
    {
      "name": "ethyl acetate", 
      "type": "COMPOUND", 
      "structure": "CCOC(C)=O"
    },
    {
      "name": "bromine",
      "type": "COMPOUND",
      "structure": "[Br]"
    }
  ]
}

©2024 Pureinsights Technology Corporation