Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

{
  "parser": {
    "metadata": true,
    "key": "/input",
    "contentTypeField": "/metadata/content-type",
    "defaultEncoding": "UTF-8",
    "timeout": "PT1M",
    "output" : {
      "field": "outputFieldName",
      "toStorage": true
    },
    "extraction" : {
      "type" : "xpath",
      "xpathQuery" : "/xhtml:html/xhtml:body//node()"
    }
  },
  "name": "Tika Processor",
  "active": true,
  "id": "b25f9a02-a8ca-471c-858e-51853c9e76a6",
  "type": "tika-processor"
}

...

(Optional, String) Record field with the content type to use during parsing

parser.timeout

(Optional, String) The timeout set on the parsing of each record, expressed as an ISO8601 duration. Defaults to "PT1M", which is 1 minute.

Warning: Each record can take up to 15 more additional seconds to abort the parsing operation when timed out. Take this into account when defining the value for this parameter.

parser.defaultEncoding

(Optional, String) Encoding to use for the extracted text. Default is "UTF-8"

...