CSV Processor

CSV Connector

This connector splits a single record into multiple child records.

Incoming record must contain a reference to a csv (comma separated values) file.

Each line on the csv file will be read and a new child record will be created and enqueued on the current processing pipeline .

All child items will have the same action as the parent item.

API

OpenCSV. A simple library for reading and writing CSV in Java - Version 5.5.2

Configuration

Sample configuration in a processor:

{
  "csv": {
    "key": "fieldWithCSVContent",
    "columns": [
      {
        "name": "columnName1",
        "index": 0,
        "idColumn": true
      },
      {
        "name": "columnName2",
        "index": 1
      },
      {
        "name": "columnName3",
        "index": 2
      }
    ],
    "skipLines": 1,
    "encoding": "UTF-8",
    "csvParser": "RFC4180"
  },
  "name": "{some name}",
  "type": "csv-processor",
  "pipelineId": "{Some Id}"
}

Configuration parameters:

key - Required, string

Name of the record's field containing the csv file content

skipLines - Optional, integer

Number of lines to skip from the begging of the csv file. Useful to skip row with column names

encoding - Optional, string

Encoding to used when reading the csv file. Defaults to the JVM provided default charset.

parser - Optional, string

Name of the parser strategy to use. Supports the following values:

  • RFC4180: Use the RFC4180 standard, which stipulates the use of CRLF pairs to denote line breaks. This avoids breaking line on \n or other characters, even if they are surrounded by double quotes.

If this value is not set, then a default parser will be used.

columns - Required, Json Object

List of objects describing the file's columns and how to map them to the newly created record. Each object has:

  • name - Required, string. Column's Name. It will be used as the fields name on the child document's record.
  • index - Required, integer. 0 based index for the columns position on the file.
  • idColumn - Optional, boolean. Whether the column will should be treated as the child document's id. Only one column must be set to true.

©2024 Pureinsights Technology Corporation