Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Version History

« Previous Version 24 Next »

Connects to the Staging API to perform either full or incremental crawls based on a change token. Please refer to the documentation of the Staging API for more details.

Pre-Requisites

The bucket that will be looked up must exist on the Staging API.

Configuration

Can be configured as a seed and also as a processor for the lookup action.

Example configuration as the lookup processor:

{
  "type": "staging-connector",
  "action": "lookup",
  "name": "Extract additional data",  
  "sourceId": "/a/field/pointer",
  "nestedSourceField": "/another/field/pointer",
  "targetBucket": "a_bucket_name",
  "connection": {
    "servers": [
      {
        "host": "localhost",
        "port": 8081
      }
    ],
    "connectTimeout": "PT5m",
    "readTimeout": "PT5m"
  }
}

Lookup action

This is configured as a processor and has the following fields:

sourcePath

(Required, String) A JSON pointer on the source document. The source field must be a String, but can be nested within an array.

For example:

{
  "id": "xyz",
  "nested": [
    {
      "id": "1"
    },
    {
      "id": "2"
    }
  ]
}

The sourcePath could be "/id", so it searches for "xyz" in the target bucket. Or it could be "/nested" with a combination of "nestedSourceField" set to "/id", so it would search for "1" and "2" on the target bucket.

If sourcePath is a string, then the target contents are merged with the current document. If sourcePath refers to an array, the array itself is replaced and merged with the contents of the target data.

nestedSourceField

(Required, String) A JSON pointer within the nested object when sourceField is an array.

targetBucket

(Required, String) The name of the target bucket to read data from.

Query Scan

This is configured at the seed and allows queries, filters and aggregations on the target repository. For this to be used you MUST set the "scanAction" on the seed to "query" and also provide a query in the /seed/scan configuration.

Important: The query scan action doesn't support projections or aggregations just yet. See PDP-425 and PDP-427 for details.

Example seed that pulls documents with no "Content-Type" header:

  {
  	"processAction": "process",
  	"seed": {
  		"servers": [{
  			"host": "localhost",
  			"port": 8081
  		}],
  		"connectTimeout": "PT5m",
  		"readTimeout": "PT5m",
  		"scan": {
  			"bucket": "source_bucket_name",
  			"scroll": {
  				"size": 100
  			},
  			"query": {
  				"filter": {
  					"or": [{
  							"exists": {
  								"fieldName": "content.metadata.Content-Type",
  								"present": false
  							}
  						},
  						{
  							"equals": {
  								"fieldName": "content.metadata.Content-Type",
  								"value": ""
  							}
  						}
  					]
  				}
  			}
  		}
  	},
  	"name": "Records missing content type",
  	"active": true,
  	"scanAction": "query",
  	"type": "staging-connector",
  	"batchSize": 25,
  	"properties": {
  		"index": "target_data_source_index_name"
  	},
  	"pipelineId": "{{ fromName('Persist to Target Data Source') }}",
  	"id": "db122b46-9104-48e7-8847-c1f2d720d596"
  }

/seed/scan/query

(Required, JSON) A JSON object with the query specification. This must adhere to the Query Language convention defined in the staging API.

/seed/scan/scroll

(Optional, JSON) A JSON object with the scroll properties.

Known limitations

Lookup action

Changes in the lookup up bucket don't trigger an update of the original document.

  • No labels