Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 27 Current »

Getting Started

Follow this guide to get a simple end to end configuration of the PDP running.

Ingest Some Content

Before you can start searching and see anything of interest, you will need to ingest content so this is available to the Discovery API.

Configure a Seed

A seed is the starting point for any content. Seeds declare a connector to a data source. Let's configure a very simple seed that creates random documents:

[
  {
    "seed": {
      "records": 1000,
      "recordSize": 8096
    },
    "name": "Random Generator",
    "type": "random-generator-connector",
    "pipelineId": "{{ fromName('My Test Pipeline') }}",
    "properties": {
      "index": "random_generated_docs"
    },
    "batchSize": "100"
  }
]

Notice from the above, that we have created a new seed of type "random-generator-connector". This instructs the framework which component should work on this seed and must match a predefined list of existing connectors.

The "seed" element configures specific settings of the "random-generator-connector".

The "pipelineId" references a pipeline that you'll create in the next step.

Finally, the "properties" element offers some global properties that any processor working on this seed can access. In particular, this is setting an "index" property that will be used later on to decide where to store records produced by this seed.

Configure a Pipeline

A pipeline is a finite state machine that will let you define the sequence in which you want to do changes to records.

[
  {
    "name": "My Test Pipeline",
    "active": true,
    "steps": [
      {
        "processorId": "{{ fromName('Persist to Search Engine') }}",
        "action": "hydrate"
      }
    ]
  }
]

The above declares a simple pipeline, with one step that references a processor called "Persist to Search Engine". We could have any steps before or after this one to do any transformations into the content. There is a wide list of pre-existing processors that can be used to customize the way in which contents are transformed before or after persisting to the search engine.

Configure a Processor

A processor is an encapsulated piece of work that needs to occur to content. Maybe you need to rename a field, or you want to make sure all names are upper case, or maybe you want to do your own script to add custom business logic. Either case, you can configure processors as follow:

[
  {
    "bulkSize": 300,
    "servers": [
      {
        "hostname": "localhost",
        "port": 9200
      }
    ],
    "name": "Persist to Search",
    "index": "${index}",
    "type": "elasticsearch-hydrator"
  }
]

The above "type" field references the "elasticsearch-hydrator" processor. This is a processor that takes content and sends it to Elasticsearch efficiently.

There are multiple fields, like "bulkSize", "index" (referenced from global seed properties) and "servers" that are specific to this type of processor. Other processors will take different fields as configuration, and you can find details on how to configure each on their respective User Guides.

Start an Execution

Now that we have a sample seed, pipeline and processor configured, we can start an execution.

curl -X POST http://ingestion-api/seed/${id}?scanType=FULL 

Assuming your Ingestion Admin API uses a DNS entry "ingestion-api" and listens on port 80, then you could do the above cURL request to trigger the start of a seed execution.

Query Content

Once the seed has finished successfully, you can go to the Discovery API and query for content.

The easier way to do this is to use the Demo Search UI available at: http://demoui/.

  • No labels