Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Ingestion Framework

Custom ETL (Extract, Transform, Load) implementation that facilitates the ingestion and processing of content from different data sources so this content can be made available through the Discovery API.

Terms

Record

Simple unit of content, usually references one document, one user, one row in a table.

Seed

A seed is the origin of records. Can be records from a relational database, a website, Amazon S3, among others.

Pipeline

A finite state machine that defines the order in which processing steps are executed on records

Processor

A unit of work that needs to be done to a record.

Cronjob

Defines a schedule to run a given list of seeds.

Job

A batch of records and unit of work in the PDP.