...
...
...
...
Table of Contents | ||
---|---|---|
|
Ingestion Framework
Discovery was designed with the cloud in mind: easy to scale and manage, and easy to integrate with other cloud services. Discovery’s content processing architecture includes connectors which scan and process content from diverse sources during ingestion. Discovery also includes the necessary messaging, pipeline management, orchestration, traceability, and publishing services to manage large volumes of data. Kubernetes containerized PODs leverage the latest cloud platform technologies and enables Discovery to easily scale up/down as processing workloads change.
[Image here]
The Ingestion Framework is a custom ETL (Extract, Transform, Load) implementation that facilitates the ingestion, cleansing, normalization, augmentation and digital stitching of content from different data sources so this content can be made available through the Discovery API for use in search applications.
...
Core Components
Admin API
Workflow Manager and Orchestrator
Pipeline Manager
Core Core Components
The core components of Discovery provide the basic root functions for communications, configuration and administration of the platform. This includes:
Binary data server – an intermediate storage to manage the processing of large files prior to uploading to the staging repository.
Asynchronous message delivery with RabbitMQ, an open-source, high-performance message broker
Distributed configuration store– pipeline configurations stored in a distributed data store for performance, failover and fault tolerance.
Distributed traceability store– tracking and storing all actions in Discovery for detailed visibility and analytics.
Admin Admin API
A RESTful JSON API to allow for configuration and complete control over all Discovery features. Through this API, you can create ingestion entities (pipelines, processors, seeds, etc.) and also start, stop and schedule ingestion processes.
...