4. Ingestion Connectors

Connectors allow Discovery to retrieve data from a given source prior to content processing. Connectors first work in scan mode to detect updates and changes to the known set of records to process. This enables the framework to keep track of all the added, updated and deleted documents to be processed and uses resources efficiently when in this mode. Once scanned, records are processed. Documents or records are examined in batches by any “ingestion processor” in the pipeline associated to the content source. Failed documents or batches of documents are retried automatically to ensure maximum completeness.  

connectors.jpg

 

Discovery currently has connectors to the content sources listed below. The list will continue to grow with future versions of the software, and we expect to be able to support ingestion from all of the most popular data sources in all the most common formats. If a connector is not yet available for a given project, a custom connector can be easily developed as a service engagement using Discovery’s connector framework. 

The current connectors for Discovery represent a wide variety of sources, including: 

In addition, Discovery has special connectors for development purposes: 

  • Random Generator Connector – to create random data for scalability and performance testing purposes 

  • Apache Solr Connectoron customer request 

 Connectors for ingesting data from existing search engine indices are useful in cases of migration from one search engine to another, or for enriching an existing index without having to recreate the index from scratch. 

©2024 Pureinsights Technology Corporation