Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Connectors allow Discovery to retrieve data from a given source prior to content processing. Connectors first work in scan mode to detect updates and changes to the known set of records to process. This enables the framework to keep track of all the added, updated and deleted documents to be processed and uses resources efficiently when in this mode. Once scanned, records are processed. Documents or records are examined in batches by any “ingestion processor” in the pipeline associated to the content source. Failed documents or batches of documents are retried automatically to ensure maximum completeness.  

brief3.png

Discovery currently has connectors to the content sources listed below. The list will continue to grow with future versions of the software, and we expect to be able to support ingestion from all of the most popular data sources in all the most common formats. If a connector is not yet available for a given project, a custom connector can be easily developed as a service engagement using Discovery’s connector framework. 

The current connectors for Discovery represent a wide variety of sources, including: 

  • MongoDB Atlas Connector – data from an existing MongoDB Atlas database 

  • URL Connector – to download content from specific URLs 

  • Website Connector – to crawl a website or group of websites based on a starting URL 

  • Azure Blob Connector – data from Microsoft Azure Blog Storage 

  • RDBMS Connector – data from relational databases (via JDBC) 

  • S3 Connector – data from Amazon S3 

  • Udemy Connector – data from the Udemy online course director 

  • Elasticsearch Connector – data from any index or indexes stored in Elasticsearch 

  • OpenSearch Connector – data from an existing OpenSearch index 

  • LinkedIn Learning connector – via 3rd Party 

In addition, Discovery has special connectors for development purposes: 

  • Random Generator Connector – to create random data for scalability and performance testing purposes 

  • Apache Solr Connector – on customer request 

 Connectors for ingesting data from existing search engine indices are useful in cases of migration from one search engine to another, or for enriching an existing index without having to recreate the index from scratch. 

  • No labels