Version 1.16.0
Release Notes
Release Date: December 4th, 2024
What’s New
For this new version, we've introduced two new components. A Parquet File component that is able to extract and process Parquet files and outputs each row as a new record so that they can be processed individually through the pipeline. Also, a new Thumbnail Generator processor which creates thumbnail images for websites and PDF files. Improvements to our Website Connector, S3 Connector and OpenAI Ingestion processor that give users more processing options for their pipeline. Finally, all Ingestion Content Processors now have the option to set an output batch size, which gives greater flexibility when handling large numbers of records across a pipeline.
Feel free to contact our development team with any questions, suggestions, or feedback. We value your input and are committed to continuously improving our services.
New Features
Components | New Features |
---|---|
S3 Connector |
|
Ingestion Processors |
|
Website Connector |
|
Parquet File Processor |
|
Thumbnail Generator |
|
OpenAI Processor |
|
Bugs Fixed
"States" Endpoint deep clone alters original endpoint processors.
Improvements
URL Connector downloads dynamically created content (Javascript) from HTML Pages.
Staging Repository support for MongoDB Atlas as provider.
RDB Connectors set Seed batch size as driver fetch size.
ElasticSearch hydrator supports partial updates.
Discovery API OpenAI component now has temperature and top_p options available.
MongoDB Hydrator now supports multi-document path.
Published metrics include new tags (environment and host name) for better identification.
Azure Blobs connector connection improvements.
Breaking changes
None.
Others
Discovery 1.16.0 was tested in EKS v1.31 with AWS.
Supported versions: 1.29, 1.30 and 1.31.
Supported Versions
Discovery provides support and bug fixing for the following versions:
Version 1.15.0
Website Connector
Option to add custom Norconex Importer configurations
Support for Forms authentication.
Option to set maximum documents per crawl url.
Html Processor
Option to extract multiple selector matches into a single string or an array.
Staging Hydrator
Hydrate record id is now configurable.
Version 1.14.0
Elasticsearch connector option to crawl an aggregation.
Language Detector option to have multiple source fields.
Version 1.13.0
Discovery 1.12.0 was tested in AKS v1.28 with Azure CNI.
Discovery 1.11.0 was tested in GKE v1.28.4 with GCP.
Deprecated Versions
Releases that are no longer recommended for use and their deprecation dates are listed below.
Version 1.12.0 (Dec 2024)
If upgrading from a version prior to 1.7.0, export the existing configuration, delete the configuration indices (ingestion: seed, processor, and pipeline / discovery: endpoint and processor), and re-import it after upgrade has been performed.
Discovery 1.12.0 was tested in AKS v1.28 with Azure CNI.
Version 1.11.0 (May 2024)
If upgrading from a version prior to 1.7.0, export the existing configuration, delete the configuration indices (ingestion: seed, processor, and pipeline / discovery: endpoint and processor), and re-import it after upgrade has been performed.
Trim UUIDs on API
Discovery 1.11.0 was tested in GKE v1.28.4 with GCP.
Version 1.10.0 (February 2024)
If upgrading from a version prior to 1.7.0, export the existing configuration, delete the configuration indices (ingestion: seed, processor, and pipeline / discovery: endpoint and processor), and re-import it after upgrade has been performed.
Json exception when script processor compilation fails
BatchId is stored as null
OCR component records don't continue in the pipeline
Internal server error importing a zip file
[Search UI] Queries with special regex characters causes page to fail
Scheduler entity is ignoring some properties from the configuration
Website Connector requires a mountpoint to work in k8s
Hugging Face service creates outdated file structure
©2024 Pureinsights Technology Corporation