Version 1.16.0

Release Notes

Release Date: December 4th, 2024

What’s New

For this new version, we've introduced two new components. A Parquet File component that is able to extract and process Parquet files and outputs each row as a new record so that they can be processed individually through the pipeline. Also, a new Thumbnail Generator processor which creates thumbnail images for websites and PDF files. Improvements to our Website Connector, S3 Connector and OpenAI Ingestion processor that give users more processing options for their pipeline. Finally, all Ingestion Content Processors now have the option to set an output batch size, which gives greater flexibility when handling large numbers of records across a pipeline.

Feel free to contact our development team with any questions, suggestions, or feedback. We value your input and are committed to continuously improving our services.

New Features

Components

New Features

Components

New Features

S3 Connector

  • New object upload action.

Ingestion Processors

  • All Ingestion content processors now have an option to change the batch size of the output jobs.

Website Connector

Parquet File Processor

  • New Parquet File Ingestion processor that extracts rows from a parquet file and creates a record for each one.

Thumbnail Generator

  • New Thumbnail Generator for Ingestion that is able to create thumbnails for websites and PDF files.

OpenAI Processor

  • New completion action for the OpenAI Ingestion Processor.

Bugs Fixed

  • "States" Endpoint deep clone alters original endpoint processors.

Improvements

  • URL Connector downloads dynamically created content (Javascript) from HTML Pages.

  • Staging Repository support for MongoDB Atlas as provider.

  • RDB Connectors set Seed batch size as driver fetch size.

  • ElasticSearch hydrator supports partial updates.

  • Discovery API OpenAI component now has temperature and top_p options available.

  • MongoDB Hydrator now supports multi-document path.

  • Published metrics include new tags (environment and host name) for better identification.

  • Azure Blobs connector connection improvements.

Breaking changes

  • None.

Others

  • Discovery 1.16.0 was tested in EKS v1.31 with AWS.

    • Supported versions: 1.29, 1.30 and 1.31.

Supported Versions

Discovery provides support and bug fixing for the following versions:

  • Version 1.15.0

    • Website Connector

      • Option to add custom Norconex Importer configurations

      • Support for Forms authentication.

      • Option to set maximum documents per crawl url.

    • Html Processor

      • Option to extract multiple selector matches into a single string or an array.

    • Staging Hydrator

      • Hydrate record id is now configurable.

  • Version 1.14.0

    • Elasticsearch connector option to crawl an aggregation.

    • Language Detector option to have multiple source fields.

  • Version 1.13.0

    • Discovery 1.12.0 was tested in AKS v1.28 with Azure CNI.

    • Discovery 1.11.0 was tested in GKE v1.28.4 with GCP.

Deprecated Versions

Releases that are no longer recommended for use and their deprecation dates are listed below.

  • Version 1.12.0 (Dec 2024)

    • If upgrading from a version prior to 1.7.0, export the existing configuration, delete the configuration indices (ingestion: seed, processor, and pipeline / discovery: endpoint and processor), and re-import it after upgrade has been performed.

    • Discovery 1.12.0 was tested in AKS v1.28 with Azure CNI.

  • Version 1.11.0 (May 2024)

    • If upgrading from a version prior to 1.7.0, export the existing configuration, delete the configuration indices (ingestion: seed, processor, and pipeline / discovery: endpoint and processor), and re-import it after upgrade has been performed.

    • Trim UUIDs on API

    • Discovery 1.11.0 was tested in GKE v1.28.4 with GCP.

  • Version 1.10.0 (February 2024)

    • If upgrading from a version prior to 1.7.0, export the existing configuration, delete the configuration indices (ingestion: seed, processor, and pipeline / discovery: endpoint and processor), and re-import it after upgrade has been performed.

    • Json exception when script processor compilation fails

    • BatchId is stored as null

    • OCR component records don't continue in the pipeline

    • Internal server error importing a zip file

    • [Search UI] Queries with special regex characters causes page to fail

    • Scheduler entity is ignoring some properties from the configuration

    • Website Connector requires a mountpoint to work in k8s

    • Hugging Face service creates outdated file structure

©2024 Pureinsights Technology Corporation