Version 1.8.0

Release Notes

Release Date: June 6th, 2023

What’s New

We are thrilled to inform you that our Binary Data Storage Service now has compatibility with Google Cloud Storage. Additionally, we have incorporated a brand new functionality called Semantic Title Search to Discovery API. This feature allows us to retrieve results that possess titles with semantic similarity to the search query. Moreover, you have the option to customize the configuration so it highlights either the exact phrase or the specific matching words. If you have any inquiries or suggestions, please don't hesitate to reach out to our development team. Continue to read on for all new features, bug fixes, and improvements.

New Features

Components

New Features

Components

New Features

Binary Data Storage

  • Binary data storage service now supports GCP.

Semantic Title Search

  • When enabled, documents whose titles are similar semantically to the search query are displayed in a carousel in the Search UI.

  • Additional possible configurations:

    • Changing the title for the section.

    • The number of results to be displayed in the carousel.

    • Highlighting matching words or phrases.

Bugs Fixed

  • Remove duplicate log message on progress from the UDEMY connector.

  • Staging Repository renders content type incorrectly in swagger UI.

  • Documents that are too large for Mongo causes job to fail.

  • Fixed memory leak in the tika component.

  • Website Connector can't handle URLs with foreign characters.

  • Request "Get all" is not sorting the entities correctly.

  • The Discovery API does not return a “No Content” status when no entities exist.

  • Core Admin API k8s pod restart due to Micronaut and Logback after v1.6.0

Improvements

  • The website connector uses a binary data server.

  • Ensure checksum generated are consistent when an array of strings are passed.

  • Validate and update the usage of "workDir" and MongoDB in the WebsiteConnector.

  • Update elastic search-hydrator example configuration in the readme.

  • Update the Docker base image for all components.

  • Remove K8 dependency.

Breaking changes

  • If upgrading from a version prior to 1.7.0, export the existing configuration, delete the configuration indices (ingestion: seed, processor, and pipeline / discovery: endpoint and processor), and re-import it after upgrade has been performed.

Others

  • None

Supported Versions

Discovery provides support and bug fixing for the following versions:

  • Version 1.7.0

    • Close record collector in AbstractRequestActionExecutor for errored jobs

    • Incorrect log message from Staging Hydrator

    • Error importing zip

    • [Discovery API] Feedback Atlas default template

    • [Ingestion] Possible memory leakage in Elasticsearch connector

    • ES Connector: Checksum is not hashed before adding a record.

    • Search UI showing more search result pages than required

    • Job in the wrong state should be retried

  • Version 1.6.0

    • Pipeline configuration should not allow null action

    • Ingestion Admin should allow deep cloning multiple times

    • Single seed schedules should not be enqueued if the same seed is already running

    • Discovery API - Endpoints should handle an empty body request

    • Discovery API - Mongo Component should store the response body as JsonNode

    • Discovery API - Snap Component not casting error message when facets field points to an array

    • Breaking changes:

      • When configuring the S3 connector, the region must be in the format AWS requires it.

        • e.g. “us-east-1” instead of “"US_EAST_1"

  • Version 1.5.0 (If you are in this version, plan to upgrade soon)

    • Internal server should not error when adding item with no "body" to Staging repo

    • JsonUtils can not substitute value properties of intNode objects when are into an array

Deprecated Versions

Releases that are no longer recommended for use and their deprecation dates are listed below.

  • Version 1.4.0 (June 2023)

    • Environment variables are not substituted in processors if there are no seed properties

    • Credentials Service cannot deserialize value '_id' from cache

    • Distilbert by HuggingFace Service causes node to reset

  • Version 1.3.0 (April 2023)

    • Discovery API’s Post Component was failing to parse response from elastic

    • Prevent creation of seeds with duplicate name

    • Mongo credential source should default to “admin”

    • Failed to encode 'Credential'. Encoding '_id' errored with: Can't find a codec for class java.lang.Object

    • Import fails with unhelpful error message

    • Aggregation merge should support empty values and also others than just content

  • Version 1.2.0 (March 2023)

    • SSL error when using UDEMY connector has been fixed.

    • Disk cache was not getting cleared when a crawl fails.

©2024 Pureinsights Technology Corporation