6. AI Services
Large Language Models
Embeddings refers to the representation of a word or some text as a series of floating-point numbers or vector. The means for generating the embeddings has evolved rapidly in the two decades since they were invented, with Word2Vec and Glove coming to the fore around 10 years ago at around the same time that pre-trained embeddings models became available. But the real potential became clear when Google open-sourced their BERT model in 2018, arguably the first Large Language model (LLM). In the 6 years since, there has been an explosion of activity with around 50,000 NLP models available on the popular Hugging Face repository. Many of these are trained or tuned for a specific language domain such as Finance, Pharmacology or Toxic language so they can do a better job at whichever task they have been applied to.
Now that access to extremely powerful computing power is available it’s possible to build language models with more and more data and parameters, and to the models are getting larger and larger as illustrated below:
In fact, in the last 12 months with the launch of ChatGPT and more latterly GPT3.5 turbo and GPT-4 the power of these models has reached the mainstream and popular imagination, and now every business we speak to is interested in how they might benefit. Any consideration of a new search system will have to consider how it will use Large Language Models. This can be a difficult proposition, because the dust has not yet settled and it’s not 100% clear which capabilities will realise the most productivity gains, or which will be most popular and have highest uptake with users. As a general rule of thumb, if Google or Microsoft are doing it, then it will have been A/B tested heavily and the majority of people will be familiar with it.
Vector Search
Alongside the rise of LLMs there has been an effort to make use of Vectors in search, with Elasticsearch, OpenSearch, MongoDB and now Solr all providing a vector field type. This allows a similarity search (often based on the K nearest neighbours’ algorithm but could be cosine similarity or Euclidean distance). This is all still relatively new however, and so there are some shortcomings that are still being ironed out:
Size limitations: There is a limit to the number of input tokens. This varies per model but is somewhere between 500 & 8000 usually, but GPT4 can optionally have a 32k prompt.
Pricing: Using an API based model such as GPT and its relatives can become expensive at search engine scale as it is metered per token.
Combination with keywords: Vector search optimises for recall not precision which is not always preferable and so ways of combining it with keyword search are still being rolled out. Note: Pureinsights at this time would recommend that any given Vector search system deployed would ideally be based on a search engine implementation of keyword search AND vector search co-joined. There are several options on the market now where dedicated vector search systems are popping up. Whilst these may have compelling demonstrations and features, they have not had the level of diligence put on their keyword search capabilities. This leaves them sub-par when it comes to performing blended vector search and keyword search operations together. We strongly believe that the search engine teams will ultimately win this battle technically. It is also suboptimal (in terms of synchronization and storage costs) to pass content to a search engine for keyword searches and a separate vector search database for vector operations by storing the same content twice in two distinct locations.
Using Vector Search to answer questions
To use vector search to answer questions it is first necessary to chunk and ingest the data into a vector store. This could be a search engine or a dedicated vector database.
Then there are three ways to answer questions driven by Vector search:
Extractive Answers:
Perform a vector-based similarity search to find a candidate text chunks
Use a model such as DistilBERT to identify the best answer to the question in some candidate chunks
Evaluate confidence that the answer has been answered correctly
Show the snippet to the end user
Knowledge Graph questions and answers:
Identify entities in the question
Vectorize the query and match it to known ways of querying a database or Knowledge graph
Insert the entities as parameters to the known query
Return the answer
Retrieval Augmented Generation:
Perform a vector-based similarity search to find a candidate text chunks
Use a model such as GPT to prompt it to find the best answer to the question in some candidate chunks.
Evaluate confidence that the answer has been answered correctly.
Show the model response to the end user.
Discovery AI Processors
BERT Service Processor – uses BERT to vectorize chunks of text. BERT is Google’s open-source, transformer-based machine learning technique for natural language processing (NLP)
Huggingface Model runner – exposes a variety of models to perform AI and NLP tasks such as question answering (summarizing and sentiment analysis are also possible).
OpenAI – uses OpenAI to vectorize chunks of text.
Google Vertex AI – On customer demand
AWS AI Services – On customer demand
Discovery can easily incorporate more basic and advanced AI-driven content processors as the state of the art improves.
©2024 Pureinsights Technology Corporation