Basics of (vector) search using AI | Written by Dr. Cameron R. Wolf | March 2024

AI Basics


How the modern AI boom has completely transformed search applications…

Dr. Cameron R. Wolf
Towards data science

32 minute read

March 18, 2024

(Photo by Tamanna Rumi on Unsplash)

The recent boom in generative AI and the emergence of large-scale language models (LLMs) has led many to wonder about the evolution of search engines. Will conversation-based LLMs replace traditional search engines, or will the hallucinatory tendencies of these models make them unreliable sources of information? While the answers to these questions are currently unknown, the rapid adoption of AI-centric search systems such as you.com and perplexity.ai has led to widespread interest in enhancing search engines with the latest advances in language models. It shows that they are gathering. However, ironically, We've been making heavy use of language models within search engines for years.! BERT's proposal [1] This has led to a stepwise improvement in our ability to assess semantic text similarity, and has led to the adoption of these language models by a variety of popular search engines (including Google!). This overview analyzes the components of such an AI-powered search system.

Search and ranking within search engines (created by the author)

Search engines are one of the oldest and most widely used applications of machine learning and AI. Most search engines have two basic components at their core (pictured above).

  • search: From the set of all possible documents, identify a much smaller set of candidate documents that are potentially relevant to the user's query.
  • Ranking: Use more detailed analysis to order the set of candidate documents so that the most relevant documents appear first.

Depending on your use case, the total number of documents to search can be very large (for example, all products on Amazon or all web pages on Google). Therefore, the retrieval component of a search must be efficient. Quickly identify small subsets of documents relevant to user queries.. Once a small set of candidate documents has been identified, more complex techniques can be used. Bigger neural networks, more data, and more. — In the best order…



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *