LinkedIn is overhauling its search infrastructure using large-scale language models (LLM) to provide a more intuitive and personalized experience. This change, detailed on the LinkedIn Engineering blog, aims to go beyond simple keyword matching to understand user intent through natural language processing.
Keyword search limitations: traditional keyword matching makes it difficult to understand user intent
LLM Semantic Search: Leverage large-scale language models to better understand your queries
Embedding-based search: Express queries and content in a vector space to explore similarities.
Determining LLM Relevance: Use LLM to assess and rank the quality of search results.
Improve query understanding: Interpret natural language to infer user goals and preferences
Intuitive job/talent search: Users can find jobs and talent more efficiently.
Personalized results: Search results are more in line with your career ambitions
Visual TL;DR
The company introduced AI Job Search and AI-powered People Search, a feature that interprets queries semantically. Rather than relying on exact word matches, these tools can infer a user’s goals and preferences and overcome vocabulary gaps to better match search results to how professionals articulate their career ambitions.
This major upgrade to LinkedIn’s search technology stack leverages LLM to create a semantic search experience. By interpreting natural language and inferring user intent and preferences, searches become more flexible and accurate.
Large-scale semantic search infrastructure
At the core of LinkedIn’s semantic search is a multi-step process. User queries are first processed by a query understanding module to generate embeddings. These embeddings are used for embedding-based retrieval (EBR) on GPUs to identify a broad set of candidate documents.
A subsequent ranking stage uses a cross-encoder small language model (SLM) to narrow down these candidates. The model runs on SGLang and combines query, job, and member features to score relevance and engagement.
To maintain efficiency at scale, the ranking pipeline incorporates a score cache, ranking depth controller, and traffic shaping. These optimizations aim to improve latency and result quality for millions of real-time queries.
The features and job representations fed to SLM are generated via a hybrid inference pipeline that combines extensive offline processing with low-latency nearline systems. Embeds and summaries are saved for on-demand retrieval.
The auction layer then balances user relevance, engagement, and business metrics to ensure optimal results.
Measuring Relevance with LLM Examiners
Ensuring search quality is paramount. LinkedIn uses LLM examiners to measure relevancy at an unprecedented scale that far exceeds manual assessment capabilities.
These judges work with product managers through iterative feedback and score millions of query and document pairs every day. It also generates labeled data, which is essential for training search and ranking systems.
The development of these LLM auditors begins with clear product policies and high-quality “golden” grades from product managers. These grades serve as precise standards that are refined through regular reconciliation sessions between product managers to ensure consistency.
To build a comprehensive dataset, queries are classified and a stratified sample of query-document pairs is graded by a product manager. This meticulous process ensures that LLM examiners accurately reflect the desired search results.
Although state-of-the-art LLMs provide high-quality decisions, their throughput is insufficient for LinkedIn’s needs. For extension, these large models are distilled into smaller 8B parameter-estimated LLMs. Through supervised fine-tuning, these extracted models achieve significant efficiency gains while maintaining high agreement with human judgments validated by Kappa scores.
This scalable LLM judge enables continuous relevancy measurement of search systems, evaluation of experiments, and extraction of student rankings and search models. This workflow continuously monitors system relevancy and supports A/B testing evaluation of the ranking and retrieval subsystems essential for optimizing LLM search relevancy measurements.
Embedded-based search
The search stage efficiently identifies a wide range of potential results. LinkedIn’s system is built on GPU-accelerated embedding-based search (EBR).
The open-source LLM embedding model was fine-tuned to encode queries and jobs into dense vectors. The training leveraged millions of real query-job pairs with relevance labels provided by LLM judges.
This EBR model presents a practical path to deploying LLM components into large-scale real-time search systems, enabling more intuitive AI-powered search technologies.
This model uses a dual-tower architecture to project queries and jobs into a shared semantic space. Training employs a combination of contrastive InfoNCE loss and margin-based ranking loss, augmented with hard positives and negatives mined from LLM-determined data.
Evaluate model performance before integrating into a live serving stack using multiple evaluation pipelines, including counterfactual log analysis and offline KNN simulation.
Query understanding and ranking
An integrated LLM-based understanding layer interprets user intent from free text queries and transforms it into structured signals for both job and talent searches.
Fine-tuned models ranging from 1.5B to 4B parameters meet LinkedIn’s latency requirements while providing highly accurate output. This layer replaces the previous multiple components with a single, robust model.
An intelligent routing layer classifies query types, performs safety checks, and directs queries to either semantic interpretation by LLM or efficient keyword search.
The ranking module uses a small language model (SLM) to estimate the relevance of retrieved jobs and user queries. For job searches, this includes structured job attributes. Member profile information is used to search for people.
With structured prompts, SLM determines the relevance of the matches and generates a logit that is further processed for the final ranking.