Shivam Verma, a staff machine learning engineer at Spotify, recently shared insights into how the music and podcast streaming giant is adapting its personalization strategy in the era of large-scale language models (LLMs). Speaking at the AI Engineer Europe event, Verma detailed Spotify’s journey from traditional recommendation systems to leveraging LLM for a more nuanced and personalized user experience.
Spotify’s Shivam Verma talks LLM and personalization — from an AI engineer
Visual TL;DR. Traditional Recs evolve in the LLM era. LLM Era enables Semantic ID. Semantic ID enables content/user understanding. Understanding your content/users leads to actionable recommendations. Steerable recommendations enable personalized generation. Spotify’s Shivam Verma talks about Traditional Recs.
Traditional Recs: Multi-stage pipeline for candidate generation, ranking, and scoring
The LLM era: The arrival of large-scale language models opens new avenues for personalization
Semantic ID: Leveraging semantic ID and vector representation of content
Content/User Understanding: LLM helps you understand nuanced content and user preferences
Actionable Recommendation: Move to actionable, context-aware content discovery
From traditional personalization to personalization using LLM
Verma explained that Spotify’s existing recommendation system, called “TradRecs,” has long relied on a multi-step pipeline that includes candidate generation, ranking, and scoring. These systems help deliver personalized playlists, search results, and content feeds across a variety of media types, including music, podcasts, and audiobooks. However, the advent of LLM has opened new avenues for personalization, allowing for a more fluid and context-aware approach.
At the heart of this evolution is how Spotify represents its users and their vast catalog of content. Verma highlighted the use of user embeddings, which are a series of numbers that represent a user’s likes and dislikes. These embeds are the basis for many of Spotify’s personalized products. To bridge the gap between these user expressions and LLM’s language understanding, Spotify employs techniques such as semantic ID and vector embedding.
Leverage semantic ID and vector representations
This process involves creating a vector representation of the content, allowing the LLM to understand not just the words, but the underlying meaning and context. Similarly, user history is converted into semantic IDs and fed to LLM. This approach allows the model to process complex user context, including listening history, explicit prompts, and other implicit signals, to generate more relevant and actionable recommendations.
Verma illustrated this with an example where LLM, with user context such as country, age, and viewing history, can process prompts like “What episode can I listen to next?” and generate personalized recommendations. This differs from traditional systems in that it allows for a more conversational and interactive way to discover content.
The role of the LLM in understanding content and users
Verma emphasized that LLM has been fine-tuned to understand Spotify’s specific catalog and user data. This involves training a model based on Spotify’s vast internal data, including content vectors and user interaction logs. The goal is to enable LLMs to not only understand the meaning of content, but also to more effectively interpret user preferences and context.
It is a shift from a strictly analytical approach to one that incorporates generative capabilities. By transforming user behavior and content metadata into a common semantic space, LLM can generate more creative and personalized recommendations. This includes features such as ‘taste profiles’ where users can provide explicit feedback to further improve the model’s understanding of their preferences.
From “Trad-Recs” to actionable and personalized generated recommendations
Verma concluded by summarizing the transition from traditional recommendation systems to a new era of “tradition-generated” recommendations. He emphasized the key points:
Embedding and semantic ID are important building blocks for generative LLM-native recommender systems.
Soft token approaches show great potential in personalizing LLMs.
Traditional recommenders and sequential modeling remain important for real-world, real-time rankings that complement LLM capabilities.
This evolution aims to give users more control and transparency in their content discovery journey, making the Spotify experience more engaging and personalized than ever before.