At the QCon London 2024 conference, Félix GV from LinkedIn discussed the AI/ML platform that powers the company's products. He specifically took a closer look at Venice DB, his NoSQL data store used for feature persistence. Presenters shared lessons learned from the evolution and operation of the platform, including cluster management and library version control.
LinkedIn has many AI/ML capabilities, including People You May Know, all of which are powered by the company's AI/ML platform, which allows you to ingest, generate, store features, train models, validate them, and infer them. Provide support. The platform is focused on improving the productivity of data scientists and engineers, offering unique, unified end-to-end capabilities to support the development, experimentation, and operation of AI/ML workloads.
Félix GV, principal staff engineer at LinkedIn, provided an overview of the architecture of the AI/ML platform and the key technologies used in the various subsystems. Frame is a virtual function store that supports multiple storage backends: offline (Iceberg tables), streaming (Kafka topics), and online (Venice store, Pinot tables). LinkedIn has open sourced many of Frame's features as the Feather project and recently published a 1.0 release.
LinkedIn's AI/ML Platform Architecture (Source: QCon London website)
The AI/ML platform uses the FedEx subsystem for “production” of functionality. This includes preparation/transformation for feature delivery and pushing feature data to VeniceDB. King Kong is used to train the model and Model Cloud is used to serve the model. Model Cloud provides observability and monitoring, benchmarking and analytics, GPU support, and self-service onboarding.
GV discussed the role and evolution of VeniceDB as a derived data platform created specifically to support online storage for AI/ML use cases. The project was open sourced in September 2022 and has received 800 new commits since then. Venice supports versioning of datasets. This allows you to push large datasets from offline sources and seamlessly switch between dataset versions after an ingestion job completes.
Ingesting data into VeniceDB (Source: QCon London website)
VeniceDB reduces read latency from less than 10 milliseconds (p99) using the default thin client to 10 microseconds using a RAM-backed Da Vinci client that consumes updates directly from data ingestion Kafka topics. (p99). The VeniceDB team ensures strong backward compatibility for the three clients it supports, making it easy for clients to migrate between them if they want to benefit from low-latency reads.
GV shared the challenges and lessons learned by the team running VeniceDB on LinkedIn. The Data Platform team is focused on maintaining control over infrastructure layout and allocation so that we can manage clusters without disrupting client teams. This is especially important given the boundaries of different workloads where either storage or traffic is the limiting factor.
The choice of compression algorithm plays a critical role in storage-constrained workloads, and the team observed that using ZSTD produced significantly better results in most cases. Similarly, when storing embeddings, the serialization protocol significantly reduces memory and compute usage, and the team achieved significant improvements by using a uniquely optimized Avro utility.
Finally, GV mentioned the importance of effective client library versioning. The team has adopted an aggressive old version deprecation policy with automated dependency upgrade promotion/application.