Databricks Invests in Lakehouse, Announces New Generative AI Tools

Applications of AI


Databricks, a data and AI company, has announced a new Lakehouse AI innovation that enables customers to easily and efficiently develop generative AI applications, including Large Language Models (LLMs), directly within the Databricks Lakehouse platform.

Lakehouse AI offers a unique data-centric approach to AI, with built-in capabilities for the entire AI lifecycle and underlying oversight and governance. New features that make it easier for customers to implement generative AI use cases include Vector Search, a curated collection of open source models, LLM-optimized Model Serving, LLM including AI Gateway and Prompt Tools MLflow 2.5 with features, Lakehouse Monitoring, and more.

The demand for generative AI is causing disruption across the industry, imperative for technical teams to build data-driven generative AI models and LLMs to differentiate their services.

However, AI success is determined by data, and it is difficult to apply and maintain clean, high-quality data when the data platform is separated from the AI ​​platform. Moreover, the process of obtaining models from experimentation to production, and the associated model tuning, operationalization and monitoring, is complex and unreliable.

Databricks unifies data and AI platforms with Lakehouse AI, so customers can get faster and better generative AI solutions, from using basic SaaS models to safely training custom models with enterprise data. can be successfully developed. Organizations can accelerate their generative AI efforts by unifying data, AI models, LLM operations (LLMOps), monitoring and governance on the Databricks Lakehouse platform.

“At JetBlue, we inspire humanity through our products, culture and customer service. We have embarked on an AI transformation for years,” said Sai Lavl, senior manager of data science and analytics at JetBlue.

“Databricks has contributed to our AI and ML transformation and helped us build our own LLM. , FAA data feeds, etc. can now be used to make decisions.This implementation has significantly reduced the onboarding time for new users. We are excited about all of the core AI innovations that will enable customers like us to build an LLM on Lakehouse and manage from there.”

Lakehouse AI unifies the AI ​​lifecycle, from data collection and preparation, to model development and LLMOps, to serving and monitoring.

Newly announced features include Vector Search, fine-tuning in AutoML, and curated open source models backed by optimized models for high performance.

Databricks Vector Search allows developers to improve the accuracy of generated AI responses through embedded search. Fully manage and automatically create vector embeddings from files in the Unity catalog and keep them updated automatically through seamless integrated Databricks Model Serving. Additionally, developers can add query filters to provide even better user results.

Databricks AutoML introduced a low-code approach to fine-tuning LLM. A customer can safely fine-tune his LLM using their own company data and own the resulting model AutoML produces without sending data to a third party. Additionally, with MLflow, Unity Catalog, and Model Serving integrations, models can be easily shared within your organization, managed for proper use, delivered to production inference, and monitored.

Databricks has published a curated list of open source models available within the Databricks Marketplace, including instruction-following and summarization models for MPT-7B and Falcon-7B, and stable diffusion for image generation. This makes it easier to get started with generative AI in different environments. Example of use. Lakehouse AI features such as Databricks Model Serving are optimized for these models to ensure the best performance and cost optimization.


Databricks also showcased the innovation of LLMOps with the announcement of MLflow 2.5, the latest release of the popular Linux Foundation open source project MLflow. This is the latest contribution to one of Databricks’ flagship open source projects. MLflow is an open source platform for the machine learning lifecycle with nearly 11 million monthly downloads.

MLflow 2.5 update includes MLflow AI Gateway and MLflow Prompt Tools.

MLflow AI Gateway allows organizations to centrally manage SaaS model or model API credentials and provide access-controlled routes for queries. Organizations can provide these routes to various teams and integrate them into their workflows and projects. Developers can easily swap backend models at any time to improve cost and quality and switch LLM providers. MLflow AI Gateway also enables prediction caching to keep track of recurring prompts and rate limiting to manage costs.

A new no-code visual tool allows users to compare the output of different models based on a series of prompts automatically tracked within MLflow. Integrating with Databricks Model Serving allows customers to deploy relevant models into production.

Additionally, following our release earlier this year, Databricks Model Serving has been optimized for LLM inference with up to 10x lower latency and lower costs. GPU-based inference support is now possible with Model Serving, fully managed by Databricks and providing smooth infrastructure management. Automatically log and monitor all requests and responses to delta tables, ensuring end-to-end lineage tracking through the Unity Catalog. Finally, model services scale quickly from scratch and scale down as demand changes, reducing operating costs and ensuring customers only pay for the computers they use.

Databricks is also introducing Databricks Lakehouse Monitoring to expand its data and AI monitoring capabilities to better monitor and manage all data and AI assets within Lakehouse. Databricks Lakehouse Monitoring provides end-to-end visibility into your data pipelines to continuously monitor, tune, and improve performance without the need for additional tools or complexity. Leveraging the Unity Catalog, Lakehouse Monitoring provides users with deep insight into the lineage of their data and AI assets, ensuring high quality, accuracy, and reliability. Proactive detection and reporting make it easy to identify and diagnose pipeline errors, automatically perform root cause analysis, and quickly find recommended resolutions across the data lifecycle.

“We have reached a tipping point for organizations. AI is no longer an aspiration, it is imperative for organizations to remain competitive,” said Ali Ghodsi, co-founder and CEO of Databricks. increase. “Databricks has been on a mission to democratize data and AI for over a decade and continues to innovate to make Lakehouse the best place to build, own and secure generative AI models.”

Databricks continues to expand its Lakehouse platform and recently announced the general availability of Lakehouse Apps and Databricks Marketplace, LakehouseIQ, new governance capabilities, and Delta Lake 3.0.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *