
Credit: Dreamstime
It’s been a busy few weeks for Databricks. After releasing a new iteration of Data Lakehouse with a universal table format and introducing Lakehouse Apps, the company announced Wednesday a new tool aimed at helping data professionals develop generative AI capabilities.
New capabilities will be added to the company’s Delta Lake lakehouse, including a proprietary enterprise knowledge engine called LakehouseIQ, new vector search capabilities, a low-code large language model (LLM) tuning tool called AutoML, and open source foundation models . .
The new feature leverages technology from the company’s recent acquisitions (MosaicML this week and Okera in May).
LakehouseIQ enables enterprise search with NLP
The new LakehouseIQ engine aims to enable enterprise users to search for data and insights from Delat Lake without seeking technical assistance from data experts. To simplify data retrieval for non-technical users, the LakehouseIQ engine uses natural language processing (NLP).
To enable NLP-based enterprise search, LakehouseIQ uses generative AI to understand concepts like terminology, data usage patterns, and organizational structure.
This is a different approach than the common technique of building knowledge graphs used by companies like Glean and Salesforce. A knowledge graph is a representation of structured and unstructured data in the form of nodes and edges. Nodes represent entities (people, places, concepts, etc.) and edges represent relationships between these entities.
In contrast, according to Sanjeev Mohan, principal analyst at SanjMo, the LakehouseIQ engine consists of machine learning models that infer the context of data sources and make it available for search with natural language queries.
Enterprise users will be able to access LakehouseIQ’s search capabilities via Assistant in Notebooks and SQL Editor, the company said. Assistants can perform a variety of tasks, such as building queries and answering data-related questions.
Databricks said it is adding LakehouseIQ to many management functions within Lakehouse to provide automated suggestions. This may include notifying the user of incomplete data sets and suggestions for debugging jobs and SQL queries.
Additionally, the company is exposing LakehouseIQ’s APIs so that custom applications that companies develop can leverage its capabilities, said Joel Minnick, vice president of marketing at Databricks.
Assistant powered by LakehouseIQ is currently in preview.
Delta Lake Gets AI Toolbox for Developing Generative AI Use Cases
Databricks said the addition of the Lakehouse AI Toolbox to Lakehouse is intended to support the development of enterprise-generated AI applications, such as creating intelligent assistants. The toolbox consists of features such as vector search, low-code AutoML, a collection of open source models, MLflow 2.5, Lakehouse Monitoring, and more.
“By embedding files that are automatically created and managed in the Unity Catalog, as well as the ability to add query filters for searches, vector search helps developers improve the accuracy of generated AI responses,” says Minnick. He said, adding that the embedding will continue to be updated using Databricks. model service.
Embedding is a vector or array used to give context to an AI model, a process known as grounding. This process eliminates the need for companies to fully train and fine-tune their AI models using enterprise information corpora.
Lakehouse AI also comes with a low-code interface to help companies tune their underlying models.
“AutoML enables technically-savvy developers and non-technical users to fine-tune LLM in a low-code manner using their own enterprise data. A unique model will be completed with data input from within the organization, not the party,” said Minik, emphasizing the company’s open source foundation model policy.
As part of Lakehouse AI, Databricks also offers several underlying models accessible via the Databricks Marketplace. Stable Diffusion, Hugging Face, MosaicML models such as MPT-7B and Falcon-7B will be provided.
MLflow 2.5’s additions, including new features such as Prompt Tools and AI Gateway, are aimed at helping enterprises manage their operations around LLM.
While AI Gateway allows enterprises to centrally manage credentials for SaaS models or model APIs and provide access-controlled routes for queries, the Prompt tool allows data scientists to set It provides a new no-code interface designed to allow you to compare the outputs of different models. Check the number of prompts before deploying to production via Model Serving.
“With AI Gateway, developers can easily swap backend models to improve cost and quality and switch LLM providers at any time,” said Minnick.
Databricks says enterprises can now continuously monitor and manage all their data and AI assets in Lakehouse with the new Lakehouse Monitoring feature, which provides end-to-end visibility into their data pipelines. Added that it will be provided.
Databricks already offers an AI Governance Kit in the form of the Unity Catalog.
Will Snowflake catch up with Databricks updates?
According to Doug Henschen, principal analyst at Constellation Research, Databricks’ new update, specifically targeted at developing generative AI applications within the enterprise, could leave Snowflake behind.
“Both Databricks and Snowflake want their customers to be able to handle all their workloads on their respective platforms, but my guess is that Databricks is already ready to help build custom ML. [machine learning]AI, generative AI models and applications,” Hengsheng said, adding that Snowflake’s generative AI capabilities, such as the recently announced Snowpark Container Services, are currently in private preview.
According to Amalgam Insights principal analyst Hhyun Park, Snowflake is just beginning to build language and generative AI capabilities through its NVIDIA NeMO partnership and acquisition of Neeva.
In contrast, most of Databricks’ features are either generally available or in public preview, analysts say.
According to Gartner analyst Aaron Rosenbaum, Databricks’ new updates may also lead to improved query performance across generative AI use cases, which could be a differentiator from rival Snowflake. .
“Snowflake and Databricks have many mutual customers, and the goal for all of them is to run a variety of SQL queries cheaply, quickly, and easily,” said Rosenbaum.
TagsDatabrickProduct News
