Join an event that enterprise leaders have been trusted for nearly 20 years. VB Transform brings together people who build real enterprise AI strategies. learn more
The generative AI boom provided a powerful language model that could be written, summarized and inferred over a vast amount of text and other types of data. However, when it comes to valuable predictive tasks such as predicting customer churn and detecting fraud from structured relationship data, businesses remain in the traditional world of machine learning.
Professor Stanford and co-founder of Spider Eye, Jule Lescobeck, claims this is a significant missing piece. His company's tool, Relational Foundation Model (RFM), is a new kind of pre-trained AI that brings the “zero shot” capabilities of large-scale language models (LLM) to a structured database.
“It's about predicting things you don't know, things that haven't happened yet,” Lescobec told VentureBeat. “And that's a fundamentally new ability: what we lack from the current scope of what we consider to be Gen Ai.”
Why Predictive ML is a “technology from 30 years ago”
LLMS and retrieved generation (RAG) systems can answer questions about existing knowledge, but are fundamentally retroactive. They search for and infer information already there. For predictive business tasks, companies still rely on classic machine learning.
For example, to build a model that predicts customer churn, businesses must hire a team of data scientists who spend quite a long time doing “functional engineering.” This is the process of manually creating a predicted signal from the data. This includes competing for complex data to create a single, large training table, joining information from various tables, such as customer purchase history and website clicks.
“If you want to do machine learning (ML), I'm sorry, but I've been stuck in the past,” Lescobeck said. The expensive and time-consuming bottlenecks prevent most organizations from making their data really agile.
How Kumo generalizes database transformers
The spider's approach, “Relational Deep Learning,” avoids this manual process with two important insights. First, we automatically represent a relational database as a single interconnect graph. For example, if you have a “user” table that records customer information and an “order” table that records customer purchases in the database, all rows in the user table will be the user node, and all rows in the order table will be the order node, and so on. These nodes are automatically connected using existing database relationships such as foreign keys to create rich maps of the entire dataset without manual effort.

Second, spiders generalized the trans architecture, the engine behind LLMS, to learn directly from this graphical representation. Transformers are excellent at understanding token sequences by using a “caution mechanism” to weigh the importance of different tokens related to each other.
Kumo's RFM applies this same attention mechanism to graphs, allowing multiple tables to simultaneously learn complex patterns and relationships. Leskovec compares this leap with the evolution of computer vision. In the early 2000s, ML engineers had to manually design features such as edges and shapes to detect objects. However, new architectures such as convolutional neural networks (CNNs) can incorporate raw pixels and automatically learn related features.
Similarly, RFM ingests RAW database tables, allowing the network to discover the most predictive signals on its own without the need for manual effort.
The result is a pre-trained foundation model that allows prediction tasks to be performed instantly in the new database. This is known as “zero shot.” During the demo, Leskovec showed how users can enter simple queries to predict whether a particular customer will place an order in the next 30 days. Within seconds, the system returned a probability score and description of data points that led to the conclusion, such as the user's recent activity and lack of it. This model was not trained in the provided database and was fitted in real time through in-context learning.

“We have a pre-trained model that simply points to your data, and it gives you an accurate prediction in 200 ms,” said Leskovec. He added that it was “as accurate as a few weeks of work as a data scientist.”
The interface is designed to be familiar to machine learning experts as well as data analysts who democratize access to predictive analytics.
Promote the future of agents
This technology has great significance in the development of AI agents. Agents need to do more than just a language to perform meaningful tasks within the enterprise. You need to make intelligent decisions based on your company's private data. RFM acts as a prediction engine for these agents. For example, a customer service agent can query the RFM to determine the potential future future value of a customer, and use LLM to coordinate conversations and provide accordingly.
“If you believe in the future of an agent, they need to make decisions that are rooted in personal data. This is how agents make decisions,” explained Leskovec.
Kumo's work refers to a future in which corporate AI is split into two complementary domains. LLM for handling retrospective knowledge of unstructured texts, and RFM for predictive prediction of structured data. By eliminating the bottlenecks of functional engineering, RFM promises to put powerful ML tools in the hands of more businesses, significantly reducing the time and cost of data to decision-making.
The company is releasing a public demo of RFM and will launch a version that allows users to connect their own data in the coming weeks. For organizations that require maximum accuracy, Kumo also offers fine-tuning services to further improve the performance of private datasets.
Source link