Kumo launches KumoRFM-2, a foundational model built to replace traditional enterprise machine learning – Unite.AI

Machine Learning


Kumo announced KumoRFM-2, a next-generation foundational model designed specifically for structured enterprise data. This represents a fundamental change in the way organizations generate predictions from their data warehouses. Unlike traditional machine learning pipelines that require months of feature engineering and custom model development, KumoRFM-2 allows teams to generate predictions instantly using natural language without any training or expertise.

Its core model represents a new category of AI. That is, a relational foundation model that directly manipulates enterprise data structures rather than flattening them into simplified tables. This distinction addresses one of the most persistent limitations in enterprise AI: valuable relationships between datasets are often lost before modeling begins.

From static pipelines to real-time prediction systems

Predictive analytics for businesses has traditionally been time-consuming and resource-intensive. Each new use case, such as churn prediction, fraud detection, or demand forecasting, typically requires a separate pipeline that includes data cleaning, feature engineering, model training, and tuning.

KumoRFM-2 replaces that entire workflow with a single pre-trained system.

Rather than building a model, users define what they want to predict. The model interprets the request, builds the necessary context from the underlying database, and generates predictions in a single pass. This is made possible through a combination of in-context learning and a declarative interface called a predictive query language (PQL), where users express the results they are interested in rather than the steps required to compute them.

The result is a shift from “building a model” to “asking a question,” significantly lowering the barrier to using predictive AI across your organization.

Why relational data is so difficult

Most existing AI systems struggle with structured enterprise data for a simple reason. That means that the data is being handled incorrectly.

Traditional models, including many tabular AI systems and large-scale language models, rely on flattening data into a single table. But in the real world, enterprise data exists as interconnected systems, where customers are linked to transactions, transactions are linked to products, and products are linked to inventory, all evolving over time.

Flattening this structure removes relationships that often contain the most valuable predictive signals. Teams also have to manually recreate these signals through feature engineering, a time-consuming and error-prone process.

KumoRFM-2 avoids this completely by operating directly on a relational database and maintaining connections between tables, timestamps, and entities.

Inside the architecture: How KumoRFM-2 works

The key innovation behind KumoRFM-2 is a hierarchical relational graph transformer architecture that processes data at multiple levels simultaneously.

At the first level, the model analyzes individual tables by combining row and column attention. This allows you to understand how features are related in the table while filtering out irrelevant or noisy data early in the process. Importantly, prediction targets are introduced at this stage. This means that the model is conditioned on the task from the beginning.

At the second level, the model performs graph-based inference across tables. Use foreign key relationships to connect data from different parts of your database, such as linking customer profiles to purchase history and behavioral patterns, and identify signals between tables that would otherwise be lost.

At the third level, the model incorporates cross-sample attention, allowing it to learn from multiple samples simultaneously. This allows us to generalize from a relatively small number of context examples, rather than requiring a complete training dataset.

This step-by-step design is important. This avoids the computational explosion caused by processing all data points simultaneously, while also improving accuracy by filtering out noise before deeper inferences are made.

Contextual learning replaces training

KumoRFM-2 is unique in that it relies on learning in context rather than traditional training.

Rather than training a model for each task, KumoRFM-2 is pretrained once on a large-scale combination of synthetic and real-world relational data. When a user submits a prediction request, the system automatically generates a set of context examples (small subgraphs of the database combined with known results).

These examples serve as guidance for the model, allowing it to infer patterns and generate predictions without updating weights. In practice, this means:

  • No task-specific training
  • No feature engineering
  • No model tuning

The model can achieve state-of-the-art performance with just 0.2% of the data typically required for supervised learning.

Performance across real-world benchmarks

KumoRFM-2 has been evaluated across 41 predictive tasks across industries such as e-commerce, healthcare, social platforms, and enterprise systems.

The model consistently outperforms traditional supervised machine learning approaches, such as engineering ensembles and relational deep learning systems. It significantly outperforms widely used solutions in enterprise benchmarks and improves further with fine-tuning.

The model shows strong robustness beyond raw accuracy.

  • Maintains performance even when large portions of relational links are missing.
  • Handle noisy or incomplete data with minimal degradation
  • Good performance even in cold start scenarios with limited historical data

This resiliency is especially important in enterprise environments where data quality is often inconsistent.

Built for scale: up to 500 billion rows

KumoRFM-2 is designed to operate at the scale of modern data infrastructures.

The system can process datasets of more than 500 billion rows by combining database-native execution with a custom graph engine for high-throughput data access. Rather than moving data to another ML system, computations are pushed directly to where the data resides, whether it’s a SQL database or a cloud data warehouse.

This approach reduces latency, simplifies deployment, and allows organizations to integrate predictive capabilities directly into existing workflows.

Natural language as an interface

Another distinctive feature is the model’s natural language interface.

Users can ask questions such as:

  • Which customers are likely to churn in the next 30 days?
  • Which leads are most likely to convert?
  • Which products will be in high demand?

The system converts these queries into structured prediction logic, runs them against the underlying data, and returns both predictions and explanations.

This not only makes predictive analytics more accessible, but also allows for integration with AI agents to incorporate predictions into automated decision-making workflows.

Towards agent-driven enterprise intelligence

KumoRFM-2 is designed with agents in mind.

Its predictive capabilities can be exposed as modular “skills” that AI agents can invoke as part of larger workflows. This turns predictive modeling into a composable building block that can be combined with search, inference, and execution in autonomous systems.

In this context, models are not just a tool for analysts, but a foundational layer for the next generation of enterprise automation.

Redefining the role of data science

KumoRFM-2 signals a broader shift in the way organizations approach data science.

Instead of building and maintaining dozens of task-specific models, teams can rely on a single, general-purpose system that instantly adapts to new problems. This reduces the need for expertise in feature engineering and model tuning, allowing for faster experimentation and iteration.

For many organizations, this may mean moving from a centralized data science function to a more distributed model with access to predictive insights across multiple departments.

A new category of foundation models

While foundational models are already transforming areas such as language and vision, structured enterprise data remains one of the final frontiers.

KumoRFM-2 is an early example of what a structured data-specific foundational model can accomplish. Introducing a new paradigm for predictive AI by combining relational inference, in-context learning, and natural language interaction.

If widely adopted, this approach could redefine how businesses work with data and transform predictive analytics from a complex, delayed process to a real-time, organization-wide function.



Source link