How Precog adds business context to make enterprise data AI-enabled

Extracting data from enterprise tools like Salesforce, SAP Ariba, and NetSuite is relatively easy. Enabling AI models to reason about that data is much more difficult. Just having a large number of tables and columns or a huge multidimensional JSON file won’t help your model infer about that data. What’s missing here is the business context in which the data was generated.

Precog is focused on helping enterprises extract data from Software as a Service (SaaS) API sources and prepare it for use in analytics and AI applications, and today we are releasing new capabilities that bring this business context back into the extraction process.

The challenge of preparing enterprise data for AI

Precog CEO John Feingold said in a pre-launch interview that the manual process of preparing data for AI analysis can take months.

“When you go into an enterprise and start analyzing mission-critical business data, the data tends to be siled into different applications, sometimes over 100 applications within the enterprise,” Finegold says. “And the process of getting that out of those applications is a very manual process of not just extracting and loading it, but giving enough context so that the model can actually understand it.”

Additionally, while large-scale language models (LLMs) are becoming increasingly capable, they are not always reliable when it comes to making inferences on large amounts of data.

“If you ask someone to send all their data to Gemini, not only will it cost a ton of money to chunk it and tokenize it and things like that, but the answer will change every time they call,” Feingold pointed out.

Diagram of Precog's data ingestion platform.

Precog’s data ingestion platform. (Credit: Precog)

How Precog adds business context to your data

To get around all of this, Precog takes a different approach to help customers get more value from their data. If a Precog user wants to configure a new source for use in an AI application, they can now outline their use case (e.g., “I want to understand which customers are making the most money and which customers are losing money”). Precog then uses existing ETL capabilities to examine the data available in your SaaS application, extract only the fields needed for this specific use case, and add the necessary context so that the model can understand the meaning of each of these fields.

The important thing to note here is that Precog never actually passes company data to LLM. Instead, it loads the actual data into the data warehouse and passes only the metadata to the semantic engine.

Generate synthetic questions and build semantic models

One of the great things about the way Precog has built this system is that it uses a separate model to automatically create hundreds of potential questions. You can think of this as synthetic question generation.

As Becky Conning, Precog’s chief product officer, puts it, the idea here is to generate “a matrix of questions that will allow LLM to generate a semantic model that can answer all of these questions.”

Koning argues that all of this is necessary because building huge semantics tied to a single normalized table only answers a very limited set of questions.

Leverage LLM for natural language to SQL queries

On the other hand, including all data also doesn’t work. “If you include all the data, and these applications contain hundreds of thousands of datasets, and each dataset not only represents one table due to the JSON structure, but can also contain a kind of dimensional information when decomposed, Cortex won’t work in that case. In fact, none of these NLQ LLMs will work.”

The advantage of modern LLMs is that they are very good at converting natural language queries to SQL. Therefore, to query data, Precog does not rely on or feed data directly to a model, but instead uses Snowflake’s Cortex NLQ LLM. Although the service could use another LLM, the team said they like Cortex NLQ for this use case.

All in all, this looks like a smart way to leverage LLM’s benefits and use it without trying to shoehorn it into use cases that are far more likely to fail than existing technologies.

Prior to joining The New Stack as Senior Editor for AI, Frederic was the Enterprise Editor at TechCrunch, where he covered everything from the rise of the cloud and the early days of Kubernetes to the emergence of quantum computing.