ORLANDO, Fla. — As companies increasingly prioritize their AI efforts, the data needed for AI success is more important than ever.
“To get started with AI, your data, your people, and your technology need to be ready,” said Melody Chien, senior research director for data management at Gartner, in an interview with TechTarget Editorial. “There are a lot of things that need to come together, but the most important thing is data. Without data, there is no AI.”
To address the pressing data needs of AI, many chief data analytics officers and data teams are expanding their responsibilities to include data management specifically for AI processes. However, it is not as simple as applying traditional data preparation practices to the data required by AI systems. Preparing AI-enabled data requires a unique set of rules, tools, and best practices. Many of them were on display this week at Gartner’s Data & Analytics Summit.
What makes AI data preparation unique?
Due to the complexity and autonomy of AI, AI systems require context- and use-case-dependent data, unlike other technology applications.
“AI-enabled data means the data is ready to support specific AI cases,” Chien explained. “A ‘specific’ AI case means that readiness can be highly situational. Data may be ready in one AI case but not in another. You can’t just label it as ‘data is AI-ready.’ You need to consider the larger context, including what exactly you are trying to implement. ”
Roxane Edjlali, senior director of data management strategy at Gartner, said in an interview with TechTarget Editorial that companies and AI vendors alike often don’t carefully consider the data needs of their AI systems. In fact, Gartner’s recent AI-enabled data study found that only 32% of organizations with AI initiatives have an AI data-enabled process in place.
“This means that 68% are not doing it systematically, which is very worrying,” Ejrali added. “If we apply the same ratio, [to AI agent use cases specifically]That would be even more of a concern.
In the age of agentic AI, the need for context in AI becomes even more pronounced. Because AI agents often operate autonomously or with limited human participation capacity, understanding context is essential for them to function properly.
Without AI-enabled data, agents can struggle to discern whether a prompt is being asked in the right context for the right use case, Edjlali said at the Gartner session, “AI-enabled Data: Lessons Learned Become Practices to Follow.”
“Context must be able to provide [an agent] “We need enough information to validate whether it’s working as planned,” Ejrali said in an interview, “and if it’s not working as planned, we can’t expect to get the same level of precision and accuracy that the AI use case was designed for.”
How to make your data AI-enabled
In an interview with TechTarget editorial department, Vice President Analyst Arun Chandrasekaran said.Gartner has identified three key pillars for making data AI-ready: data quality, data integration, and data lineage and classification.
Because AI systems are fed so much data, much of it unstructured, ensuring data quality means going beyond traditional methods, Chandrasekaran explained. Creativity is essential here, including filling data gaps using tools that support data labeling and synthetic data.
Data integration means getting data into the pipeline, he said. This can include many new techniques, such as using AI models to chunk, retrieve, and append metadata as new data comes in. It may also include the use of tools such as model context protocols.
Unstructured data can be viewed as a liability or as an asset.
melody chenGartner Senior Research Director
Data lineage and data classification specify the origin, lifecycle, and characteristics of data. This includes citations, verified sources, and often a context layer. “The context layer includes everything from active metadata management, to the semantic layer, which is the business definition of the data, to the ontologies that understand and represent the relationships that exist between the data, and perhaps even memory, which will be particularly needed as AI agents become more prevalent,” Chandrasekaran said.
Due to the various processes required to ensure AI-ready data, companies can validate the readiness of their data through AI-readiness assessments, such as the 26-point checklist presented in Edjlali’s session.
Ejrali said in an interview that not all companies need to proceed with readiness assessments in the same way. Importantly, whenever embarking on a new AI project, experts and stakeholders must collaborate to define the data needed to execute the use case. From there, your team can move on to a proof of concept and complete readiness assessments appropriate to your use case.
In his session, Edjlali presented a 26-point checklist that organizations can use to ensure their data is AI-ready.
Consider the value of unstructured data
Because many current generative and agent AI applications are multimodal, unstructured data makes up a large portion of the data needed to train and maintain AI systems.
Unstructured data lacks the predefined structure required for easy storage and analysis in traditional databases. You can save as images, audio, PDF, social media, email, etc. Although unstructured data can be more difficult for organizations to store and analyze, it can generate business value when used in AI applications.
“Unstructured data can be viewed as a liability or as an asset,” Chen said in an interview.
In his session, “How to unlock the value of unstructured data for AI: Start by managing your data,” Chien pointed out that 70% to 90% of enterprise data is unstructured. Therefore, Qian said in an interview that he now spends a lot of time educating customers about unstructured data and how to analyze it. But now, with increased literacy and access to tools, organizations can more frequently work with the unstructured data that is essential to many generative and agent AI applications.
Understanding the importance of unstructured data in AI use cases is the first step, but leaders also need to properly manage unstructured data if they want to make that data AI-ready, Chien said. Governance involves multiple steps, including tagging and categorizing unstructured data, along with extensive metadata management.
“The type of processing and metadata required for unstructured data is very different,” Edjlali said in an interview. “If you don’t have enough labels to differentiate everything, it can become more difficult to get accuracy for your use case.”
In his session, Chien discussed steps to manage unstructured data, including tagging and categorizing data.
Prioritize metadata
Chien said in the session that the metadata used to describe unstructured data can be inconsistent. Often, there is no predefined standard or clear ownership of the data, which can pose significant obstacles to AI data preparation.
“Metadata is everything that helps you answer the question, ‘Is my data AI-enabled?’” Edjlali said in an interview. “Questions about what the model is designed to do, who should use it, what kind of agent is it, what use cases are outside of its scope, etc. Metadata tells you where the data is, where the data comes from, and the statistical distribution of the data.”
Importantly, metadata, typically managed through AI model cards, also helps teams identify data drift. This means the AI system can no longer perform its use case properly, Edjlali added. This is the key difference between AI metadata and traditional metadata, which is considered static. “‘Has anything changed?’ That’s the central question you should try to answer,” she said in an interview. “Metadata answers that question.”
Metadata is everything that helps you answer the question, “Is my data AI-enabled?”
Roxanne EdlariGartner Senior Director Analyst
Automate as needed
From metadata collection to operationalization, AI-enabled data practices are too time-consuming to do completely manually. Organizations can use tools to automate certain business processes.
Edjlali said in an interview that there are three main categories of tools to consider. Enterprises should focus on metadata management, data observability, and data governance tools to improve data readiness. When choosing the right AI tool, it’s important to understand your project’s use case and data and metadata needs.
“Customers want simple solutions,” Ezirali said. “They’re looking at technology that solves a problem, so they’d rather have a single vendor solve it for them, but that might not deliver on their promise. This is where your knowledge and context come into play.”
Olivia Wisbey is a site editor in Informa TechTarget’s AI & Emerging Tech group. She has experience covering topics in AI, machine learning, and software quality.