ChatGPT Disrupted: Evolving Vision of AI Renews Need for Trusted and Governed Data

Access to and adoption of artificial intelligence (AI) by organizations is more prevalent than ever, but many companies struggle with how to manage their data and overall processes. As companies open the Pandora’s Box of this new capability, they must be prepared to manage data inputs and outputs in a secure manner or risk private data being used in public AI models.

Through this evolution, it is important for companies to think of ChatGPT as a public model built to expand and grow its use through advanced learning models. Private instances will soon be available and models to answer prompt questions will be generated only from selected internal data. Therefore, it is important for companies to determine where public use cases are appropriate (such as non-confidential information) and what they need. Private instances (e.g. internal and/or confidential corporate financial information and other data sets).

It’s all in. . . But what about your data?

The popularity of recently launched AI platforms, such as Open AI’s ChatGPT and Google Bard, is fueling a proliferation of AI use cases. Organizations envision a future in this space where AI platforms will make enterprise-specific data available in a closed environment, rather than using the global ecosystem that is common today. AI relies on large datasets as input to help create output, but is limited by the quality of the data consumed by the model. This appeared during the initial test release of Google Bard and provided an inaccurate answer to the facts about James’ Webb Space Telescope, based on reference data captured. Individuals often go straight to the end goal (implementing automated data practices) without taking the necessary steps of discovering, ingesting, transforming, sanitizing, labeling, annotating, and combining key data sets. want to proceed. Without this critical step, AI can generate inconsistent or inaccurate data, and organizations can take risky bets on leveraging unvetted insights.

Through data governance practices such as accurately labeled metadata and authoritative parameters for ownership, definition, calculation, and use, organizations can organize and maintain data in ways that can be used for AI initiatives. Understanding this challenge, many organizations are now focused on how to properly collect the most useful data in a way that is easily obtained, interpreted and used to support their business operations.

Storage and retrieval of managed data

Influential technologies such as natural language processing (NLP) enable you to get responses based on conversationally asked questions and standard business requests. This process parses the request into meaningful components and ensures that the appropriate context is applied within the response. As technology evolves, this capability will allow enterprise-specific vocabularies to be considered and processed through the AI platform. One application of this might be related to defining company-specific attributes for certain words to ensure that the meaning matches the nomenclature agreed upon by the organization ( (e.g., how you define “customer” for your organization and what is your broad definition of “customer”). Enforced through AI Response. For example, an individual may be asked to “produce a report highlighting the most recent earnings by department for the last two years. All metadata will be applied.”

Historically, making this request required an individual to translate the question into a query that could be retrieved from a standard database. AI and NLP technologies can now process both requests and their underlying results, interpreting data and applying it to business needs. The main challenge, however, is that many organizations do not have their data in a way or format that can be stored, retrieved, and consumed by AI. This is usually due to individuals taking non-standard approaches to obtaining data and making assumptions about how the data is obtained. use a dataset.

Setting and defining key terms

A key step in achieving high-quality output is organizing your data in a way that can be properly interpreted by an AI model. The first step in this process is to ensure that the proper technical and business metadata are in place. The following aspects of data must be recorded and made available:

Definition of terms
Calculation basis (if applicable)
Lineage of the underlying data source (upstream/downstream)
quality parameters
Use/affinity mentions within the business
possession

The above criteria should be used as a starting point for how captured fields and tables can be enhanced to enable proper business use and application. Accurate metadata is important so that private algorithms can be trained to highlight the most important datasets that contain relevant and reliable information.

A metadata dictionary with good processes for data updating and validation practices helps drive consistent data usage and maintains a clean and usable data set for transformation efforts.

Understand use cases and applications

Once the appropriate information related to the underlying dataset foundation has been recorded, it is important to understand how the data will ultimately be used and applied to business needs. Key considerations for data use cases include: documenting the sensitivity of recorded information (data classification); organizing and applying categories associated with a logical data domain structure to data sets (data labeling); To enforce boundaries associated with how we share data, and to destroy data that has been stored (data retention) and is ultimately no longer needed, or where a request to delete data has been presented and legally mandated; define a protocol for (data deletion).

Understanding the correct use and application of the underlying datasets to make good decisions about other uses of the data and areas the organization wishes to stay out of based on strategic direction or legal and/or regulatory considerations becomes possible. guidance. Additionally, the storage and maintenance of business and technical metadata allows the AI platform to customize the content and responses it generates, ensuring your organization receives both customized question processing and relevant response analysis. can. This will ultimately enable enterprise-specific language processing capabilities.

Prepare Now for What Happens Next

More than ever, it is critical to put the right parameters in terms of how and where data is stored to ensure that the right datasets are retrieved by human users while enabling the growth and realization of future AI use cases. has become important. The concept of AI model training relies on clean data that can be applied through governance of the underlying data set. This places even greater demands on proper data governance to ensure valuable datasets are leveraged.

This change has greatly accelerated the need for data governance. Data governance is a “nice-to-have” or an afterthought “must-have” capability that allows organizations to stay competitive and be seen as a truly transformative capability. Some may be considered They use their most valuable asset, their data, both within their own operations and with customers in highly data-rich environments. AI should reinforce the old adage “garbage in, garbage out”, allow imperfections in the data flowing into the model to potentially become part of the output, and strengthen data governance controls. further emphasizes the importance of

read the results of Protiviti Global Technology Executive Survey: A Tug of War Between Innovation and Tech Debt

connect with the author

Will Schumann
Director of Technology Consulting Division

Source link