The cow is out of the barn – steps to reduce bias in generative AI

AI News


“See what you want” is not just a one-size-fits-all warning. This is the current situation where organizations of all kinds are grappling with the ripple effects of generative artificial intelligence (AI) applications such as ChatGPT, Bing, Bard, and Claude. . In the race for market share, companies such as Microsoft and Google have made AI ethics and control far and away his second priority (paywall), exposing users and society as a whole to a new world of risk. clearly exposed to Examples include:

  • Samsung’s confidential meeting notes and new source code were accidentally exposed after being leaked externally via ChatGPT.
  • ChatGPT named a law professor at George Washington University on a list of legal staff academics who assaulted someone and falsely accused the professor of sexual assault.
  • Fundamental to its design, ChatGPT and other large-scale language model (LLM) AIs are free to hallucinate, producing text that is semantically or syntactically plausible, but in fact imprecise or nonsensical. produce errors.

These phenomena can occur individually or together, giving rise to prejudice and disinformation. More than 1,000 tech industry leaders have called for a six-month moratorium on further generative AI development, but that’s too little, too late. The proverbial cow came out of the barn. Businesses must act quickly and forcefully to curb the harm that ChatGPT and LLM are perpetuating with alarming speed.

LLMs are “Black Box” AI Systems

But before that, let’s understand how generative AI like ChatGPT inherently creates bias and disinformation. Most LLM AIs operate on very opaque assumptions and group facts probabilistically. These probabilities are based on what the AI ​​learned from the data it was trained on and how it learned to correlate data elements. However, none of the following details are revealed when using ChatGPT:

  • There is no explanation of how the AI ​​learned or the interpretability of the model.
  • We do not have access to the specific data used or the probabilities derived to decide whether to trust the generative AI.
  • We are not given a disagreement to reason or challenge the results.

Therefore, the hidden probabilities and relationships of LLM-generated AI are neither surfaced nor shared, so it is just another form of “black box” AI cloaked in clever and engaging jokes. It is impossible to understand whether we should trust the output of generative AI to make decisions. Also, it would be a mistake to treat the answer as an absolute truth.

Correlation is not causation

When we question generative AI, we implicitly want to get a causal explanation for the outcome. But machine learning models and generative AI are looking for correlation and probability, not causation. So we humans argue for the interpretability of the model, that is, why the model gave such an answer, and rather than taking the results at face value and risking erroneous correlations. We really need to understand if it’s an explanation or not.

Until generative AI can meet this demand and support scrutiny of its ethical, responsible, and regulatory underpinnings, it will not be possible to provide answers that can significantly impact human well-being and economic outcomes. should not be trusted to provide.

ChatGPT was not trained on reliable data

When data scientists build machine learning models, we study and understand the data used to train them. We understand that data are inherently riddled with bias and representation issues. The accuracy of an LLM-generated AI system is determined by the corpus of data used and its origin.

ChatGPT mines the internet for training data. This is a large amount of data, much of which is of unknown or questionable origin. In addition, this data may be uncontrolled, uncontrolled for data bias, or used without consent. This reality inherently fosters bias, making it impossible to assess the accuracy of her LLM’s responses to questions for which the answers are unknown.

Surprisingly, various levels of inaccuracy can be further amplified by AI itself, or by adversarial data attacks that recreate or insert data to enforce data misrepresentation. All these issues lead to inaccuracies, troubles, ethical concerns and ultimately “response gaps”. This is semantically equivalent to 1+2=3 today, but 1+2=7 tomorrow. Collectively, these constitute unknown potential liabilities and risks.

How to Reduce Bias in Generative AI Systems

In my data science organization, we have used generative AI for over a decade to simulate certain types of effects, creating types of data not seen today. Fraud detection, for example, uses collaborative profiling technology applied to all relevant statistics, probabilities, and carefully filtered data to learn what normal customer behavior looks like to achieve this. Then apply the data generation specification to generate the simulation data. A data scientist might say, “There are 5, 10, or 20 behaviors I want to see in a robustness study. will occur.” Very large transactions occur within 1-2 minutes. ” This is a rule-based generation specification where every step is Auditable and subject to scrutiny.

The generated data is everytime Labeled as synthetic and real data. This will help you understand where your data is allowed to be used in your models and processes. For testing and simulation purposes, treat this as walled data. that’s all; Synthetic data generated by Generative AI does not inform future models. We have contained this generated asset and will not allow it to “run wild”.

Responsible AI practices are essential

The AI ​​industry has worked very hard to build trust in AI through responsible AI efforts, including AI governance and model development standards. Responsible AI includes tenets of Robust AI, Explainable AI, Ethical AI, and Auditable AI, highlighting the fact that AI models are just tools, not gospel. . Statistician George Box said, “All models are wrong, but some are useful.” We’ve long wanted a generative AI that appears to be magically intelligent, but it would be foolish to assume that the answers ChatGPT provides are more than “potentially useful.” , is downright dangerous.

About the author

Scott Zoldi is FICO’s Chief Analytics Officer, responsible for the analytical development of FICO’s product and technology solutions. During his tenure at FICO, Scott was responsible for drafting over 110 of his analytical patents, of which 71 of his have been granted and 46 are pending. Scott is actively involved in the development of new analytics products and big data analytics applications, many leveraging new streaming analytics innovations such as adaptive analytics, collaborative profiling, and self-tuning analytics. Scott has recently focused on the application of streaming self-learning analytics to detect real-time cyber security attacks. Scott serves on his two boards of directors, Software San Diego and Cyber ​​Center of Excellence. Scott completed his Ph.D. He holds a PhD in theoretical and computational physics from Duke University.

About the author: Tiffany Trader

With over a decade of experience covering the HPC space, Tiffany Trader is one of the preeminent speakers reporting on advanced scale computing today.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *