Research: AI models cannot reproduce human judgments about rule violations | Massachusetts Institute of Technology News

To improve fairness and reduce backlog, machine learning models are designed to mimic human decisions, such as determining whether a social media post violates harmful content policies. may be

However, researchers at MIT and others have found that these models often do not replicate human judgments about rule violations. If a model isn’t trained on the right data, it can make different and often harsher decisions than humans.

In this case, “correct” data is data labeled by humans who have been explicitly asked if the item violates certain rules. In training, the machine learning model must be presented with millions of examples of this “normative data” so that it can learn the task.

However, the data used to train machine learning models are usually labeled descriptively. That is, humans are asked to identify factual features, for example, whether there is fried food present in a photograph. When using “descriptive data” to train a model to determine rule violations, such as whether a meal violates a school policy against fried foods, the model tends to over-predict rule violations.

This loss of accuracy can have serious implications in the real world. For example, if descriptive models were used to determine whether an individual was likely to reoffend, the researcher’s findings could lead to more stringent judgments than humans would, resulting in , suggest it could lead to higher bail and longer prison sentences.

“I think most artificial intelligence/machine learning researchers assume that human judgments on data and labels are biased, but this result shows it to be even worse. Due to the flaws in the data being used for training, we can’t even replicate the already biased human judgments: Humans can use image and text features if they know that those features will be used in their judgment. This will have a huge impact on machine learning systems in human processes,” says Marzyeh, assistant professor and head of the Healthy ML group at the Computer Science and Artificial Intelligence Laboratory (CSAIL). Ghassemi said.

Garsemi is the senior author of a new paper detailing these findings, which was published today. scientific progress. The paper also includes lead author Aparna Balagopalan, a graduate student in electrical engineering and computer science. David Madras, a graduate student at the University of Toronto. David H. Yang is a former graduate student and now he is the co-founder of ML Estimation. Dylan Hadfield-Menell, Assistant Professor at MIT. Gillian K. Hadfield, Schwartz Raisman Dean of the School of Technology and Society and Professor of Law at the University of Toronto.

label mismatch

This research grew out of another project investigating how machine learning models can justify their predictions. When collecting the data for that study, the researchers noticed that humans may give different answers when asked to provide descriptive or prescriptive labels for the same data.

To collect descriptive labels, researchers ask labelers to identify factual features. Does this text contain obscenities? To collect prescriptive labels, researchers give the labeler a rule and ask whether the data violates that rule, i.e. does this text contain obscenities? Ask if it violates the platform’s explicit language policy.

Surprised by this finding, researchers began user research to find out more. He collected four datasets to mimic different policies, including a dataset of images of dogs that might violate apartment rules against aggressive breeds. We then asked groups of participants to give descriptive or prescriptive labels.

In each case, descriptive labelers were asked to indicate whether three factual features were present in the image or text, such as whether the dog appeared aggressive. Their responses were then used to make decisions. (If a user said the photo contained an aggressive dog, they would have violated the policy.) The labeler was unaware of the pet policy. On the other hand, prescriptive labelers were given a policy banning aggressive dogs and asked if and why each image violated that policy.

Researchers found that humans are significantly more likely to label objects as violations in descriptive environments. The differences they calculated using absolute difference in mean labels ranged from 8 percent for the image dataset used to determine dress code violations to 20 percent for dog images.

“We haven’t definitively tested why this happens, but one hypothesis is that people think differently about rule violations than they think about descriptive data. In general, prescriptive decisions are more lenient,” says Balagopalan.

However, data is typically collected with descriptive labels to train models for specific machine learning tasks. These data are often later reused to train different models that perform prescriptive decisions such as rule violations.

training trouble

To study the potential impact of reusing descriptive data, researchers trained two models to determine rule violations using one of four data settings. They trained one model using descriptive data and another model using prescriptive data and compared the performance.

They found that models trained using descriptive data performed worse than models trained to make the same decisions using prescriptive data. Specifically, descriptive models are more likely to misclassify inputs by incorrectly predicting rule violations. And the descriptive model was even less accurate when classifying objects that human labelers disagreed with.

“This shows that data really matters. When training a model to detect rule violations, it is important to match the training context to the deployment context,” says Balagopalan.

It can be very difficult for users to determine how their data was collected. This information may be buried in the appendices of research papers or not made public by private companies, Ghasemi said.

Improving the transparency of your dataset is one way to mitigate this problem. If researchers know how the data was collected, they know how to use those data. Another possible strategy is to fine-tune a descriptively trained model based on a small amount of standard data. Known as transfer learning, this idea is something researchers hope to explore in future research.

We also want to conduct similar studies with professional labelers such as doctors and lawyers to see if they lead to similar label discrepancies.

“The way to solve this is to transparently acknowledge that if you want to replicate human judgment, you should only use data collected in that environment. You’ll end up with a system that’s very, very, very hard-tuned.Humans perceive nuances and make other distinctions, but these models don’t have that,” Gassemi said. say.

This research was funded in part by the Schwartz Leismann Institute for Technology and Society, Microsoft Research, the Vector Institute, and the Canadian Research Council chain.

Source link