Experiment to show that the people you are training your bot with are using it • The Register

Machine Learning


Workers hired through crowdsourced services like Amazon Mechanical Turk use large language models to complete tasks, which can negatively impact AI models in the future.

Data is important to AI. Developers need clean, high-quality datasets to build accurate and reliable machine learning systems. However, compiling the best data of value can be tedious. Companies often rely on third parties such as Amazon Mechanical Turk on his platform to perform repetitive tasks such as labeling objects, describing situations, transcribing sentences, and annotating text to low-paid workers. to.

You can clean up the output and feed it into your model, then train the model to reproduce its work at a larger, more automated scale.

Therefore, AI models are built on human labor. In other words, people go to great lengths to provide a mountain of training examples for AI systems that companies can use to make billions of dollars.

But in experiments conducted by researchers at Switzerland’s Lausanne Polytechnic (EPFL), crowdsourced workers used AI systems such as OpenAI’s chatbot ChatGPT to do chores online. concluded.

Training a model on its own output is not recommended. You can see that the AI ​​model is not trained on data generated by humans, but on data generated by other AI models, possibly by the same model. This can lead to disastrous output quality, further bias, and other undesirable effects.

experiment

The researchers employed 44 Mechanical Turk serfs to summarize the abstracts of 16 medical research papers, and found that between 33 and 46 percent of the text passages submitted by the workers used large-scale language models. generated by Crowd workers are often underpaid, and by using AI to automatically generate responses, they can work faster, take on more work and get paid more.

A Swiss team trained a classifier to predict whether submissions from Turks were generated by humans or by AI. The scholars also recorded the workers’ keystrokes to detect whether the serfs copied and pasted the text onto the platform or typed it themselves. It’s always possible that someone will use a chatbot and manually enter the output, but I think it’s unlikely.

“We have developed a very specific methodology that works very well to detect synthetic text in scenarios,” says Manoel Ribeiro, study co-author and PhD student at EPFL. said Mr. register this week.

“Traditional methods try to detect synthesized text ‘in every context,’ whereas our approach focuses on detecting synthesized text in specific scenarios.”

This classifier isn’t perfect at identifying whether someone used an AI system or created their own work. Academics combined classifier output with keystroke data to increase certainty when someone copied and pasted from a bot or created their own material.

Human data is the gold standard because we care about humans

“We were able to validate the results using the keystroke data we also collected from MTurk,” says Ribeiro. “For example, we found that all text that wasn’t copied and pasted was classified as ‘genuine’ by us, suggesting very few false positives.”

The code and data used to run the tests can be found on GitHub.

There’s another reason the experiment is likely not a completely fair representation of the number of workers who are actually using AI to automate crowdsourced tasks. The authors point out that text summarization tasks are better suited to large-scale language models than other types of jobs. This means that a large number of workers using tools such as ChatGPT can skew your results.

The dataset of 46 responses from 44 employees is also small. The worker was paid $1 per summary text he was given, but this may only encourage her use of the AI ​​as well.

Researchers argue that large-scale language models will only get worse if they are increasingly trained on AI-generated fake content collected from crowdsourced platforms. Companies like OpenAI keep exactly how they train their latest models top secret and may rely less on things like Mechanical Turk. That said, many other models may rely on human workers, and as a result bots may be used to generate training data, which is a problem.

For example, Mechanical Turk is marketed as a provider of “data labeling solutions that enhance machine learning models.”

“Human data is the gold standard because we are interested in humans, not large-scale language models,” Riberio said. “I wouldn’t take a drug that’s only been tested in a Drosophila biological model,” he said, for example.

Researchers argued that the responses generated by today’s AI models are typically very bland or trivial and do not capture the complexity and diversity of human creativity.

“Sometimes what we want to study with crowdsourced data is exactly how imperfect humans are,” said co-author of the paper and director of EPFL’s Department of Computer and Communication Sciences. Assistant Professor Robert West said.

As AI continues to evolve, so can crowdsourcing jobs. Riberio speculated that a large language model could replace some workers for a given task. “But paradoxically, human data can be more valuable than ever, so these platforms prevent the use of large-scale language models and ensure that they remain the source of human data. We might be able to implement a way to do that.”

Who knows, perhaps humans might end up working with large language models to generate responses, he added. ®



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *