How to better report on artificial intelligence

in the last few months We are flooded with headlines about new AI tools and how they will change society.

some reporters did great job Companies that develop AI will be held accountable, many people are struggling To report this new technology fairly and accurately.

We (investigative writers, data journalists, computer scientists) have first-hand experience investigating AI. We have seen not only the great potential of these tools, but also the great risks.

As the adoption of AI tools continues, we believe that many reporters will encounter AI tools in the near future. So we thought we’d create a short guide summarizing what we’ve learned.

So let’s start by briefly explaining what they are.

Until now, computers have essentially been rule-based systems. That is, if a certain condition A is met, operation B will be executed. But machine learning (a subset of AI) is different. Instead of following a set of rules, computers can be used to recognize patterns in data.

For example, given enough labeled photos (hundreds of thousands, or even millions) of cats and dogs, you can teach a particular computer system to distinguish between the two types of images.

This process, known as supervised learning, can be done in a number of ways. One of the most common techniques used these days is called neural networks. However, although the details differ, all supervised learning tools are essentially just computers learning patterns from labeled data.

Similarly, one of the techniques used to build modern models like ChatGPT is called self-supervised learning, where labels are generated automatically.

Be skeptical of the PR hype

People in the tech industry often Claim Only they can understand and explain AI models and their impact. But reporters should be skeptical of such claims, especially when they come from company insiders and publicists.

Abeba Birhane, an AI researcher and senior fellow at the Mozilla Foundation, said, “Reporters tend to take what authors and modelers say.” “In the end, they just become his PR machines for those tools themselves.”

Analysis of AI news found that found that this is a common problem. Verhein and University of Washington computational linguist Emily Bender suggest that reporters not only provide platforms for AI vendors promoting their technology, but also talk to experts outside the tech industry. For example, Bender recalled reading an article in He AI His Bender that claimed their tools would revolutionize mental health care. “Obviously anyone who has expertise in it is someone who has some knowledge of how the treatment works,” she said.

in the dallas morning newsseries of The story of Social Sentinel, the company has made outlandish claims about its model’s performance, repeatedly claiming that its model can detect students at risk of harming themselves and others from posts on popular social media platforms. But when reporters spoke to experts, they found it was impossible to reliably predict suicidal thoughts from a single post on social media.

Many editors also choose a better image Margaret Mitchell, chief ethics scientist at AI company Hugging Face, said: Inaccurate headlines about AI often influence legislators and regulators, and Mitchell and others must try to fix it.

“Just seeing headline after headline with these exaggerated or inaccurate claims is your sense of what the truth is,” Mitchell said. “You are creating the issues journalists are trying to cover.”

question the training data

After a model is “trained” using labeled data, it is evaluated on an invisible dataset called the test or validation set and scored using some metric.

The first step in evaluating an AI model is to see how much and what kind of data the model was trained on. A model can only perform well in the real world if the training data is representative of the population under test. For example, if a developer used 10,000 pictures of him of puppies and fried chicken to train a model and then evaluated it using pictures of salmon, it might not work. Reporters should be careful when a model trained for one purpose is used for a completely different purpose.

In 2017, Amazon researchers scraped It was used to filter resumes after machine learning models were found to discriminate against women. Who is the culprit? Their training data consisted of the resumes of the company’s past recruits, who were predominantly male.

Data privacy is another concern. In 2019 IBM liberated A dataset containing 1 million faces.The following year, the plaintiffs sued the company for posting their pictures without consent.

Nicholas Diakopoulos, a professor of communication studies and computer science at Northwestern University, recommends that journalists ask AI companies about their data collection practices and whether subjects gave consent.

Reporters also need to consider corporate labor practices. Earlier this year, time OpenAI magazine reported paid Kenyan workers pay $2 an hour to label offensive content used to train ChatGPT. Bender said such harm should not be ignored.

“There is a tendency for narratives like this to basically believe in all the upside possibilities and ignore the actually documented downside,” she said.

evaluate the model

The final step in the machine learning process is for the model to output guesses on the test data and the output is scored. Generally, models are deployed if they achieve a sufficient score.

Companies trying to advertise their models often cite figures like “95% accuracy”. The reporter should dig deeper here and ask if the high scores come only from holdout samples of the original data, or if the model was checked on realistic examples. These scores are valid only if the test data match the real world. Mitchell suggests reporters ask specific questions such as, “How does this generalize in context?” “Was the model tested in the ‘real world’ or outside the realm?”

It is also important for journalists to ask what metrics companies use to evaluate their models, and if they are the right ones. A useful question to consider is whether false positives or false negatives are worse. For example, in cancer screening tools, false positives can lead to unnecessary testing, while false negatives can cause treatable early-stage tumors to be missed.

Differences in metrics can be important in determining model fairness issues. In May 2016, pro publica published The investigation uses an algorithm called COMPAS, which aims to predict the risk of a criminal defendant committing a crime within two years. Reporters found that the algorithm generated twice as many false positives for blacks as for whites, despite having similar accuracy between blacks and whites.

The article sparked a heated debate in academia over competing definitions of fairness.Which should the journalist identify fairness version Used to evaluate the model.

Recently, AI developers have claimed that their models perform well in a variety of situations, not just single tasks. “One of the things that’s happening with AI right now, he said, is that the companies that make AI are claiming that AI is basically all machines,” Bender said. “I cannot verify that claim.”

Journalists shouldn’t believe the company’s claims without any real-world verification.

Consider downstream damage

Knowing how these tools work is important, but the most important thing journalists should consider is how technology is affecting people today. Companies like to brag about the positive effects of their tools, so don’t forget journalists to investigate the real-world harm their tools can cause.

AI models not performing as advertised is a common problem, and some tools have been abandoned in the past. However, by then the damage has already been done. Epic, one of the US’s largest healthcare technology companies, released an AI tool to predict sepsis in 2016. This tool was used all over the world. hundreds of We investigated the proportion of hospitals in the United States without independent external verification. Finally, in 2021, researchers at the University of Michigan will test the tool and found I mean it worked much less well than advertised.rear series of Follow-up survey To statistics news1 year later, Epic Stop We sell versatile tools.

Even if the tool works well, ethical issues arise. Facial recognition can be used to unlock mobile phones, but it’s already being used by businesses and governments to spy on people on a large scale.has been used bar From people entering the concert hall, identification ethnic minorities, and monitor with workers people living in public housingoften without their knowledge.

With a reporter from Lighthouse Reports in March Wired published Investigation of the welfare fraud detection model used by the Rotterdam authorities. Investigations have found that the tool frequently discriminates against women and non-Dutch speakers, sometimes leading to highly intrusive raids on innocent people’s homes by fraud crackdowns. Investigating the model and training data, the reporters also found that the model performed slightly better than random guessing.

“It’s even harder to go looking for exploited workers, artists whose data has been stolen, or skeptical academics like me,” Bender said.

Jonathan Stray, a senior scientist at the Berkeley Center for Human Compatibility AI and former AP editor, says talking to humans who use or are affected by tools is almost always worthwhile. said.

“Find people who are using it, or are going to use it, to do real work and pick up the story, because there are real people trying to get real things done,” he said. said.

“That’s where you find out what reality is.”

Sayash Kapoor, Hilke Schellmann, and Ari Sen each have a Ph.D. in Computer Science from Princeton University.Candidate, professor of journalism at New York University, and computational journalist at New York University dallas morning news. Hilke and Ari are AI Accountability Fellows at the Pulitzer Center.

Source link