Updated July 12, 2024: The Tony Blair Global Change Institute contacted us to say we were oversimplifying the nature of our research into the value of AI in the public sector workforce. A TBI spokesperson said, “We didn't just ask ChatGPT for results,” and that its approach was built “on previous academic papers and empirical research.”
We would also like to mention that Prime Minister Tony Blair himself was fully aware of the methods used to produce the data, stating at the UK's Future Conference that “there is no definitive accuracy to these kinds of predictive figures.” His comments echo the TBI report, which stresses that the LLM “may or may not produce reliable results,” but that anything that attempts to predict the future, whether created by generative AI or by fallible humans, will never be 100% accurate.
TBI says it used the GPT-4 model behind ChatGPT in a more detailed way than previously featured to explore whether AI can save time in the workplace.
“We trained a version of ChatGPT by presenting it with a rubric of rules to help classify tasks that the AI ββcan (and cannot) perform. We refined this rubric on a test dataset of about 200 tasks and cross-checked the output with expert assessments of the AI's current capabilities, including comparing it with the results of recent academic and empirical studies on its impact on specific tasks (e.g., Noi and Chan, 2023) to ensure that the model is robust and produces reliable results based on real-world data.
“Once training was complete, we used LLM to apply the rubric to 20,000 tasks in the ONET dataset, automating our approach and scaling up our results. We then ran multiple robustness checks on the model's output to ensure that the time-savings figures generated were consistent with real-world applications.”
But the essential result is the same: we are still seen as a research lab that produces research papers on the benefits of AI that use a lot of AI to tell us whether it will be a good thing.
Now, with properly trained models and used appropriately, AI could indeed bring a lot of time-saving benefits in the public sector, but given the lack of public trust in the output of LLMs currently in use, I think it's natural that there is a certain amount of skepticism towards this latest report that might not have been there if it had been produced in a more traditional way.
Original story, July 11, 2024: The Tony Blair Global Change Institute, a nonprofit founded by the former British Prime Minister, published a paper (PDF) predicting that AI automation in public sector jobs could save workers one-fifth of their time and significantly reduce labor and government costs. The paper's findings were presented by Tony Blair himself at the opening of Britain's Future Conference in 2024.
There's just one small problem: the prediction was made by a version of ChatGPT, and as experts interviewed by 404 Media for this strange ouroboros report point out, AI may not be the most reliable source of information about how trustworthy, useful, or beneficial it is.
Researchers at the Tony Blair Institute collected data from O*NET based on job-specific descriptors for about 1,000 U.S. occupations, with the goal of assessing which of these tasks could be performed by AI. However, consulting with human experts to define which roles are suitable for AI automation was deemed too difficult a problem to solve, so the researchers instead fed the data into ChatGPT to make predictions.
The problem, as the researchers themselves point out, is that LLM “may or may not produce reliable results.” The solution? Ask the question again, but in a different way.
“We first use GPT-4 to classify each of the 19,281 tasks in the O*NET database along several different dimensions that we believe are important determinants of whether a task can be performed by an AI. These were selected after an initial analysis of an unguided evaluation of the automatability of several example tasks with GPT-4, some of which we struggled to assess.”
“This classification allows GPT-4 to generate prompts with an initial assessment of whether the task is likely to be performable by an AI.”
That is, they determine which jobs AI can improve, conclude that AI would be beneficial, and then have an international figure extol the genius of their conclusion for the rest of the world.
Unsurprisingly, those who have looked into the details of the report are not satisfied with the veracity of its findings. Speaking to 404 Media, Professor Emily Bender of the University of Washington's Institute for Computational Linguistics said:
“This is ridiculous. It's like shaking a Magic 8 Ball and writing down the answer that appears.”
“They're suggesting that if you manipulate GPT-4 in two different ways, the results will somehow be more reliable. It doesn't matter how you mix and remix the synthetic text pushed out of one of these machines; no matter how much you remix it, it's never going to be on sound empirical grounds.”
The findings were reported by multiple news outlets without mentioning ChatGPT's involvement in the paper's predictions. It is unclear whether Big Tony knew that the information he published was based on unreliable data methodology, or whether he actually read the paper in detail himself.
While the researchers here have at least documented the flaws in their methodology, it makes you wonder how much seemingly accurate information based on AI predictions is being created and presented as verifiable fact.
Also, AI-created content is credible enough that it doesn't take any serious research to prove it's not created by an AI. To prove that this article is not an example of such content, here's one spelling mistake: You're welcome.