Researchers are beginning to uncover the strengths and weaknesses of chatbots.
At Oxford University’s Reasoning with Machines Laboratory, we asked a team of doctors to create detailed, realistic scenarios from mild health problems that could be dealt with at home. From regular GP appointments, if you need an A&E trip, or if you need to call an ambulance.
When we gave the chatbot a complete picture, its accuracy was 95%. “They were amazing, in fact almost perfect,” researcher Professor Adam Mahdi told me.
But when 1,300 people were given the scenario of talking to a chatbot for diagnosis and advice, the story was much different.
When accuracy dropped to 35%, it was human-AI interaction that unraveled the situation, external – Two-thirds of the time, people received the wrong diagnosis or care.
Mahdi said to me: “When people talk, they share information gradually, leave things out, and get distracted.”
One scenario described a stroke symptom that causes a brain bleed called a subarachnoid hemorrhage. This is a life-threatening emergency and requires urgent hospital treatment.
But as you can see, subtle differences in how people describe those symptoms to ChatGPT led to vastly different advice.
