What questions did you set to answer in this study?
ChatGPT, a new language processing tool driven by artificial intelligence (AI), can provide conversational text responses to questions and generate valuable information for interrogating individuals, but not for medical questions. The quality of the answers ChatGPT generates is currently unknown.
What method/approach did you use?
We selected eight general colonoscopies from the public web pages of three hospitals randomly selected from the US News & World Report list of the top 20 best hospitals for gastroenterology and gastroenterology. I have extracted the relevant questions and answers.
We entered these questions as ChatGPT prompts twice on the same day and recorded the answers generated by ChatGPT.
Plagiarism detection software was then used to compare text similarity between all responses. Finally, to objectively interpret the quality of the responses ChatGPT generated, four gastroenterologists tested 36 random question-answer pairs on his 7-point scale with respect to the following quality indicators: evaluated in
- (1) Clarity
- (2) scientific validity
- (3) Satisfaction with responses
Raters were also asked to interpret whether the responses were AI-generated.
what did you find?
The ChatGPT responses had very low text similarity compared to the responses on the hospital’s web page, while the text similarity between the two ChatGPT responses ranged from 28% to 77%.
ChatGPT responses were rated similarly to non-AI responses by gastroenterologists for comprehensibility, but had higher average AI scores than non-AI scores. Scores for scientific validity and satisfaction with responses were similar. Only 48% of his raters accurately judged the answers provided by ChatGPT.
This study demonstrates that a conversational AI program derived from a modern large-scale language model can provide comprehensible, scientifically sound, and generally satisfactory answers to common questions about gastrointestinal endoscopy. This is the first study of its kind to demonstrate that it can deliver.
Such programs could help optimize clinical communication to patients, especially in high-volume procedures such as colonoscopies. Conversational AI, powered by large-scale language models like ChatGPT, has the potential to transform and benefit shared decision-making by patients and physicians.
What is the impact?
Future studies should explore patient questions and responses to a broader sample of clinical presentations and include both patients and physicians as raters.
