Eye-popping AI: ChatGPT takes on eye exams with promising results

The application of artificial intelligence (AI) and deep learning (DL) has expanded significantly since 2015, especially in ophthalmology. DL uses ophthalmic data such as optical coherence tomography and fundus photography for image recognition. A key feature of DL has recently been combined with AI for natural language processing (NLP) in ophthalmology, allowing it to interact with human language.

Scientists have developed large-scale language models (LLMs) that generate human-like text. For example, OpenAI has developed ChatGPT, a general-purpose LLM based on the Generative Pre-trained Transformer 3 (GPT-3) series. Several experiments have shown ChatGPT’s overall accuracy to exceed 50%.

Recent ophthalmology A study evaluated the performance of ChatGPT in ophthalmology.

study: Evaluating the performance of ChatGPT in ophthalmology: an analysis of its successes and shortcomingsImage credit: metamorworks / Shutterstock.com

Background

NLP has gained prominence with the recent release of underlying models that can be tuned for specific applications through a process called transfer learning. Foundation models can include billions of parameters due to advances in computer hardware, the availability of large amounts of training data, and transform model architectures.

The LLM, GPT-3, was trained on a large dataset of over 400 billion words of text from the Internet, including articles, books, and websites. Recently, LLMs were evaluated for their ability to understand and generate natural language in medicine. However, in the medical field, the high demand for clinical reasoning, which requires years of training and experience, poses a challenge to the performance of LLM.

In 2022, the performance of PaLM, an LLM with 540 billion parameters, will be evaluated based on performance on the United States Medical Licensing Examination (USMLE) multiple-choice question, revealing an accuracy of 67.6%. Interestingly, ChatGPT was also able to provide insightful explanations to support the responses.

About research

Limited studies have evaluated the performance of LLM in the ophthalmic question-answering space. Given this gap in research, the current study uses two popular question banks, including the OphthoQuestions online question bank and the American Academy of Ophthalmology Basic and Clinical Sciences Course (BCSC) self-assessment program, to assess the We investigated the performance of ChatGPT.

ChatGPT was trained using human feedback, so it does more than just predict the next word. His two versions of ChatGPT were evaluated. The first was released on his January 9, 2023, known as the Legacy model, and his other upgraded model was released on his January 30, 2023. The updated model consisted of “enhanced facts and mathematical functions.”

OpenAI also launched ChatGPT Plus, which offers faster responses. The author used his ChatGPT Plus for analysis, as no previous version was accessible.

Multiple experiments were performed using ChatGPT Plus to establish reproducibility of results. 260 test questions were generated from the BCSC Self-Assessment Program and another 260 from the OphthoQuestions.

Twenty random questions were selected from 13 sections of the standardized Ophthalmic Knowledge Assessment Program (OKAP) exam. ChatGPT’s performance was analyzed based on subject matter, question type, and difficulty level.

Investigation result

Current research demonstrates ChatGPT’s ability to answer OKAP exam questions. During our experiments, we observed significant improvements in ChatGPT performance. ChatGPT Plus showed 59.4% accuracy on the simulated OKAP exam based on the BCSC test set and 49.2% accuracy using the OphthoQuestions test set.

Based on aggregated historical human performance data, humans score 74% on the BCSC question bank. Additionally, a group of ophthalmology residents who completed their training in 2022 scored 63% of him on the OphthoQuestions.

As stated in a recent publication in 2022, ChatGPT’s performance in ophthalmology has improved because it matches the accuracy levels of advanced LLMs for general medical question answering (typically between 40-50%). It is worth noting that it is promising.

The accuracy of traditional models depended on the exam section, regardless of question difficulty or cognitive level. However, this effect is less noticeable in the updated ChatGPT version.

Importantly, ChatGPT performance improved consistently in basic, general medicine, and cornea. This may be due to the vast amount of training data and resources available on the Internet.

ChatGPT performed poorly in ophthalmic pathology, neuro-ophthalmology, and intraocular tumors. These are highly specialized areas and can be challenging even within the ophthalmic community. It should be noted that approximately 40% of patients referred to neuro-ophthalmology and ophthalmic oncology are misdiagnosed.

The updated ChatGPT Plus model showed improved performance in intraocular tumors and pathology compared to the previous version, but unchanged performance in neuro-ophthalmology. Additionally, ChatGPT’s predictions were found to be more accurate the higher the percentage of people who answered a given question correctly. This finding indicates that ChatGPT responses are consistent with the collective understanding of ophthalmology trainees.

Future outlook

In the future, the authors plan to conduct qualitative analyzes to identify areas in need of improvement in the ophthalmic space. ChatGPT’s accuracy can be improved by incorporating other expert underlying models trained on domain-specific sources such as EyeWiki.

Currently, ChatGPT cannot process images, so it cannot be implemented in ophthalmology. A new application programming interface (API) for ChatGPT helps validate this technology and reduce the tedious nature of the process.

Journal reference:

Antaki, F., Touma, S., Milad, D., and others. (2023) Evaluation of ChatGPT performance in ophthalmology: analysis of its successes and shortcomings. ophthalmology.doi:10.1016/j.xops.2023.100324

written by

Dr. Priyom Bose

Priyom holds a Ph.D. He holds a PhD in Plant Biology and Biotechnology from the University of Madras, India. She is an active researcher and an experienced science her writer. Priyom has also co-authored several original research papers published in reputable peer-reviewed journals. She is an avid reader and amateur photographer.