Deep learning segmentation of non-perfused regions from color fundus images and AI-generated fluorescein angiography

In this study, we compared the FA model and two non-FA models in terms of accuracy and uncertainty. We also investigated the clinical utility of synthetic FA generated from color fundus images. As a result, the FA model achieved the highest accuracy, and his other two models also achieved comparable accuracy. In terms of uncertainty, the FA model showed the most stable predictions, whereas the color fundus model showed the highest Monte Carlo uncertainty.

Despite comparable accuracy, predictions from non-FA models were unstable compared to FA models. This may be because some color fundus images lacked visible abnormalities such as hemorrhage. In other words, the accuracy of her NPA predictions using color fundus was inconsistent. If the lesion is visible in both color and FA images, the color fundus model can perform as well as the FA model. However, if the lesion is only visible in FA, the color fundus model performs worse than FA. Furthermore, the unstable image quality of the color fundus images may also have made the color fundus model unstable. Compared to FA, color fundus images are more vulnerable to image artifacts such as camera angle and illumination direction (e.g. shadows). These artifacts can reduce image quality and lower dice scores.

Visual inspection of these error samples revealed two common error scenarios. The first is false positives resulting from bleeding or obscured areas (Samples A and B in Supplementary Figure 1). The root cause of this error may be a conservative annotation policy that encourages erring on the safe side. Including more edge cases in the dataset and adjusting the annotation criteria to accommodate those cases will alleviate this problem. Another scenario is unreliable prediction (sample CF in Supplementary Figure 1). The Dice score is calculated on the binarized output with a threshold of 0.5, so even if the model is able to locate the NPA, weak predictions will disappear and the Dice score will be low. Adding more samples to the training set may allow the model to make bolder predictions, especially in typical cases. Also, heuristic calibration of thresholds that balance the risk of false positives and false negatives may be useful when applying these models to clinical practice in the future. Nevertheless, even with these problems, none of the error samples deviated so much from her true NPA that even the ophthalmologist was confused.

The impact of synthetic FA on accuracy was variable. Depending on the sample, it led to both increases and decreases in dice scores. On the other hand, the synthetic FA consistently reduced the Monte Carlo uncertainty, probably thanks to the image enhancement capabilities of his GAN. As previously mentioned, some color fundus images have quality issues, increasing uncertainty. However, image quality can be improved using GAN. In fact, GANs are often used in image enhancement tasks such as noise reduction and super-resolution.^{15, 16, 17, 18, 19}. In this study, the image quality of the target image (FA) was better than the source image (color fundus), so the GAN model is considered to have acquired the image enhancement ability.

By using synthetic FA, misleading uncertainty estimates can be reduced. This improvement is clinically beneficial. Uncertainty helps clinicians identify areas that require further testing for abnormalities. However, “false alarms” in uncertainty estimates can mislead clinicians into investigating completely normal areas, wasting time. Using synthetic FA for NPA prediction can reduce such false alarms, making uncertainty estimates more reliable and useful.

The integration of generative AI into medical practice offers promising advances, but it is not without risks. One major concern is the phenomenon of “hallucinations,” where AI generates information that doesn't exist, which can lead to misdiagnoses.^20,21. For example, there is a risk that AI models could mask critical abnormalities, preventing patients from receiving timely and appropriate care. This raises serious ethical, legal, and safety issues. An example of this was observed in our study, particularly shown in Figure 5Bc, where the model was sometimes unable to highlight anomalies. This occurs because the GAN model, although trained on images containing NPA, is primarily exposed to the normal part of the FA and is erroneously biased towards producing normal results. You may have done so. Before introducing AI into clinical practice, it is important to rigorously evaluate its benefits and potential harms. Healthcare professionals need to be thoroughly educated about the capabilities and limitations of AI technology. Furthermore, diagnoses and recommendations generated by AI should undergo rigorous review by clinicians to reduce the risk of misdiagnosis. The ethical, legal, and social implications of the adoption of generative AI in healthcare remain important and an underexplored area that requires a deeper understanding of AI capabilities and limitations. We hope that our study brings valuable insights into this ongoing debate.

Although our results suggest that the impact of synthetic data on accuracy is limited, some studies have shown that GAN-generated images are useful for medical imaging tasks such as contrast-enhanced CT synthesis, pathology, etc. report that it has the potential to improve predictive performance in other medical fields.^22,23. Taken together, our results suggest that the effectiveness of GANs is task dependent. Generally, GANs cannot obtain additional information from the patient. You can only improve existing features within an existing image. There must be a practical limitation that it cannot detect what is not there. Therefore, in theory, using her GAN-generated images for downstream tasks would only be beneficial if the downstream model is unable to extract features effectively.

The analysis was limited due to the small size of the dataset, which may have affected the accuracy of the segmentation and GAN models. Collecting additional data not only increases the quantity of the dataset, but also improves its quality. This enhancement arises from the ability to stratify datasets, which facilitates more homogeneous datasets across different stages of the RVO journey (e.g., initial visit and follow-up). Additionally, computational resource constraints required us to evaluate the model using a holdout method. Holdout methods are less robust than cross-validation methods and are not generalizable for small datasets. Therefore, more extensive research is needed to determine the usefulness of synthetic data in medical imaging AI. Future research should aim to develop more stable NPA segmentation models that perform well even when abnormalities are subtle or not immediately visible on color fundus images.

Furthermore, in this study, we investigated the potential utility of synthetic FA images in the limited context of segmenting NPA in RVO patients. However, FA has wide clinical applications beyond his NPA detection and is frequently used in the diagnosis of various retinal diseases, including diabetic retinopathy, age-related macular degeneration, etc. We have only shed light on one of them, and further studies are needed to examine the utility of synthetic FA images in a broader clinical context.

In conclusion, deep learning models can predict NPA with acceptable accuracy from color fundus images alone. These results are promising towards the goal of providing safe and accessible testing for BRVO patients. However, at present, her NPA prediction, which relies only on color fundus images, is unstable and may miss lesions. Further research is needed to overcome this challenge. There are two possible reasons for his unstable performance. First, the color fundus model performs equally well only when there is a visible lesion of her NPA within the color fundus image. If the input image completely lacks indexical features, the model's performance will degrade. Second, the quality of color fundus images is more likely to be compromised than FA by image artifacts such as shadows. The main contribution of GAN-generated FA is image enhancement effects such as noise reduction and brightness adjustment. Although the improvement in accuracy is small, GAN-generated FA reduces “false alarms” in the Monte Carlo dropout uncertainty estimation, making it useful as an indicator to prompt the physician for further examination of a particular portion of the fundus image. Increased clinical utility.

Source link