Automatic evaluation of the Fazekas scale in fluid inverted recovery MRI for ischemic stroke or temporary ischemic attack using machine learning.

This pioneering study aimed to evaluate the Fazekas scale using Famekas images by developing a two-stage pipeline. In particular, our approach included the spatial pavilistic methodology of Fazekas assessment, employing a three-dimensional deep learning model with predicted probability maps. This is different from previous research²⁸It was dependent on the scalar value of the WMH volume. The probability map contained location information corresponding to the flare image, along with the probability of a continuous range of WMH at each voxel. In the context of evaluation on the Fazekas scale, WMH is assessed based on its location and categorized into periventricular and deep cortical regions, indicating that the location of WMH can affect the evaluation of Fazekas.^9,10. Therefore, it is expected that our approach will harness the spatial relationships of WMH and incorporate detailed probability information at the Voxel level, thereby facilitating more accurate grading of the Fazekas scale.

In patients with ischemic stroke and small vascular disease, WMH is associated with an increased risk of recurrent vascular events, including ischemia and hemorrhagic stroke³⁰. Although it is clinically important to assess the burden of WMH in these patients, distinguishing between WMH and ischemic stroke lesions can be challenging. The combination of Flair and diffusion-weighted imaging (DWI) can help distinguish acute ischemic lesions from WMH. However, its usefulness is limited to subacute to chronic lesions¹⁵.

One study segmenting WMH in patients with acute ischemic lesions using the U-NET model found no significant differences in lesion identification or segmentation between models using FLAIR alone and models incorporating FLAIR and DWI.³¹. Therefore, the authors speculated that CNN-based architectural models can utilize the various features of Flair images to distinguish between WMH and acute ischemic lesions.³¹.

In this study, we successfully segmented the WMH using a simplified method using only flare images in the uresnet architecture. The dice in our study was 0.73, showing good performance comparable to previous stroke patient studies, including model results of 0.61.³²0.76³³and 0.78³⁴and similar to other population studies with results of 0.71 and 0.80²⁸. The recall and accuracy values were comparable to those found in previous studies at 0.73 and 0.74, respectively, compared to 0.74 and 0.56.³¹. We extended our analysis by including five additional deep learning models in WMH segmentation in benchmark comparisons. Our results show that performance varies between metrics, but trade-offs between architectures highlight the importance of choosing a method tailored to your specific clinical requirements.

The model has excellent segmentation performance and has a higher Fazekas scale score. This trend may have occurred due to class imbalances, which leads to an increase in the number of true positive voxels. Furthermore, it can also occur as it is easier to detect larger WMHs, and therefore the inconsistency along the boundary may be less affected. In the past, several studies have used automatic segmentation in WMH for automatic scoring of Fazekas scales.^28,34,35. However, studies on stroke patients still do not use talent images for automated scoring^28,34,35.

In one study assessing predictive values of Fazekas scores from automated segmentation in a general population-based cohort, ROC analysis was performed to separate low WMH burdens from high WMH loads (Fazekas score 0,1) (Fazekas score 2,3). Results revealed a curve (AUC) value of 0.93 in the lesion segmentation tool and an area below 0.94 in the FreeSurfer³⁴. A study using the U-Net-based architecture Vuno Med-Deepbrain showed that, with the exception of stroke, targeting patients with memory complaints, AUC values were 0.921 (normal/moderate/severe), 0.956 (normal/moderate/medium/severe), and 0.960 (normal/mild/moderate/moderate vs.²⁸. Another study involving patients with dementia, and excluding patients with stroke, showed a mean AUROC of 0.80³⁵.

In contrast, in our study, Auroc values were 0.957 (Fazekas score 0 vs. 1, 2, 3), 0.982 (Fazekas score 0, 1 vs. 2, 3), and 1.000 (Fazekas score 0, 1, 2 vs. 3). For the classification task, the AUROC values were 0.972, 0.992, and 1.000. Additionally, the baseline rules and logistic regression-based methods yielded quadratic weighted kappa values of 0.897 and 0.916, respectively. Furthermore, for the 3D CNNs for the regression task and 3D CNNs for the classification task, the kappa values for the quadratic weight were 0.898 and 0.956, respectively. Additionally, our results demonstrated very similar evaluation performance of Fazica in both model development datasets and external validation datasets.

Our study evaluates automated Fazekas score ratings using quadratic weighted kappa values, apart from other studies that evaluated model performance using metrics such as AUC and Auroc. This method allows for a more subtle assessment of agreement. This is particularly important in the context of automated Fazekas score assessments, allowing for improved reliability and interpretability of the findings³⁶. Additionally, quadratic weighted kappa values are useful for handling ordinal variables, allowing for a more accurate representation of contracts³⁶. Therefore, our model offers a higher level of accuracy in Fazekas scale grading.

The Fazekas scaling model revealed that the shallow convolutional layer was consistently active with the intensity of the folded predicted WMH probability map. In contrast, deep convolutional layers close to the output layer showed spatial activation of variants, even in regions with similar intensity levels. Performance differences may not be significant due to ceiling effects that allow for very accurate Fazekas scale predictions with accurate WMHS volume estimation alone. Nevertheless, our results suggest that spatial information contributes to predictions of the Fazekas scale. This may explain the higher predictive performance of the model compared to scalar volume measurements in external test sets. Subgroup analysis further shows that the large presence of ischemic lesions impairs WMH segmentation performance, as evidenced by increased volume prediction errors. This observation highlights the importance of developing or adapting a segmentation approach specifically for patients with significant stroke lesion burden.

Despite the strengths mentioned above, our study had some limitations. First, there is currently no conclusive reference for WMH segmentation and Fazekas evaluation. This is a general limitation in studies using these methods in which segmentation performed by experienced neurologists and visual assessments of the Fazekas scale serve as gold criteria. However, there are inherent limitations in terms of subjectivity and interrator reliability. Second, in our research model, we used only flare images. Incorporating combinatorial information obtained from T1 sequences, DWI, and FLAIR images could potentially improve the performance of automated segmentation and Fazekas scoring. However, nonetheless, our model exhibits higher or equivalent dice, recall, and accuracy values compared to models using DWI and T1 sequences, suggesting that even the inclusion of other sequences plays a role in distinguishing WMH from stroke lesions. Third, we did not directly compare automated segmentation and Fazekas scoring models with previously developed models. Finally, when assessing the Fazekas scale, we did not distinguish between deep cardiac tissue white matter and periencephalic white matter. However, by using a spatial pavilistic model that incorporates both WMH and voxel-by-voxel probabilistic information location, we were able to overcome this limitation and achieve high quadratic weighted kappa values.

In conclusion, our deep learning pipeline demonstrated accurate automated WMH segmentation and Fazekas scale grading in stroke patients. Therefore, this approach provides a convenient way to assess the burden of WMH using only talent images of stroke patients, and may assist in predicting future vascular events.

Source link