Metaheuristic hyperparameter optimization of deep neural networks for demographic-aware autism spectrum disorder classification

Machine Learning


This section evaluates the performance of the proposed demographic-aware CNN framework and analyzes the impact of age and gender stratification on ASD classification using structural MRI data. All experiments were conducted using five-fold cross-validation to ensure robustness and reduce partition bias.

Quantitative performance evaluation

The classification accuracies obtained for the three proposed models—gender-based (Method 1), age-based (Method 2), and joint age–gender classification (Method 3)—are summarized in Table 3, alongside the results achieved by several widely used pre-trained transfer learning models trained and evaluated on the same dataset.

As shown in Table 3, the proposed CNN models consistently outperform the compared pre-trained networks across all classification tasks. This demonstrates that models designed and optimized specifically for structural MRI data are better suited for ASD classification than generic architectures originally trained on natural image datasets.

To provide a clearer visual comparison, Fig. 5 illustrates the classification accuracy trends for all methods. The figure confirms the numerical findings reported in Table 3 and highlights the relative performance gaps between the proposed models and baseline approaches.

Table 3 compares the performance of the proposed model to popular state-of-the-art CNN networks.
Fig. 5
Fig. 5

Comparative classification accuracy of the proposed CNN models and widely used pre-trained transfer learning networks for Method 1 (gender-based classification), Method 2 (age-based classification), and Method 3 (joint age–gender classification). The figure summarizes model performance under identical experimental protocols and datasets, highlighting differences in accuracy across demographic-aware classification tasks. Numerical results corresponding to this figure are reported in Table 3.

In addition to the reported average accuracies, the proposed framework demonstrated stable convergence and consistent generalization behavior across training epochs, as illustrated in Fig. 8. The close alignment between training and validation loss curves suggests that the optimized CNN models achieved stable learning without evidence of severe overfitting. Furthermore, the use of five-fold cross-validation under identical experimental conditions reduces sensitivity to individual data partitions and provides a more reliable estimate of robustness across heterogeneous multi-site neuroimaging data.

Per-class performance analysis

While overall classification accuracy provides a general indication of model effectiveness, it may not fully reflect classification behavior under demographic imbalance and multi-class learning settings. Therefore, additional class-wise evaluation metrics, including precision, recall, and F1-score, were computed to provide a more comprehensive assessment of the proposed framework across all demographic-aware classification tasks.

Table 4 summarizes the per-class performance results for the proposed framework under gender-based, age-based, and joint age–gender classification settings. These metrics provide further insight into the robustness of the model when handling classes with varying sample distributions, particularly in the presence of demographic imbalance across the ABIDE cohort.

The obtained results demonstrate relatively consistent performance across most demographic categories despite variations in class frequency. In the gender-based classification task, the proposed framework achieved balanced precision and recall values across ASD and typically developing (TD) groups for both male and female subjects, indicating stable discrimination capability without pronounced bias toward majority classes.

For the age-based classification task, higher precision and recall values were observed overall, which is consistent with the superior classification accuracy reported previously for this setting. These findings suggest that age-related neuroanatomical variations provide stronger discriminative information for ASD classification compared with gender-based stratification alone.

As expected, the joint age–gender classification task exhibited comparatively lower class-wise performance due to increased task complexity, finer class granularity, and more pronounced class imbalance. Nevertheless, the proposed framework maintained relatively stable precision and recall trends across minority and majority demographic subgroups, indicating reasonable robustness under challenging multi-class conditions.

Overall, the per-class analysis confirms that the proposed optimization-driven framework does not solely achieve strong aggregate accuracy, but also maintains balanced classification behavior across diverse demographic categories. This analysis further supports the suitability of the proposed approach for demographic-aware neuroimaging classification tasks involving heterogeneous and imbalanced datasets.

For direct comparison with conventional binary ASD-versus-TD studies summarized in Table 6, the corresponding binary classification performance of the proposed framework is additionally reported within the same comparative analysis.

Table 4 Comparison between the proposed framework and previously reported state-of-the-art ASD classification methods. Since many existing studies primarily report conventional binary ASD-versus-TD classification performance, the corresponding binary classification result of the proposed framework is additionally provided in Table 6 for direct comparison.

Statistical analysis of model performance

To further evaluate the reliability and stability of the proposed framework, an additional statistical analysis was conducted using fold-wise accuracies obtained from the five-fold cross-validation experiments. Since all models were evaluated under identical dataset partitions and experimental settings, fold-wise performance distributions were analyzed to assess variability and robustness across validation folds.

Statistical measures, including paired significance testing and 95% confidence intervals (CI), were used to provide additional insight into the consistency and stability of the observed performance differences between the proposed framework and the strongest baseline models. The corresponding statistical results are summarized in Table 5.

The analysis indicates that the proposed framework consistently achieved improved mean classification performance across gender-based, age-based, and joint age–gender classification tasks. The most noticeable improvement trend was observed in the age-based classification setting, whereas the joint age–gender classification task exhibited comparatively smaller statistical margins due to increased task complexity and finer class granularity.

Overall, the statistical analysis supports the stability and robustness of the proposed optimization-driven framework and indicates consistent performance improvements across cross-validation folds.

In addition to mean accuracy values, fold-wise performance distributions were visually analyzed using boxplot representations to further assess model stability and variability. The corresponding visualization is presented in Fig. 6.

Statistical significance analysis was conducted using paired two-tailed t-tests on fold-wise classification accuracies obtained from the five-fold cross-validation experiments. Prior to significance testing, the normality of fold-wise performance differences was assessed using the Shapiro–Wilk test to verify the suitability of parametric analysis. Since the normality assumption was not violated, paired t-tests were considered appropriate for comparing the proposed framework against baseline methods. To reduce the risk of inflated Type I error associated with multiple pairwise comparisons across classification tasks and competing models, Bonferroni correction was additionally applied to the reported significance analysis. Statistical significance was evaluated at a corrected significance threshold of p < 0.05.

Table 5 Statistical comparison between the proposed framework and the strongest baseline models across demographic-aware classification tasks. Paired statistical analysis and confidence interval estimation were performed using fold-wise cross-validation performance distributions to assess the stability and consistency of the observed improvements.
Fig. 6
Fig. 6

Boxplot representation of fold-wise classification accuracies obtained from five-fold cross-validation for the proposed framework and the strongest baseline models across demographic-aware classification tasks. The figure illustrates the distribution, variability, and stability of classification performance for gender-based, age-based, and joint age–gender classification settings.

Impact of demographic stratification

An important observation from Table 3; Fig. 5 is that age-based classification (Method 2) achieves the highest overall performance. This suggests that age-related neurodevelopmental changes captured by sMRI provide stronger discriminative cues for ASD detection than gender-related structural differences alone. Neurodevelopmental processes introduce measurable anatomical variations over time, which appear to amplify ASD-specific patterns when age stratification is explicitly considered.

Gender-based classification (Method 1) also yields strong performance, indicating the presence of structural differences between male and female ASD populations. However, these differences are less pronounced than age-dependent variations, which explains the comparatively lower accuracy observed for Method 1.

The joint age–gender classification task (Method 3) exhibits reduced performance relative to Methods 1 and 2. This outcome is expected due to the increased number of classes and the resulting reduction in samples per class, which increases inter-class overlap and classification complexity. Such behavior is consistent with prior multi-class neuroimaging studies and does not indicate a weakness of the proposed framework.

Comparison with pre-trained CNN models

Among the baseline models presented in Table 3, ResNet50 demonstrates the closest performance to the proposed CNN models for Methods 1 and 2, while MobileNet performs comparatively well for Method 3. Nevertheless, Fig. 5 clearly shows that the proposed models maintain a consistent performance advantage.

This superiority can be attributed to two key factors. First, the proposed CNN architectures were specifically designed and trained for ASD-related structural MRI data, rather than repurposed from natural image recognition tasks. Second, the use of the OptABC algorithm enabled task-specific hyperparameter optimization, allowing each model to better adapt to the underlying demographic characteristics of the data.

Robustness and generalization

The use of five-fold cross-validation across all experiments demonstrates that the proposed framework maintains stable performance across different data splits. This stability suggests that the models are not overfitted to a particular subset of the data and can generalize effectively across heterogeneous imaging sites within the ABIDE dataset.

Furthermore, the integration of demographic stratification contributes to improved robustness by reducing intra-class variability and enabling the models to focus on population-specific neuroanatomical patterns. This characteristic is particularly important for ASD, where clinical and biological heterogeneity remains a central challenge.

Comparison with state-of-the-art (SOTA) methods

Recent advances in ASD classification using neuroimaging have largely focused on improving classification accuracy through increasingly complex deep learning architectures. Most State-of-the-Art (SOTA) approaches address binary ASD versus typically developing (TD) classification and frequently employ computationally intensive strategies such as volumetric 3D convolutional neural networks, ensemble learning, or multimodal fusion of structural and functional MRI. While these methods have demonstrated promising results, their applicability to demographic-aware analysis and scalable clinical deployment remains limited.

To position the proposed framework within the current SOTA landscape, we compare our results with representative and recent high-quality studies from the literature. This comparison considers not only classification accuracy, but also classification scope, model complexity, and practical deployability, which are essential for assessing the broader impact of ASD diagnostic frameworks.

Quantitative comparison with recent SOTA methods

As shown in Table 6, most SOTA methods report performance for binary classification, often under controlled experimental settings. In contrast, the proposed framework addresses multiple demographic-aware classification tasks, including gender-based, age-based, and joint age–gender stratification. Although direct numerical comparison across studies is inherently limited by differences in dataset composition, preprocessing strategies, and validation protocols, the proposed method demonstrates competitive performance, particularly for age-based classification, while offering broader analytical scope.

Table 6 Quantitative comparison between the proposed framework and recent State-of-the-Art (SOTA) ASD classification methods. Reported accuracies are shown alongside imaging modality, dataset, and classification scope. Differences in experimental settings and classification objectives should be considered when interpreting numerical comparisons.

For completeness and fair comparison with conventional ASD versus TD classification studies, Table 6 also includes the best performance obtained by the proposed framework under a non-stratified binary classification configuration, as reported in the ablation analysis.

Visual comparison with SOTA performance

Figure 7 presents a visual comparison between recent SOTA methods and the proposed framework. Panel (a) summarizes reported accuracies from SOTA studies focusing on binary ASD versus TD classification, whereas panel (b) illustrates the performance of the proposed framework under demographic-aware classification settings. The strong performance observed for age-based classification highlights the benefit of incorporating demographic information, while the reduced accuracy in joint age–gender classification reflects the increased complexity associated with finer class granularity. Overall, the figure emphasizes that the proposed framework achieves competitive performance while addressing more challenging and clinically relevant classification tasks.

Fig. 7
Fig. 7

Performance comparison between recent SOTA ASD classification methods and the proposed demographic-aware framework. Panel (a) presents reported accuracies for binary classification tasks, while Panel (b) shows performance for gender-based, age-based, and joint age–gender classification tasks addressed in this study.

Discussion of model scope, computational complexity, and practicality

Table 7 summarizes key differences between the proposed framework and recent State-of-the-Art (SOTA) methods in terms of classification scope, computational complexity, interpretability, and practical scalability. While many SOTA approaches achieve strong performance, they frequently rely on computationally intensive designs such as volumetric 3D convolutional networks, ensemble learning strategies, or multimodal fusion pipelines. These architectures typically demand substantial computational resources and specialized data acquisition, which can limit reproducibility and scalability in large multi-site studies and routine clinical environments.

In contrast, the proposed framework employs optimized 2D CNN architectures trained on structural MRI data, resulting in moderate computational complexity while maintaining competitive classification performance (see Table 7). This design choice reduces memory and processing requirements relative to 3D and multimodal models, enabling more efficient training and inference without compromising robustness. Such efficiency is particularly relevant for datasets such as ABIDE, where heterogeneity in acquisition protocols and site distribution necessitates scalable modeling solutions.

Another important distinction highlighted in Table 7 is the classification scope addressed by each method. Most existing SOTA studies focus on binary ASD versus typically developing classification, whereas the proposed framework explicitly evaluates gender-based, age-based, and joint age–gender classification. Although this demographic stratification introduces increased task complexity, it provides a more realistic representation of ASD heterogeneity and aligns more closely with clinical observations. Consequently, the proposed approach extends the analytical scope of existing SOTA methods rather than competing solely on binary accuracy metrics.

The use of the Optimized Artificial Bee Colony (OptABC) algorithm further enhances practicality by enabling automated, task-specific hyperparameter optimization. As reflected in Table 7, this strategy allows the model to adapt its capacity to different demographic classification settings without relying on excessively deep or ensemble-based architectures. This balance between adaptability and architectural simplicity contributes to improved scalability and facilitates deployment in resource-constrained environments.

Overall, the comparison presented in Table 7 demonstrates that the contribution of the proposed framework lies in its balanced trade-off between performance, complexity, and scope. By combining demographic-aware modeling with moderate computational demands and automated optimization, the proposed method complements existing SOTA approaches and offers a practical pathway toward scalable ASD classification using structural MRI.

While Table 6 focuses on quantitative performance relative to recent SOTA studies, Table 7 provides a complementary comparison emphasizing model scope, computational complexity, and practical applicability.

Table 7 Comparative analysis of the proposed framework and selected SOTA methods in terms of classification scope, model architecture, computational complexity, interpretability, and clinical scalability. The table emphasizes trade-offs between performance and practical deployability.

While several SOTA methods achieve strong accuracy, many rely on computationally expensive architectures such as volumetric 3D CNNs, ensemble models, or multimodal pipelines that limit scalability and clinical feasibility. In contrast, the proposed framework achieves competitive performance using optimized 2D CNN architectures, offering a favorable balance between accuracy, computational cost, and deployability.

Extended performance analysis and model behavior

Beyond aggregate accuracy metrics, a deeper analysis is required to understand model learning behavior, robustness, and error characteristics, particularly for demographic-aware and multi-class ASD classification tasks. To this end, we extend the evaluation by examining training dynamics and class-level prediction behavior, providing insights that cannot be inferred from bar-chart-based performance summaries alone.

Learning dynamics and convergence behavior

To analyze optimization stability and generalization behavior, Fig. 8 illustrates the training and validation loss curves for a representative fold of the demographic-aware CNN model. The smooth and monotonic decrease in training loss, coupled with the close alignment between training and validation curves, indicates stable convergence and effective regularization throughout the learning process. Importantly, the absence of divergence or oscillatory behavior suggests that the OptABC-based hyperparameter optimization contributes to balanced learning and mitigates overfitting.

The limited gap between training and validation loss further reflects consistent generalization across unseen samples, which is particularly significant given the heterogeneous, multi-site nature of the ABIDE dataset. Such convergence analysis has been emphasized in recent medical imaging studies as an essential component of rigorous deep learning evaluation, especially when demographic stratification increases task complexity.

Fig. 8
Fig. 8

Training and validation loss curves for the proposed CNN model, illustrating convergence behavior and generalization performance across training epochs.

Class-level performance and error characteristics

While learning curves provide insight into optimization behavior, class-level analysis is essential for understanding how errors are distributed across demographic groups. Figure 9 presents the confusion matrix for the joint age–gender multi-class classification task, reporting raw sample counts for each class.

As shown in Fig. 9, the majority of predictions lie along the diagonal, indicating reliable discrimination across most demographic categories. Misclassifications are primarily concentrated between neighboring age groups, particularly within adjacent developmental stages. This behavior reflects gradual neurodevelopmental transitions rather than random classification errors and highlights the intrinsic difficulty of fine-grained demographic stratification. Notably, cross-gender confusion within the same age group is less frequent, suggesting that age-related structural variation exerts a stronger influence on classification than gender alone.

This structured error pattern is consistent with findings reported in recent demographic-aware ASD neuroimaging studies and supports the interpretability of the proposed framework. Importantly, the confusion matrix reports raw sample counts, and therefore row and column sums are not constrained to equal 100, ensuring transparency in class distribution and prediction behavior.

Fig. 9
Fig. 9

Confusion matrix for the joint age–gender multi-class classification task. Rows represent true class labels and columns represent predicted labels, providing insight into class-wise performance and misclassification patterns.

Interpretation of demographic stratification effects

Taken together, the analyses in Figs. 8 and 9 highlight a fundamental trade-off between classification granularity and predictive difficulty. While demographic-aware stratification increases task complexity, it enables a more realistic and clinically meaningful characterization of ASD heterogeneity. The observed performance trends and error structures indicate that reduced accuracy in the joint age–gender task should be interpreted as a consequence of finer class partitioning rather than model instability or overfitting.

Overall, this extended analysis demonstrates that the proposed framework exhibits stable learning dynamics, predictable error behavior, and interpretable performance trends. These findings strengthen the validity of the proposed approach and confirm that its evaluation extends beyond simple accuracy reporting toward a more comprehensive and scientifically rigorous analysis.

Ablation study and component contribution analysis

To systematically evaluate the contribution of individual components within the proposed framework, we conducted an extensive ablation study in which key modules were selectively removed or simplified while keeping all other experimental conditions unchanged. Unlike limited ablation analyses restricted to a single binary task, this study examines the impact of component removal across gender-based, age-based, and joint age–gender classification, thereby providing a comprehensive assessment under increasing task complexity.

Ablation design and experimental protocol

Starting from the full proposed framework, four core components were ablated independently:

  1. (i)

    demographic-aware task stratification,

  2. (ii)

    OptABC-based hyperparameter optimization,

  3. (iii)

    CED-based structural preprocessing, and.

  4. (iv)

    data augmentation.

Each ablated configuration modifies only one component at a time, ensuring that observed performance differences can be directly attributed to the removed or altered module. All experiments were conducted using the same five-fold cross-validation protocol and identical data splits to guarantee a fair and controlled comparison.

Quantitative impact of component removal

The quantitative results of the ablation study are summarized in Table 8, which reports classification accuracy across gender-based, age-based, and joint age–gender tasks.

Table 8 Ablation study evaluating the contribution of key components of the proposed framework across demographic-aware classification tasks, including gender-based, age-based, and joint age–gender classification. Each row reports classification accuracy after systematically removing or modifying individual components, such as preprocessing, data augmentation, and OptABC-based hyperparameter optimization. The results illustrate the relative impact of each component on overall performance and demonstrate their complementary roles in achieving robust classification.

As shown in Table 8, the full proposed framework consistently achieves the highest performance across all classification settings. The reported values for the non-stratified configuration correspond to baseline binary ASD versus TD classification settings evaluated under the gender-oriented and age-oriented experimental protocols, respectively. This confirms that demographic-aware decomposition enables the model to learn more homogeneous and discriminative representations, which is critical for capturing ASD heterogeneity.

The removal of OptABC-based hyperparameter optimization results in the largest performance degradation, especially for the joint age–gender task. This observation highlights the importance of adaptive hyperparameter tuning when dealing with fine-grained multi-class problems and heterogeneous neuroimaging data. In contrast, fixed hyperparameter configurations limit the model’s ability to balance capacity and generalization across demographic subgroups.

Excluding CED-based preprocessing also leads to a measurable reduction in accuracy, indicating that explicit structural localization supports the learning of discriminative anatomical features from sMRI data. Similarly, removing data augmentation degrades performance and increases sensitivity to class imbalance, reflecting reduced robustness to inter-site and inter-subject variability.

Interpretation and methodological implications

The progressive performance degradation observed across ablated configurations demonstrates that the proposed framework is not a collection of independent heuristics, but rather a tightly integrated system in which each component plays a complementary role. Notably, the relative impact of each component becomes more pronounced as task complexity increases, with the joint age–gender classification representing the most challenging scenario.

These findings are consistent with the theoretical interpretation discussed earlier, where demographic-aware task decomposition reduces intra-class variance, and OptABC-based bi-level optimization enhances generalization under complex decision boundaries. The ablation study therefore provides empirical validation for both the methodological design and the theoretical motivation of the proposed framework.

Overall, the ablation analysis confirms that demographic stratification, adaptive optimization, structural preprocessing, and data augmentation jointly contribute to stable learning, improved robustness, and enhanced classification performance, thereby reinforcing the validity and necessity of the proposed design choices.

Mathematical perspective and theoretical interpretation

Although the primary contribution of this work lies in system design and empirical evaluation, the proposed framework can be formally interpreted within a mathematical modeling perspective. Specifically, the ASD classification problem addressed in this study can be viewed as a multi-class probabilistic learning problem defined over high-dimensional anatomical feature spaces derived from structural MRI data.

Let \(\:\mathcal{X}\subset\:{\mathbb{R}}^{d}\:\) denote the space of structural brain representations extracted implicitly by the convolutional neural networks, and let \(\:\mathcal{Y}=\{1,\dots\:,\text{K}\}\) represent the set of demographic-aware class labels. The learning objective of each CNN model is to approximate a conditional probability mapping \(f_{\theta } :{\mathcal{X}} \to {\mathcal{P}}\left({\mathcal{Y}} \right)\), parameterized by \(\:\theta\:\), that minimizes empirical risk under cross-entropy loss. While this formulation is standard in supervised learning, its application to demographic-stratified neuroimaging classification introduces additional structural complexity due to heterogeneous class distributions and overlapping anatomical patterns.

From an optimization standpoint, the use of the Optimized Artificial Bee Colony (OptABC) algorithm introduces a metaheuristic search layer over the model parameter space. Unlike gradient-based optimization, OptABC performs population-based exploration over discrete and continuous hyperparameter domains, enabling adaptive control of model capacity. This can be interpreted as a bi-level optimization process, where the outer loop searches for optimal hyperparameters, and the inner loop performs gradient-based learning of network weights. Such bi-level formulations have been increasingly adopted in complex deep learning systems to balance expressiveness and generalization54,55.

Importantly, the demographic-aware modeling strategy implicitly introduces a structured partitioning of the label space, transforming a single binary decision boundary into multiple demographic-specific decision manifolds. This decomposition reduces intra-class variance within each subtask while increasing inter-class separability, particularly for age-based stratification. The observed performance differences across gender-based, age-based, and joint age–gender tasks can therefore be interpreted as a function of the geometric complexity of class manifolds in the learned feature space. Similar interpretations have been reported in recent neuroimaging studies addressing multi-task and stratified learning problems32,33.

The extended performance analysis presented in Figs. 8 and 9 further supports this interpretation. Smooth convergence behavior reflects stable optimization dynamics, while structured misclassification patterns indicate gradual transitions between neighboring demographic classes rather than random errors. These findings suggest that the learned representations preserve meaningful anatomical continuity, which is consistent with known neurodevelopmental processes.

Overall, while the proposed framework does not introduce a new mathematical theory, it contributes a mathematically grounded formulation of demographic-aware ASD classification, combining probabilistic learning, bi-level optimization, and structured label decomposition. This perspective elevates the discussion beyond empirical observation and provides a theoretical basis for interpreting the observed performance trends, limitations, and future extensions.

Benefits and research contribution of the proposed framework

The primary contribution of the proposed framework is its demographic-aware design, which moves beyond conventional binary ASD classification to explicitly model age- and gender-related heterogeneity. This approach reflects known clinical variability in ASD presentation and provides a richer analytical perspective on subgroup-specific neuroanatomical patterns. Combined with automated optimization via OptABC and moderate computational demands, the proposed framework emphasizes balanced performance, scalability, and clinical relevance rather than accuracy alone.

Discussion on fairness and limitations of SOTA comparison

Direct quantitative comparison with SOTA studies is inherently constrained by differences in imaging modality, cohort selection, preprocessing pipelines, and validation protocols. Accordingly, the presented comparison aims to contextualize the proposed framework within the broader research landscape rather than claim absolute superiority. Future work will focus on standardized benchmarking and evaluation across independent cohorts to further strengthen comparative conclusions.

Limitations

Although the results presented in Table 3; Fig. 5 demonstrate strong performance, several limitations remain. First, the experiments were conducted using a single public dataset, and although ABIDE is multi-site and diverse, external validation on independent cohorts is necessary to confirm generalizability. Accordingly, the quantitative performance results reported in the present study should be interpreted as evidence of within-ABIDE generalization under the adopted experimental protocol rather than as confirmation of cross-cohort or cross-protocol generalizability. Second, the study focuses exclusively on structural MRI data; incorporating additional modalities such as functional MRI or diffusion imaging may further enhance diagnostic performance. Finally, while the proposed framework demonstrates strong classification accuracy, it has not yet been evaluated in real clinical workflows, where factors such as acquisition time, interpretability, and clinician interaction play critical roles.

Although the current study uses conservative geometric and noise-based augmentation suitable for sMRI classification, future work will explore more advanced augmentation strategies (e.g., intensity-based normalization augmentation or anatomically constrained transformations) and will evaluate their impact across sites and demographic strata. In addition, the proposed preprocessing strategy relies on Canny Edge Detection (CED) as a lightweight structural enhancement approach, which differs from widely adopted anatomical preprocessing frameworks such as FreeSurfer, SPM, and FSL. Although the ablation analysis suggests that CED contributes positively within the proposed framework, the absence of direct comparison with standardized sMRI preprocessing pipelines may limit cross-study comparability and should therefore be interpreted cautiously. In addition, future studies will investigate integrating standardized anatomical preprocessing tools with task-specific structural enhancement to further assess potential gains in robustness and reproducibility.



Source link