This section presents the experimental results on evaluating the performance of HCTN-LC in lymphoma classification. These results are meticulously organized, accompanied by interpretations, visual interpretations and a comparative analysis that benchmarks the HCTN-LC model against current standards. Further, this section explores into the implications of these findings for both future research directions and their practical applicability in diagnostic procedures.
Experimental setup
The proposed HCTN-LC model is implemented on a platform equipped with an Intel Core i9-10900 K processor, complemented by 64GB of DDR4 RAM. This configuration is further enhanced by the computing power of an NVIDIA GeForce RTX 3080, which offers 10GB of dedicated graphics memory, facilitating efficient processing of complex computational tasks. The software framework is anchored in MATLAB 2024a, operating on a Windows environment, which provides a rich array of deep learning and image processing tools indispensable for the development and testing of sophisticated models like HCTN-LC. This synergy of advanced hardware and comprehensive software ensures a solid foundation for conducting the demanding computational processes integral to the model’s operation, aiming to achieve optimal performance in lymphoma classification from WSIs.
The hyperparameter optimization of the HCTN-LC model is guided by a rigorous process of iterative validation and empirical assessment. Key factors such as the model’s learning dynamics on whole slide images (WSIs), computational efficiency, and resistance to overfitting are carefully considered. Initial configurations are informed by performance trends observed on representative data subsets, then systematically refined through a grid search strategy within practical resource limits. Cross-validation is employed to ensure the stability and generalizability of the selected parameters. This balanced approach ensures that HCTN-LC achieves both high diagnostic accuracy and computational efficiency. The finalized hyperparameters are summarized in Table 2.
Performance evaluation
The evaluation of the HCTN-LC model, tailored for the three-class classification of lymphoma from WSIs, utilizes an extensive array of quantitative metrics. It includes accuracy, specificity, sensitivity, F1-score, and the Area Under the Curve (AUC). Each metric offers a unique lens through which the model’s diagnostic efficacy can be evaluated from the True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN) garnered from the model’s performance on the test dataset.
Accuracy (Acc) quantifies the overall proportion of samples correctly classified by the model out of the total dataset as in (9), serving as a primary indicator of the model’s effectiveness in distinguishing between lymphoma types. Specificity (Sp) measures the model’s proficiency in accurately identifying each lymphoma case from the other two cases as in (10), crucial in scenarios where the effects of misclassifications are significant.
Sensitivity (Sn), or recall, assesses the model’s capacity to correctly identify all the lymphoma cases as in (11), ensuring that the model is effective in detecting the presence of lymphoma. Precision (Pre) defined in (12) evaluates the accuracy of the model’s lymphoma classifications, indicating a low rate of false classifications in positive lymphoma predictions.
F1 Score offers a harmonized measure of the model’s precision and recall, encapsulating both metrics into a singular performance indicator as in (13). A score closer to 1 signifies optimal model performance in lymphoma classification. The AUC assesses the model’s ability to distinguish each lymphoma type from the others. It ranges between 0 and 1, where a value of 0.5 corresponds to a random classifier and a value of 1 corresponds to a perfect classifier.
$$\:\text{Acc}=\frac{TP+TN}{TP+FP+TN+FN}$$
(9)
$$\:\text{Sp}=\frac{TN}{FP+TN}$$
(10)
$$\:\text{Sn}=\frac{TP}{FN+TP}$$
(11)
$$\:\text{Pre}=\frac{TP}{TP+FP}$$
(12)
$$\:\text{F1}=2\times\:\frac{\text{Pre}\times\:Rec}{\text{Pre}+\text{Rec}}$$
(13)
.
Table 3a presents the quantitative metrics evaluating the overall and class-wise performances of HCTN-LC.
The table reveals exceptionally high performance across all metrics, indicating the model’s robustness and accuracy in lymphoma classification as described below.
-
Overall performance: The model achieves an overall accuracy, sensitivity, specificity, precision, and F1 score of approximately 0.9987, with specificity slightly higher at 0.9993. These results indicate an almost perfect ability to correctly classify lymphoma cases and accurately identify the specific type of lymphoma, with minimal misclassification.
-
Class-wise performance:
-
CLL classification shows slightly higher accuracy and sensitivity at 0.9990 compared to precision at 0.9980, and an F1 score of 0.9987. The equal accuracy and sensitivity suggest the model’s high capability in identifying CLL cases, with precision slightly lower, indicating a very few cases of CLL were incorrectly classified as another lymphoma type.
-
FLL classification metrics are slightly lower in accuracy and sensitivity (0.9980) but higher in specificity (0.9995) and precision (0.9990), with an F1 score of 0.9985. The high specificity and precision suggest an excellent capability of the model to correctly classify non-FLL cases and accurately identify FLL cases among positives, respectively.
-
MCL classification shows equal and high values across accuracy, sensitivity, specificity, precision and F1 score of 0.9990, indicating a remarkable performance in correctly classifying MCL cases.
-
An analysis of the results reveals the exceptional ability of the HCTN-LC to classify lymphomas with near-perfect accuracy, specificity, and precision, highlighting its potential for significant clinical impact. The model’s significant performance in distinguishing CLL, FL and CLL with a notable edge in specificity signifies its reliability in minimizing false positives, a critical attribute for clinical diagnostics. The uniformity in high performance across lymphoma types suggests its efficacy in aiding precise diagnoses, facilitating tailored treatment plans, and potentially reducing the need for invasive procedures. This precision in classification, particularly in differentiating between similar lymphoma types, positions the HCTN-LC model as a valuable diagnostic tool, poised to enhance patient outcomes through improved diagnostic accuracy and treatment personalization.
In addition, to validate the robustness of the reported classification metrics and account for potential variability in the test set, statistical confidence analysis was performed using bootstrap resampling (n = 1000 iterations). Table 4b presents the 95% Confidence Intervals (CIs) for the key performance metrics corresponding to the results in Table 3a. The bootstrapped CIs confirm the stability and consistency of the proposed model across accuracy, sensitivity, specificity, precision, and F1 score. All intervals are tightly bound around their respective mean values, demonstrating minimal variance and reinforcing the reliability and generalizability of the model’s performance. These findings offer additional assurance that the model’s high accuracy is not the result of overfitting or random chance, thereby strengthening its clinical applicability for lymphoma classification from WSIs.
Further, the confusion matrix illustrated in Fig. 5 gives a visual interpretation of the model’s ability to discern the lymphoma cases. The confusion matrix demonstrates the classification performance of the proposed HCTN-LC model across three lymphoma subtypes: Chronic Lymphocytic Leukemia (CLL), Follicular Lymphoma (FL), and Mantle Cell Lymphoma (MCL). Out of 1,000 samples per class, the model correctly classifies 999 CLL, 998 FL, and 999 MCL samples, showing an exceptionally high degree of accuracy.
Only a total of 4 misclassifications are observed: 1 CLL sample is misclassified as MCL, 2 FL samples are misclassified as CLL, and 1 MCL sample is misclassified as CLL. This minimal error rate is reflected in the overall class-wise accuracies: 99.9% for CLL, 99.8% for FL, and 99.9% for MCL. Importantly, the misclassification rates are extremely low—ranging from 0.1–0.2%—demonstrating the model’s robust performance even when distinguishing between morphologically similar subtypes.
The matrix confirms that HCTN-LC achieves high sensitivity and specificity across all classes, with a near-perfect balance between true positives and minimal false predictions. This performance strongly supports the model’s reliability in clinical decision support, minimizing diagnostic errors and ensuring consistent identification of lymphoma subtypes in whole slide image analysis.

Confusion matrix Of HCTN-LC for lymphoma subtype classification.
In addition, to assess the generalization capability of the model and investigate potential overfitting, training and validation loss curves over 100 epochs were plotted in Fig. 6. It is seen that both curves exhibit smooth and stable convergence, with the validation loss closely following the training loss throughout the training process. No significant divergence between the two curves was observed, indicating that overfitting was effectively mitigated. This behavior reflects the impact of the applied regularization strategies, including dropout, L2 weight penalty, and early stopping. Additionally, the use of adaptive learning rates for the convolutional and Transformer pathways contributed to stable training dynamics and improved model generalization.

Training and validation loss curves of the HCTN-LC model.
Comparative analysis
This section presents a comprehensive comparison of the proposed HCTN-LC model against several state-of-the-art methods for lymphoma classification. The models compared include traditional hybrid architectures combining handcrafted and deep features (Hybrid DL + Hand-Crafted + XBoost33), compact CNN-based models (Reduced FireNet40), and recent hybrid deep learning frameworks such as BCCHI-HCNN46, EffNetV2-ViT47, and CNN-LSTM48. Table 5 presents a comprehensive evaluation of various models on lymphoma subtype classification across multiple metrics—Accuracy (Acc), Sensitivity (Sn), Specificity (Sp), Precision (Pre), and Area Under the Curve (AUC)—at both class-wise and overall levels. The proposed HCTN-LC consistently outperforms all other methods across nearly every metric and subtype.
HCTN-LC achieves the highest overall accuracy of 0.999, supported by perfect or near-perfect scores across all classes: 0.999 for CLL and MCL, and 0.998 for FL. This superior performance can be attributed to its hybrid dual-pathway architecture. The SqueezeNet-based Convolutional Pathway captures fine-grained spatial features, while the Vision Transformer Pathway models global contextual dependencies. The FFEM effectively merges these feature maps, allowing the model to learn discriminative representations that are both spatially and contextually rich.
In comparison, the Hybrid DL + Hand-Crafted + XBoost33 model performs well in most categories, with overall accuracy of 0.998 and strong class-wise precision and specificity. This model benefits from engineered features and ensemble learning, which boosts performance but lacks the adaptive feature integration seen in HCTN-LC. Similarly, the Reduced FireNet40 and BCCHI-HCNN46 demonstrate competitive performance, but exhibit a slight drop in sensitivity and AUC, especially for the FL and MCL classes. These models rely solely on convolutional representations, which limits their capacity to model long-range dependencies.
Notably, HCTN-LC delivers the highest sensitivity (0.999) and specificity (0.999) across all classes, highlighting its robustness in both identifying true positives and minimizing false positives. This is crucial in medical diagnostics, where both underdiagnosis and overdiagnosis can have significant consequences.
In terms of AUC—a strong indicator of the model’s discriminative power—HCTN-LC achieves 0.9991 overall, with top scores for all classes, including 0.9994 for CLL and 0.9993 for MCL. This reaffirms the model’s ability to distinguish between subtle inter-class variations. While the CNN-LSTM48 and EffNetV2-ViT47 models also perform reasonably well in AUC, their lower scores in precision and specificity suggest inconsistencies in prediction confidence and generalization.
The relatively lower performance of BCCHI-HCNN can be attributed to its exclusive reliance on convolutional layers, which limits its ability to capture long-range dependencies essential for analyzing Whole Slide Images. While effective in extracting local features, the absence of a global attention mechanism or dynamic feature fusion restricts its adaptability in distinguishing subtle morphological differences among lymphoma subtypes. This architectural limitation leads to reduced generalization and slightly higher misclassification, particularly in challenging cases like FL and MCL.
In conclusion, the performance gains of HCTN-LC can be directly attributed to its ability to integrate complementary features from both CNN and Transformer architectures. The results validate the model’s effectiveness in handling the complex and nuanced patterns present in histopathological images of lymphoma subtypes. Its consistently high scores across all key metrics reflect a well-balanced architecture optimized for both accuracy and clinical reliability.
Explainable analysis
In this research, gradient Class Activation Maps (grad-CAMs)53 are utilized to provide an explainable analysis of the HCTN-LC. These visual explanations offer insights into the model’s decision-making process, allowing researchers and clinicians to understand which features within the WSIs are most influential in the model’s classification of different lymphoma types. The grad-CAMs help in identifying the discriminative regions that the model perceives as indicative of CLL, FL, and MCL, thereby adding a layer of interpretability to the model. This transparency is crucial for establishing trust in the model’s capabilities and for its potential adoption in clinical practice, where understanding the rationale behind a diagnosis is as important as the accuracy of the diagnosis itself. Table 6 depicts heatmaps captured using grad-CAMs to visually validate the HCTN-LC model’s classifications, revealing the key areas within the WSIs that guide the model’s decisions. These heatmaps highlight diagnostic features, enhancing interpretability by linking the model’s analytical focus to recognizable histological patterns of different lymphoma types as below.
-
CLL: The original image likely shows a uniform distribution of small lymphocytes with clumped chromatin, a typical feature of CLL. The heatmap corroborates this by illuminating these regions intensely, indicating that the model is focusing on the patterns that are characteristic of CLL to make its classification.
-
FL: The WSI for FL possibly displays a disrupted nodular architecture interspersed with normal tissue, which is suggestive of FL. The corresponding heatmap seems to spotlight the nodular areas, supporting the classification and implying that the model is identifying and relying on these disruptions to distinguish FL.
-
MCL: The image for MCL shows a diffuse pattern of medium-sized lymphocytes, which is typical of MCL. The heatmap suggests a concentrated focus on certain regions, likely where these characteristic cell patterns are most prominent, aiding in the MCL classification.
In this line, Table 7 presents a clear visual interpretation of the misclassifications by the HCTN-LC model, with heatmaps pinpointing the decisive regions. In each case, the heatmaps reveal the model’s misdirected focus on certain tissue characteristics, shedding light on the potential reasons behind each incorrect classification.
-
The heatmap for the CLL case indicates a concentration of activity that led to its misclassification as MCL, with a confidence score of 0.7852, likely due to the model emphasizing areas typically associated with MCL features.
-
For FL, the heatmap highlights extensive areas outside the characteristic nodules, resulting in its misclassification as CLL with a confidence score of 0.7734, suggesting a misinterpretation of FL’s distinctive nodular structures.
-
The heatmap for MCL erroneously focuses on cluster-like formations, leading to a misclassification as FL with a confidence score of 0.7916, indicating a confusion between the dispersed pattern of MCL and the nodular pattern of FL.
Tables 6 and 7 collectively demonstrate the HCTN-LC model’s capability in lymphoma diagnosis through heatmaps, which successfully highlight diagnostic features for accurate classifications, and also reveal the model’s vulnerability to misinterpreting similar histological patterns, leading to misclassifications. The heatmaps in Table 6 reflect a precise focus on disease-defining features across CLL, FL, and MCL, while those in Table 7 show the model’s misplaced emphasis on non-definitive areas, despite high confidence scores. This contrast underlines the need for the model to enhance its discriminative learning, particularly for complex cases, to improve its interpretability and reliability in clinical diagnostics.
Ablation study
The ablation study on the proposed HCTN-LC system, is designed to systematically evaluate how different learning rate strategies—both static and dynamic (adaptive)—affect key performance indicators such as accuracy, training efficiency, and convergence speed. By comparing a baseline configuration with adaptive learning rates for both pathways (CP and TP) against variations where either or both pathways have static learning rates, the study aims to uncover the optimal learning rate strategy for this specific application.
This study’s significance centers on enhancing the HCTN-LC model through learning rate strategy insights, aiming for higher accuracy and swift convergence by adapting the learning rate based on training progress. It probes into the learning dynamics of hybrid architectures, shedding light on optimal learning rate applications for convolutional and Transformer elements, which is crucial for developing more refined adaptive mechanisms. Assessing training efficiency and speed helps mitigate computational demands and expedites training, a critical advantage in processing large-scale medical images. While focused on lymphoma classification, the implications extend to various medical imaging and neural network applications, suggesting broader utility of the findings. The results of the ablation study are presented in Table 8 to analyze the impact of the learning rates on the performance of the model.
This experimental setup clearly demonstrates the effectiveness of utilizing dynamic learning rates for both the CP and TP, with significant improvements observed across accuracy, training efficiency, and convergence speed. Employing dynamic learning rates for both pathways yields an outstanding accuracy of 99.0%, with 100 epochs to achieve convergence, and a short convergence time of 12 min. In contrast, configurations with static learning rates or a mix of static and dynamic rates for CP and TP show inferior performance, indicating that adaptive learning rates optimize neural network training more effectively, especially for complex tasks like medical image analysis, by enhancing model performance and computational efficiency.
Computational cost and efficiency analysis
In addition to diagnostic performance, computational efficiency is a critical factor in the deployment of deep learning models in real-world clinical workflows—especially in resource-limited settings. The proposed HCTN-LC model is designed not only for accuracy but also for low-latency and low-resource inference, making it suitable for real-time diagnostics and edge computing scenarios. Table 9 provides a detailed comparison of HCTN-LC with several benchmark and recent hybrid deep learning models. The Hybrid DL + XBoost model33, while effective, involves a high parameter count and computation overhead due to the combination of handcrafted and deep features, rendering it more suitable for desktop-level systems. The Reduced FireNet model40, though a streamlined variant of conventional CNNs, has a moderate computational load with 20.1 million parameters, 7.1 GFLOPs, and 62 ms inference time, placing it between33] and [46 in terms of efficiency. The BCCHI-HCNN model46, which uses a more complex multi-stream CNN architecture, incurs greater computational cost and inference latency.
The proposed HCTN-LC achieves a balanced trade-off between architectural depth and efficiency by integrating SqueezeNet for lightweight convolutional processing and a ViT with a tuned configuration for global contextual analysis. It maintains a low parameter count of 15.8 million, with 4.2 GFLOPs, and an average inference time of 42 ms per 512 × 512 WSI tile, as measured on an NVIDIA GTX 1060 GPU.
This comparison confirms that HCTN-LC maintains a competitive edge not only in classification performance but also in its computational profile, making it highly suitable for real-time clinical use, including settings with limited hardware capabilities.
Discussions
The diagnosis of lymphoma, a cancer impacting lymphocytes, requires accurate biopsy analysis. Traditional methods face limitations and inaccuracies, driving the need for AI-driven diagnostics. Research on using WSIs for lymphoma diagnosis highlights challenges in differentiating lymphoma subtypes because of their similar features, affecting diagnostic precision.
The HCTN-LC model proposed in this research represents a transformative advancement in lymphoma diagnostic methodologies, distinctly surpassing existing approaches through comprehensive and rigorous comparative analyses. Demonstrating remarkable diagnostic performance with an exceptional overall accuracy of 99.87%, sensitivity of 99.87%, and specificity of 99.93%, the HCTN-LC notably excels beyond contemporary models. For instance, it significantly surpasses the hybrid deep learning approach coupled with handcrafted features and XGBoost classifiers33, which, despite their effective integration of varied image features, are constrained by manual feature extraction and reliance on traditional classification mechanisms. Moreover, the HCTN-LC achieves superior results compared to the ReducedFireNet model40, which, though specifically designed for efficiency within resource-limited IoMT settings, inadequately addresses the intricate global contextual patterns crucial for accurate lymphoma subtyping. Recent methodologies such as the multimodal feature fusion network proposed by Huang et al.45, which employs Vision Transformers for PET/CT imaging analysis, highlight the relevance of multimodal data integration but remain limited by reliance on single-modality analysis for precise tasks. Similarly, the deep learning pipeline utilizing self-attention Transformers introduced by Zhou et al.44, aimed at refining diffuse large B-cell lymphoma subtyping, is narrowly focused on a single lymphoma subtype, thereby restricting broader diagnostic applicability. Furthermore, the AI-driven diagnostic system presented by Naser et al.42, though effective in diagnosing primary central nervous system lymphoma, lacks expansive validation across multiple lymphoma variants.
In contrast, by integrating a CNN for detailed feature extraction with a ViT for global contextual understanding of intricate patterns of lymphoma subtypes in WSIs, the HCTN-LC model enhances the precision in diagnosing and classifying three forms of lymphoma. Further, the use of SqueezeNet as the backbone in the CP of the HCTN-LC significantly enhances its efficiency and accuracy. SqueezeNet’s compact architecture requires fewer parameters than deeper networks, yet it maintains a high level of accuracy, making it ideal for detailed feature extraction without extensive computational resources. This choice allows the HCTN-LC model to efficiently process WSIs, accurately recognizing and classifying various lymphoma subtypes, thereby optimizing both the performance and scalability of the diagnostic system.
The FFEM stands out as a critical component of the HCTN-LC, enhancing the model’s ability to merge and amplify the distinct features identified by the CNN and ViT pathways. This module not only facilitates a more refined feature synthesis but also adjusts the emphasis on specific characteristics, ensuring that the most relevant information for classification is prioritized. This dynamic fusion process is a marked improvement over models that might miss subtle diagnostic cues or overemphasize commonalities among subtypes, leading to misdiagnosis.
Empirical evaluation of the HCTN-LC model highlights the model’s unparalleled diagnostic precision but also its ability to consistently distinguish between various lymphoma subtypes with minimal error. Compared to representative models in33,40,46,47,48, the HCTN-LC demonstrates significant improvements, outperforming them in every key performance indicator. These results highlight the strong potential of the HCTN-LC model in improving lymphoma diagnosis. With such high accuracy, the model can play a meaningful role in supporting personalized patient care and more informed treatment planning. It reflects real progress in applying AI to medical diagnostics and raises the standard for future tools in this field.
The interpretability analysis of the HCTN-LC model, highlighted by heatmaps, shows how the model discerns lymphoma subtypes, enhancing its credibility. These heatmaps visualize the model’s focus on key histopathological features, demonstrating its precision in identifying diagnostically relevant patterns within WSIs. This clarity in the model’s decision-making process, provided by the gradient concentration in heatmaps, not only validates its diagnostic accuracy but also promotes trust in AI-driven diagnostics, facilitating their adoption in clinical settings by offering insights into the AI’s analytical focus.
In spite of the HCTN-LC model’s advanced diagnostic capabilities, it is important to address two drawbacks. First, the study’s limitation to CLL, FL, and MCL emphasizes the necessity to expand the dataset to include a broader array of lymphoma subtypes. This expansion should specifically encompass additional lymphoma classes such as Burkitt Lymphoma (BL), DLBCL, along with various subtypes of T-cell Lymphomas, including Peripheral T-cell Lymphoma (PTCL), Anaplastic Large Cell Lymphoma (ALCL), and Cutaneous T-cell Lymphoma (CTCL). Incorporating these diverse lymphoma categories would significantly enhance the model’s diagnostic scope and precision, aligning it more closely with the vast array of lymphoma presentations encountered in medical practice. Second, the computational requirements for processing WSIs and implementing sophisticated algorithms may hamper the model’s deployment in resource-constrained settings. Addressing these limitations involves not only diversifying the dataset with a wide spectrum of lymphomas but also advancing the model to strike a balance between computational demand and diagnostic accuracy, thereby broadening its clinical applicability and utility.
The HCTN-LC model’s combination of CNNs and ViTs offers a foundation for extending its diagnostic capabilities to a broader range of pathologies, including specific lymphomas like BL and CTCL, as well as other diseases such as melanoma and breast cancer. By retraining with datasets from these conditions, the model can adapt its feature analysis to recognize distinct histological signatures. Tailoring the FFEM enables it to focus on pathology-specific characteristics, while transfer learning facilitates quick adaptation. This expansion can enhance the HCTN-LC model’s utility across various medical domains, marking a significant stride in precision medicine.
