Verification of Toronto recurrence inference using machine learning for post-transplant hepatocellular carcinoma models

We examined Triumph models developed using a machine learning approach in an international multicenter external cohort. This model showed numerically superior identification compared to the HALT-HCC, moral, and AFP models, and showed greater clinical utility in net profit analysis. These findings support the use of machine learning to create more accurate risk prediction models.

The strength of the Triumph model comes from two important factors. First, it was developed using a large LT cohort from the university's health network.¹⁵. In contrast, the HALT-HCC, morality, and AFP models were based solely on deceased donor LT cohorts. Incorporating the large number of LDLT populations into model development supports a more balanced evaluation of LT recipients and improves generalization. This is reflected in the numerically high performance of Triumph, both in the LDLT subgroup and in the Kaohsiung Chang Gung Memorial Hospital Subgroup. Second, the machine learning approach used in the Triumph model identified more associations and incorporated a wider range of risk factors. This included not only morphological factors such as lesion size and number, but also biomarkers such as AFP level and neutrophil count. Additionally, we described the dynamic changes that occur during cross-linking therapy for the waitlist. Incorporating multiple aspects of HCC is important to establish effective selection criteria¹⁶. Compared to the Triumph model, only HALT-HCC⁷ Statistically non-inferiority performance was achieved, consistent with the findings of the original development cohort. Although this difference was not statistically significant, subgroup analyses showed numerical advantages of victory, particularly between LDLT recipients and non-US cohorts. Developed at the US Center (Cleveland Clinic), HALT-HCC performed best at the US Verification Center (UCSF, UCLA). This highlights the impact of local practices and patient characteristics on model performance. Despite both models originating from North American centers, the excellent performance of non-North American Triumphs could be derived from a more diverse development cohort and machine learning approach.

The Triumph model also showed better clinical utility in net profit determination analysis that balances true positive rates (prevent liver transplantation in patients who may experience recurrence) against false positive rates (refusing to transplant into potentially curable patients). Over various risk thresholds (reflecting the probability of LT based on waitlist status and organ availability), the Triumph model achieved higher net profits than the other models, particularly from the thresholds 0.0 to 0.6. This range is realistic considering that the 1-year probability and overall probability of LT for HCC patients on the UNOS waitlist report a 54% probability and overall probability.^{18, 19, 20}.

One example of machine learning implemented in organ allocation is the optimized prediction of mortality rate (OPOM) assessed by Optn/unos.^{twenty one}. OPOM is designed to improve risk stratification in HCC patients with exception points^22,23it only predicts waitlist dropouts, ignoring important prognostic factors for post-transplant survival. However, the Triumph model shows strong performance and utility in predicting post-port recurrence. Thus, victory serves as a complementary tool and allows this important post-transplant survival aspect to be integrated into the organ distribution system. Nevertheless, incorporating new models like Triumph into allocation policies is a key initiative that will remain a goal for the future, requiring extensive validation, logistical considerations and consensus within the port community.

Apart from the Triumph model, three other machine learning-based models have been developed to predict HCC recurrence after porting: Moral-AI^{twenty four},recurrence^{twenty five}and trains²⁶. Moral-AI models using deep neural networks incorporate variables such as tumor diameter, AFP, and Pivka-II. It demonstrated improved discrimination compared to traditional moral models, but its generalizability is limited because it focuses on Korean LDLT recipients. The recurrence model employing random survival forests and classification techniques achieved a higher C index compared to the winning model, but this advantage may be attributed to the inclusion of post-port variables. However, pre-transplant variables are important for organ allocation decisions as they are the only factor available before surgery. Trains developed in a large international cohort and validated in a small North American cohort followed the opposite approach of the victory model. Although objective data were used to acquire dynamic changes in tumor lesions during bridging therapy, the train relied on modified response assessment criteria for solid tumors (MRECIST). This standard may vary by agency and radiologist^27,28bias may be introduced. Nevertheless, Train-AI achieves a high C index of 0.77 on both internal and external validations, outperforming other existing models. The DeepSurv methodology used by Train-AI was also examined in the development cohort. However, the Triumph approach was preferred because of its excellent performance by reinforcing the traditional Cox model with elastic normalization. Deepsurv is suitable for large development cohorts such as those used on trains, but its complexity poses the risk of overfitting in smaller data sets. Given the relatively small sample size, we chose the Triumph model as a more appropriate approach to mitigate this risk.

This study faces several limitations that affect its findings. First, the design as retrospective and multi-room-centered studies introduces the possibility of selection bias that can arise due to different approaches to management between different centers. Furthermore, the development cohort from Toronto was significantly different from the validation cohort with a higher prevalence of HBV and a lower proportion of locally advanced HCC. This discrepancy could have negatively impacted model validation performance and highlighted potential regional biases that affect model performance. Future work incorporating data from diverse international centres during model development will help improve the model and improve its generalization. Including both living donors and deceased donor grafts in the Triumph model could be considered a mix of data, but this approach was intentional. Donor-specific models may better capture differences in implant settings or graft characteristics, while Triumph models are designed to reflect the reality of the centres that offer both types of implantation. This provides a unified tool to guide decision-making among eligible patients in either pathway. Notably, there is no evidence so far that graft quality directly affects oncological outcomes in processed analyses. Given that models specific to LDLT or DDLT already exist, the strength of victory lies in performance validated across the mixed donor cohort. This reflects real-world practices and enhances generalization across diverse port programs. Finally, other frequently used models such as Metroticket 2.0²⁹ (based on competing risk analysis) and retreats¹⁴ (incorporating pathological predictors), could be included as a comparator, and its inclusion was unfeasible due to the requirements of an unmet model.

In conclusion, the Triumph model surpasses other commonly used scores in predicting HCC recurrence after implantation, providing both higher accuracy and greater clinical utility. This suggests that integration of the Triumph model into future organ allocation strategies in HCC patients can enhance the overall benefits of liver transplantation. Our research highlights the potential of machine learning approaches to advance organ allocation in transplant medicine. However, despite technological advances, it is essential to develop robust machine learning models with large, diverse cohorts to ensure generalization and avoid overfitting. This requires continuous collaboration within the international transplant community and a commitment to incorporating machine learning innovations into porting practices.

Source link