Predicting early mortality and severe intraventricular hemorrhage in very low birth weight preterm infants: a nationwide multicenter study using machine learning

This study used a national retrospective database consisting of data on VLBW preterm infants and their associated variables collected immediately after initial management in the delivery room. Our objective was to develop a predictive model for early death, severe her-child IVH, and early poor outcomes using a -ML approach. After applying this approach, GA, BBW, 5-minute Apgar score, and delivery room intubation were identified as the top four most important factors for building a predictive model. In particular, we found that both the logistic regression model and the neural network model showed superior performance, as indicated by higher AUROC values. This suggests that they have good discriminatory ability in distinguishing between different outcomes. Furthermore, these models are well calibrated, meaning that the predicted probabilities closely match the observed frequency of outcomes. Furthermore, they were effectively validated across different cohorts within this study, highlighting their robustness and generalizability across diverse populations or settings. Overall, the logistic regression and neural network models performed well in terms of high AUROC values, good calibration, and successful validation across different cohorts, making them reliable predictors of outcome in this study.

Scoring systems currently available to predict early neonatal mortality include: Clinical Risk Index for Infants (CRIB) II^{twenty three} Neonatal Acute Perinatal Prolongation II (SNAPPE-II) score^{twenty four} and Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD) calculators.^{twenty five} About the condition and results of newborns. These predictive models have been widely adopted and have received external validation in multiple studies.²⁶

Our study identified GA and BBW as important risk factors, as well as CRIB II and NICHD. A systematic review highlighted the importance of these risk factors in neonatal mortality in neonatal intensive care units, with GA and BBW emerging as the most frequently cited causes of neonatal mortality.²⁷ Additionally, a study conducted on the Taiwanese population using birth certificate and death registration data established a strong correlation between GA, BBW, and early mortality.²⁸

In 1952, Dr. Virginia Apgar pioneered the development of a scoring system designed to assess the health of newborn infants and determine the need for resuscitative interventions. Her groundbreaking research revealed a significant correlation between neonatal survival from birth to 28 days and the infant's condition at birth.²⁹ Remarkably, modern research has demonstrated the enduring relevance of Apgar scores, reaffirming its importance even after nearly 50 years.³⁰

Although the Apgar score was originally devised to evaluate term infants in an era of high neonatal mortality in preterm infants, recent studies have shown that as Apgar scores decline across all GA categories, neonatal mortality The relative risk was shown to be consistently elevated.³¹ Similarly, in our study, we included Apgar score as an important variable to predict outcome.

In our study, intubation emerged as the most important variable among all initial management steps performed in the delivery room. Of note are empirical studies conducted in countries such as South Korea.³² Iran³³ Thailand³⁴ and brazil³⁵ Similarly, we identified intubation as a critically important risk factor for neonatal outcomes.

In our study, antenatal steroid administration and multiple births did not show statistical significance as outcome-predicting variables despite being included in the NICHD calculator. This discrepancy may be due to the high prevalence of antenatal steroid administration in Taiwan. Approximately 70% of the population included in the NICHD calculations received antenatal steroids, whereas in Taiwan, 85% of the patients in this study received this treatment. These demographic differences within the study population may have weakened the influence of these variables on the study results.

In contrast, Boghossian et al.³⁶ reported that the beneficial effect of prenatal steroids on mortality was statistically significant primarily in infants born between 24 and 25 weeks of gestation. This observation suggests that the effectiveness of antenatal steroids in reducing mortality may depend on GA.

As shown in previous studies, multiple births were associated with a significantly increased risk of death, especially for extremely premature infants born before 26 weeks of gestation.³⁷ In our study cohort, the mean GA of infants was 28.7 weeks, but this feature may explain why antenatal steroid administration and multiple births were not significant factors in our analysis.

ML is a subset of artificial intelligence that is widely used in the medical field³⁸ According to a recent systematic review³⁹ Regarding the deployment of ML models to predict neonatal mortality, prominent ML algorithms include neural networks, logistic regression, and random forests. Overall, the reviewed papers reported an average AUC range of 58.3–97.0%, with an average of >70%. These findings highlight that ML models can predict neonatal mortality. Our ML-based predictive model showed a commendable level of performance with comparable AUC values when side-by-side with other ML-based models.

In the context of predicting IVH, it is noteworthy that all four variables included in the predictor variables have previously shown strong predictive ability for IVH, with particular emphasis on GA. Additionally, the importance of endotracheal tube ventilation has been highlighted in the literature. Moreover, when we compare our His IVH predictor with previous models (AUC 0.67~0.85 for severe His IVH prediction), our predictor shows superior performance.⁴⁰

Of note, although the CRIB II, SNAPPE-II, and NICHD predictive models were externally validated in diverse study populations, none of these models incorporated data from the Taiwanese population into their evaluations. Could not. Forecasting methods rely heavily on epidemiological population data to predict specific outcomes.⁴¹ It is important to emphasize that the usefulness of predictive models can be compromised by the possibility that the models are built on data that may be outdated by the time they undergo validation.

To our knowledge, our predictive model represents a pioneering effort in the development of outcome predictive models. This was the first effort to build such a model based on the most recent and comprehensive dataset available in Taiwan. Additionally, our model is able to predict early death, severe her IVH, and early poor outcomes in her VLBW preterm infants immediately after initial management in the delivery room. Remarkably, this predictive ability was achieved using only her four factors, eliminating the need for time-consuming blood sampling. However, these unique advantages may facilitate widespread application in Taiwanese populations.

Limitations

This study had several limitations. First, the limitations imposed by the available databases prevented the collection of accurate clinical data such as blood pressure, oxygen demand, and comprehensive laboratory data, including hemograms, biochemical markers, and blood gas analyses. Including these clinical parameters may improve the predictive performance of the model.^26,39. Second, to protect privacy, anonymous information was recorded in the Taiwan Neonatal Network database, gestational age was truncated, and birth weight was recorded within a range. These unavoidable limitations may affect collinearity between variables. Third, although our predictive model showed a high degree of accuracy in predicting outcomes, it lacks adaptability over time. As clinical dynamics evolve, these models may become less predictive. Fourth, differences in controls and procedures between institutions may introduce potential biases, which may be unavoidable in our study. Fifth, it is important to recognize that ML models can inadvertently reveal bias or discriminatory tendencies. Therefore, additional external validation across diverse population groups is needed. This validation should consider whether the generated model can be applied with equal validity to populations other than the Taiwanese cohort to ensure broader applicability.

Source link