Study design
Two datasets were used in this study. First, a prospective observational study was conducted using ICU patients aged greater than 18 years who were diagnosed with AKI stage 3 during their stay in AZ Groeninge Hospital in Kortrijk, Belgium, between September 2018 and October 2020. AZ Groeninge Hospital’s institutional review board approved the use of de-identified data in this study (AZGS2018070). All patients have been de-identified and the analyses were conducted in accordance with relevant guidelines and regulations. Participants and/or their legal guardians provided informed consent. Exclusion criteria were patients with a baseline estimated Glomerular Filtration Rate (eGFR) < 30 ml/min/1.73 m\(^{2}\) estimated by CKD-EPI18, patients with RRT initiated before admission to the ICU, patients with a kidney transplant, patients with therapy restrictions with the shift to palliative care, and patients who received extracorporeal blood purification techniques for reasons other than AKI. During the period of the patient’s stay in the intensive care unit, data were collected using EHR. SCr and cystatin C (CysC) measurements were taken at the time of admission to the ICU and at the time of diagnosis of AKI (in most cases with a short time lag). The patients were also followed up by the nephrology department three, six, nine, and twelve months following diagnosis of AKI stage 3 in the intensive care unit. The eGFR was measured again during these follow-up visits.
In addition to the observational study, we have used an additional dataset containing patients who were not enrolled in this study and as a result, have not been followed up and have no information regarding outcomes. This dataset includes adult patients (> 18 years) who were diagnosed with severe AKI during their ICU stay at AZ Groeninge Hospital in Kortrijk, Belgium, between January 2016 and September 2018 and between October 2020 and September 2021. The exclusion criteria for this dataset were identical to those used in the observational study.
For external validation of our mortality prediction model, we used data from patients who were admitted to the ICU and suffered severe AKI after the observational study ended (between October 2021 and June 2022).
Acute kidney injury classification
Patients with AKI stage 3, as defined by the KDIGO criteria, have been included in the study. KDIGO defines stage 3 as an increase in SCr up to 3 times from baseline within a 7-day period or urine output (UO) \(< 0.3\) ml/kg/h for \(\ge 24\) hours19. In this study, true baseline SCr was available for patients who had an SCr measurement from an earlier visit (previously to their hospital or ICU admission). When such records were unavailable, baseline SCr was considered the first record of the patient’s hospitalization before ICU admission. All SCr measurements were performed with an Enzymatic method that is traceable to the isotope dilution mass spectrometric method (IDMS), which is the internationally approved reference method for measuring creatinine. In addition, CysC concentrations were measured by Liège University Hospital using a particle-enhanced nephelometric immunoassay on the BNII nephelometer (Siemens Healthcare Diagnostics, Marburg, Germany). The assay was calibrated against the internationally certified reference material ERM-DA471/IFCC for CysC.
The definition of CKD
We defined CKD as eGFR < 60 ml/min/1.73 m\(^{2}\), using the chronic kidney disease epidemiology collaboration (CKD-EPI) corresponding to CKD stage 3 or more according to the KDIGO classification.
Data description and preprocessing
During the ICU stay, various data points were collected, including demographic information (age, gender, weight, height, and BMI), comorbidity data, details of ICU interventions, the severity of illness scores (APACHE 2), disease classification (SAPS II), sequential organ failure assessment score (SOFA), admission diagnosis, events during the ICU stay (respiratory status, fluid balance, etc.), laboratory data, and kidney function-related features (serum creatinine (SCr), cystatin C (CysC), estimated glomerular filtration rate (eGFR), and albumin measurements), the latter of which was specifically collected for CKD prediction purposes. For certain features like SOFA, SAPS II, and fluid balance, which were measured multiple times during the ICU stay, we calculated the average, maximum value, and change over time (represented as the slope of the fitted line). As patients were transferred to different types of ICUs based on their admission diagnosis (including Medical ICUs (MICUs), Surgical ICUs (SICUs), and Trauma Units), this information was also recorded. The recorded comorbidities encompassed various conditions such as arterial hypertension, chronic liver failure, and diabetes. ICU interventions included the length of invasive ventilation (in days), usage of vasopressors, sedatives, antibiotics, and blood transfusions. In total 52 features (full feature set) have been used for the CKD prediction task. For the mortality prediction task, considering that informed consent was not obtained from the patients for the unlabeled data and their kidney function was not assessed in the same way, this dataset only consists of the features collected from the EHR excluding demographics and kidney-related features, resulting in 30 features (reduced feature set) in total (as the kidney-related features were considered crucial for CKD prediction, we did not consider the unlabeled dataset in that prediction task).
Proper timing of measurements is essential for the accurate prediction of CKD, as different features may have varying levels of relevance at different stages of a patient’s hospitalization. Table 1 lists the features used in our analysis and the corresponding timing of measurement. For example, age, gender, and body weight are measured on the first day of hospital admission, while SOFA and SAPS II are measured on the first day of ICU admission and with one measurement per day. Comorbidities are measured on the first day of hospital admission, while ICU interventions are measured on the first day of ICU admission and with three measurements per day. Our careful consideration of feature timing helps ensure the accuracy and usefulness of our predictive model.
Those features that were missing as a result of incomplete records were imputed by using the Multiple Imputation by Chained Equations (MICE) method20 for a maximum of 10 iterations.
Prediction methods
A random forest (RF)21 classifier has been employed to predict CKD after three and six months among patients who developed AKI stage 3 in the intensive care unit. RF is an ensemble learning method that constructs a multitude of decision trees (100 trees in our model) at the time of training and is used for classification, regression, and other tasks. Moreover, feature importance is calculated by weighting the mean decrease in impurity in splits with the given feature by the number of samples in the split22.
For our second task (mortality prediction or survival analysis), we assessed two survival methods, random survival forest (RSF)23 and survival gradient boosting (XGBoost)24. In the XGBoost implementation, there are two different methods for conducting survival analysis, Cox and AFT. In this work, we have used the Cox method. In order to interpret the XGBoost survival model and to show the relative importance of each feature and its effect on the predicting ability, a SHAP (SHapley Additive exPlanations)25 summary plot was performed. SHAP is a game theoretic approach that can intuitively and accurately explain the output of a machine learning model. Based on this survival model, the higher the SHAP value, the greater the likelihood of the progression to mortality.
The survival methods were trained on two scenarios of time-to-event data for mortality prediction. The first scenario consists of training the models on the resulting time-to-event data from the observational study (referred to as labeled data, or Ldata). This Ldata consists of subjects who are either deceased (observed) or alive at the study end or at drop-out (censored). The second scenario consists of adding the unlabeled dataset (referred to as Udata) to the first scenario. Due to the absence of information concerning the status of the mortality event (censored or observed) in Udata, we set the corresponding status and time equal to zero26.
We conducted a 5-fold cross-validation on the CKD prediction datasets in order to estimate the generalization capacity of the models. Results for the mortality prediction task have been evaluated on the external dataset described in “Study design” section .
To better characterize the overall performance of the models, we contrasted the performance of ML algorithms against the conventional statistical methods logistic regression (LR) and Cox proportional hazards model (COXPH)27 for CKD and mortality prediction, respectively. We chose logistic regression as a baseline model for classification due to its simplicity, interpretability, and widespread use in the literature. Logistic regression has been extensively applied in medical research and has been shown to perform well in predicting clinical outcomes. Moreover, it has a simple and interpretable model structure, allowing easy clinical interpretation of the results. On the other hand, Cox regression is a commonly used method for survival analysis, which is a type of modeling that accounts for the time until an event of interest occurs. We chose Cox regression over AFT (accelerated failure time) regression in our study because it does not require specifying the baseline hazard function and can handle time-varying variables. In contrast, AFT regression requires specifying the distribution of the survival times, which can be challenging in practice. Additionally, Cox is more widely used in the literature, particularly in medical research.
The outline of the ML and statistical method workflow is shown for the CKD prediction and survival analysis in Figs. 1 and 2, respectively. Figure 2 represents the two scenarios described earlier.

Study workflow for CKD prediction task. We utilized a population of patients from the observational follow-up data to train ML and statistical models to predict CKD after 3 and 6 months of developing AKI stage 3 in the ICU. 5-fold cross-validation was used to train and test models. Prediction performance was assessed with the AUROC and AUPR.

Study workflow for mortality prediction task (survival analysis). (a) In the first scenario, we utilized a population of patients from the observational follow-up data (Ldata) to train ML and statistical models to predict mortality in patients who developed AKI stage 3 in the ICU. In this scenario, the censoring rate is 57.42%. Prediction performance was assessed using C-index and has been tested on an external test set for each model separately. (b) In the second scenario, we utilized a population of patients from the observational follow-up data plus the unlabeled data (Udata) to have a bigger training set and train ML and statistical models to predict mortality in patients who developed AKI stage 3 in the ICU. In this scenario, the censoring rate is 80.8%. Prediction performance was assessed using C-index and has been tested on an external test set for each model separately.
Statistical analysis
Categorical features were expressed as numbers (proportions) and continuous features as medians with interquartile ranges (IQR). For the CKD prediction task, the predictive performance of the models was compared using the area under the receiver-operating characteristic curve (AUCROC), the area under the precision-recall curve (AUPR), and the net benefit using the decision curve analysis. AUCROC or ROC curves are constructed by plotting the true positive rate (TPR) against the false positive rate (FPR) at a variety of threshold settings. The true-positive rate is also known as sensitivity or recall, and the false-positive rate is known as (1 – specificity). The AUPR or PR curve illustrates the trade-off between Precision and Recall at various thresholds. In the context of decision curve analysis28,29,30, the net benefit metric serves as a means of comparing the costs and benefits of various treatment approaches. Based on a model’s sensitivity and specificity and the prevalence of the outcome in the population, the net benefit is calculated. According to decision curve analysis, the optimal strategy is the model with the highest utility. Through the cross-validation process, we calculated the predictive performance for each test fold and returned the average over the five test folds. For the mortality prediction task, Harrell’s concordance index (C-index)31 which is a common way to evaluate a model in survival analysis, has been used for comparing the predictive performance of the models. All analyses were conducted using Python version 3.9. The XGBoost algorithm was employed for the implementation of the XGBoost model, utilizing the xgboost library32. The Random Forest (RF) algorithm from the widely-used scikit-learn library was utilized33. Additionally, the Random Survival Forest (RSF) and Cox proportional hazards (COXPH) models were implemented using the scikit-survival library, which provides comprehensive survival analysis capabilities34.
Ethics approval and consent to participate
The study was approved by the institutional review board of AZ Groeninge Hospital (AZGS2018070). Informed consent was obtained by the investigators from the patients or their surrogates before they were enrolled in the study.
