Machine learning to predict in-hospital mortality in patients with spontaneous intracerebral hemorrhage in the intensive care unit

Machine Learning


Databases and Ethics

The Medical Information Mart for Intensive Care-IV (MIMIC-IV) is an open and freely accessible intensive care database containing comprehensive clinical data of patients admitted to a tertiary care hospital in Boston, MA, USA from 2008 to 2019. The database includes essential patient information, vital signs, laboratory values, treatment details, and survival data. The use of MIMIC-IV data has been approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and Massachusetts Institute of Technology (MIT, Cambridge, MA). All personal data in this database is encrypted, so informed consent is waived. One of the authors (Mao, Baojie) accessed the database and was responsible for data extraction (Certification No. 46148427). In addition, we recruited patients with cerebral hemorrhage admitted to the ICU at Zhejiang Hospital from December 2018 to February 2023. The study protocol was approved by the Zhejiang Hospital Ethics Review Committee (Review No. 2023 (58K)). All methods and procedures were performed in accordance with the Declaration of Helsinki. All patient data were anonymized. No patient identifiable data were recorded throughout the study. Written consent from patients was not required, as the study was purely observational.

Data Extraction and Results

Clinical and laboratory variables were carefully collected within 24 hours after admission to the intensive care unit (ICU). If a variable was measured multiple times, the average value was calculated and used for analysis. A total of 46 variables were included in the data collection process. These included patient characteristics (age, sex), vital signs (respiratory rate, blood pressure, heart rate, oxygen saturation, temperature), laboratory data (routine blood analysis, renal function, coagulation, blood gases), and comorbidities identified based on the recorded International Classification of Diseases ICD-9 and ICD-10 codes. The comorbidities considered were hypertension, diabetes, chronic obstructive pulmonary disease (COPD), congestive heart failure, renal disease, liver disease, and malignancy. In addition, information on the use of anticoagulants and vasoactive drugs, surgical status, Glasgow Coma Scale (GCS), Sequential Organ Failure Assessment (SOFA) score, mechanical ventilation, and renal replacement therapy (RRT) were also collected. Due to the limited number of patients with missing data, we decided to exclude them from the analysis rather than impute missing values.The primary outcome was all-cause in-hospital mortality.

Cohort Selection

  1. 1.

    The patient will need to be admitted to the ICU for the first time.

  2. 2.

    Patients must have a confirmed diagnosis of sICH.

  3. 3.

    Patients' ages must be between 18 and 90 years old.

  4. Four.

    Patients' ICU stay must be at least one day.

  5. Five.

    Patients must have complete clinical data.

The flow chart of patient recruitment is shown in Figure 1 .

Figure 1
Figure 1

Flowchart of the model development process and study.

Feature Selection

We applied Lasso regression, a regularization technique, to the preprocessed dataset. Lasso performs feature selection by shrinking the coefficients of less important features to zero, effectively removing them from the model. The optimal regularization parameter (λ) for Lasso was determined using a coordinate descent algorithm. After Lasso regression, variables were ranked based on their corresponding non-zero coefficients. The final predictive model contained the top 14 variables with the highest absolute coefficient values.

Statistical analysis

Normality of distribution was assessed using the Kolmogorov-Smirnov test. Continuous variables were expressed as mean with standard deviation if they followed a normal distribution, or as median with 25th–75th percentiles if they deviated from normal distribution. To analyze continuous variables, Student's t test or Mann-Whitney test was applied as appropriate. Categorical variables were expressed as counts and percentages, and the chi-square test was used to compare distributions.

In this study, five different ML algorithms were employed for model development: logistic regression (LR), K-nearest neighbors (KNN), adaptive boosting (AdaBoost), random forest (RF), and eXtreme Gradient Boosting algorithm (XGBoost). The MIMIC IV dataset was first divided into a training set (70%) and an internal validation set (30%). In addition, the Zhejiang Hospital dataset was used as the external validation set. In the validation process, a bootstrap resampling technique with 1000 iterations was adopted to evaluate the model performance. The area under the curve (AUC) and 95% confidence interval (CI) were calculated. In addition, several evaluation metrics were calculated, including accuracy, sensitivity, specificity, Youden index, and F1 score. The model performance is evaluated by performing 10-fold cross-validation and taking the average value. A grid search method was used for hyperparameter selection.

To evaluate the performance and clinical applicability of the predictive models, we generated calibration curves and clinical decision curves. The calibration curves were used to evaluate the predictive accuracy and calibration of the models by comparing the predicted probabilities with actual observations. Meanwhile, the clinical decision curves were used to determine the sensitivity and specificity of the models at different decision thresholds to optimize the predictive performance for clinical decision-making. After selecting the optimal model, we used the SHAP package in Python to demonstrate the importance of each feature. We then developed a web-based visual interface using Streamlit to demonstrate the capabilities of the selected machine learning models. Users can input relevant data parameters or upload datasets for real-time model evaluation. The models process the input data and generate prediction results based on the underlying learning patterns.

Statistical significance is P< 0.05 and all tests were two-sided. Statistical analyses were performed using R software (version 4.3.1) or Python software (version 3.11).



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *