ethics
This study was conducted in accordance with the Declaration of Helsinki and reported following enhanced reporting of observational studies in epidemiology. Samsung Medical’s Institutional Review Board has waived its approval because the registry is managed in an anonymized format (Samsung Medical Center, 81 Irwon-ro, Gangnam-gu, Seoul, Korea, 2021-06-078 Chairperson Prof. SW Park) On 26 June 2021, written informed consent from the participants was also waived. The use of the dataset for external validation was approved by the Institutional Review Board of Ajou University Hospital (World cup-ro, Yeongtong-gu, Suwon, Korea, AJIRB-MED-MDB-21-662 Chairperson Prof. SU Han) it was done.
Data curation and research populations
This study used the Samsung Medical Center-Non-Cardiac Operation (SMC-NoCop) registry (cris.nih.go.kr; registration number KCT 0006363; registration date 21/07/2021). This registry lists 203,787 of her consecutive patients aged 18 and older who underwent noncardiac surgery at her Samsung Medical Center in Seoul, South Korea between January 2011 and June 2019. Single-center, anonymized cohort. The registry is based on raw data. Extracted by the clinical data warehouse Darwin-C. This is an electronic system that allows researchers to search and retrieve anonymized medical records in an institution’s electronic archiving system. The system contains electronic hospital records for over 4 million patients, comprising over 900 million laboratory findings and her over 200 million prescriptions. For out-of-facility deaths, the system uses data from the Korea Statistics Office’s National Census Register.
Data from Ajou University Medical Center were used for external validation. Using the same recruitment criteria, curating data from January 2011 to October 2021, he included 101,582 patients in an externally validated dataset.
predictor
A total of 54 predictor variables obtained from preoperative evaluation sheets were provided as inputs to each model (Additional File 1: Table S1). Investigators independent of this study organized relevant preoperative variables, including information from demographic data, underlying medical conditions, and blood tests.In addition, the International Classification of Diseases-10 code was used to organize preoperative diagnoses and estimate the Charlson Comorbidity Index [12]Surgical procedure risks were stratified according to the European Society of Cardiology (ESC)/European Society of Anesthesiologists (ESA) guidelines for non-cardiac surgery. [13]The American Society of Anesthesiologists (ASA) Physical Condition Classification was categorized by attending anesthesiologist and extracted from preoperative evaluation sheets [14].
Study endpoints and definitions
The primary endpoint was postoperative delirium diagnosed by a psychiatrist using Diagnostic Statistical Manual (DSM) criteria during the first 30 days postoperatively. Patients assessed for acute confusion or behavioral changes using the Confusion Assessment Method (CAM) were referred to psychiatry at the discretion of the attending physician. Specifically, the CAM is based on her four features of delirium (including acute onset and fluctuating course, inattention, disorganized thinking, and altered level of consciousness). CAM considers a patient delirium when it is accompanied by acute onset, fluctuating course, and inadvertent, disorganized thinking or altered level of consciousness. For referred patients, the attending physician will use the Diagnostic and Statistical Manual (DSM) criteria to assess the patient for delirium. To ensure an initial diagnosis of delirium, we excluded patients with a preoperative history of delirium or dementia.
model development
We compared the performance of predictive models created by four machine learning algorithms: extreme gradient boosting (XGB), random forest (RF), logistic regression (LR), and naive Bayes (NB). See Supplementary File 1: Table S2 for details of the machine learning algorithms.
model evaluation
To evaluate the predictive model, we calculated four indices: accuracy, F1 score, area under the precision and recall curve (AUPRC), and area under the receiver operating characteristic curve (AUROC). Hyperparameters were optimized based on grid search using AUROC curves and 5-fold cross-validation used during model development. A stratified random split with constant event probability was used to split the data into training and test models. Postoperative delirium was the event in this study, with 80% of the data reserved for building the machine learning model and the remaining 20% for the test model. In addition, we included calibration metrics for calibration plots, calibration slopes, intercepts, Spiegelhalter z-statistics, and Brier scores. Using the Spiegelhalter z statistic, P.> 0.05 indicates a well calibrated model [15]We used the maximum Youden index to select the optimal cutoff value for each predictive model and calculated the corresponding accuracy [16]We also generated a case-sensitive dataset for internal validation.
SHapley Additive exPlanations (SHAP) summary plots were used to demonstrate feature importance. The effect of each function on postoperative delirium was presented as a SHAP value representing variable importance by deriving marginal distributions and weighted means fixing all but the variable of interest. [17]The Shapley value is defined as the average marginal contribution of feature values over the union of all possible features. With this definition, the Shapley value for a given feature value can be interpreted as the difference between the actual prediction and the average prediction for the entire data set. The SHAP summary plot sorts features in descending order based on their impact on postoperative delirium. One dot on each variable line represents one patient, and the horizontal position indicates the level of association between features and outcomes. On the right is where the SHAP value > 0, indicating that her SHAP value > 0, which is variable-specific, is a high risk of the outcome.
A sub-analysis using an internal validation dataset was conducted to validate the predicted delirium outcome. Among the sub-analyzed patients, patients were divided into high-risk and low-risk patient groups according to the final predictive model. Kaplan-Meier and Cox survival analyzes were used to analyze differences in delirium incidence in high-risk and low-risk patient groups.
External verification
To confirm the validity of the model performance, we conducted external validation using another dataset from Ajou University Medical Center. The best performance model with five selected variables was validated.
statistical analysis
Differences between patients with and without postoperative delirium were determined.Continuous features were presented as mean±s.d. or median interquartile range and comparisons were made using t-test or Mann-Whitney test as appropriate. Categorical features were expressed as numbers and percentages, and differences were assessed using chi-square or Fisher’s exact test. Survival analyzes were performed using the survival package and P-values for comparing survival rates were obtained using the log-rank test. Analyzes were performed using R 4.1.0 (Vienna, Austria; http://www.R-project.org/).
