CarbaDetector: Machine learning model for detecting carbapenemase-producing gut bacteria from disk diffusion tests

Machine Learning


Bacterial strain collection

This study consisted of 385 non-overlapping clinical trials. Enterobacteriales Isolates were collected for routine diagnostics at Cologne University Hospital and Oldenburg Clinic from 2012 to 2021. Species identity was determined using MALDI-TOF mass spectrometry and confirmed by whole genome sequencing (WGS). Of the total isolates, 238 (61.8%) were carbapenemase-producing bacteria and 147 (38.2%) were negative for carbapenemase. Molecular characterization of all isolates was performed by WGS on the Illumina platform, as previously described.twenty one. Briefly, DNA was extracted from pure bacterial cultures using the DNeasy UltraClean Microbial Kit (Qiagen, Hilden, Germany). Whole genome sequencing was performed by Novogene (Beijing, China). Genomic DNA libraries were prepared using the Novogene NGS DNA Library Prep Set with an average insert size of 350 bp, followed by paired-end 150 bp sequencing on the Illumina NovaSeq platform (Illumina, San Diego, CA, USA). Confirm the presence or absence of carbapenemase gene using ResFinder v4.7.222,23. The molecular characterization results were used as a reference standard to evaluate the performance of the algorithm. Six species accounted for 88.8% of the isolates. Klebsiella pneumoniae, Escherichia coli, C. Freundi, E. cloacae, P. mirabilis and S. marcescens. The most frequently present carbapenemase groups are braOXA-48-ish (46.6%). Detailed characteristics of the isolates and datasets are provided in the Supporting Information.

Susceptibility test

Susceptibility testing was performed at the Institute of Medical Microbiology and Virology at the University of Oldenburg according to EUCAST standards.20meropenem, ertapenem, imipenem, meropenem-vaborbactam, ceftazidime-avibactam, ceftolozane-tazobactam, temocillin (Oxoid, Basingstoke, UK), and imipenem-relebactam (Mast Group, Merseyside, UK) were placed on Muller-Hinton agar (Oxoid, Basingstoke, UK). The zone of inhibition was measured manually.

Evaluation of the performance of the new CA-SFM algorithm and the EUCAST screening process

To baseline our models, we evaluated the CA-SFM and EUCAST screening algorithms for carbapenemase detection by using WGS results as ground truth and applying them to all three datasets. To develop general-purpose algorithms using R (part (4.1.24) and random forest (4.7.1.2) Package24,25), we constructed a decision tree and a random forest model by (i) using species and standard-scale inhibition zone diameters, and (ii) further using scale differences in inhibition zone diameters. To compensate for laboratory-specific differences between measurements, the difference in inhibition zone diameter (rather than just the raw diameter) was included once for each pair of antibiotics. To increase sensitivity, several cutoffs (0.5, 0.6, 0.7, 0.75) for random forest model classification were evaluated, and the final cutoff was 0.6. This means that the sample is predicted to be “negative” if the probability is determined to be greater than 60%, as opposed to the default of 50%.

To estimate model performance, we used nested cross-validation with 10 outer folds and 10 inner folds. nested cvs R package26. Where possible, sampling was stratified with respect to species and presence of carbapenemase genes. Class weighting was applied to address the imbalanced distribution between carbapenemase-negative and positive samples.

After estimating performance on our own dataset (Supplementary Data 1), we trained the final model on the entire dataset using hyperparameter tuning with 10-fold cross-validation and applied it to an external dataset for additional validation.

Validating the algorithm using external datasets

To further validate the trained model and its correct CPE prediction, the resulting model (CarbaDetector) was first used to predict carbapenemase production in a set of 282. Enterobacteriales Isolates from Switzerland (University of Zurich) with and without carbapenemase production (included in External Dataset A, Supplementary Data 2). In this dataset, inhibition zone diameters were determined for all eight antibiotics used in the algorithm.

The prediction of carbapenemase production with an incomplete dataset (all eight recommended antibiotic disks were not used) was then tested on another previously published dataset containing 518 disk diffusion diameters. Enterobacteriales Isolates submitted to the French multidrug-resistant Gram-negative reference laboratory for carbapenemase testing (external dataset B included in Supplementary Data 3, originally used to evaluate the CA-SFM algorithm)16). Here, the diameter was measured using SIRscan and manually verified. Inhibition zone diameters for ertapenem, meropenem, imipenem, temocillin, and ceftazidime-avibactam were used to impute missing values ​​for imipenem-relebactam, meropenem-vaborbactam, and ceftolozane-tazobactam based on the dataset. miss ranger R package27. The constructed model was then used to predict the presence or absence of carbapenemase production. Information regarding statistical analysis and app development is provided in the Supplementary Information.

Ethics approval

Bacterial strains were isolated during routine diagnostics and anonymized. As no patient data were analyzed, ethical approval was not required for this type of research according to Article 15 of the Professional Code of Physicians.

Report overview

For more information on the study design, please see the Nature Portfolio Reporting Summary linked in this article.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *