Study design and participants
ALD was diagnosed on the results of alcohol history, liver biopsy, blood chemistry, or imaging study (ultrasound or computed tomography scan). The ALD group was subgrouped by control, ELE, cirrhosis, and liver cancer. Alcoholic ELE patients were defined if they had abnormal liver enzymes [aspartate aminotransferase (AST) ≥ 50 IU/L, AST/alanine aminotransferase (ALT) > 1.5, and AST and ALT < 400 IU/L] and excessive alcohol consumption (male > 60 g/day and female > 40 g/day) with last alcohol drink within 8 weeks of jaundice onset (bilirubin > 3 mg/dL). The control group was recruited from a medical check-up center. Patients meeting the following criteria were excluded: age > 70 years, hepatitis A, B, C, and E virus-related cirrhosis, HIV infection, Wilson’s disease, biliary obstruction, sepsis, drug-induced liver injury, autoimmune LD, history of high-dose steroid or antibiotics, presence of liver tumor or history of other cancer, or pregnancy.
MASLD was diagnosed on the definition as described in the consensus statement on new fatty liver disease nomenclature36,37. Cardiometabolic criteria including body mass index, fasting glucose, medication history, blood pressure, or cholesterol level were used for the diagnosis of MASLD. Normal control group did not have cardiometabolic criteria. Patients with elevated liver enzymes [AST or ALT ≥ 50 IU/L] or hepatitis on liver pathology were included in the ELE group. They did not drink excessive alcohol (male > 210 g/week and female > 140 g/week). Patients with autoimmune LD, alcohol use disorder, pancreatitis, hemochromatosis, viral LD, pregnancy, Wilson’s disease, drug-induced liver injury, and other cancers were excluded.
Cirrhosis was diagnosed based on the presence of complications (varix, ascites, and encephalopathy), blood tests, imaging findings, fibroscan, or pathological liver results. Liver cancer was diagnosed by two or more imaging tests, such as computer tomography, magnetic resonance imaging, angiography or contrast ultrasound. In addition, subjects taking drugs that affect the gut microbiota were excluded at enrollment. For the control group, we included healthy subjects who visited the center for a health check-up.
Baseline studies included family history, diet pattern, alcohol history, abdominal ultrasound and computed tomography scan, X-ray, electrocardiography, complete blood count, electrolytes, liver function test, viral markers, and Child‒Pugh score. Blood analysis was performed using standard methodologies. Serum biochemical parameters included AST, ALT, albumin, bilirubin, alkaline phosphatase (ALP), gamma glutamyl transpeptidase (GGT), blood urea nitrogen, creatinine, international normalized ratio, α-fetoprotein, carcinoembryonic antigen, prothrombin time, blood glucose, and total cholesterol. The levels of hepatitis A, B, and C and other virus markers were evaluated. Antinuclear antibody, antimitochondrial antibody, and antismooth muscle antibody tests were also performed.
This project followed the ethics of the 1975 Helsinki Declaration, as reflected by a prior approval by the Chuncheon Sacred Heart Hospital institutional review board for human research (2016–134). Informed consent was obtained from all participants. All authors had access to the study data and reviewed and approved the final manuscript.
Validation cohort and data collection
Machine learning with internal validation was performed in 464 subjects, and 210 subjects were divided into an external validation group. For external validation, patient data collected from hospitals in different regions were used.
Stool sample and processing
Sequencing was carried out according to the manufacturer’s instructions at Chunlab, Inc. (Seoul, Republic of Korea) with the Illumina MiSeq platform using reagent kit V3 in PE 250 bp mode. Microbiome taxonomic profiling was conducted with the EZBioCloud platform (ChunLab Inc., Republic of Korea) using the database version PKSSU4.0.
Human feces were stored at − 20 °C as soon as the patient received 2–3 g of feces using the kit (stool paper and stool box) and moved to − 80 °C within 1 day. Genomic DNA for metagenomic sequencing was extracted with a QIAamp stool kit (Qiagen, Hilden, Germany), and the library was prepared with a NEBNext Ultra II FS DNA Library Prep Kit for Illumina (New England BioLabs, Ipswish, MA, USA) according to the manufacturer’s directions. The quantification of libraries was checked using a Qubit dsDNA HS assay kit (Thermo Fisher Scientific, Waltham, MA, USA) and confirmed by quantitative polymerase chain reaction (qPCR) with a KAPA SYBR FAST qPCR Master Mix kit (Kapa Biosystems, Wilmington, MA, USA). The quality of the libraries was assessed on a Bioanalyzer 2100 (Agilent, Santa Clara, CA, USA) using a DNA 12,000 chip. All libraries were sequenced on the NovaSeq 6000 platform (Illumina, USA) with paired-end 150 bp reads.
The analysis was performed following our previous reference38. In brief, DNA was extracted with a QIAamp stool kit, and amplification of the V3-V4 region of the bacterial 16S rRNA gene was conducted using barcoded fusion primers. The forward fusion primer contained the p5 adapter, i5 index, and gene-specific primer 341F (5′-AATGATACGGCGACCACCGAGATCTACAC-XXXXXXXX-TCGTCGGCAGCGTCAGATGTG TATAAGAGACAG-CCTACGGGNGGCWGCAG3′; underlining indicates the target region primer and X indicates the barcode region), and the reverse fusion primer contained the p7 adapter, i7 index, and gene-specific primer 805R (5′-CAAGCAGAAGACGGCATACGAGATXXXXXXXXGTCTCGTGGGCTCGGAGATGTG TATAAGAGACAG-GACTACHVGGGTATCTAATCC-3′), which included sequencing adapters and dual-index barcodes of the Nextera XT kit (Illumina, San Diego, CA, USA). The amplification was performed in the C1000 touch thermal cycler PCR system (Bio-Rad Laboratories, Inc., Hercules, CA, USA) with the following conditions: initial denaturation of 3 min at 95 °C; followed by 25 cycles of denaturation at 95 °C for 30 s, annealing at 55 °C for 30 s, extension at 72 °C for 30 s and final extension at 72 °C for 5 min. Each amplified PCR product was confirmed with 1% agarose gel electrophoresis and visualized on a Gel Doc XR + imaging system (Bio-Rad Laboratories, Inc., USA). The amplified products were purified and size-selected by Agencourt AMPure XP beads (Beckman Coulter, Chaska, MN, USA). The library was constructed with pooled PCR products, and the quality of the library was assessed on a Bioanalyzer 2100 (Agilent, USA) using a DNA 12,000 chip and quantified by qPCR with a KAPA SYBR FAST qPCR Master Mix kit (Kapa Biosystems, USA).
Sequence and statistical analysis
The sequencing data were processed using the Quantitative Insights Into Microbial Ecology (QIIME version 2). The low-quality sequence reads were removed following the criteria: (1) reads with a length of < 150 bp, (2) reads with an average Phred score of < 20, (3) reads containing ambiguous bases, and (4) reads containing mononucleotide repeats of > 8 bp. High-quality reads were clustered into 16S rRNA operational taxonomic units (OTUs) with ≥ 97% sequence homology39. The taxonomic classification of each OTU was performed with VSEARCH by comparing the representative sequence set against the SILVA reference database40.
After filtering samples with read counts greater than 500, the reads were grouped at the phylum level (using phyloseq), and the relative abundance was estimated at the phylum level by groups. There was a total of seven phyla. The taxa were sorted by abundance to improve the visualization and then plotted on box plots according to group and faceted by phylum using the raw counts. Many samples had a high number of Bacteroidetes, followed by Firmicutes and Proteobacteria. Most samples had low read counts for other phyla, with some outlying samples. To formally test for a difference in the phylum-level abundance, a multivariate test for differences in the overall composition between groups of samples was conducted by using the HMP package; herein, a Dirichlet-multinomial distribution is assumed for the data, and null hypothesis testing is conducted by testing for a difference in the location (mean distribution of each of the taxa) across groups accounting for the overdispersion in the count data.
Taxon abundance at the phylum, class, order, family, genus, and species levels was calculated and statistically compared among groups using the R stats package. Based on the tables generated in QIIME, alpha diversities, including Chao1, Simpson, and Shannon, were calculated. The significant differences in alpha diversity metrics were determined using the R package “Vegan”. To investigate the structural variation in microbial communities, beta diversity analysis was performed using UniFrac distance metrics 24 and was visualized via principal component analysis (PCA), principal coordinate analysis (PCOA), and nonmetric multidimensional scaling (NMDS). We checked the separation between other disease groups and normal group samples, suggesting some differences in the communities according to sample type. We tested whether the samples clustered beyond that expected by sampling variability using ADONIS.
The linear discriminant analysis (LDA) effect size (LEfSe) method based on the Kruskal‒Wallis test was conducted to identify significantly abundant taxa between different groups, where a linear discriminant analysis score > 2.0 was defined as the threshold for selecting the discriminative features. Quantitative data are expressed as the mean ± standard deviation (SD) unless otherwise stated. Comparisons were made utilizing analysis of variance (ANOVA), Kruskal‒Wallis test, general linear model analysis (repeated regression), paired sample t test, or independent-sample t test for continuous variables. The chi-square test or Fisher’s exact test was used for the comparison of groups. Data were analyzed with statistical software (SPSS, version 22.0, SPSS, Inc., Chicago, IL, USA), R stats package (www.r-project.org, Austria), and GraphPad Prism version 8.0 for Windows (GraphPad Software, San Diego, CA, USA). For all tests, p values < 0.05 were considered significant.
Machine learning modeling for OTU feature reduction and classification
The abundance of microorganisms at the genus level was used as a feature. The abundance was normalized by applying a centered log ratio (clr) transformation. Three different nonsupervised ML algorithms for feature reduction were trained with the features of the normalized OTUs on the platform: independent component analysis (ICA), principal component analysis (PCA), and random projection (RP), considering 20, 40, 60, and 80 reduced features. For each of these numbers on the platform, four different supervised ML algorithms for classification were trained using the reduced features: support vector machine (SVM), random forest (RF), multilevel perceptron (MLP), and convolutional neural network (CNN) (Table 2 and Table S4). To evaluate the performance concerning model architectures for predictive classification and diagnostics of LDs, SVM, RF, MP, and CNN algorithms were constructed using various numbers of features reduced from the three nonsupervised ML algorithms. Additionally, four different supervised ML algorithms for classification were also trained with features having LDA score greater than 2.0. Data were assigned into training (70%) and testing internal validation (30%) datasets. The training performance of the different ML models was evaluated using Stratified ShuffleSplit tenfold cross-validation41, and the process was repeated 10 times. Hyperparameter tuning was automatically executed by caret, testing 60 different values for each hyperparameter.
A total of 52 combinations of MLs (ML for feature reduction with an ML for classification) for each of four different numbers (20, 40, 60 and 80) of reduced features were performed. The process was optimized using 60 hyperparameter variations and evaluated by Stratified ShuffleSplit tenfold cross validation. In the testing internal validation phase, the prediction performance of each combination of ML models was assessed using parameters including the area under the receiver operating characteristic curve (AUC), accuracy, recall, precision, and F1 score. Box plot representations of the AUC, accuracy, recall, precision, and F1 score values were generated using the ggplot2 package in R. The entire process was repeated for each pair of subgroups (7 ALD and 1 MASLD) within each group.
The computing machine we used for timestamped runs is on Ubuntu 18.04 and is equipped with an Intel Core i9-9820X CPU (10 cores), 64 GB memory, and an NVIDIA GTX 1080 Ti GPU. The scikit-learn Python library was utilized for ICA, PCA, RP, SVM, and RF. Additionally, the Keras Python library was employed for MLP and CNN.