The data used in this work was Srinivasan et al. was created by. 33It consists of 220 women with or without bacterial vaginosis (BV). BV was diagnosed based on nugent scoring shown based on gram staining tests of vaginal smears. Patients with a Nugent score of 7 or higher have been identified as BV positive, while patients with a Nugent score of 7 are identified as BV negative. Given the goals predicting bacterial vaginosis (BV), we used four machine learning (ML) models: random forest (RF), logistic regression (LR), support vector machine (SVM), and multi-layer percepron (MLP). The hyperparameters used to optimize each classifier are listed in Supplementary Table 1. Four metrics were used to evaluate the performance of the ML model in predicting BV using balanced accuracy (BACC), area under the precision recovery curve (AUPRC), false positive rate (FPR), and false negative rate (FNR).
Descriptive statistics
Within the dataset there were 220 women, of which 97 (44%) were white, 75 (34%) were black, and 48 (22%) were other ethnic groups (i.e. Asian, Hawaiian/Pacific Islander, American Indian/Alaskan natives, mixed, did not disclose ethnicity or reveal race). All ethnic categories were self-proclaimed. Figure 1 shows the percentage of BV diagnosis based on Nugent scoring, including ethnicity. 53% of women had positive BV diagnosis between black women and women, with a higher prevalence of BV compared to white women (Figure 1). The chi-square test showed an important link between ethnicity and BV results (p = 0.0001 <0.05). This work examines the impact of this association between ethnicity and BV outcomes on ethnic learning performance.

The BV diagnosis was based on a Nugent score. Patients with a Nugent score of 7 or above will be diagnosed as BV positive. Patients with a Nugent score <7 will be diagnosed as BV negative.
Figure 2a shows a two-dimensional T-partition probabilistic adjacent embedding (T-SNE) projection of operational classification unit (OTU) variables mapped to BV diagnosis based on Nugent scoring. From examining the T-SNE projection, most data can be separated by BV diagnosis. However, some samples are not well isolated in the T-SNE project, augmenting the challenges in diagnostics using AI/ML models. To further explore the effects of dominant bacterial species on BV diagnosis, T-SNE projections mapped to community status type (CST) classifications are shown (Figure 2B). The plot is well separated by CST, with most of CST I in BV negative clusters and most of CST IV in BV positive clusters. The mixed BV diagnostic cluster is largely composed of CST III, L. Inner Dominant microbiota for mixed diagnosis.

t-sne plot of 16S rRNA bacterial variables (a) BV diagnosis based on Nugent scoring andb) Community status type.
Figure 3 shows the percentage and counts of women in each CST across ethnic groups. CST IV is the primary CST for Black (56%) and other (50%) women. CST III, that's right L. Inner Dominant is the second most common condition type for women in these two groups (34.7% of black women and 25% of other women). CST I, L. Christapatus The dominant microbiota is the third most common CST among black women (8%), and is the women labeled as females (22.9%). In contrast, CST III is the most common condition type of Caucasian women in this cohort, followed by CST IV (33%) and CST I (26.8%). All three ethnic groups had only one CST V patient (L. jensenii ). Neither group had patients classified as CST II (L. Gasseri).

(Community State Type (CST) distribution (a) White, (b) black, and (c) Other ethnic groups. CST I is dominated L. ChristapatusCST II by L. Gasseri,CST III by L. Innerand CST V by L. jensenii. CST IV is made up of a variety of bacteria that are not controlled by lactic acid bacteria.
Model performance depends on the ethnicity of the BV diagnosis
Table 1 shows the average balanced accuracy (BACC), precision recall curve (AUPRC), false positive rate (FPR), and false negative rate (FNR) areas of the four ML models in predicting BV. Overall, the ML model worked well (BACC: 0.90–0.92; AUPRC: 0.93–0.96; FPR: 0.07–0.10; FNR: 0.10–0.10). Random Forest (RF) and logistic regression (LR) had better BV prediction performance compared to other models, depending on the metric. However, there were no statistically significant differences in performance metrics (Table 1).
When examining the performance of the ML model by ethnic groups, differences in predictive results were found (Figure 4, Supplementary Table 2). Overall, black women had the lowest balanced accuracy (BACC) (Figure 4A) and the highest FPRS (Figure 4C) in all models. In contrast, FNR tended to be lower in Caucasian women except when using a multilayer perceptron (MLP) model (Figure 4D).

a Box plot showing median, top quartile, bottom quartile, and outliers of balanced accuracy (b) Area under the PRECISION-RECALL curve (AUPRC) (c) False-positive rate (FPR), and d False negative rate (FNR). asterisk
Shows group pairs with statistically significant differences in model performance.
In summary, most models except MLP tended to perform worse among black women compared to white women and women of other ethnic groups. However, MLPs tended to perform most equally in all ethnic groups.
Use paired ethnicity training to improve model performance
This subsequent analysis sought to determine whether training and testing using data from the same ethnicity (i.e., training of paired ethnic groups) reduces ethnic disparities in model performance. Only logistic regression (LR) results are shown. This is because the overall balance was the highest accuracy (Table 1). White and black women's paired ethnic training (Figure 5, Supplementary Table 3) resulted in comparable or comparable performance as training in a sample of all ethnic groups. However, these improvements did not result in statistical significance. In contrast, for women of other ethnic groups that were statistically significant, all performance measures except FNR were reduced (balanced accuracy: p = 0.002; auprc: p = 0.037; FPR: p

Figure 5 Ten layered train test runs were performed (using nested grid search cross-validation in each run). aBox plot showing median, top quartile, bottom quartile, and outliers of balanced accuracy (b) Area under the PRECISION-RECALL curve (AUPRC) (c ) False-positive rate (FPR), and d
False negative rate (FNR). asterisk Shows group pairs with statistically significant differences in model performance. We also examined whether these models could be generalizable to ethnic groups not used in the training process (i.e., cross-training). Overall, cross-training tended to improve predictive performance among women from other ethnic groups (Figure 5, Supplementary Table 3), with particularly well-balanced accuracy (white: p = 0.048), fpr(white: p = 0.005; Black: p= 0.012), and fnr(white: p= 0.046; Black: p= 0.039). In contrast, we found that paired ethnic training tends to improve predictive outcomes for black women compared to cross-training using data from women from other ethnic groups (BACC: p= 0.003; FPR: p= 0.004; FNR: p = 0.01). Similarly, paired ethnic training often has a higher predictive performance for white women than cross-training with data from women from other ethnic groups (balanced accuracy: p= 0.006; auprc: p= 0.006; FPR:
p
= 0.006). Bacterial taxa has been emphasized as important for predicting BVFunctional selection methods were used to identify bacterial taxa that contributed to accurate BV diagnosis. It was used to extract important bacterial taxa using the following feature selection methods: Gini index, t-test, F-test, and point quadratic (PB) correlations. Both
p

Figure 6: Model performance of the LR classifier with and without function selection. Figure 6Ten layered train test runs were performed (using nested grid search cross-validation in each run).a –
Boxplots showing median, upper quartiles, lower quartiles, and outliers of balanced accuracy, precision recovery curves (AUPRC), false positive rates (FPR), and false negative rates (FNR). Table 2. Overall model performance of the LR model is from the perspective of balance accuracy (BACC), AUPRC, false positive rate (FPR), and false negative rate (FNR) at 95% confidence intervalsTo further explore ways to improve model performance equity, features identified as important for BV diagnosis in each ethnic group were used to independently train ML models using GINI indexing methods. For BV diagnosis, unique bacterial taxa were found in each ethnic group-specific subset (Fig. 7). Eggerthella sp. Type 1and Atopobium vaginae (fannyhessea vaginae)It corresponds to important bacterial taxa identified as the most important in the BV diagnosis of Caucasian women in this cohort and identified throughout the cohort. in contrast, Gardnerella vaginalisand L. ChristapatusIt was found to be an important predictor of BV for women of other ethnic groups. Dialister sp. Type 2and

Figure 7: Identification of important bacterial taxa.
Figure 7 Shared upper bacterial taxa showing BV identified using the GINI index are provided overall across each ethnic group.
Source link
