Machine learning framework for predicting susceptibility to obesity

Machine Learning


In this part, we will assess our framework. Our framework is implemented through three stages: preprocessing stage (PS), feature stage (FS), and obesity risk prediction (ORP). Several steps are taken to preprocess the input data through PS, including the filling of missing values, feature encoding, the removal of outliers, and normalization. Next, the preprocessed features are sent to FS to identify the paramount features utilizing the proposed entropy controlled-quantum bat algorithm (EC-QBA). EC-QBA is a new feature selection methodology that is based on BA with two variations. The first is updating BA parameters (i.e., frequency, loudness, and pulse rate) using normalized entropy. The second variation involves updating the new position through a quantum mechanism. Finally, these selected features are sent to several ML algorithms, and the decision is made by the majority vote. Our framework is a viable way to predict obesity risk because it combines smart feature selection, deep representation learning, and attention-driven interpretability. The execution of the suggested system is predicated on detailed information on individuals, including essential characteristics such as gender, age, height, weight, familial predisposition to obesity, dietary practices, physical activity, mode of transportation, and associated obesity levels50. The prediction model is evaluated using cross-validation. The dataset is divided into ten equal parts, with one part serving as the test set and the other nine as training sets. Ten-fold cross-validation is used in this paper to achieve this. Consequently, 18,682 patients (90%) are involved in the training phase, while 2076 patients (10%) are involved in the testing phase. All experiments were conducted on a system with an Intel Core i7 CPU, 16 GB RAM, and Python 3.10. Table 4 presents the parameters utilized and the values implemented.

Table 4 The parameters used along with their corresponding values.

Dataset description

In this study, we used a dataset that was retrieved from Kaggle50 consisting of 20,758 cases, each with 16 features related to demographic, behavioral, and lifestyle factors. These features include gender, age, height, weight, family_history_with_overweight, Frequent consumption of high-caloric food (FAVC), Frequency of consumption of vegetables (FCVC), Number of main meals (NCP), Consumption of food between meals (CAEC), smoke, Daily water consumption (CH2O), Caloric beverages consumption (SCC), Physical activity frequency (FAF), Time spent using technological devices (TUE), Consumption of alcohol (CALC), Mode of transportation (MTRANS). Table 5 provides an overview of the structure of the data set. Figure 15 shows an example of a sample from the used dataset, while Figs. 16 and 17 show the distribution of obesity levels and gender representation, respectively.

Table 5 The distribution of the used dataset according to gender.
Fig. 15
figure 15

A sample of the used dataset.

Fig. 16
figure 16

The distribution of weights type.

Fig. 17
figure 17

The distribution of the used dataset according to gender.

Table 6 also includes the range of values, including minimum and maximum, average, and standard deviation. At its lowest point, the recorded value indicates the dataset’s lowest magnitude. The highest recorded value, or maximum value, indicates the largest observed value. As a measure of the typical value, the mean (or average) is calculated by taking the mathematical mean of all the data points in a set. Data variability can be better understood by calculating the standard deviation, which measures the dispersion of data points relative to the mean. To better evaluate and understand the experimental results, the variable or parameter under analysis has its extreme values, variability, and mean value clarified. Figure 18 displays the features from the dataset in a boxplot. To further evaluate the association between each characteristic and the type of obesity, correlation analysis was also performed. Figure 19 shows the results of the correlation analysis that was used to determine if the relationship between each pair of variables was mostly positive or negative.

Table 6 The statistical analysis of the used dataset.
Fig. 18
figure 18

The boxplot of features in the used dataset.

Fig. 19
figure 19

A heat map showing the overall correlation between all variables and the correlation between features and obesity type.

Additionally, Fig. 20 displays histograms that are normally distributed, categorizing all features within the specified value range. The x-axis denotes the nature or type of each attribute, whereas the y-axis indicates the corresponding feature value. Moreover, Fig. 21 indicates the obesity level with respect to age.

Fig. 20
figure 20

Histogram of features of the used dataset.

Fig. 21
figure 21

Evaluation metrics

In the upcoming tests, evaluation metrics like precision, sensitivity, accuracy, and F-measure will be calculated. The following equations can be used to calculate these metrics51:

$$\:Precision=\frac{{T}_{P}}{({T}_{P}\:+\:{F}_{P})}$$

(25)

$$\:Sensitivity=\frac{{T}_{P}}{({T}_{P}\:+\:{F}_{N})}$$

(26)

$$\:Accuracy=\frac{({T}_{P}\:+\:{T}_{N})\:}{\:({T}_{P}\:+\:{T}_{N}+\:{F}_{P}\:+\:{F}_{N})}\:$$

(27)

$$\:F-measure=\frac{2*Recall*Precision}{\left(Recall+Precision\right)}$$

(28)

Where \(\:{T}_{P}\) refers to true positive, which signifies the ratio of anticipated positive samples. \(\:{T}_{N}\) is a true negative, signifying the anticipated quantity of non-positive samples. \(\:{F}_{P}\) is a false positive, representing the ratio of expected positive instances within negative samples. \(\:{F}_{N}\) is a false positive, denoting the number of positive samples expected to be misclassified as negative.

Testing the proposed entropy controlled-quantum Bat algorithm (EC-QBA)

In this subsection, the proposed entropy controlled-quantum bat algorithm (EC-QBA) will be evaluated. To prove the effectiveness of the proposed EC-QBA, we compared it with the most recent feature selection method using NB as a base classifier. These methods are the new hybrid feature selection method (NHFSM)52, modified grey wolf optimization (MGWO)53, Bat Algorithm with the residue number system (BA-RNS)54, sheep-tuna halcyon integrated optimization (STHIO)55, and quantum particle swarm (QPSO)56. Results were shown in (Fig. 22).

Fig. 22
figure 22

Comparison between the most recent feature selection methods and EC-QBA.

As shown in Fig. 22, the proposed EC-QBA outperforms the other feature selection methodology with regard to accuracy, precision, sensitivity, and F-measure. It offers 96% accuracy, 96% precision, 96.5% sensitivity, and 96.25% F-measure. The worst performance was achieved by NHFSM, which reported 89% accuracy, 87% precision, 88.5% sensitivity, and 87.74% F-measure; MGWO followed with 90% accuracy, 90% precision, 90% sensitivity, and 90% F-measure. In conclusion, the best performance is obtained by the proposed EC-QBA, as it is based on a very effective strategy of adjusting BA parameters; this strategy uses Shannon entropy. Additionally, EC-QBA uses a quantum mechanism to update the BA solution in local search. Consequently, it avoids BA’s drawbacks.

In addition, to further evaluate the efficiency of the proposed EC-QBA, we fed the selected features into Logistic Regression (LR) as a baseline model to highlight EC-QBA’s added value. According to the obtained results, it introduces an accuracy of 78% without using EC-QBA. While when using EC-QBA, it introduces 86% with a 9% improvement. This proves the value that EC-QBA added to any ML model, even one.

Furthermore, We assessed the computational cost of the EC-QBA algorithm regarding execution duration and memory utilization, juxtaposing it with various conventional feature selection algorithms, including Recursive Feature Elimination (RFE) and Mutual Information (MI). Table 7 illustrates that the EC-QBA algorithm exhibits a longer execution time with an average value of 13.85 s and, with 136.7 MB, higher memory consumption attributable to its implementation of a transaction control mechanism utilizing entropy and a quantum update mechanism. Notwithstanding the elevated computational cost, the superior feature selection accuracy achieved through EC-QBA markedly improves the predictive model’s performance, rendering it justifiable in scenarios necessitating high precision.

Table 7 Comparison between EC-QBA and other competitors in terms of computational cost.

Comparison between EC-QBA and standard approaches

In this section, the efficacy of the proposed EC-QBA is demonstrated by comparing several methods for selecting features using LR classifier, which serves as the base classifier. These approaches were recursive feature elimination (RFE), principal component analysis (PCA), genetic algorithm, grey wolf optimization (GWO) and particle swarm optimization (PSO). Results were shown in (Table 8).

Table 8 Comparison between EC-QBA and standard approaches based on LR.

As shown in Table 8, the proposed EC-QBA outperformed the other standard algorithms in terms of accuracy, precision, sensitivity, and F-measure. It introduces 86.0% ± 1.0 accuracy, 84.1%± 0.9 precision, 83.8%±1.0 sensitivity, and 83.9%± 0.9 F-measure. The lowest performance was obtained by RFE as it introduced 75% ± 1.5 accuracy, 72%± 1.2 precision, 70%± 1.0 sensitivity, and 70.98% ± 1.3 F-measure. According to the Table 8, the proposed EC-QBA is more effective in selecting the most important features than other traditional methods when LR used as a base classifier.

Ablation studies: evaluating the impact of key components

In this section, we will test the effect of each of the proposed EC-QBA components to demonstrate their effectiveness. To achieve this goal, first, exclude entropy control and quantum jump, and only use Binary Bat Algorithm (FS_modeL1). After that, we exclude the use of quantum jumps and remain only with the use of entropy control (FS_modeL2). Finally, we exclude the use of entropy while updating the bat frequency, loudness, and pulse rate in a normal manner, leaving only quantum jump (FS_modeL3) as presented in (Table 9). Results are shown in (Table 10).

Table 9 The structural components within the EC-QBA.
Table 10 Evaluating the impact of key components of EC-QBA.

Table 9 demonstrates that ablation studies were conducted to evaluate components that substantially influence accuracy and reliability in selecting the most important features. Each experiment isolates a fundamental element of our proposed method. As shown in Table 10, when using traditional BBA for feature selection (FS_modeL1), it introduces 89.0% ± 1.3 accuracy, 75.0% ± 1.5 precision, 74.6% ± 1.2 sensitivity, and 74.8% ± 1.1 F-measure, which are the worst results. While when excluding the quantum jump in updating the position of the bat in the next iteration in local search, it introduces 94.3% ± 0.8 accuracy, 87.0% ± 1.0 precision, 87.2% ± 0.9 sensitivity, and 87.1% ± 0.9 F-measure. Additionally, when excluding entropy control (FS_modeL3), it introduces 92.0% ± 1.0 accuracy, 86.0% ± 1.1 precision, 86.4% ± 1.0 sensitivity, and 86.2% ± 1.1 F-measure. Finally, the obtained results illustrate the ability of the proposed EC-QBA to identify the most critical features.

Testing the proposed oberisk

Through this subsection, our framework will be tested and evaluated to select the most efficient algorithm to complete the proposed model. In fact, ObeRisk consists of three main parts; the first part (i.e., PS) is responsible for cleaning the used dataset and making it in suitable form for the next part (i.e., FS). In FS, the proposed EC-QBA is employed to select the most significant and beneficial features. Finally, these effective features are sent to several ML algorithms, and the decision is made by the majority vote. These ML algorithms are LR, KNN, SVM, LGBM, XGB, MLP, and AdaBoost. Each model was trained using 10-fold cross validation, and results were shown in (Table 11).

Table 11 Comparison between using single classifier and ObeRisk.

Among all compared models, ObeRisk achieved the best overall performance with an accuracy value of 97.1% ± 0.4, significantly outperforming traditional models like LR with an accuracy value of 86.0% ± 1.2 and KNN with an accuracy value of 73.8 ± 2.4. Other algorithms that performed strongly included XGB with an accuracy value of 91.2% ± 0.7 and LGBM with an accuracy of 90.1% ± 0.9, although they still trailed ObeRisk by a significant margin. Consequently, depending on several ML algorithms is more important than using a single model. We utilize our majority vote-based framework to enhance classification results by integrating basic models. However, it also presents high computational complexity.

Finally, we used SHapley Additive exPlanations (SHAP) values to learn more about how the models made their decisions. SHAP analysis sheds light on the relative importance of features for each classifier’s predictions. A SHAP summary plot shown in (Figs. 23, 24, 25, 26, 27, 28, 29) illustrates the average effect of each feature on each model’s predictions. According to (Figs. 23, 24, 25, 26, 27, 28, 29), The main factors impacting the model’s predictions of obesity risk are illuminated by the SHAP value analysis. Interestingly, “weight” turns out to be the most important aspect. The SHAP analysis offers critical insights into each model’s reasoning and underscores the necessity of incorporating both recognized risk factors and potentially innovative indicators such as “time” for a thorough obesity risk evaluation.

Fig. 23
figure 23

SHAP summary plot of feature importance of LR classifier’s output.

Fig. 24
figure 24

SHAP summary plot of feature importance of KNN classifier’s output.

Fig. 25
figure 25

SHAP summary plot of feature importance of SVM classifier’s output.

Fig. 26
figure 26

SHAP summary plot of feature importance of LGBM classifier’s output.

Fig. 27
figure 27

SHAP summary plot of feature importance of XGB classifier’s output.

Fig. 28
figure 28

SHAP summary plot of feature importance of MLP classifier’s output.

Fig. 29
figure 29

SHAP summary plot of feature importance of AdaBoost classifier’s output.

Comparing oberisk with the state of the Art

Via this section, our framework will be evaluated, keeping EC-QBA for feature selection and OPR for prediction, and results were shown in (Table 12). Also, to prove the effectiveness of the proposed ObeRisk, it is compared against the latest methodologies that were discussed in the literature review. These methods are ML19, CIM20, CDSS21, DeepHealthNet23, and ML-XAI26. All models, including ObeRisk and the comparison models, were evaluated using the same dataset under consistent experimental conditions. Results were shown in (Fig. 30).

Table 12 Performance of oberisk in terms of accuracy, precision, sensitivity, and F-measure.

The results for each fold were presented, and the average values were calculated in (Table 12). Our framework’s accuracy, precision, sensitivity, and F-measure for predicting individuals with obesity were displayed in (Table 12). According to Table 12, the performance values of our framework were best for 1, 2, 4, 5, 6, 7, 8, 9, and 10-fold, while they are lowest for 3-fold. The lower precision values were presented in the 3-fold and 7-fold with values of 93.5 and 93.6% in the same order. While the best values were introduced in the 4-fold and 9-fold with values of 97.9 and 96.8% in the same order. Additionally, the lower sensitivity value is introduced in the 6-fold with a value of 93.71%. Additionally, the lower F-measure values were presented in the 3-fold and 7-fold with values of 93.8 and 93.9%, while the best value is introduced in the 4-fold with a value of 97.6%. According to Table 12, our framework is more effective in predicting obese individuals.

Fig. 30
figure 30

Comparison between the most recent prediction models and ObeRisk.

As shown in Fig. 30, ObeRisk offers an accuracy of 97.13%, which means it effectively classifies almost all cases. Furthermore, it achieves a precision of 95.7%, indicating that there were few cases of false positives. Additionally, ObeRisk achieves 95.5% sensitivity and 95.6% F-measure, indicating that it is highly effective in identifying actual obesity cases, reducing false negative results, and maintaining a balance between precision and sensitivity. ML-XAI, DeepHealth, CDSS, CIM, and ML introduced accuracies of 95.4, 94, 93.6, 93.2, and 93% in the same order. Additionally, they introduce 93.5, 93.5, 93, 92.4, and 92.2% precision in the same order. Their sensitivity values, in the same order, were 93, 93, 92.8, 92.6, and 92.5%. Its F-measure values were 93.25, 93.25, 92.89, 92.5, and 92.35%, respectively. Finally, ObeRisk demonstrates the highest performance in classifying the obese individuals, outperforming the other recent methodology. ObeRisk relies on an efficient feature selection methodology, specifically EC-QBA.

Statistical tests

Statistical validation is crucial for evaluating the model’s efficacy. Hence, the Friedman test and the Wilcoxon signed rank test (WSRT) are used to assess and evaluate the predictive potential of the proposed technique57. We used WSRT with a 5% significance level and 95% confidence intervals. WSRT results are shown in (Table 13). Assume for the duration of this analysis that the means of the two techniques are not significantly different from one another. Using Minitab, a statistical analysis was carried out. A p-value lower than 0.05 (the 5% significance level) indicates substantial evidence rejecting the null hypothesis, according to the results. This finding suggests that, statistically speaking, the recommended model is different from competing strategies. Therefore, the proposed system outperforms more traditional approaches to obesity prediction.

In addition, the Friedman test metric is used to rank the performance of each model. This metric is nonparametric. This method would determine the difference between the proposed ObeRisk and ML, CIM, CDSS, DeepHealthNet, and ML-XAI at a significant level (α = 0.05). Results were shown in (Table 14). Table 14 shows that the proposed ObeRisk performs better than the most recent models in predicting obesity individuals.

Comparison with traditional clinical tools

Traditional clinical instruments like BMI are among the most prevalent methods for evaluating obesity risk in routine medical practice. They depend exclusively on height and weight to determine an individual’s health classification. Notwithstanding their user-friendliness and rapid computation, they exhibit considerable deficiencies in accuracy and predictive capability, particularly in scenarios where total body mass is inadequately represented, as seen in athletes or the elderly.

Conversely, the AI-driven ObeRisk model provides a more advanced solution, utilizing a comprehensive array of factors, such as diet, physical activity, age, gender, and overall health condition. It utilizes the EC-QBA feature selection algorithm to identify the most significant factors and subsequently employs a range of ML models, attaining a predictive accuracy of up to 97%, significantly surpassing the accuracy of tools such as the BMI. Table 15 provides a comparison between IBM and our framework.

Table 14 Friedman mean ranking.
Table 15 Comparison between IBM and our framework.

Computational efficiency, scalability, and model interpretability

It is crucial to consider the significance of any AI-based predictive model, particularly if it is intended for real-world application. The EC-QBA algorithm incurs higher operational costs compared to alternative feature selection techniques, averaging 13.85 s in execution time and utilizing 136.7 MB of memory, as illustrated in (Table 7). This carefully selected algorithm performs more efficiently. The entropy-based parameter-controlled volume system and mechanical coordination mechanisms render the algorithm more precarious; however, they facilitate information retrieval, thereby enhancing the efficacy of the predictive model. Ultimately, we can enhance the EC-QBA implementation or employ hardware acceleration techniques such as GPU computing for its execution.

Scalability remains crucial, particularly as healthcare datasets continue to expand in size and complexity. EC-QBA employs a population-based architecture that inherently facilitates parallel processing. This indicates that the algorithm can manage increased data volumes without necessitating additional simultaneous calculations. The modular architecture of the ObeRisk model facilitates its application in distributed or cloud computing environments, essential for handling the extensive datasets prevalent in clinical and epidemiological research. However, experimental evaluations on larger and more diverse datasets are essential to confirm these scalability claims and identify possible bottlenecks.

Additionally, interpretability is essential for cultivating trust and the implementation of models in clinical settings, as understanding the reasoning behind predictions impacts medical decision-making. The ObeRisk model employs ensemble learning techniques, which may obscure clarity; however, we enhanced interpretability by analyzing the significance of features derived from specific models and utilizing EC-QBA’s capacity to identify the most critical features. This dual approach assists physicians in comprehending the primary factors that increase the likelihood of obesity, thereby facilitating the understanding of intricate models. Interpretable AI tools, such as SHAP, enhance clarity by providing both local and global interpretations of model predictions.

Limitations and potential biases

A crucial element in predictive modeling is the potential biases inherent in the dataset, which may negatively impact the model’s equity and generalizability. This study uncovered a gender disparity in the dataset employed for predicting obesity risk, potentially biasing predictions towards the most representative group. This disproportionate representation may impair the model’s efficacy for the underrepresented gender, resulting in diminished accuracy and reliability of predictions for those groups.

To mitigate this risk, the EC-QBA algorithm selects features with substantial predictive power throughout the entire dataset. This mitigates bias induced by features that are either irrelevant or excessively noisy. In Future, additional methodologies, such as resampling techniques or fairness-aware algorithms, will be used to ensure equitable and balanced model performance across various demographic groups. Additionally, examining the model outputs for each gender category individually can facilitate the identification of disparities and propose enhancements. To utilize reliable AI systems in healthcare, it is essential to identify and rectify biases in data. Accurate predictions significantly influence patient outcomes.



Source link