Integration of machine learning and experimental validation reveals new lipid-lowering drug candidates

Machine Learning


Machine learning-based identification of lipid-lowering drug candidates

Figure 1 illustrates the comprehensive workflow of this study. Utilizing a dataset comprising 3430 drugs (176 positive drugs with established lipid-lowering effects and 3254 negative drugs), along with their corresponding drug characteristics and lipid-lowering evidence levels, we evaluated the predictive capabilities of various machine learning models. These models incorporated 68 continuous variables (or combinations thereof) to assess the lipid-lowering potential of drugs (Fig. 2a, b). Among the models utilizing continuous variables, the Lasso + Ridge model and the Lasso + Enet model, with various parameter configurations, exhibited exceptional performance. When the regularization parameter α was set to 0.7 for the Lasso + Elastic Net model, it achieved the highest scores in AUC (0.886), accuracy (0.888), F1 score (0.820), recall (0.820), and specificity (0.888), ranking first among all models. Similarly, the Lasso + Partial Least Squares Regression (plsRglm) model, SVM model, Lasso model, and Lasso + GBM model all demonstrated consistently high performance across these five metrics (Fig. 2a, b). Subsequently, we selected the top 10 machine learning models based on their AUC values, which were deemed to have the most robust predictive performance. These models were incorporated into further analyses (Fig. 2c). We further analyzed the lipid-lowering potential assessment results of candidate drugs using the top-performing 10 machine learning models. Analysis of the machine learning results for continuous variables revealed that 29 FDA-approved drugs without lipid-lowering indications were identified as having lipid-lowering potential by at least 8 models (Fig. 2d, Table S7).

Fig. 1: Schematic overview of the development and validation of the machine learning model for predicting the lipid-lowering effect of non-lipid-lowering drugs.
figure 1

This figure was created based on the tools provided by Biorender.com (accessed on 8/2/2024).

Fig. 2: Identification of potential lipid-lowering agents from repurposed drugs using machine learning models.
figure 2

a Detailed AUC of the continuous variable machine learning models presented as a heatmap. b Evaluation metrics—accuracy, F1 score, recall, and specificity—of the top 10 machine learning models. c The ROC curve illustrates the performance of the top ten continuous variable machine learning models. d The Venn diagram summarizes the results of repurposed drugs across the continuous variable machine learning model, clinical retrospective data analysis, and animal experiments. In the heatmap, the intensity of the red color represents the magnitude of the corresponding evaluation metrics. Darker shades indicate higher values.

In summary, our comprehensive machine learning approach effectively identified 29 FDA-approved drugs with potential lipid-lowering effects, thereby providing a reliable foundation for drug repurposing and subsequent experimental validation.

Validation of potential lipid-lowering drugs through retrospective clinical data analysis

Comparative analysis of patients’ average blood lipid profiles before and after medication revealed that four drugs (Argatroban, Levoxyl, Oseltamivir, and Thiamine), identified through machine learning screening, exhibited significant biological activity in modulating patients’ blood lipid parameters (Fig. 3a–d). Among these, Argatroban treatment demonstrated the most pronounced effects on blood lipid-related parameters, including low-density lipoprotein (LDL), TC, and TG (Fig. 3a). Analysis of LDL data from 63 patients undergoing Argatroban treatment revealed a significant decrease in LDL levels by 33.1%, from a pre-treatment average of 2.96 mmol/L to 1.98 mmol/L post-treatment (P = 1.4 × 10−8). Analogously, blood TC and TG levels exhibited significant reductions following medication: TC decreased markedly by 25.1% from a pre-treatment level of 4.68 mmol/L to 3.51 mmol/L post-treatment (P = 1.4 × 10−9), while TG levels declined from 1.47 mmol/L to 1.37 mmol/L (P = 0.017). Levoxyl also exhibited potent lipid-lowering effects (Fig. 3b). Following Levoxyl treatment, 87 patients exhibited significant reductions in both LDL and TC levels, with decreases of 16.2% (P = 3.7 × 10−7) and 11.9% (P = 8.4 × 10−7), respectively. Oseltamivir treatment resulted in a reduction of LDL levels and, despite the modest magnitude of change, demonstrated a statistically significant effect on TC reduction in a larger sample size (Fig. 3c). Lastly, Thiamine treatment demonstrated significant lipid-lowering potential, exhibiting notable effects in reducing patients’ LDL and TC levels (Fig. 3d).

Fig. 3: Analysis of differences in TG, TC, HDL, and LDL levels before and after treatment with candidate lipid-lowering drugs based on retrospective clinical data.
figure 3

a Box plots depicting the changes in LDL, TC, and TG levels in patients before and after treatment with Argatroban. b Box plots depicting the changes in LDL and TC levels in patients before and after treatment with Levoxyl. c Box plots depicting the changes in LDL and TC levels in patients before and after treatment with Oseltamivir. d Box plots depicting the changes in LDL and TC levels in patients before and after treatment with Thiamine. The corresponding sample sizes are provided. Statistical significance was assessed using the Wilcoxon test.

In conclusion, the four potential lipid-lowering agents (Argatroban, Levoxyl, Oseltamivir, and Thiamine) identified in this study exhibited significant lipid-modulating effects as evidenced by preliminary clinical data validation. Of particular note, Argatroban demonstrated remarkably pronounced effects in reducing LDL-C, TC, and TG levels while concomitantly elevating HDL-C levels. This observation indicates a high degree of concordance between the predictions generated by the machine learning model and the observed clinical data. However, it is imperative to note that these agents exhibit variations in terms of potency and target specificity, which provides crucial evidence for the development of personalized therapeutic strategies.

Comprehensive mouse studies validated potential lipid-lowering drug effects

In vivo experiments conducted in mouse models demonstrated that multiple drugs significantly modulated four key lipid-related blood indicators: TG, TC, high-density lipoprotein (HDL), and LDL (Fig. 4a–e). Both Levoxyl and Sulfaphenazole exhibited significant TG-lowering effects (P < 0.05). Compared to the control group, the Levoxyl treatment group showed a 28.96% reduction in TG levels, while the Sulfaphenazole treatment group demonstrated a 27.09% decrease in TG levels (Fig. 4a). With respect to blood TC levels, we found that Argatroban and Promega significantly reduced blood TC levels: Argatroban treatment lowered TC levels by 10.55% (P < 0.05), while Promega treatment reduced TC levels by 9.87% (P < 0.05), as shown in Fig. 4b. Furthermore, six drugs – Sorafenib, Prasterone, Alpha-Tocopherol Acetate, Cedazuridine, Regorafenib, and Promega – all exhibited significant effects on blood HDL levels (Fig. 4c). Among all candidate drugs, Prasterone notably exhibited the most pronounced HDL-elevating effect. Relative to the control group, mice in the Prasterone treatment group showed a 24.08% increase in HDL levels (P < 0.001). Alpha-tocopherol acetate also demonstrated a substantial increase in HDL: the experimental group showed a significant 17.81% elevation in HDL (P = 0.02). Following closely were Sorafenib (P = 0.03) and Cedazuridine (P = 0.03), both of which significantly increased HDL, with elevations of 14.36% and 9.33%, respectively. Mice treated with Regorafenib and Promega exhibited HDL levels of 1.769 and 1.769 mmol/L, respectively, which were significantly higher than the control group’s 1.593 mmol/L (P < 0.05). Contrary to expectations, we found that mice receiving potential lipid-lowering drug treatments had higher LDL levels compared to the control group. LDL levels in the Procarbazine Hydrochloride and Dimenhydrinate treatment groups were both 18.73% higher than those in the control group (P = 0.01). The Promega treatment group had an average LDL value of 0.292 mmol/L, representing a 15.19% increase compared to the control group (P = 0.04).

Fig. 4: Differences in TG, TC, HDL, and LDL levels before and after treatment with candidate lipid-lowering drugs in an in vivo mouse model.
figure 4

a Box plots illustrating TG levels in the experimental group treated with candidate lipid-lowering drugs compared to the PBS control group. b Box plots illustrating TC levels in the experimental group versus the PBS control group. c Box plots illustrating HDL levels in the experimental group compared to the PBS control group. d Box plots illustrating LDL levels in the experimental group versus the PBS control group. e The heatmap summarizes the effects of all candidate drugs on mouse blood levels of TG, TC, HDL, and LDL. Each group has a sample size of at least three, with specific sample sizes indicated by points on the box plots. Bold font indicates drugs that resulted in statistically significant changes in lipid levels. Statistical significance was assessed using the Wilcoxon test. *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001.

In conclusion, this study identified a series of drugs with significant regulatory effects on lipid metabolism in mice through in vivo pharmacological evaluation, with Argatroban, Prasterone, Promega, Sorafenib, and Sulfaphenazole demonstrating particularly pronounced improvements in lipid profile indicators.

Molecular docking analysis reveals potential targets for lipid-lowering drug action

In this study, we selected seven drugs (Argatroban, Promega, Sulfaphenazole, Sorafenib, Prasterone, Levoxyl, and Alpha-Tocopherol Acetate) that have previously demonstrated lipid-lowering effects in animal experiments and clinical retrospective studies. These drugs were subjected to molecular docking analysis with 12 key target proteins involved in lipid metabolism. Through the evaluation of binding affinities between drugs and target molecules, we investigated the potential lipid-lowering mechanisms of drugs not primarily designed for lipid reduction. The results, as illustrated in Fig. 5a, demonstrate that Argatroban and Apixaban exhibited strong binding affinities to FX, with binding energies of −7.60 and −9.30 kcal/mol, respectively. Promega and Implitapide displayed comparable binding affinities to the MTP, with binding energies of −7.10 and −6.70 kcal/mol, respectively. Sulfaphenazole and Tegaserod exhibited potent binding affinities to serotonin receptors HTR2A, HTR2B, HTR2C, and 5-HT4R, with binding energies consistently below −7.00 kcal/mol. Notably, Sulfadiazine demonstrated the highest binding affinity to HTR2A and HTR2C receptor subtypes, with binding energies ranging from −8.70 to −8.80 kcal/mol. Sorafenib and cerivastatin exhibited robust binding affinities to HMGR, with binding energies of −7.50 and −7.20 kcal/mol, respectively. Prasterone displayed a notable binding affinity to COX-2, with a binding energy of −8.00 kcal/mol, whereas Etodolac exhibited a binding energy of −6.80 kcal/mol to the same target. Additionally, Prasterone and Etodolac showed strong binding affinities to RXRA, with binding energies of −9.70 and −8.90 kcal/mol, respectively. Levoxyl and D-Thyroxine demonstrated strong binding affinities to TRα, with binding energies ranging from −7.90 to −8.00 kcal/mol, while exhibiting weaker affinities to TRβ, as shown in Fig. 5a. Employing molecular docking techniques, we comprehensively evaluated the binding affinities of 12 drug molecules with potential lipid-lowering effects on various lipid metabolism-related targets. Our analysis revealed that multiple drug-target pairs exhibited significant binding affinities, suggesting potential mechanisms for their lipid-lowering actions.

Fig. 5: Prediction of binding interactions between candidate lipid-lowering drugs and key lipid metabolism-related target proteins based on molecular docking experiments.
figure 5

a The dumbbell chart summarizes the binding affinity of six candidate lipid-lowering drugs with twelve common lipid metabolism-related target proteins. b Molecular docking visualization predicting the interaction between Argatroban and Coagulation Factor X (FX). c Molecular docking visualization predicting the interaction between Prasterone and Retinoic Acid Receptor RXR-alpha (RXRA). d Molecular docking visualization predicting the interaction between Promega and Microsomal Triglyceride Transfer Protein Large Subunit (MTP). e Molecular docking visualization predicting the interaction between Sorafenib and 3-Hydroxy-3-Methylglutaryl-Coenzyme A Reductase (HMGR). f Molecular docking visualization predicting the interaction between Sulfaphenazole and Hydroxytryptamine Receptor 2 C (HTR2C). A lower binding energy indicates a stronger binding affinity between the drug molecules and target proteins. Bold font indicates that the binding energy of the drug with the corresponding protein is less than −5, signifying a strong interaction.

To further elucidate the molecular mechanisms, we conducted comprehensive molecular docking analyses for multiple drugs exhibiting high binding affinity to lipid metabolism-associated target proteins to demonstrate the binding patterns and interactions between drug molecules and target protein sites, including Argatroban with FX, Prasterone with RXRA, Promega with MTP, Sorafenib with HMGR, and Sulfaphenazole with HTR2C (Fig. 5b–f). Molecular docking analyses revealed that Argatroban exhibits high-affinity binding to the active site of FX, establishing multiple crucial interactions with key amino acid residues. In particular, Argatroban establishes hydrophobic interactions with Tyr99A, Phe174A, and Trp215A of FX, simultaneously forming hydrogen bonds with Gln192A and Gly219A of FX, providing additional binding capacity, thereby tightly filling multiple sub-pockets of the thrombin active site and firmly anchoring in the thrombin active center (Fig. 5b). Structural analysis demonstrated that Prasterone establishes extensive hydrophobic interactions with RXRA’s Ile268A, Ala272A, Leu309A, Ile310A, Phe313A, Leu326A, Ile345A, and Leu436A. These comprehensive interactions facilitate Prasterone’s stable accommodation within the RXRA ligand-binding domain. Promega demonstrates dual binding mechanisms, comprising hydrophobic interactions with MTP’s Ile666H, Leu671H, Ala694H, Leu696H, Phe706H, Val727H, Ile761H, Thr776H, and Val778H and hydrogen bonds with MTP’s Gln663H, which synergistically enhance its binding affinity. The binding mode analysis revealed that Sorafenib displays hydrophobic interactions with HMGR’s Leu853B while forming hydrogen bonds with Cys561B, Ser565B, Arg590A, and Ser684A, resulting in tight binding between Sorafenib and HMGR. Sulfaphenazole establishes a network of hydrophobic interactions with HTR2C’s Val135A, Ala222A, Phe223A, Phe327A, and Phe328A, while forming hydrogen bonds with Asp134A and Ser138A, contributing to its strong binding affinity to HTR2C.

Enhanced exploration of drug-protein binding patterns through molecular dynamics simulations

Based on the aforementioned research findings, Sorafenib, Sulfaphenazole, Prasterone, Promega, and Argatroban exhibited significant lipid-lowering effects. To elucidate their mechanisms of action, we conducted an in-depth investigation of these five drugs. Initially, we analyzed the root mean square deviation (RMSD) changes of sulfaphenazole-HTR2C, Sorafenib-HMGR, Prasterone-RXRA, Promega-MTP, and Argatroban-FX complexes over a 100-nanosecond molecular dynamics simulation period. The molecular dynamics simulation results revealed that the sulfaphenazole-HTR2C complex exhibited the highest RMSD value, escalating from 0.3 nm to approximately 1.0 nm, suggesting substantial conformational changes during the ligand-receptor binding process (Fig. 6a). In contrast, the remaining four complexes (Sorafenib-HMGR, Prasterone-RXRA, Promega-MTP, and Argatroban-FX) displayed lower RMSD values, predominantly oscillating between 0.1 and 0.3 nm, indicative of high structural stability (Fig. 6a). Collectively, with the exception of sulfaphenazole-HTR2C, all other complexes demonstrated remarkable structural stability. Root mean square fluctuation (RMSF) analysis of the five ligand-protein complexes indicated that the sulfaphenazole-HTR2C complex displayed the most pronounced fluctuation in the vicinity of 5000 atoms, reaching a peak value of approximately 0.8 nm. Conversely, the RMSF values for the remaining complexes were substantially lower, with the majority of fluctuations not exceeding 0.2 nm. These findings suggest that the sulfaphenazole-HTR2C complex exhibits enhanced flexibility in specific regions, whereas the other complexes maintain relative structural rigidity (Fig. 6b). Analysis of the radius of gyration (Rg) changes for the five ligand-protein complexes during molecular dynamics simulations revealed that the Promega-MTP complex exhibited the highest Rg value of approximately 3.5 nm, followed by Sorafenib-HMGR at 2.8 nm, and sulfaphenazole-HTR2C at 2.5 nm. Prasterone-RXRA and Argatroban-FX displayed the lowest Rg values, both ~1.7 nm (Fig. 6c). The Rg values for all complexes remained relatively constant throughout the simulation period, suggesting that their global conformations did not undergo substantial alterations (Fig. 6c).

Fig. 6: Molecular dynamics simulation and analysis of protein-ligand complexes.
figure 6

a Root Mean Square Deviation (RMSD) of the complex. b Root Mean Square Fluctuation (RMSF) of the complex. c Radius of Gyration (Rg) of the complex. d Solvent Accessible Surface Area (SASA) of the complex. e MMPBSA analysis of the Argatroban-FX complex. f Total Decomposition Contribution (TDC) plot of the Argatroban-FX complex. g Sidechain Decomposition Contribution (SDC) plot of the Argatroban-FX complex.

The solvent-accessible surface area (SASA) analysis for these five ligand-protein complexes reveals that the Promega-MTP complex demonstrates the largest solvent-exposed area, consistently maintained at approximately 400 nm2 (Fig. 6d). The Sorafenib-HMGR and sulfaphenazole-HTR2C complexes exhibit the next highest SASA values, with ~330 nm2 and 200 nm2, respectively (Fig. 6d). Prasterone-RXRA and Argatroban-FX show smaller solvent contact areas, with values ranging between 110 and 120 nm2 (Fig. 6d). The SASA curves for all complexes display relatively stable characteristics, indicating the maintenance of a consistent solvent exposure state throughout the simulation process (Fig. 6d). Furthermore, we performed free energy decomposition analyses for the five protein-ligand systems: Sorafenib-HMGR, sulfaphenazole-HTR2C, Prasterone-RXRA, Promega-MTP, and Argatroban-FX. These analyses included van der Waals forces (VDWAALS), EEL, polar Boltzmann energy (EPB), gas-phase free energy (GGAS), solvation-free energy (GSOLV), and total free energy (TOTAL) (Table 1). The analysis demonstrated that all systems displayed negative total free energies, suggesting thermodynamically favorable interactions (Table 1). Notably, the Promega-MTP system exhibited the lowest total free energy (−43.61 kcal/mol), indicating the strongest binding affinity among the complexes (Table 1). Further analysis showed that VDWAALS significantly contributed to the binding of all systems, while the contribution of EEL varied across systems (Table 1).

Table 1 Free energy (kcal/mol) decomposition of the system.

Molecular dynamics simulations reveal that Argatroban-FX makes substantial contributions to GGAS, GSOLV, and TOTAL. Within the GGAS component, VDWAALS exhibits a negative value of approximately −50 kcal/mol, while EEL demonstrates a larger negative magnitude of around −100 kcal/mol (Fig. 6e). The GSOLV component comprises EPB, which displays a positive value of approximately 125 kcal/mol, and non-polar solvation-free energy (ENPOLAR), which is marginally positive, approaching zero (Fig. 6e). The TOTAL component analysis indicates that the sum of GGAS exhibits a large negative value of approximately −150 kcal/mol, while the sum of GSOLV is positive, about 120 kcal/mol. Consequently, the final TOTAL is negative, approximately −25 kcal/mol (Fig. 6e). These results suggest that while solvation effects, particularly polar solvation, are detrimental to system stability, gas-phase interactions, notably EEL, contribute more substantially to the system’s stability. The observed negative total energy implies that the molecular system maintains thermodynamic stability under the simulated conditions (Fig. 6e).

We further calculated the contribution of individual amino acid residues to the total energy in the Total Decomposition Contribution system for the Argatroban-FX complex. The results revealed that the energy contributions of the majority of residues were relatively small, ranging between −1 and 2 kcal/mol (Fig. 6f). Notably, A:GLY:219 and B:LYS:245 exhibited significant positive energy contributions of ~2.5 kcal/mol and 16 kcal/mol, respectively, indicating their potential to generate unfavorable interactions within the system (Fig. 6f). Conversely, A:ASP:189 displayed a notable negative energy contribution of ~ −3 kcal/mol, suggesting its potential crucial role in stabilizing the system structure or promoting favorable interactions (Fig. 6f). In a similar vein, the Sidechain Decomposition Contribution analysis of the Argatroban-FX complex demonstrated that the energy contributions of most amino acid residues to the total energy were relatively small, ranging from −1 to 2 kcal/mol (Fig. 6g). A:ASP:189 exhibited the most significant negative energy contribution of ~ −3 kcal/mol, strongly suggesting its crucial role in stabilizing the system (Fig. 6g). In contrast, LYS245 presented the largest positive energy contribution of ~16 kcal/mol, indicating its potential to generate unfavorable interactions (Fig. 6g). The GMX-Hbonds analysis of the Argatroban-FX complex primarily revealed hydrogen bonds between residues 215 and 245, elucidating key hydrogen bond interactions in the protein structure. These interactions provide valuable insights into protein stability and function (Fig. 7a). Furthermore, the GMX-HBOND time series analysis of the Argatroban-FX complex demonstrated that the hydrogen bonds formed between LIG245 and multiple residues, including G219 and A190, were highly stable, persisting for more than 50% of the entire simulation process. The hydrogen bond between Y99 and LIG245 exhibited relative stability, albeit with intermittent occurrences (Fig. 7b). The interaction between G216 and LIG245 occurred frequently but discontinuously (Fig. 7b). The interaction between K96 and LIG245 showed lower frequency and was predominantly observed in the latter stages of the simulation (Fig. 7b). In the Gibbs free energy landscape of the Argatroban-FX complex, the blue regions denote low-energy states (Fig. 7c), representing the most stable conformations of the complex. We subsequently visualized the molecular interactions within this stable state. The hydrophobic interactions between Argatroban and FX encompassed residues GLN61, TYR99, PHE174, and TRP215, with distances ranging from 3.49 to 3.85 Å, whereas hydrogen bonding interactions involved TYR99 and GLN192, with distances ranging from 2.08 to 2.46 Å. Notably, TYR99 functioned as both a hydrogen bond donor and acceptor, establishing bidirectional interactions with the ligand. GLN192 functioned as a hydrogen bond acceptor. These multiple interactions between Argatroban and FX are likely to contribute substantially to the tight binding observed between the two molecules (Fig. 7c).

Fig. 7: Hydrogen bond analysis and interaction patterns in the Argatroban-FX complex.
figure 7

a Hydrogen bond occurrence between donor and acceptor residues in the Argatroban-FX complex. b Timeline representation of hydrogen bond formation between different residue pairs in the Argatroban-FX complex. c Gibbs Free Energy Landscape of the Argatroban-FX complex obtained from Principal Component Analysis (PCA). d Interaction plot of the frame corresponding to the lowest energy in the free energy landscape.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *