Multiscale topology-enabled structure-to-sequence transformer for protein–ligand interaction predictions

Machine Learning


  • Fleming, N. How artificial intelligence is changing drug discovery. Nature 557, S55–S57 (2018).

    Article 

    Google Scholar 

  • Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nature 566, 224–229 (2019).

    Article 

    Google Scholar 

  • Kitchen, D. B., Decornez, H., Furr, J. R. & Bajorath, J. Docking and scoring in virtual screening for drug discovery: methods and applications. Nat. Rev. Drug Discov. 3, 935–949 (2004).

    Article 

    Google Scholar 

  • Pinzi, L. & Rastelli, G. Molecular docking: shifting paradigms in drug discovery. Int. J. Mol. Sci. 20, 4331 (2019).

    Article 

    Google Scholar 

  • Pagadala, N. S., Syed, K. & Tuszynski, J. Software for molecular docking: a review. Biophys. Rev. 9, 91–102 (2017).

    Article 

    Google Scholar 

  • Wang, L. et al. Accurate and reliable prediction of relative ligand binding potency in prospective drug discovery by way of a modern free-energy calculation protocol and force field. J. Am. Chem. Soc. 137, 2695–2703 (2015).

    Article 

    Google Scholar 

  • Sliwoski, G., Kothiwale, S., Meiler, J. & Lowe, E. W. Computational methods in drug discovery. Pharmacol. Rev. 66, 334–395 (2014).

    Article 

    Google Scholar 

  • Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article 

    Google Scholar 

  • Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373, 871–876 (2021).

    Article 

    Google Scholar 

  • Lin, Z. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130 (2023).

    Article 
    MathSciNet 

    Google Scholar 

  • Song, Y. & Wang, L. Multiobjective tree-based reinforcement learning for estimating tolerant dynamic treatment regimes. Biometrics 80, ujad017 (2024).

    Article 

    Google Scholar 

  • Luo, J., Wei, W., Waldispühl, J. & Moitessier, N. Challenges and current status of computational methods for docking small molecules to nucleic acids. Eur. J. Med. Chem. 168, 414–425 (2019).

    Article 

    Google Scholar 

  • Lo, Yu-Chen, Rensi, S. E., Torng, W. & Altman, R. B. Machine learning in chemoinformatics and drug discovery. Drug Discov. Today 23, 1538–1546 (2018).

    Article 

    Google Scholar 

  • The Atomwise AIMS Program. AI is a viable alternative to high throughput screening: a 318-target study. Sci. Rep. 14, 7526 (2024).

  • Gómez-Sacristán, P., Simeon, S., Tran-Nguyen, V.-K., Patil, S. & Ballester, P. J. Inactive-enriched machine-learning models exploiting patent data improve structure-based virtual screening for PDL1 dimerizers. J. Adv. Res. (in the press); https://doi.org/10.1016/j.jare.2024.01.024

  • Hu, X. et al. Discovery of novel non-steroidal selective glucocorticoid receptor modulators by structure-and IGN-based virtual screening, structural optimization, and biological evaluation. Eur. J. Med. Chem. 237, 114382 (2022).

    Article 

    Google Scholar 

  • Vaswani, A. et al. Attention is all you need. In NIPS’17: Proc. 31st International Conference on Neural Information Processing Systems (eds von Luxburg, U. et al.) 6000–6010 (Curran Associates, 2017).

  • Devlin, J., Chang, M. W., Lee, K. & Toutanova, K. B. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 1, 4171–4186 (Association for Computational Linguistics, 2019).

  • Ouyang, L. et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 35, 27730–27744 (2022).

    Google Scholar 

  • Singh, R., Sledzieski, S., Bryson, B., Cowen, L. & Berger, B. Contrastive learning in protein language space predicts interactions between drugs and protein targets. Proc. Natl Acad. Sci. USA 120, e2220778120 (2023).

    Article 

    Google Scholar 

  • Saar, K. L. et al. Turning high-throughput structural biology into predictive inhibitor design. Proc. Natl Acad. Sci. USA 120, e2214168120 (2023).

    Article 

    Google Scholar 

  • Cang, Z., Mu, L. & Wei, G.-W. Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening. PLoS Comput. Biol. 14, e1005929 (2018).

    Article 

    Google Scholar 

  • Nguyen, D. D., Cang, Z. & Wei, G.-W. A review of mathematical representations of biomolecular data. Phys. Chem. Chem. Phys. 22, 4343–4367 (2020).

    Article 

    Google Scholar 

  • Wang, R., Nguyen, D. D. & Wei, G.-W. Persistent spectral graph. Int. J. Numer. Methods Biomed. Eng. 36, e3376 (2020).

    Article 
    MathSciNet 

    Google Scholar 

  • Meng, Z. & Xia, K. Persistent spectral–based machine learning (PerSpect ML) for protein-ligand binding affinity prediction. Sci. Adv. 7, eabc5329 (2021).

    Article 

    Google Scholar 

  • Chen, D., Liu, J., Wu, J. & Wei, G.-W. Persistent hyperdigraph homology and persistent hyperdigraph Laplacians. Found. Data Sci. 5, 558–588 (2023).

    Article 
    MathSciNet 

    Google Scholar 

  • Zomorodian, A. & Carlsson, G. Computing persistent homology. Discrete Comput. Geom. 33, 249–274 (2005).

    Article 
    MathSciNet 

    Google Scholar 

  • Chen, D., Zheng, J., Wei, G.-W. & Pan, F. Extracting predictive representations from hundreds of millions of molecules. J. Phys. Chem. Lett. 12, 10793–10801 (2021).

    Article 

    Google Scholar 

  • Ruff, K. M. & Pappu, R. V. AlphaFold and implications for intrinsically disordered proteins. J. Mol. Biol. 433, 167208 (2021).

    Article 

    Google Scholar 

  • Li, Y., Han, L., Liu, Z. & Wang, R. Comparative assessment of scoring functions on an updated benchmark: 2. Evaluation methods and general results. J. Chem. Inf. Model. 54, 1717–1736 (2014).

    Article 

    Google Scholar 

  • Cheng, T., Li, X., Li, Y., Liu, Z. & Wang, R. Comparative assessment of scoring functions on a diverse test set. J. Chem. Inf. Model. 49, 1079–1093 (2009).

    Article 

    Google Scholar 

  • Su, M. et al. Comparative assessment of scoring functions: the CASF-2016 update. J. Chem. Inf. Model. 59, 895–913 (2018).

    Article 

    Google Scholar 

  • Trull, T. J. & Ebner-Priemer, U. W. Using experience sampling methods/ecological momentary assessment (ESM/EMA) in clinical assessment and clinical research: introduction to the special section. Psychol. Assess. 21, 457–462 (2009).

  • Karlov, D. S., Sosnin, S., Fedorov, M. V. & Popov, P. graphDelta: MPNN scoring function for the affinity prediction of protein–ligand complexes. ACS Omega 5, 5150–5159 (2020).

    Article 

    Google Scholar 

  • Sánchez-Cruz, N., Medina-Franco, J., Mestres, J. & Barril, X. Extended connectivity interaction features: improving binding affinity prediction through chemical description. Bioinformatics 37, 1376–1382 (2021).

    Article 

    Google Scholar 

  • Wang, Z. et al. Onionnet-2: a convolutional neural network model for predicting protein-ligand binding affinity based on residue-atom contacting shells. Front. Chem. 9, 753002 (2021).

    Article 

    Google Scholar 

  • Rezaei, M. A., Li, Y., Wu, D., Li, X. & Li, C. Deep learning in drug design: protein-ligand binding affinity prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 19, 407–417 (2020).

    Article 

    Google Scholar 

  • Wang, S. et al. Se-onionnet: a convolution neural network for protein–ligand binding affinity prediction. Front. Genet. 11, 607824 (2021).

    Article 

    Google Scholar 

  • Jones, D. et al. Improved protein–ligand binding affinity prediction with structure-based deep fusion inference. J. Chem. Inf. Model. 61, 1583–1592 (2021).

    Article 

    Google Scholar 

  • Boyles, F., Deane, C. M. & Morris, G. M. Learning from the ligand: using ligand-based features to improve binding affinity prediction. Bioinformatics 36, 758–764 (2020).

    Article 

    Google Scholar 

  • Liu, Z. et al. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 31, 405–412 (2015).

    Article 

    Google Scholar 

  • Wang, C. & Zhang, Y. Improving scoring-docking-screening powers of protein–ligand scoring functions using random forest. J. Comput. Chem. 38, 169–177 (2017).

    Article 

    Google Scholar 

  • Gentile, F. et al. Automated discovery of noncovalent inhibitors of SARS-Cov-2 main protease by consensus deep docking of 40 billion small molecules. Chem. Sci. 12, 15960–15974 (2021).

    Article 

    Google Scholar 

  • Méndez-Lucio, O., Ahmad, M., del Rio-Chanona, E. A. & Wegner, J. K. A geometric deep learning approach to predict binding conformations of bioactive molecules. Nat. Mach. Intell. 3, 1033–1039 (2021).

    Article 

    Google Scholar 

  • Zheng, L. et al. Improving protein–ligand docking and screening accuracies by incorporating a scoring function correction term. Brief. Bioinform. 23, bbac051 (2022).

    Article 

    Google Scholar 

  • Bao, J., He, X. & Zhang, J. Z. H. DeepBSP—a machine learning method for accurate prediction of protein–ligand docking structures. J. Chem. Inf. Model. 61, 2231–2240 (2021).

    Article 

    Google Scholar 

  • Shen, C. et al. Boosting protein–ligand binding pose prediction and virtual screening based on residue–atom distance likelihood potential and graph transformer. J. Med. Chem. 65, 10691–10706 (2022).

    Article 

    Google Scholar 

  • Nguyen, D. D. & Wei, G.-W. AGL-Score: algebraic graph learning score for protein–ligand binding scoring, ranking, docking, and screening. J. Chem. Inf. Model. 59, 3291–3304 (2019).

    Article 

    Google Scholar 

  • Liu, X., Feng, H., Wu, J. & Xia, K. Dowker complex based machine learning (DCML) models for protein-ligand binding affinity prediction. PLoS Comput. Biol. 18, e1009943 (2022).

    Article 

    Google Scholar 

  • Tran-Nguyen, V.-K., Junaid, M., Simeon, S. & Ballester, P. J. A practical guide to machine-learning scoring for structure-based virtual screening. Nat. Protoc. 18, 3460–3511 (2023).

    Article 

    Google Scholar 

  • Moon, S., Zhung, W., Yang, S., Lim, J. & Kim, W. Y. PIGNet: a physics-informed deep learning model toward generalized drug–target interaction predictions. Chem. Sci. 13, 3661–3673 (2022).

    Article 

    Google Scholar 

  • Tran-Nguyen, V.-K., Bret, G. & Rognan, D. True accuracy of fast scoring functions to predict high-throughput screening data from docking poses: the simpler the better. J. Chem. Inf. Model. 61, 2788–2797 (2021).

    Article 

    Google Scholar 

  • Tran-Nguyen, V.-K. & Ballester, P. J. Beware of simple methods for structure-based virtual screening: the critical importance of broader comparisons. J. Chem. Inf. Model. 63, 1401–1405 (2023).

    Article 

    Google Scholar 

  • Tran-Nguyen, V.-K., Simeon, S., Junaid, M. & Ballester, P. J. Structure-based virtual screening for PDL1 dimerizers: evaluating generic scoring functions. Curr. Res. Struct. Biol. 4, 206–210 (2022).

    Article 

    Google Scholar 

  • Shen, C. et al. A generalized protein–ligand scoring framework with balanced scoring, docking, ranking and screening powers. Chem. Sci. 14, 8129–8146 (2023).

    Article 

    Google Scholar 

  • Jones, G., Willett, P., Glen, R. C., Leach, A. R. & Taylor, R. Development and validation of a genetic algorithm for flexible docking. J. Mol. Biol. 267, 727–748 (1997).

    Article 

    Google Scholar 

  • Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).

    Article 

    Google Scholar 

  • Tran-Nguyen, V.-K., Jacquemard, C. & Rognan, D. LIT-PCBA: an unbiased data set for machine learning and virtual screening. J. Chem. Inf. Model. 60, 4263–4273 (2020).

    Article 

    Google Scholar 

  • Horak, D. & Jost, J. Spectra of combinatorial Laplace operators on simplicial complexes. Adv. Math. 244, 303–336 (2013).

    Article 
    MathSciNet 

    Google Scholar 

  • Eckmann, B. Harmonische funktionen und randwertaufgaben in einem komplex. Comment. Math. Helv. 17, 240–255 (1944).

    Article 
    MathSciNet 

    Google Scholar 

  • Chen, J., Zhao, R., Tong, Y. & Wei, G.-W. Evolutionary de Rham-Hodge method. Discrete Continuous Dyn. Syst. Ser. B. 26, 3785–3821 (2021).

    Article 
    MathSciNet 

    Google Scholar 

  • Mémoli, F., Wan, Z. & Wang, Y. Persistent Laplacians: properties, algorithms and implications. SIAM J. Math. Data Sci. 4, 858–884 (2022).

    Article 
    MathSciNet 

    Google Scholar 

  • Edelsbrunner, H., Letscher, D. & Zomorodian, A. Topological persistence and simplification. Discrete Comput. Geom. 28, 511–533 (2002).

    Article 
    MathSciNet 

    Google Scholar 

  • Liu, J., Li, J. & Wu, J. The algebraic stability for persistent Laplacians. Preprint at arXiv https://doi.org/10.48550/arXiv.2302.03902 (2023).

  • He, K. et al. Masked autoencoders are scalable vision learners. In Proc. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 15979–15988 (IEEE, 2022).

  • Chen, D. WeilabMSU/TopoFormer: TopoFormer. Zenodo https://doi.org/10.5281/zenodo.10892799 (2024).

  • Sunseri, J. & Koes, D. R. Virtual screening with Gnina 1.0. Molecules 26, 7369 (2021).

    Article 

    Google Scholar 

  • Yang, C. & Zhang, Y. Delta machine learning to improve scoring-ranking-screening performances of protein–ligand scoring functions. J. Chem. Inf. Model. 62, 2696–2712 (2022).

    Article 

    Google Scholar 

  • Wójcikowski, M., Kukiełka, M., Stepniewska-Dziubinska, M. M. & Siedlecki, P. Development of a protein–ligand extended connectivity (PLEC) fingerprint and its application for binding affinity predictions. Bioinformatics 35, 1334–1341 (2019).

    Article 

    Google Scholar 

  • Stepniewska-Dziubinska, M. M., Zielenkiewicz, P. & Siedlecki, P. Development and evaluation of a deep learning model for protein–ligand binding affinity prediction. Bioinformatics 34, 3666–3674 (2018).

    Article 

    Google Scholar 



  • Source link

    Leave a Reply

    Your email address will not be published. Required fields are marked *