MISATO: machine learning dataset of protein–ligand complexes for structure-based drug discovery

Machine Learning


  • Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    Article 

    Google Scholar 

  • Berman, H., Henrick, K. & Nakamura, H. Announcing the worldwide Protein Data Bank. Nat. Struct. Mol. Biol. 10, 980 (2003).

    Article 

    Google Scholar 

  • Mohs, R. C. & Greig, N. H. Drug discovery and development: role of basic biological research. Alzheimer’s Dement. Transl. Res. Clin. Interv. 3, 651–657 (2017).

    Article 

    Google Scholar 

  • Sliwoski, G., Kothiwale, S., Meiler, J. & Lowe, E. W. Computational methods in drug discovery. Pharm. Rev. 66, 334–395 (2014).

    Article 

    Google Scholar 

  • Thiel, W. Semiempirical quantum-chemical methods. WIREs Comput. Mol. Sci. 4, 145–157 (2014).

    Article 

    Google Scholar 

  • Hollingsworth, S. A. & Dror, R. O. Molecular dynamics simulation for all. Neuron 99, 1129–1143 (2018).

    Article 

    Google Scholar 

  • Siebenmorgen, T. & Zacharias, M. Computational prediction of protein–protein binding affinities. WIREs Comput. Mol. Sci. 10, e1448 (2020).

    Article 

    Google Scholar 

  • Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461 (2010).

    Article 

    Google Scholar 

  • Kmiecik, S. et al. Coarse-grained protein models and their applications. Chem. Rev. 116, 7898–7936 (2016).

    Article 

    Google Scholar 

  • Spicher, S. & Grimme, S. Robust atomistic modeling of materials, organometallic, and biochemical systems. Angew. Chem. Int. Ed. 59, 15665–15673 (2020).

    Article 

    Google Scholar 

  • Vandenbrande, S., Waroquier, M., Speybroeck, V. V. & Verstraelen, T. The monomer electron density force field (MEDFF): a physically inspired model for noncovalent interactions. J. Chem. Theory Comput. 13, 161–179 (2017).

    Article 

    Google Scholar 

  • Wang, J. & Dokholyan, N. V. Yuel: improving the generalizability of structure-free compound–protein interaction prediction. J. Chem. Inf. Model. 62, 463–471 (2022).

    Article 

    Google Scholar 

  • Ponder, J. W. et al. Current status of the AMOEBA polarizable force field. J. Phys. Chem. B 114, 2549–2564 (2010).

    Article 

    Google Scholar 

  • Chen, B. et al. Automated discovery of fundamental variables hidden in experimental data. Nat. Comput Sci. 2, 433–442 (2022).

    Article 

    Google Scholar 

  • Durrant, J. D. & McCammon, J. A. NNScore: a neural-network-based scoring function for the characterization of protein−ligand complexes. J. Chem. Inf. Model. 50, 1865–1871 (2010).

    Article 

    Google Scholar 

  • Wang, X., Terashi, G., Christoffer, C. W., Zhu, M. & Kihara, D. Protein docking model evaluation by 3D deep convolutional neural networks. Bioinformatics 36, 2113–2118 (2020).

    Article 

    Google Scholar 

  • Wang, N.-N. et al. ADME properties evaluation in drug discovery: prediction of Caco-2 cell permeability using a combination of NSGA-II and boosting. J. Chem. Inf. Model. 56, 763–773 (2016).

    Article 

    Google Scholar 

  • Ishida, S., Terayama, K., Kojima, R., Takasu, K. & Okuno, Y. AI-driven synthetic route design incorporated with retrosynthesis knowledge. J. Chem. Inf. Model. 62, 1357–1367 (2022).

    Article 

    Google Scholar 

  • Karpov, P., Godin, G. & Tetko, I. V. A transformer model for retrosynthesis. In Artificial Neural Networks and Machine Learning—ICANN 2019: Workshop and Special Sessions (eds Tetko, I. V. et al.) 817–830 (Springer, 2019).

  • Öztürk, H., Özgür, A. & Ozkirimli, E. DeepDTA: deep drug–target binding affinity prediction. Bioinformatics 34, i821–i829 (2018).

    Article 

    Google Scholar 

  • Karimi, M., Wu, D., Wang, Z. & Shen, Y. DeepAffinity: interpretable deep learning of compound–protein affinity through unified recurrent and convolutional neural networks. Bioinformatics 35, 3329–3338 (2019).

    Article 

    Google Scholar 

  • Hassan-Harrirou, H., Zhang, C. & Lemmin, T. RosENet: improving binding affinity prediction by leveraging molecular mechanics energies with an ensemble of 3D convolutional neural networks. J. Chem. Inf. Model. 60, 2791–2802 (2020).

    Article 

    Google Scholar 

  • Feinberg, E. N. et al. PotentialNet for molecular property prediction. ACS Cent. Sci. 4, 1520–1530 (2018).

    Article 

    Google Scholar 

  • Li, Y., Rezaei, M. A., Li, C. & Li, X. DeepAtom: a framework for protein–ligand binding affinity prediction. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 303–310 (IEEE, 2019).

  • Wang, R., Fang, X., Lu, Y., Yang, C.-Y. & Wang, S. The PDBbind database: methodologies and updates. J. Med. Chem. 48, 4111–4119 (2005).

    Article 

    Google Scholar 

  • Liu, T., Lin, Y., Wen, X., Jorissen, R. N. & Gilson, M. K. BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities. Nucleic Acids Res. 35, D198–D201 (2007).

    Article 

    Google Scholar 

  • Hu, L., Benson, M. L., Smith, R. D., Lerner, M. G. & Carlson, H. A. Binding MOAD (Mother Of All Databases). Proteins Struct. Funct. Bioinform. 60, 333–340 (2005).

    Article 

    Google Scholar 

  • Friedrich, N.-O., Simsir, M. & Kirchmair, J. How diverse are the protein-bound conformations of small-molecule drugs and cofactors? Front. Chem. 6, 68 (2018).

    Article 

    Google Scholar 

  • Korlepara, D. B. et al. PLAS-5k: dataset of protein–ligand affinities from molecular dynamics for machine learning applications. Sci. Data 9, 548 (2022).

    Article 

    Google Scholar 

  • Korlepara, D. B. et al. PLAS-20k: extended dataset of protein–ligand affinities from MD simulations for machine learning applications. Sci. Data 11, 180 (2024).

    Article 

    Google Scholar 

  • Yang, J., Shen, C. & Huang, N. Predicting or pretending: artificial intelligence for protein–ligand interactions lack of sufficiently large and unbiased datasets. Front. Pharmacol. 11, 69 (2020).

    Article 

    Google Scholar 

  • Volkov, M. et al. On the frustration to predict binding affinities from protein–ligand structures with deep neural networks. J. Med. Chem. 65, 7946–7958 (2022).

    Article 

    Google Scholar 

  • Vajda, S., Beglov, D., Wakefield, A. E., Egbert, M. & Whitty, A. Cryptic binding sites on proteins: definition, detection, and druggability. Curr. Opin. Chem. Biol. 44, 1–8 (2018).

    Article 

    Google Scholar 

  • Zeng, L. et al. Selective small molecules blocking HIV-1 Tat and coactivator PCAF association. J. Am. Chem. Soc. 127, 2376–2377 (2005).

    Article 

    Google Scholar 

  • Johnson, R. D. III (ed). Computational Chemistry Comparison and Benchmark Database Standard Reference Database Number 101 Release 22 (NIST, accessed 12 Jul 2022); http://cccbdb.nist.gov/

  • Bista, M. et al. Transient protein states in designing inhibitors of the MDM2–p53 interaction. Structure 21, 2143–2151 (2013).

    Article 

    Google Scholar 

  • Xie, M. et al. Structural basis of inhibition of ERα–coactivator interaction by high-affinity N-terminus isoaspartic acid tethered helical peptides. J. Med. Chem. 60, 8731–8740 (2017).

    Article 

    Google Scholar 

  • Jakalian, A., Jack, D. B. & Bayly, C. I. Fast, efficient generation of high-quality atomic charges. AM1-BCC model: II. Parameterization and validation. J. Comput. Chem. 23, 1623–1641 (2002).

    Article 

    Google Scholar 

  • Dodda, L. S., Vilseck, J. Z., Tirado-Rives, J. & Jorgensen, W. L. 1.14*CM1A-LBCC: localized bond-charge corrected CM1A charges for condensed-phase simulations. J. Phys. Chem. B 121, 3864–3870 (2017).

    Article 

    Google Scholar 

  • Jorgensen, W. L., Maxwell, D. S. & Tirado-Rives, J. Development and testing of the OPLS all-atom force field on conformational energetics and properties of organic liquids. J. Am. Chem. Soc. 118, 11225–11236 (1996).

    Article 

    Google Scholar 

  • Storer, J. W., Giesen, D. J., Cramer, C. J. & Truhlar, D. G. Class IV charge models: a new semiempirical approach in quantum chemistry. J. Comput. Aided Mol. Des. 9, 87–110 (1995).

    Article 

    Google Scholar 

  • Li, J., Zhu, T., Cramer, C. J. & Truhlar, D. G. New class IV charge model for extracting accurate partial charges from wave functions. J. Phys. Chem. A 102, 1820–1831 (1998).

    Article 

    Google Scholar 

  • Thompson, J. D., Cramer, C. J. & Truhlar, D. G. Parameterization of charge model 3 for AM1, PM3, BLYP, and B3LYP. J. Comput. Chem. 24, 1291–1304 (2003).

    Article 

    Google Scholar 

  • Grimme, S. & Bannwarth, C. Ultra-fast computation of electronic spectra for large systems by tight-binding based simplified Tamm–Dancoff approximation (sTDA-xTB). J. Chem. Phys. 145, 054103 (2016).

    Article 

    Google Scholar 

  • Wang, E. et al. End-point binding free energy calculation with MM/PBSA and MM/GBSA: strategies and applications in drug design. Chem. Rev. 119, 9478–9508 (2019).

    Article 

    Google Scholar 

  • Sun, Z., Liu, Q., Qu, G., Feng, Y. & Reetz, M. T. Utility of B factors in protein science: interpreting rigidity, flexibility, and internal motion and engineering thermostability. Chem. Rev. 119, 1626–1665 (2019).

    Article 

    Google Scholar 

  • Guilligay, D. et al. The structural basis for cap binding by influenza virus polymerase subunit PB2. Nat. Struct. Mol. Biol. 15, 500–506 (2008).

    Article 

    Google Scholar 

  • Rayne, S. & Forest, K. Benchmarking semiempirical, Hartree–Fock, DFT, and MP2 methods against the ionization energies and electron affinities of short- through long-chain [n]acenes and [n]phenacenes. Can. J. Chem. 94, 251–258 (2016).

    Article 

    Google Scholar 

  • Zhan, C.-G., Nichols, J. A. & Dixon, D. A. Ionization potential, electron affinity, electronegativity, hardness, and electron excitation energy: molecular properties from density functional theory orbital energies. J. Phys. Chem. A 107, 4184–4195 (2003).

    Article 

    Google Scholar 

  • Lange, G. et al. Requirements for specific binding of low affinity inhibitor fragments to the SH2 domain of pp60Src are identical to those for high affinity binding of full length inhibitors. J. Med. Chem. 46, 5184–5195 (2003).

    Article 

    Google Scholar 

  • Öster, L., Tapani, S., Xue, Y. & Käck, H. Successful generation of structural information for fragment-based drug discovery. Drug Discov. Today 20, 1104–1111 (2015).

    Article 

    Google Scholar 

  • Heinzlmeir, S. et al. Chemoproteomics-aided medicinal chemistry for the discovery of EPHA2 inhibitors. ChemMedChem 12, 999–1011 (2017).

    Article 

    Google Scholar 

  • Gaieb, Z. et al. D3R Grand Challenge 2: blind prediction of protein–ligand poses, affinity rankings, and relative binding free energies. J. Comput. Aided Mol. Des. 32, 1–20 (2018).

    Article 

    Google Scholar 

  • Whitehouse, A. J. et al. Development of inhibitors against Mycobacterium abscessus tRNA (m1G37) methyltransferase (TrmD) using fragment-based approaches. J. Med. Chem. 62, 7210–7232 (2019).

    Article 

    Google Scholar 

  • Menezes, F. & Popowicz, G. M. ULYSSES: an efficient and easy to use semiempirical library for C. J. Chem. Inf. Model. 62, 3685–3694 (2022).

    Article 

    Google Scholar 

  • Bannwarth, C., Ehlert, S. & Grimme, S. GFN2-xTB—an accurate and broadly parametrized self-consistent tight-binding quantum chemical method with multipole electrostatics and density-dependent dispersion contributions. J. Chem. Theory Comput. 15, 1652–1671 (2019).

    Article 

    Google Scholar 

  • Dewar, M. J. S., Zoebisch, E. G., Healy, E. F. & Stewart, J. J. P. Development and use of quantum mechanical molecular models. 76. AM1: a new general purpose quantum mechanical molecular model. J. Am. Chem. Soc. 107, 3902–3909 (1985).

    Article 

    Google Scholar 

  • Stewart, J. J. P. Application of the PM6 method to modeling proteins. J. Mol. Model. 15, 765–805 (2009).

    Article 

    Google Scholar 

  • Sigalov, G., Fenley, A. & Onufriev, A. Analytical electrostatics for biomolecules: beyond the generalized Born approximation. J. Chem. Phys. 124, 124902 (2006).

    Article 

    Google Scholar 

  • Christensen, A. S., Kubař, T., Cui, Q. & Elstner, M. Semiempirical quantum mechanical methods for noncovalent interactions for chemical and biochemical applications. Chem. Rev. 116, 5301–5337 (2016).

    Article 

    Google Scholar 

  • Dixon, S. L. & Merz, K. M. Fast, accurate semiempirical molecular orbital calculations for macromolecules. J. Chem. Phys. 107, 879–893 (1997).

    Article 

    Google Scholar 

  • O’Boyle, N. M. et al. Open Babel: an open chemical toolbox. J. Cheminform. 3, 33 (2011).

    Article 

    Google Scholar 

  • Caldeweyher, E. et al. A generally applicable atomic-charge dependent London dispersion correction. J. Chem. Phys. 150, 154122 (2019).

    Article 

    Google Scholar 

  • Hanwell, M. D. et al. Avogadro: an advanced semantic chemical editor, visualization, and analysis platform. J. Cheminform. 4, 17 (2012).

    Article 

    Google Scholar 

  • Case, D. A. et al. Amber 2021 (Univ. of California, San Francisco, 2021).

  • Wang, J., Wolf, R. M., Caldwell, J. W., Kollman, P. A. & Case, D. A. Development and testing of a general Amber force field. J. Comput. Chem. 25, 1157–1174 (2004).

    Article 

    Google Scholar 

  • Maier, J. A. et al. ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99SB. J. Chem. Theory Comput. 11, 3696–3713 (2015).

    Article 

    Google Scholar 

  • Jorgensen, W. L., Chandrasekhar, J., Madura, J. D., Impey, R. W. & Klein, M. L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 79, 926–935 (1983).

    Article 

    Google Scholar 

  • Townshend, R. J. L. et al. ATOM3D: tasks on molecules in three dimensions. Preprint at https://doi.org/10.48550/arXiv.2012.04035 (2022).

  • Kipf, T. N. & Welling, M. Semi-supervised classification with graph convolutional networks. Preprint at https://doi.org/10.48550/arXiv.1609.02907 (2017).

  • Huang, Y., Niu, B., Gao, Y., Fu, L. & Li, W. CD-HIT Suite: a web server for clustering and comparing biological sequences. Bioinformatics 26, 680–682 (2010).

    Article 

    Google Scholar 

  • Forli, S. et al. Computational protein–ligand docking and virtual drug screening with the AutoDock suite. Nat. Protoc. 11, 905–919 (2016).

    Article 

    Google Scholar 

  • Zhao, Y., Stoffler, D. & Sanner, M. Hierarchical and multi-resolution representation of protein flexibility. Bioinformatics 22, 2768–2774 (2006).

    Article 

    Google Scholar 

  • Ravindranath, P. A., Forli, S., Goodsell, D. S., Olson, A. J. & Sanner, M. F. AutoDockFR: advances in protein–ligand docking with explicitly specified binding site flexibility. PLoS Comput. Biol. 11, e1004586 (2015).

    Article 

    Google Scholar 

  • Delaglio, F. et al. NMRPipe: a multidimensional spectral processing system based on UNIX pipes. J. Biomol. NMR 6, 277–293 (1995).

    Article 

    Google Scholar 

  • Johnson, B. A. & Blevins, R. A. NMR View: a computer program for the visualization and analysis of NMR data. J. Biomol. NMR 4, 603–614 (1994).

    Article 

    Google Scholar 

  • Siebenmorgen, T. et al. MISATO—machine learning dataset for structure-based drug discovery. Zenodo https://doi.org/10.5281/zenodo.7711953 (2023).

  • t7morgen/misato-dataset: release for publication. Zenodo https://doi.org/10.5281/zenodo.10926008 (2024).



  • Source link

    Leave a Reply

    Your email address will not be published. Required fields are marked *