Machine learning global atomic representations with Euclidean fast attention

Machine Learning


  • Rupp, M., Tkatchenko, A., Müller, K.-R. & Von Lilienfeld, O. A. Fast and accurate modeling of molecular atomization energies with machine learning. Phys. Rev. Lett. 108, 058301 (2012).

    Article 

    Google Scholar 

  • von Lilienfeld, O. A., Müller, K.-R. & Tkatchenko, A. Exploring chemical compound space with quantum-based machine learning. Nat. Rev. Chem. 4, 347–358 (2020).

    Article 

    Google Scholar 

  • Unke, O. T. et al. Machine learning force fields. Chem. Rev. 121, 10142–10186 (2021).

    Article 

    Google Scholar 

  • Keith, J. A. et al. Combining machine learning and computational chemistry for predictive insights into chemical systems. Chem. Rev. 121, 9816–9872 (2021).

    Article 

    Google Scholar 

  • Fu, X. et al. Forces are not enough: benchmark and critical evaluation for machine learning force fields with molecular simulations. TMLR 2835–8856 (2023).

  • Karplus, M. & McCammon, J. A. Molecular dynamics simulations of biomolecules. Nat. Struct. Mol. Biol. 9, 646–652 (2002).

    Article 

    Google Scholar 

  • Behler, J. & Parrinello, M. Generalized neural-network representation of high-dimensional potential-energy surfaces. Phys. Rev. Lett. 98, 146401 (2007).

    Article 

    Google Scholar 

  • Bartók, A. P., Payne, M. C., Kondor, R. & Csányi, G. Gaussian Approximation Potentials: the accuracy of quantum mechanics, without the electrons. Phys. Rev. Lett. 104, 136403 (2010).

    Article 

    Google Scholar 

  • Schütt, K. T., Sauceda, H. E., Kindermans, P.-J., Tkatchenko, A. & Müller, K.-R. SchNet—a deep learning architecture for molecules and materials. J. Chem. Phys. 148, 241722 (2018).

    Article 

    Google Scholar 

  • Schütt, K. T. et al. Machine Learning Meets Quantum Physics (Springer, 2020).

  • Unke, O. T. et al. Biomolecular dynamics with machine-learned quantum-mechanical force fields trained on diverse chemical fragments. Sci. Adv. 10, eadn4397 (2024).

    Article 

    Google Scholar 

  • Kabylda, A. et al. Molecular simulations with a pretrained neural network and universal pairwise force fields. J. Am. Chem. Soc. 147, 33723–33734 (2025).

    Article 

    Google Scholar 

  • Merchant, A. et al. Scaling deep learning for materials discovery. Nature 624, 80–85 (2023).

    Article 

    Google Scholar 

  • Woods, L. et al. Materials perspective on Casimir and van der Waals interactions. Rev. Mod. Phys. 88, 045003 (2016).

    Article 
    MathSciNet 

    Google Scholar 

  • Hermann, J., DiStasio, R. A. Jr & Tkatchenko, A. First-principles models for van der Waals interactions in molecules and materials: concepts, theory, and applications. Chem. Rev. 117, 4714–4758 (2017).

    Article 

    Google Scholar 

  • Stöhr, M., Van Voorhis, T. & Tkatchenko, A. Theory and practice of modeling van der Waals interactions in electronic-structure calculations. Chem. Soc. Rev. 48, 4118–4154 (2019).

    Article 

    Google Scholar 

  • Vaswani, A. et al. Attention is all you need. In Proc. Advances in Neural Information Processing Systems (Curran Associates, Inc., 2017); https://proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

  • Dosovitskiy, A. et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR, 2021).

  • Veličković, P. et al. Graph attention networks. In International Conference on Learning Representations (ICLR, 2018).

  • von Glehn, I., Spencer, J. S. & Pfau, D. A self-attention ansatz for ab-initio quantum chemistry. In International Conference on Learning Representations (ICLR, 2023).

  • Dao, T., Fu, D., Ermon, S., Rudra, A. & Ré, C. FlashAttention: fast and memory-efficient exact attention with IO-awareness. Adv. Neural Inf. Process. Syst. 35, 16344–16359 (2022).

    Article 

    Google Scholar 

  • Dao, T. Flashattention-2: faster attention with better parallelism and work partitioning. In International Conference on Learning Representations (ICLR, 2024).

  • Tay, Y., Dehghani, M., Bahri, D. & Metzler, D. Efficient transformers: a survey. ACM Comput. Surveys 55, 1–28 (2022).

    Article 

    Google Scholar 

  • Yuan, J. et al. Native sparse attention: Hardware-aligned and natively trainable sparse attention. In Proc. 63rd Annual Meeting of the Association for Computational Linguistics (eds Che, W. et al.) 23078–23097 (Association for Computational Linguistics, 2025).

  • Katharopoulos, A., Vyas, A., Pappas, N. & Fleuret, F. Transformers are RNNs: fast autoregressive transformers with linear attention. In Proc. 37th International Conference on Machine Learning (eds Daumé, H. III & Singh, Aarti) 5156–5165 (PMLR, 2020).

  • Choromanski, K. M. et al. Rethinking attention with performers. In International Conference on Learning Representations (ICLR, 2021).

  • Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. Neural message passing for quantum chemistry. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 1263–1272 (PMLR, 2017).

  • Zhang, L. et al. End-to-end symmetry preserving inter-atomic potential energy model for finite and extended systems. In Proc. Advances in Neural Information Processing Systems (Curran Associates, Inc., 2018); https://proceedings.neurips.cc/paper_files/paper/2018/file/e2ad76f2326fbc6b56a45a56c59fafdb-Paper.pdf

  • Unke, O. T. et al. SpookyNet: learning force fields with electronic degrees of freedom and nonlocal effects. Nat. Commun. 12, 7273 (2021).

    Article 

    Google Scholar 

  • Batzner, S. et al. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials. Nat. Commun. 13, 2453 (2022).

    Article 

    Google Scholar 

  • Frank, J. T., Unke, O. T., Müller, K.-R. & Chmiela, S. A Euclidean transformer for fast and stable machine learned force fields. Nat. Commun. 15, 6539 (2024).

    Article 

    Google Scholar 

  • Poltavsky, I. et al. Crash testing machine learning force fields for molecules, materials, and interfaces: model analysis in the tea challenge 2023. Chem. Sci. 16, 3720–3737 (2025).

    Article 

    Google Scholar 

  • Poltavsky, I. et al. Crash testing machine learning force fields for molecules, materials, and interfaces: molecular dynamics in the tea challenge 2023. Chem. Sci. 16, 3738–3754 (2025).

    Article 

    Google Scholar 

  • Smith, J. S., Isayev, O. & Roitberg, A. E. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chem. Sci. 8, 3192–3203 (2017).

    Article 

    Google Scholar 

  • Christensen, A. S., Bratholm, L. A., Faber, F. A. & Anatole von Lilienfeld, O. FCHL revisited: faster and more accurate quantum machine learning. J. Chem. Phys. 152, 044107 (2020).

    Article 

    Google Scholar 

  • Musaelian, A. et al. Learning local equivariant representations for large-scale atomistic dynamics. Nat. Commun. 14, 579 (2023).

    Article 

    Google Scholar 

  • Su, J. et al. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing 568, 127063 (2024).

    Article 

    Google Scholar 

  • Ewald, P. P. Die Berechnung optischer und elektrostatischer Gitterpotentiale. Ann. Phys. 369, 253–287 (1921).

    Article 

    Google Scholar 

  • Kosmala, A., Gasteiger, J., Gao, N. & Günnemann, S. Ewald-based long-range message passing for molecular graphs. In Proc. 40th International Conference on Machine Learning 17544–17563 (JMLR.org, 2023).

  • Frank, T., Unke, O. & Müller, K.-R. So3krates: equivariant attention for interactions on arbitrary length-scales in molecular systems. Adv. Neural Inf. Process. Syst. 35, 29400–29413 (2022).

    Article 

    Google Scholar 

  • Unke, O. T. & Maennel, H. E3x: E(3)-equivariant deep learning made easy. Preprint at https://arxiv.org/abs/2401.07595 (2024).

  • Liao, Y.-L. & Smidt, T. Equiformer: equivariant graph attention transformer for 3D atomistic graphs. Preprint at https://arxiv.org/abs/2206.11990 (2022).

  • Pozdnyakov, S. N. et al. Incompleteness of atomic structure representations. Phys. Rev. Lett. 125, 166001 (2020).

    Article 
    MathSciNet 

    Google Scholar 

  • Leman, A. & Weisfeiler, B. A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Technicheskaya Informatsiya 2, 12–16 (1968).

    Google Scholar 

  • Joshi, C. K., Bodnar, C., Mathis, S. V., Cohen, T. & Lio, P. On the expressive power of geometric graph neural networks. In Proc. 40th International Conference on Machine Learning 15330–15355 (PMLR, 2023).

  • Ko, T. W., Finkler, J. A., Goedecker, S. & Behler, J. A fourth-generation high-dimensional neural network potential with accurate electrostatics including non-local charge transfer. Nat. Commun. 12, 398 (2021).

    Article 

    Google Scholar 

  • Unke, O. T. & Meuwly, M. PhysNet: a neural network for predicting energies, forces, dipole moments, and partial charges. J. Chem. Theory Comput. 15, 3678–3693 (2019).

    Article 

    Google Scholar 

  • Donchev, A. G. et al. Quantum chemical benchmark databases of gold-standard dimer interaction energies. Sci. Data 8, 55 (2021).

    Article 

    Google Scholar 

  • Alon, U. & Yahav, E. On the bottleneck of graph neural networks and its practical implications. In International Conference on Learning Representations (ICLR, 2021).

  • Sauceda, H. E. et al. BIGDML—Towards accurate quantum machine learning force fields for materials. Nat. Commun. 13, 3733 (2022).

    Article 

    Google Scholar 

  • Muhli, H. et al. Machine learning force fields based on local parametrization of dispersion interactions: Application to the phase diagram of C60. Phys. Rev. B 104, 054106 (2021).

    Article 

    Google Scholar 

  • Westermayr, J., Chaudhuri, S., Jeindl, A., Hofmann, O. T. & Maurer, R. J. Long-range dispersion-inclusive machine learning potentials for structure search and optimization of hybrid organic–inorganic interfaces. Digit. Discov. 1, 463–475 (2022).

    Article 

    Google Scholar 

  • Artrith, N., Morawietz, T. & Behler, J. High-dimensional neural-network potentials for multicomponent systems: applications to zinc oxide. Phys. Rev. B 83, 153101 (2011).

    Article 

    Google Scholar 

  • Morawietz, T., Sharma, V. & Behler, J. A neural network potential-energy surface for the water dimer based on environment-dependent atomic energies and charges. J. Chem. Phys. 136, 064103 (2012).

    Article 

    Google Scholar 

  • Grisafi, A. & Ceriotti, M. Incorporating long-range physics in atomic-scale machine learning. J. Chem. Phys. 151, 204105 (2019).

    Article 

    Google Scholar 

  • Pagotto, J., Zhang, J. & Duignan, T. Predicting the properties of salt water using neural network potentials and continuum solvent theory. Preprint at chemRxiv https://doi.org/10.26434/chemrxiv-2022-jndlx (2022).

  • Li, Y. et al. Long-short-range message-passing: a physics-informed framework to capture non-local interaction for scalable molecular dynamics simulation. In International Conference on Learning Representations (ICLR, 2024).

  • Loche, P. et al. Fast and flexible long-range models for atomistic machine learning. J. Chem. Phys. 162, 142501 (2025).

    Article 

    Google Scholar 

  • Rokhlin, V. Rapid solution of integral equations of classical potential theory. J. Comput. Phys. 60, 187–207 (1985).

    Article 
    MathSciNet 

    Google Scholar 

  • King, D. S., Kim, D., Zhong, P. & Cheng, B. Machine learning of charges and long-range interactions from energies and forces. Nat. Commun. 16, 8763 (2025).

    Article 

    Google Scholar 

  • Cheng, B. Latent Ewald summation for machine learning of long-range interactions. npj Comput. Mater. 11, 80 (2025).

    Article 

    Google Scholar 

  • Gori, M., Kurian, P. & Tkatchenko, A. Second quantization of many-body dispersion interactions for chemical and biological systems. Nat. Commun. 14, 8218 (2023).

    Article 

    Google Scholar 

  • Batatia, I., Schaaf, L. L., Csanyi, G., Ortner, C. & Faber, F. A. Equivariant matrix function neural networks. In International Conference on Learning Representations (ICLR, 2024).

  • Wang, Y. et al. Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing. Nat. Commun. 15, 313 (2024).

    Article 

    Google Scholar 

  • Lebedev, V. I. Quadratures on a sphere. USSR Comput. Math. Math. Phys. 16, 10–24 (1976).

    Article 
    MathSciNet 

    Google Scholar 

  • Unke, O. et al. Se(3)-equivariant prediction of molecular wavefunctions and electronic densities. In Proc. 35th Conference on Neural Information Processing Systems (NeurIPS 2021) (Curran Associates, Inc., 2021); https://proceedings.neurips.cc/paper_files/paper/2021/file/78f1893678afbeaa90b1fa01b9cfb860-Paper.pdf

  • Weiler, M., Geiger, M., Welling, M., Boomsma, W. & Cohen, T. S. 3D steerable CNNs: learning rotationally equivariant features in volumetric data. In Proc. 32nd Conference on Neural Information Processing Systems (NeurIPS 2018) (Curran Associates, Inc., 2018); https://proceedings.neurips.cc/paper_files/paper/2018/file/488e4104520c6aab692863cc1dba45af-Paper.pdf

  • Hendrycks, D. & Gimpel, K. Gaussian error linear units (GELUs). Preprint at https://arxiv.org/abs/1606.08415 (2016).

  • Thomas, N. et al. Tensor field networks: rotation- and translation-equivariant neural networks for 3D point clouds. Preprint at https://arxiv.org/abs/1802.08219 (2018).

  • Bradbury, J. et al. JAX: composable transformations of Python+NumPy programs. GitHub http://github.com/google/jax (2018).

  • Heek, J. et al. Flax: a neural network library and ecosystem for JAX. GitHub http://github.com/google/flax (2020).

  • Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).

    Article 

    Google Scholar 

  • Larsen, A. H. et al. The atomic simulation environment—a Python library for working with atoms. J. Phys. Condens. Matter 29, 273002 (2017).

    Article 

    Google Scholar 

  • Elfwing, S., Uchibe, E. & Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural networks 107, 3–11 (2018).

    Article 

    Google Scholar 

  • Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1802.08219 (2014).

  • Hessel, M. et al. Optax: composable gradient transformation and optimisation, in jax!. GitHub http://github.com/deepmind/optax (2020).

  • Tancik, M. et al. Fourier features let networks learn high frequency functions in low dimensional domains. Adv. Neural Inf. Process. Syst. 33, 7537–7547 (2020).

    Google Scholar 

  • Eastman, P. et al. Spice, a dataset of drug-like molecules and peptides for training machine learning potentials. Sci. Data 10, 11 (2023).

    Article 

    Google Scholar 

  • Frank, J. T., Chmiela, S., Müller, K.-R. & Unke, O. T. Machine learning global atomic representations with euclidean fast attention. Zenodo https://doi.org/10.5281/zenodo.14750285 (2026).

  • Frank, T. & Engelberger, F. thorben-frank/euclidean_fast_attention: publication. Zenodo https://doi.org/10.5281/zenodo.18171624 (2026).



  • Source link