SINDy-RL for interpretable and efficient model-based reinforcement learning

Machine Learning


  • Szeliski, R. Computer vision: algorithms and applications. Springer Nature, (2022).

  • Khurana, D., Koli, A., Khatter, K. & Singh, S. Natural language processing: State of the art, current trends and challenges. Multimed. Tools Appl. 82, 3713–3744 (2023).

    Article 
    PubMed 

    Google Scholar 

  • Kober, J. & Peters, J. Reinforcement learning in robotics: A survey. In Reinforcement Learning, pages 579–610. Springer, (2012).

  • Dutta, S. Reinforcement Learning with TensorFlow: A beginner’s guide to designing self-learning systems with TensorFlow and OpenAI Gym. Packt Publishing Ltd, (2018).

  • Recht, B. A tour of reinforcement learning: The view from continuous control. Annu. Rev. Control, Robot., Auton. Syst. 2, 253–279 (2019).

    Article 

    Google Scholar 

  • Agarwal, A., Jiang, N., Kakade, S. M. & Sun, W. Reinforcement learning: Theory and algorithms. CS Dept., UW Seattle, Seattle, WA, USA, Tech. Rep, (2019).

  • Van Hasselt, H., Guez, A., & Silver, D. Deep reinforcement learning with double Q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, (2016).

  • Wang, Z. et al. Dueling network architectures for deep reinforcement learning. In International Conference on Machine Learning, pages 1995–2003. PMLR (2016).

  • Qureshi, A. H., Boots, B., & Yip, M. C. Adversarial imitation via variational inverse reinforcement learning. In International Conference on Learning Representations. https://openreview.net/forum?id=HJlmHoR5tQ (2019).

  • Cheng, C.-A., Yan, X., Wagener, N. & Bootsm, B. Fast policy learning through imitation and reinforcement. In Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence, (2018).

  • Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529 (2015).

    Article 
    ADS 
    PubMed 

    Google Scholar 

  • Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).

    Article 
    ADS 
    PubMed 

    Google Scholar 

  • Silver, D. et al. A general reinforcement learning algorithm that masters chess, Shogi, and Go through self-play. Science 362, 1140–1144 (2018).

    Article 
    ADS 
    MathSciNet 
    PubMed 

    Google Scholar 

  • Berner, C. et al. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, (2019).

  • Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).

    Article 
    ADS 
    PubMed 

    Google Scholar 

  • Kaufmann, E. et al. Champion-level drone racing using deep reinforcement learning. Nature 620, 982–987 (2023).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Degrave, J. et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602, 414–419 (2022).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Gazzola, M., Hejazialhosseini, B. & Koumoutsakos, P. Reinforcement learning and wavelet-adapted vortex methods for simulations of self-propelled swimmers. SIAM J. Sci. Comput. 36, B622–B639 (2014).

    Article 
    MathSciNet 

    Google Scholar 

  • Colabrese, S., Gustavsson, K., Celani, A. & Biferale, L. Flow navigation by smart microswimmers via reinforcement learning. Phys. Rev. Lett. 118, 158004 (2017).

    Article 
    ADS 
    PubMed 

    Google Scholar 

  • Pivot, C., Mathelin, L., Cordier, L., Guéniat, F. & Noack, B. R. A continuous reinforcement learning strategy for closed-loop control in fluid dynamics. In 35th AIAA Applied Aerodynamics Conference, page 3566, (2017).

  • Verma, S., Novati, G. & Koumoutsakos, P. Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proc. Natl Acad. Sci. 115, 5849–5854 (2018).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Biferale, L., Bonaccorso, F., Buzzicotti, M., Clark Di Leoni, P. & Gustavsson, K. Zermelo’s problem: Optimal point-to-point navigation in 2d turbulent flows using reinforcement learning. Chaos: Interdiscip. J. Nonlinear Sci. 29, 103138 (2019).

    Article 
    MathSciNet 

    Google Scholar 

  • Novati, G., Mahadevan, L. & Koumoutsakos, P. Controlled gliding and perching through deep-reinforcement-learning. Phys. Rev. Fluids 4, 093902 (2019).

    Article 
    ADS 

    Google Scholar 

  • Fan, D., Yang, L., Wang, Z., Triantafyllou, M. S. & Karniadakis, G. E. Reinforcement learning for bluff body active flow control in experiments and simulations. Proc. Natl Acad. Sci. 117, 26091–26098 (2020).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Rabault, J. & Kuhnle, A. Deep Reinforcement Learning Applied to Active Flow Control, page 368–390. Cambridge University Press, (2023).

  • Beintema, G., Corbetta, A., Biferale, L., & Toschi, F. Controlling rayleigh-bénard convection via reinforcement learning. J. Turbul, 21, 585–605 (2020).

  • Novati, G., de Laroussilhe, H. L. & Koumoutsakos, P. Automating turbulence modelling by multi-agent reinforcement learning. Nat. Mach. Intell. 3, 87–96 (2021).

    Article 

    Google Scholar 

  • Bae, H. J. & Koumoutsakos, P. Scientific multi-agent reinforcement learning for wall-models of turbulent flows. Nat. Commun. 13, 1443 (2022).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Levine, S., Kumar, A., Tucker, G., & Fu, J. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, (2020).

  • Schaul, T., Quan, J., Antonoglou, I., & Silver, D. Prioritized experience replay. Proceedings of the International Conference on Learning Representations, (2016).

  • Rolnick, D., Ahuja, A., Schwarz, J., Lillicrap, T., & Wayne, G. Experience replay for continual learning. Advances in Neural Information Processing Systems, 32, (2019).

  • Marcin Andrychowicz et al. Hindsight experience replay. Adv. Neural Inf. Process. Syst. 30, (2017).

  • Zhu, Z., Lin, K., Jain, A. K., & Zhou, J. Transfer learning in deep reinforcement learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, (2023).

  • Wang, J. X. et al. Learning to reinforcement learn. In Proceedings of the Annual Meeting of the Cognitive Science Society, vol 39, https://escholarship.org/uc/item/1tn6q2t7 (2016).

  • Finn, C., Abbeel, P., & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, pages 1126–1135. PMLR, (2017).

  • Sutton, R. S. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Machine Learning Proceedings 1990, pages 216–224. Elsevier, (1990).

  • Wang, T. et al. Benchmarking model-based reinforcement learning. arXiv preprint arXiv:1907.02057, (2019).

  • Brunton, S. L., Proctor, J. L. & Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl Acad. Sci. 113, 3932–3937 (2016).

    Article 
    ADS 
    MathSciNet 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kaiser, E., Kutz, J. N. & Brunton, S. L. Sparse identification of nonlinear dynamics for model predictive control in the low-data limit. Proc. R. Soc. A 474, 20180335 (2018).

    Article 
    ADS 
    MathSciNet 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Lore, J. D. et al. Time-dependent SOLPS-ITER simulations of the tokamak plasma boundary for model predictive control using SINDy. Nucl. Fusion 63, 046015 (2023).

    Article 
    ADS 

    Google Scholar 

  • Farsi, M. & Liu, J. Structured online learning-based control of continuous-time nonlinear systems. IFAC-PapersOnLine 53, 8142–8149 (2020).

    Article 

    Google Scholar 

  • Arora, R., da Silva, B. C., & Moss, E. Model-based reinforcement learning with SINDy. In Decision Awareness in Reinforcement Learning Workshop at ICML, https://openreview.net/forum?id=3xBZY7LGorK (2022).

  • Fasel, U., Kutz, J. N., Brunton, B. W. & Brunton, S. L. Ensemble-SINDy: Robust sparse model discovery in the low-data, high-noise limit, with active learning and control. Proc. R. Soc. A 478, 20210904 (2022).

    Article 
    ADS 
    MathSciNet 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Tunyasuvunakool, S. et al. dm_control: Software and tasks for continuous control. Softw. Impacts 6, 100022 (2020).

    Article 

    Google Scholar 

  • Brockman, G. et al. Openai gym. arXiv preprint arXiv:1606.01540, (2016).

  • Lagemann, C. et al. Hydrogym: A reinforcement learning platform for fluid dynamics. In 7th Annual Learning for Dynamics\& Control Conference, pages 497–512. PMLR, (2025).

  • Lagemann, C. et al. Hydrogym-gpu: From 2d to 3d benchmark environments for reinforcement learning in fluid flows. In Proceedings of the 35th International Conference on Parallel Computational Fluid Dynamics (ParCFD2024), Bonn, Germany, September (2024).

  • Rudy, S. H., Brunton, S. L., Proctor, J. L., & Kutz, J. N. Data-driven discovery of partial differential equations. Science Advances, 3(e1602614), (2017).

  • Schaeffer, H. & McCalla, S. G. Sparse model selection via integral terms. Phys. Rev. E 96, 023302 (2017).

    Article 
    ADS 
    MathSciNet 
    PubMed 

    Google Scholar 

  • Reinbold, P. A. K., Gurevich, D. R. & Grigoriev, R. O. Using noisy or incomplete data to discover models of spatiotemporal dynamics. Phys. Rev. E 101, 010203 (2020).

    Article 
    ADS 
    PubMed 

    Google Scholar 

  • Messenger, D. A. & Bortz, D. M. Weak SINDy for partial differential equations. J. Comput. Phys. 443, 110525 (2021).

    Article 
    MathSciNet 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Messenger, D. A. & Bortz, D. M. Weak SINDy: Galerkin-based data-driven model selection. Multiscale Model. Simul. 19, 1474–1497 (2021).

    Article 
    MathSciNet 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kaptanoglu, A. A., Callaham, J. L., Hansen, C. J., Aravkin, A., & Brunton, S. L. Promoting global stability in data-driven models of quadratic nonlinear dynamics. Physical Review Fluids, 6(094401), (2021).

  • Forootani, A., Goyal, P., & Benner, P. A robust sindy approach by combining neural networks and an integral form. arXiv preprint arXiv:2309.07193, (2023).

  • Schroeder, M. Synthesis of low-peak-factor signals and binary sequences with low autocorrelation (corresp.). IEEE Trans. Inf. Theory 16, 85–89 (1970).

    Article 
    ADS 

    Google Scholar 

  • Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, (2017).

  • Ng, A. Y., Harada, D., & Russell, S. Policy invariance under reward transformations: Theory and application to reward shaping. In Icml, volume 99, pages 278–287. Citeseer, (1999).

  • Arora, S. & Doshi, P. A survey of inverse reinforcement learning: Challenges, methods and progress. Artif. Intell. 297, 103500 (2021).

    Article 
    MathSciNet 

    Google Scholar 

  • Hussein, A., Gaber, M. M., Elyan, E. & Jayne, C. Imitation learning: A survey of learning methods. ACM Comput. Surv. (CSUR) 50, 1–35 (2017).

    Article 

    Google Scholar 

  • Mania, H., Guy, A., & Recht, B. Simple random search of static linear policies is competitive for reinforcement learning. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc. (2018).

  • Rajeswaran, A., Lowrey, K., Todorov, E. V., & Kakade, S. M. Towards generalization and simplicity in continuous control. Advances in Neural Information Processing Systems, 30, (2017).

  • Zhu, F., Jing, D., Leve, F. & Ferrari, S. Nn-poly: Approximating common neural networks with Taylor polynomials to imbue dynamical system constraints. Front. Robot. AI 9, 968305 (2022).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Clavera, I. et al. Model-based reinforcement learning via meta-policy optimization. In Conference on Robot Learning, pages 617–629. PMLR, (2018).

  • Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Choi, S., Choi, H. & Kang, S. Characteristics of flow over a rotationally oscillating cylinder at low Reynolds number. Phys. Fluids 14, 2767–2777 (2002).

    Article 
    ADS 

    Google Scholar 

  • Rabault, J., Kuchta, M., Jensen, A., Réglade, U. & Cerardi, N. Artificial neural networks trained through deep reinforcement learning discover control strategies for active flow control. J. Fluid Mech. 865, 281–302 (2019).

    Article 
    ADS 
    MathSciNet 

    Google Scholar 

  • Liaw, R. et al. Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118, https://openreview.net/forum?id=KHlWtBm8BJ; https://docs.ray.io/en/latest/tune/index.html#citing-tune (2018).

  • Weng, J. et al. Tianshou: A highly modularized deep reinforcement learning library. J. Mach. Learn. Res. 23, 12275–12280 (2022).

    ADS 
    MathSciNet 

    Google Scholar 

  • Huang, S. et al. Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms. J. Mach. Learn. Res. 23, 1–18 (2022).

    MathSciNet 

    Google Scholar 

  • Franceschetti, M., Lacoux, C., Ohouens, R., Raffin, A., & Sigaud, O. Making reinforcement learning work on swimmer. arXiv preprint arXiv:2208.07587, (2022).

  • Jaderberg, M. et al. Population based training of neural networks. arXiv preprint arXiv:1711.09846, (2017).

  • Deng, N., Noack, B. R., Morzyński, M. & Pastur, L. R. Low-order model for successive bifurcations of the fluidic pinball. J. Fluid Mech. 884, A37 (2020).

    Article 
    ADS 
    MathSciNet 

    Google Scholar 

  • Champion, K., Lusch, B., Kutz, J. N. & Brunton, S. L. Data-driven discovery of coordinates and governing equations. Proc. Natl Acad. Sci. 116, 22445–22451 (2019).

    Article 
    ADS 
    MathSciNet 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Cranmer, M. et al. Discovering symbolic models from deep learning with inductive biases. Adv. Neural Inf. Process. Syst. 33, 17429–17442 (2020).

    Google Scholar 

  • Udrescu, S.-M. & Tegmark, M. AI Feynman: A physics-inspired method for symbolic regression. Sci. Adv. 6, eaay2631 (2020).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar 

  • Kim, S. et al. Integration of neural network-based symbolic regression in deep learning for scientific discovery. IEEE Trans. Neural Netw. Learn. Syst. 32, 4166–4177 (2020).

    Article 

    Google Scholar 

  • Sahoo, S., Lampert, C., & Martius, G. Learning equations for extrapolation and control. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4442–4450. PMLR, 10–15 Jul (2018).

  • Both, G.-J., Choudhury, S., Sens, P. & Kusters, R. Deepmod: Deep learning for model discovery in noisy data. J. Comput. Phys. 428, 109985 (2021).

    Article 
    MathSciNet 

    Google Scholar 

  • Pathak, D., Agrawal, P., Efros, A. A., & Darrell, T. Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pages 2778–2787. PMLR, (2017).

  • Loiseau, J.-C. & Brunton, S. L. Constrained sparse Galerkin regression. J. Fluid Mech. 838, 42–67 (2018).

    Article 
    ADS 
    MathSciNet 

    Google Scholar 

  • Otto, S. E., Zolman, N., Kutz, J. N., & Brunton, S. L. A unified framework to enforce, discover, and promote symmetry in machine learning. arXiv preprint arXiv:2311.00212, (2023).

  • Ahmadi, A. A. & El Khadir, B. Learning dynamical systems with side information. In Learning for Dynamics and Control, pages 718–727. PMLR, (2020).

  • Bramburger, J. J., Dahdah, S., & Forbes, J. R. Synthesizing control laws from data using sum-of-squares optimization. In 2024 IEEE Conference on Control Technology and Applications (CCTA), pages 505–510. IEEE, (2024).

  • Wolf, F., Botteghi, N., Fasel, U., & Manzoni, A. Interpretable and efficient data-driven discovery and control of distributed systems. arXiv preprint arXiv:2411.04098, (2024).

  • Bakarji, J., Champion, K., Kutz, J. N. & Brunton, S. L. Discovering governing equations from partial measurements with deep delay autoencoders. Proc. R. Soc. A 479, 20230422 (2023).

    Article 
    ADS 
    MathSciNet 

    Google Scholar 

  • Salimans, T., Ho, J., Chen, X., Sidor, S., & Sutskever, I. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864, (2017).

  • Lopez, B. T., Slotine, J.-J. E., & How, J. P. Dynamic tube MPC for nonlinear systems. In 2019 American Control Conference (ACC), pages 1655–1662. IEEE (2019).

  • Zolman, N., Lagemann, C., Fasel, U., Kutz, J. N. & Brunton, S. L. sindy-rl_data (revision d295c18), (2025).

  • Zolman, N., Lagemann, C., Fasel, U., Kutz, J. N. & Brunton, S. L. nzolman/sindy-rl: nat-comms-v1, September (2025).

  • Zolman, N., Lagemann, C., Fasel, U., Kutz, J. N. & Brunton, S. L. nzolman/sindy-rl_3dairfoil: nat-comms-v1, September (2025).



  • Source link