SINDy-RL for interpretable and efficient model-based reinforcement learning

Szeliski, R. Computer vision: algorithms and applications. Springer Nature, (2022).

Khurana, D., Koli, A., Khatter, K. & Singh, S. Natural language processing: State of the art, current trends and challenges. Multimed. Tools Appl. 82, 3713–3744 (2023).

Article
PubMed

Google Scholar

Kober, J. & Peters, J. Reinforcement learning in robotics: A survey. In Reinforcement Learning, pages 579–610. Springer, (2012).

Dutta, S. Reinforcement Learning with TensorFlow: A beginner’s guide to designing self-learning systems with TensorFlow and OpenAI Gym. Packt Publishing Ltd, (2018).

Recht, B. A tour of reinforcement learning: The view from continuous control. Annu. Rev. Control, Robot., Auton. Syst. 2, 253–279 (2019).

Article

Google Scholar

Agarwal, A., Jiang, N., Kakade, S. M. & Sun, W. Reinforcement learning: Theory and algorithms. CS Dept., UW Seattle, Seattle, WA, USA, Tech. Rep, (2019).

Van Hasselt, H., Guez, A., & Silver, D. Deep reinforcement learning with double Q-learning. In Proceedings of the AAAI conference on artificial intelligence, volume 30, (2016).

Wang, Z. et al. Dueling network architectures for deep reinforcement learning. In International Conference on Machine Learning, pages 1995–2003. PMLR (2016).

Qureshi, A. H., Boots, B., & Yip, M. C. Adversarial imitation via variational inverse reinforcement learning. In International Conference on Learning Representations. https://openreview.net/forum?id=HJlmHoR5tQ (2019).

Cheng, C.-A., Yan, X., Wagener, N. & Bootsm, B. Fast policy learning through imitation and reinforcement. In Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence, (2018).

Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529 (2015).

Article
ADS
PubMed

Google Scholar

Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).

Article
ADS
PubMed

Google Scholar

Silver, D. et al. A general reinforcement learning algorithm that masters chess, Shogi, and Go through self-play. Science 362, 1140–1144 (2018).

Article
ADS
MathSciNet
PubMed

Google Scholar

Berner, C. et al. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680, (2019).

Vinyals, O. et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature 575, 350–354 (2019).

Article
ADS
PubMed

Google Scholar

Kaufmann, E. et al. Champion-level drone racing using deep reinforcement learning. Nature 620, 982–987 (2023).

Article
ADS
PubMed
PubMed Central

Google Scholar

Degrave, J. et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602, 414–419 (2022).

Article
ADS
PubMed
PubMed Central

Google Scholar

Popova, M., Isayev, O. & Tropsha, A. Deep reinforcement learning for de novo drug design. Sci. Adv. 4, eaap7885 (2018).

Article
ADS
PubMed
PubMed Central

Google Scholar

Gazzola, M., Hejazialhosseini, B. & Koumoutsakos, P. Reinforcement learning and wavelet-adapted vortex methods for simulations of self-propelled swimmers. SIAM J. Sci. Comput. 36, B622–B639 (2014).

Article
MathSciNet

Google Scholar

Colabrese, S., Gustavsson, K., Celani, A. & Biferale, L. Flow navigation by smart microswimmers via reinforcement learning. Phys. Rev. Lett. 118, 158004 (2017).

Article
ADS
PubMed

Google Scholar

Pivot, C., Mathelin, L., Cordier, L., Guéniat, F. & Noack, B. R. A continuous reinforcement learning strategy for closed-loop control in fluid dynamics. In 35th AIAA Applied Aerodynamics Conference, page 3566, (2017).

Verma, S., Novati, G. & Koumoutsakos, P. Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proc. Natl Acad. Sci. 115, 5849–5854 (2018).

Article
ADS
PubMed
PubMed Central

Google Scholar

Biferale, L., Bonaccorso, F., Buzzicotti, M., Clark Di Leoni, P. & Gustavsson, K. Zermelo’s problem: Optimal point-to-point navigation in 2d turbulent flows using reinforcement learning. Chaos: Interdiscip. J. Nonlinear Sci. 29, 103138 (2019).

Article
MathSciNet

Google Scholar

Novati, G., Mahadevan, L. & Koumoutsakos, P. Controlled gliding and perching through deep-reinforcement-learning. Phys. Rev. Fluids 4, 093902 (2019).

Article
ADS

Google Scholar

Fan, D., Yang, L., Wang, Z., Triantafyllou, M. S. & Karniadakis, G. E. Reinforcement learning for bluff body active flow control in experiments and simulations. Proc. Natl Acad. Sci. 117, 26091–26098 (2020).

Article
ADS
PubMed
PubMed Central

Google Scholar

Rabault, J. & Kuhnle, A. Deep Reinforcement Learning Applied to Active Flow Control, page 368–390. Cambridge University Press, (2023).

Beintema, G., Corbetta, A., Biferale, L., & Toschi, F. Controlling rayleigh-bénard convection via reinforcement learning. J. Turbul, 21, 585–605 (2020).

Novati, G., de Laroussilhe, H. L. & Koumoutsakos, P. Automating turbulence modelling by multi-agent reinforcement learning. Nat. Mach. Intell. 3, 87–96 (2021).

Article

Google Scholar

Bae, H. J. & Koumoutsakos, P. Scientific multi-agent reinforcement learning for wall-models of turbulent flows. Nat. Commun. 13, 1443 (2022).

Article
ADS
PubMed
PubMed Central

Google Scholar

Levine, S., Kumar, A., Tucker, G., & Fu, J. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, (2020).

Schaul, T., Quan, J., Antonoglou, I., & Silver, D. Prioritized experience replay. Proceedings of the International Conference on Learning Representations, (2016).

Rolnick, D., Ahuja, A., Schwarz, J., Lillicrap, T., & Wayne, G. Experience replay for continual learning. Advances in Neural Information Processing Systems, 32, (2019).

Marcin Andrychowicz et al. Hindsight experience replay. Adv. Neural Inf. Process. Syst. 30, (2017).

Zhu, Z., Lin, K., Jain, A. K., & Zhou, J. Transfer learning in deep reinforcement learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, (2023).

Wang, J. X. et al. Learning to reinforcement learn. In Proceedings of the Annual Meeting of the Cognitive Science Society, vol 39, https://escholarship.org/uc/item/1tn6q2t7 (2016).

Finn, C., Abbeel, P., & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, pages 1126–1135. PMLR, (2017).

Sutton, R. S. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Machine Learning Proceedings 1990, pages 216–224. Elsevier, (1990).

Wang, T. et al. Benchmarking model-based reinforcement learning. arXiv preprint arXiv:1907.02057, (2019).

Brunton, S. L., Proctor, J. L. & Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proc. Natl Acad. Sci. 113, 3932–3937 (2016).

Article
ADS
MathSciNet
PubMed
PubMed Central

Google Scholar

Kaiser, E., Kutz, J. N. & Brunton, S. L. Sparse identification of nonlinear dynamics for model predictive control in the low-data limit. Proc. R. Soc. A 474, 20180335 (2018).

Article
ADS
MathSciNet
PubMed
PubMed Central

Google Scholar

Lore, J. D. et al. Time-dependent SOLPS-ITER simulations of the tokamak plasma boundary for model predictive control using SINDy. Nucl. Fusion 63, 046015 (2023).

Article
ADS

Google Scholar

Farsi, M. & Liu, J. Structured online learning-based control of continuous-time nonlinear systems. IFAC-PapersOnLine 53, 8142–8149 (2020).

Article

Google Scholar

Arora, R., da Silva, B. C., & Moss, E. Model-based reinforcement learning with SINDy. In Decision Awareness in Reinforcement Learning Workshop at ICML, https://openreview.net/forum?id=3xBZY7LGorK (2022).

Fasel, U., Kutz, J. N., Brunton, B. W. & Brunton, S. L. Ensemble-SINDy: Robust sparse model discovery in the low-data, high-noise limit, with active learning and control. Proc. R. Soc. A 478, 20210904 (2022).

Article
ADS
MathSciNet
PubMed
PubMed Central

Google Scholar

Tunyasuvunakool, S. et al. dm_control: Software and tasks for continuous control. Softw. Impacts 6, 100022 (2020).

Article

Google Scholar

Brockman, G. et al. Openai gym. arXiv preprint arXiv:1606.01540, (2016).

Lagemann, C. et al. Hydrogym: A reinforcement learning platform for fluid dynamics. In 7th Annual Learning for Dynamics\& Control Conference, pages 497–512. PMLR, (2025).

Lagemann, C. et al. Hydrogym-gpu: From 2d to 3d benchmark environments for reinforcement learning in fluid flows. In Proceedings of the 35th International Conference on Parallel Computational Fluid Dynamics (ParCFD2024), Bonn, Germany, September (2024).

Rudy, S. H., Brunton, S. L., Proctor, J. L., & Kutz, J. N. Data-driven discovery of partial differential equations. Science Advances, 3(e1602614), (2017).

Schaeffer, H. & McCalla, S. G. Sparse model selection via integral terms. Phys. Rev. E 96, 023302 (2017).

Article
ADS
MathSciNet
PubMed

Google Scholar

Reinbold, P. A. K., Gurevich, D. R. & Grigoriev, R. O. Using noisy or incomplete data to discover models of spatiotemporal dynamics. Phys. Rev. E 101, 010203 (2020).

Article
ADS
PubMed

Google Scholar

Messenger, D. A. & Bortz, D. M. Weak SINDy for partial differential equations. J. Comput. Phys. 443, 110525 (2021).

Article
MathSciNet
PubMed
PubMed Central

Google Scholar

Messenger, D. A. & Bortz, D. M. Weak SINDy: Galerkin-based data-driven model selection. Multiscale Model. Simul. 19, 1474–1497 (2021).

Article
MathSciNet
PubMed
PubMed Central

Google Scholar

Kaptanoglu, A. A., Callaham, J. L., Hansen, C. J., Aravkin, A., & Brunton, S. L. Promoting global stability in data-driven models of quadratic nonlinear dynamics. Physical Review Fluids, 6(094401), (2021).

Forootani, A., Goyal, P., & Benner, P. A robust sindy approach by combining neural networks and an integral form. arXiv preprint arXiv:2309.07193, (2023).

Schroeder, M. Synthesis of low-peak-factor signals and binary sequences with low autocorrelation (corresp.). IEEE Trans. Inf. Theory 16, 85–89 (1970).

Article
ADS

Google Scholar

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, (2017).

Ng, A. Y., Harada, D., & Russell, S. Policy invariance under reward transformations: Theory and application to reward shaping. In Icml, volume 99, pages 278–287. Citeseer, (1999).

Arora, S. & Doshi, P. A survey of inverse reinforcement learning: Challenges, methods and progress. Artif. Intell. 297, 103500 (2021).

Article
MathSciNet

Google Scholar

Hussein, A., Gaber, M. M., Elyan, E. & Jayne, C. Imitation learning: A survey of learning methods. ACM Comput. Surv. (CSUR) 50, 1–35 (2017).

Article

Google Scholar

Mania, H., Guy, A., & Recht, B. Simple random search of static linear policies is competitive for reinforcement learning. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc. (2018).

Rajeswaran, A., Lowrey, K., Todorov, E. V., & Kakade, S. M. Towards generalization and simplicity in continuous control. Advances in Neural Information Processing Systems, 30, (2017).

Zhu, F., Jing, D., Leve, F. & Ferrari, S. Nn-poly: Approximating common neural networks with Taylor polynomials to imbue dynamical system constraints. Front. Robot. AI 9, 968305 (2022).

Article
PubMed
PubMed Central

Google Scholar

Clavera, I. et al. Model-based reinforcement learning via meta-policy optimization. In Conference on Robot Learning, pages 617–629. PMLR, (2018).

Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).

Article
ADS
PubMed
PubMed Central

Google Scholar

Choi, S., Choi, H. & Kang, S. Characteristics of flow over a rotationally oscillating cylinder at low Reynolds number. Phys. Fluids 14, 2767–2777 (2002).

Article
ADS

Google Scholar

Rabault, J., Kuchta, M., Jensen, A., Réglade, U. & Cerardi, N. Artificial neural networks trained through deep reinforcement learning discover control strategies for active flow control. J. Fluid Mech. 865, 281–302 (2019).

Article
ADS
MathSciNet

Google Scholar

Liaw, R. et al. Tune: A research platform for distributed model selection and training. arXiv preprint arXiv:1807.05118, https://openreview.net/forum?id=KHlWtBm8BJ; https://docs.ray.io/en/latest/tune/index.html#citing-tune (2018).

Weng, J. et al. Tianshou: A highly modularized deep reinforcement learning library. J. Mach. Learn. Res. 23, 12275–12280 (2022).

ADS
MathSciNet

Google Scholar

Huang, S. et al. Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms. J. Mach. Learn. Res. 23, 1–18 (2022).

MathSciNet

Google Scholar

Franceschetti, M., Lacoux, C., Ohouens, R., Raffin, A., & Sigaud, O. Making reinforcement learning work on swimmer. arXiv preprint arXiv:2208.07587, (2022).

Jaderberg, M. et al. Population based training of neural networks. arXiv preprint arXiv:1711.09846, (2017).

Deng, N., Noack, B. R., Morzyński, M. & Pastur, L. R. Low-order model for successive bifurcations of the fluidic pinball. J. Fluid Mech. 884, A37 (2020).

Article
ADS
MathSciNet

Google Scholar

Champion, K., Lusch, B., Kutz, J. N. & Brunton, S. L. Data-driven discovery of coordinates and governing equations. Proc. Natl Acad. Sci. 116, 22445–22451 (2019).

Article
ADS
MathSciNet
PubMed
PubMed Central

Google Scholar

Cranmer, M. et al. Discovering symbolic models from deep learning with inductive biases. Adv. Neural Inf. Process. Syst. 33, 17429–17442 (2020).

Google Scholar

Udrescu, S.-M. & Tegmark, M. AI Feynman: A physics-inspired method for symbolic regression. Sci. Adv. 6, eaay2631 (2020).

Article
ADS
PubMed
PubMed Central

Google Scholar

Kim, S. et al. Integration of neural network-based symbolic regression in deep learning for scientific discovery. IEEE Trans. Neural Netw. Learn. Syst. 32, 4166–4177 (2020).

Article

Google Scholar

Sahoo, S., Lampert, C., & Martius, G. Learning equations for extrapolation and control. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 4442–4450. PMLR, 10–15 Jul (2018).

Both, G.-J., Choudhury, S., Sens, P. & Kusters, R. Deepmod: Deep learning for model discovery in noisy data. J. Comput. Phys. 428, 109985 (2021).

Article
MathSciNet

Google Scholar

Pathak, D., Agrawal, P., Efros, A. A., & Darrell, T. Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pages 2778–2787. PMLR, (2017).

Loiseau, J.-C. & Brunton, S. L. Constrained sparse Galerkin regression. J. Fluid Mech. 838, 42–67 (2018).

Article
ADS
MathSciNet

Google Scholar

Otto, S. E., Zolman, N., Kutz, J. N., & Brunton, S. L. A unified framework to enforce, discover, and promote symmetry in machine learning. arXiv preprint arXiv:2311.00212, (2023).

Ahmadi, A. A. & El Khadir, B. Learning dynamical systems with side information. In Learning for Dynamics and Control, pages 718–727. PMLR, (2020).

Bramburger, J. J., Dahdah, S., & Forbes, J. R. Synthesizing control laws from data using sum-of-squares optimization. In 2024 IEEE Conference on Control Technology and Applications (CCTA), pages 505–510. IEEE, (2024).

Wolf, F., Botteghi, N., Fasel, U., & Manzoni, A. Interpretable and efficient data-driven discovery and control of distributed systems. arXiv preprint arXiv:2411.04098, (2024).

Bakarji, J., Champion, K., Kutz, J. N. & Brunton, S. L. Discovering governing equations from partial measurements with deep delay autoencoders. Proc. R. Soc. A 479, 20230422 (2023).

Article
ADS
MathSciNet

Google Scholar

Salimans, T., Ho, J., Chen, X., Sidor, S., & Sutskever, I. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864, (2017).

Lopez, B. T., Slotine, J.-J. E., & How, J. P. Dynamic tube MPC for nonlinear systems. In 2019 American Control Conference (ACC), pages 1655–1662. IEEE (2019).

Zolman, N., Lagemann, C., Fasel, U., Kutz, J. N. & Brunton, S. L. sindy-rl_data (revision d295c18), (2025).

Zolman, N., Lagemann, C., Fasel, U., Kutz, J. N. & Brunton, S. L. nzolman/sindy-rl: nat-comms-v1, September (2025).

Zolman, N., Lagemann, C., Fasel, U., Kutz, J. N. & Brunton, S. L. nzolman/sindy-rl_3dairfoil: nat-comms-v1, September (2025).

Source link