Discovering state-of-the-art reinforcement learning algorithms

Kirsch, L., van Steenkiste, S. & Schmidhuber, J. Improving the generalization of meta-reinforcement learning using learned goals. in International Conference on Procedural Learning Representations (ICLR, 2020).

Kirsch, L. et al. Introducing symmetry to black-box meta-reinforcement learning. in procedure. AAAI Artificial Intelligence Conference 367202–7210 (Artificial Intelligence Promotion Association, 2022).

Oh, J. et al. Discovery of reinforcement learning algorithms. in procedure. Advanced neural information processes. system. 331060–1070 (NeurIPS, 2020).

Xu, Z. et al. Meta-gradient reinforcement learning with goals discovered online. in procedure. Advanced neural information processes. system. 3315254–15264 (NeurIPS, 2020).

Houtoft, R. et al. Evolved policy gradients. in procedure. Advanced neural information processes. system. 315405–5414 (NeurIPS, 2018).

Lu, C. et al. I discovered policy optimization. in procedure. Advanced neural information processes. system. 35, 16455–16468 (NeurIPS, 2022).

Silver, D. et al. Master Go using deep neural networks and tree search. nature 529484–489 (2016).

Article ADS CAS PubMed Google Scholar

Schrittwieser, J. et al. Master Atari, Go, Chess, and Shogi by planning using learned models. nature 588604–609 (2020).

Article ADS CAS PubMed Google Scholar

Vinyals, O. et al. Grandmaster Level StarCraft II Use multi-agent reinforcement learning. nature 575350–354 (2019).

Article ADS CAS PubMed Google Scholar

Hafner, D., Pasukonis, J., Ba, J. & Lillicrap, T. Mastering diverse control tasks through world models. nature 640647–653 (2025).

Article ADS CAS PubMed PubMed Central Google Scholar

Fawzi, A. et al. Discovering faster matrix multiplication algorithms using reinforcement learning. nature 61047–53 (2022).

Article ADS CAS PubMed PubMed Central Google Scholar

Degrave, J. et al. Magnetic control of tokamak plasma using deep reinforcement learning. nature 602414–419 (2022).

Article ADS CAS PubMed PubMed Central Google Scholar

Xu, Z., van Hasselt, HP & Silver, D. Metagradient reinforcement learning. in procedure. Advanced neural information processes. system. 312402–2413 (NeurIPS, 2018).

Zahavi, T. et al. Algorithms of self-regulating actors and critics. in procedure. Advanced neural information processes. system. 3320913–20924 (NeurIPS, 2020).

MT Jackson et al. Discover common reinforcement learning algorithms with adversarial environment design. in procedure. Advanced neural information processes. system. 3679980–79998 (NeurIPS, 2023).

Sutton, RS, Barth, AG Reinforcement Learning: Introduction (MIT Press, 2018).

CJ Watkins & Dayan P. Q-learn. Mach. learn. 8279–292 (1992).

Google Scholar

Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O. Proximity policy optimization algorithm. Preprint available at https://arxiv.org/abs/1707.06347 (2017).

Jaderberg, M. et al. Reinforcement learning with unsupervised auxiliary tasks. in procedure. International Conference on Learning Representations (ICLR, 2017).

Barreto, A. et al. Successor functions of transfer in reinforcement learning. in procedure. Advanced neural information processes. system. 304055–4065 (NeurIPS, 2017).

Bellemare, MG, Dabney, W. & Munos, R. A distributional perspective on reinforcement learning. in procedure. International conference on machine learning 449–458 (PMLR, 2017).

Bellemare, MG, Naddaf, Y., Veness, J. & Bowling, M. Arcade learning environments: An evaluation platform for general agents. J. Artif. intelligence. resolution 47253–279 (2013).

Article Google Scholar

Cobbe, K., Hesse, C., Hilton, J. & Schulman, J. Leveraging procedural generation to benchmark reinforcement learning. in procedure. International conference on machine learning 2048-2056 (PMLR, 2020).

Hochreiter, S. & Schmidhuber, J. Long short-term memory. neural computing. 91735–1780 (1997).

Article CAS PubMed Google Scholar

Veeriah, V. et al. Finding questions that serve as auxiliary tasks. in procedure. Advanced neural information processes. system. 329306–9317 (NeurIPS, 2019).

Munos, R., Stepleton, T., Hartyunyan, A., Bellemare, M. Secure and efficient off-policy reinforcement learning. in procedure. Advanced neural information processes. system. 291054–1062 (NeurIPS, 2016).

Finn, C., Abbeel, P., Levine, S. Model-independent meta-learning for rapid adaptation of deep networks. in procedure. International conference on machine learning 701126–1135 (PMLR, 2017).

Mnih, V. et al. Asynchronous methods for deep reinforcement learning. in procedure. International conference on machine learning 481928-1937 (PMLR, 2016).

Agarwal, R., Schwarzer, M., Castro, PS, Courville, AC, Bellemare, M. Deep reinforcement learning at the statistical precipice. in procedure. Advanced neural information processes. system. 3429304–29320 (NeurIPS, 2021).

Kapturowski, S. et al. Human-level hit is 200 times faster. in procedure. International Conference on Learning Representations (ICLR, 2023).

Hafner, D. Benchmarking different features of agents. in procedure. International Conference on Learning Representations (ICLR, 2022).

Kutler, H. et al. nethack learning environment. in procedure. Advanced neural information processes. system. 337671–7684 (NeurIPS, 2020).

Hambro, E. et al. Insights from the NeurIPS 2021 NetHack Challenge. in Instructions NeurIPS 2021 Competition and Demonstration Track 41–52 (PMLR, 2022).

Espeholt, L. et al. IMPALA: Scalable distributed deep RL with importance-weighted actor-learner architecture. in procedure. International Conference on Learning Representations (ICLR, 2018).

Beattie, C. et al. DeepMind Institute preprint available at https://arxiv.org/abs/1612.03801 (2016).

Lacaniere, S. et al. Imagination-enhanced agents for deep reinforcement learning. in procedure. Advanced neural information processes. system. 305690–5701 (NeurIPS, 2017).

Schmidhuber, J. Evolutionary principles in self-referential learning, or learning how to learn: Meta-meta… Hook. PhD thesis at the Technical University of Munich (1987).

Schmidhuber, J. Potential implementation of curiosity and boredom in model-building neural controllers. in procedure. International Conference on Adaptive Behavioral Simulation: From Animals to Animats 222–227 (MIT Press, 1991).

Schmidhuber, J., Chao, J., Wheeling, M. Simple principles of metal learning. Report number IDSIA-69-96 (Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale, 1996).

Thrun, S. & Pratt, L. Learning to Learn: Overview and Overview 3-17 (Springer, 1998).

Pan, S.J., Yang, Q. A survey on transfer learning. IEEE Trans. Please know. data engineering twenty two1345–1359 (2009).

Article Google Scholar

Parisi, GI, Kemker, R., Part, JL, Kanan, C. & Wermter, S. Continuous lifelong learning with neural networks: A review. Neural network. 11354–71 (2019).

Article PubMed Google Scholar

Caruana, R. Multitasking learning. Mach. learn. 2841–75 (1997).

Article Google Scholar

Führer, M. & Hitter, F. Hyperparameter optimization 3–33 (Springer, 2019).

Yao, Q. et al. Removing humans from learning applications: A survey of automated machine learning. Preprint available at https://www.arxiv.org/abs/1810.13306v3 (2018).

Storck, J. et al. Reinforcement-driven information acquisition in nondeterministic environments. in International Conference on Artificial Neural Networks 2159–164 (ICANN, 1995).

Duan, Y. et al. R.L.²: Fast reinforcement learning with slow reinforcement learning. Preprint available at https://arxiv.org/abs/1611.02779 (2016).

Niv, Y., Joel, D., Meilijson, I. & Ruppin, E. Evolution of reinforcement learning in uncertain environments: A simple explanation of complex foraging behavior. Adapt. behavior. 105–24 (2002).

Xiong, Z., Zintgraf, L., Beck, J., Vuorio, R., Whiteson, S. On the practical consistency of meta-reinforcement learning algorithms. Preprint at https://arxiv.org/abs/2112.00478 (2021).

Sutton, RS & Tanner, B. Time-lagged networks. in Procedure advanced neural information processing. system. 171377–1384 (NeurIPS, 2004).

Mnih, V., Kavukcuoglu, K., Silver, D. Human-level control using deep reinforcement learning. nature 518529–533 (2015).

Cobbe, K., Hilton, J., Klimov, O., and Schulman, J. Phase policy gradients. in procedure. International conference on machine learning 1392020-2027 (PMLR, 2021).

Hessel, M. et al. Rainbow: Combining improvements in deep reinforcement learning. in procedure. AAAI Artificial Intelligence Conference 323215–3222 (Artificial Intelligence Promotion Association, 2018).

Sutton, RS Learning to predict by the method of time differences. Mach. learn. 39–44 (1988).

Bradbury, J. et al. JAX: Composable transformation of Python+ NumPy programs. http://github.com/jax-ml/jax (2018).

Deepmind et al. DeepMind JAX Ecosystem. GitHub http://github.com/google-deepmind (2020).

Jouppi, NP et al. Performance analysis of tensor processing units within data centers. in Annual International Symposium on Procedural Computer Architecture 1–12 (ICSA, 2017).

Hessel, M. et al. Podracer architecture for scalable reinforcement learning. Preprint at https://arxiv.org/abs/2104.06272 (2021).

Kemaev, I., Calian, DA, Zintgraf, LM, Farquhar, G. & van Hasselt, H. Scalable meta-learning with mixed-mode differentiation. in procedure. International conference on machine learning 26729687–19605 (PMLR, 2025).

Source link