Discovering state-of-the-art reinforcement learning algorithms

Machine Learning


  • Kirsch, L., van Steenkiste, S. & Schmidhuber, J. Improving the generalization of meta-reinforcement learning using learned goals. in International Conference on Procedural Learning Representations (ICLR, 2020).

  • Kirsch, L. et al. Introducing symmetry to black-box meta-reinforcement learning. in procedure. AAAI Artificial Intelligence Conference 367202–7210 (Artificial Intelligence Promotion Association, 2022).

  • Oh, J. et al. Discovery of reinforcement learning algorithms. in procedure. Advanced neural information processes. system. 331060–1070 (NeurIPS, 2020).

  • Xu, Z. et al. Meta-gradient reinforcement learning with goals discovered online. in procedure. Advanced neural information processes. system. 3315254–15264 (NeurIPS, 2020).

  • Houtoft, R. et al. Evolved policy gradients. in procedure. Advanced neural information processes. system. 315405–5414 (NeurIPS, 2018).

  • Lu, C. et al. I discovered policy optimization. in procedure. Advanced neural information processes. system. 35, 16455–16468 (NeurIPS, 2022).

  • Silver, D. et al. Master Go using deep neural networks and tree search. nature 529484–489 (2016).

    Article ADS CAS PubMed Google Scholar

  • Schrittwieser, J. et al. Master Atari, Go, Chess, and Shogi by planning using learned models. nature 588604–609 (2020).

    Article ADS CAS PubMed Google Scholar

  • Vinyals, O. et al. Grandmaster Level StarCraft II Use multi-agent reinforcement learning. nature 575350–354 (2019).

    Article ADS CAS PubMed Google Scholar

  • Hafner, D., Pasukonis, J., Ba, J. & Lillicrap, T. Mastering diverse control tasks through world models. nature 640647–653 (2025).

    Article ADS CAS PubMed PubMed Central Google Scholar

  • Fawzi, A. et al. Discovering faster matrix multiplication algorithms using reinforcement learning. nature 61047–53 (2022).

    Article ADS CAS PubMed PubMed Central Google Scholar

  • Degrave, J. et al. Magnetic control of tokamak plasma using deep reinforcement learning. nature 602414–419 (2022).

    Article ADS CAS PubMed PubMed Central Google Scholar

  • Xu, Z., van Hasselt, HP & Silver, D. Metagradient reinforcement learning. in procedure. Advanced neural information processes. system. 312402–2413 (NeurIPS, 2018).

  • Zahavi, T. et al. Algorithms of self-regulating actors and critics. in procedure. Advanced neural information processes. system. 3320913–20924 (NeurIPS, 2020).

  • MT Jackson et al. Discover common reinforcement learning algorithms with adversarial environment design. in procedure. Advanced neural information processes. system. 3679980–79998 (NeurIPS, 2023).

  • Sutton, RS, Barth, AG Reinforcement Learning: Introduction (MIT Press, 2018).

  • CJ Watkins & Dayan P. Q-learn. Mach. learn. 8279–292 (1992).

    Google Scholar

  • Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O. Proximity policy optimization algorithm. Preprint available at https://arxiv.org/abs/1707.06347 (2017).

  • Jaderberg, M. et al. Reinforcement learning with unsupervised auxiliary tasks. in procedure. International Conference on Learning Representations (ICLR, 2017).

  • Barreto, A. et al. Successor functions of transfer in reinforcement learning. in procedure. Advanced neural information processes. system. 304055–4065 (NeurIPS, 2017).

  • Bellemare, MG, Dabney, W. & Munos, R. A distributional perspective on reinforcement learning. in procedure. International conference on machine learning 449–458 (PMLR, 2017).

  • Bellemare, MG, Naddaf, Y., Veness, J. & Bowling, M. Arcade learning environments: An evaluation platform for general agents. J. Artif. intelligence. resolution 47253–279 (2013).

    Article Google Scholar

  • Cobbe, K., Hesse, C., Hilton, J. & Schulman, J. Leveraging procedural generation to benchmark reinforcement learning. in procedure. International conference on machine learning 2048-2056 (PMLR, 2020).

  • Hochreiter, S. & Schmidhuber, J. Long short-term memory. neural computing. 91735–1780 (1997).

    Article CAS PubMed Google Scholar

  • Veeriah, V. et al. Finding questions that serve as auxiliary tasks. in procedure. Advanced neural information processes. system. 329306–9317 (NeurIPS, 2019).

  • Munos, R., Stepleton, T., Hartyunyan, A., Bellemare, M. Secure and efficient off-policy reinforcement learning. in procedure. Advanced neural information processes. system. 291054–1062 (NeurIPS, 2016).

  • Finn, C., Abbeel, P., Levine, S. Model-independent meta-learning for rapid adaptation of deep networks. in procedure. International conference on machine learning 701126–1135 (PMLR, 2017).

  • Mnih, V. et al. Asynchronous methods for deep reinforcement learning. in procedure. International conference on machine learning 481928-1937 (PMLR, 2016).

  • Agarwal, R., Schwarzer, M., Castro, PS, Courville, AC, Bellemare, M. Deep reinforcement learning at the statistical precipice. in procedure. Advanced neural information processes. system. 3429304–29320 (NeurIPS, 2021).

  • Kapturowski, S. et al. Human-level hit is 200 times faster. in procedure. International Conference on Learning Representations (ICLR, 2023).

  • Hafner, D. Benchmarking different features of agents. in procedure. International Conference on Learning Representations (ICLR, 2022).

  • Kutler, H. et al. nethack learning environment. in procedure. Advanced neural information processes. system. 337671–7684 (NeurIPS, 2020).

  • Hambro, E. et al. Insights from the NeurIPS 2021 NetHack Challenge. in Instructions NeurIPS 2021 Competition and Demonstration Track 41–52 (PMLR, 2022).

  • Espeholt, L. et al. IMPALA: Scalable distributed deep RL with importance-weighted actor-learner architecture. in procedure. International Conference on Learning Representations (ICLR, 2018).

  • Beattie, C. et al. DeepMind Institute preprint available at https://arxiv.org/abs/1612.03801 (2016).

  • Lacaniere, S. et al. Imagination-enhanced agents for deep reinforcement learning. in procedure. Advanced neural information processes. system. 305690–5701 (NeurIPS, 2017).

  • Schmidhuber, J. Evolutionary principles in self-referential learning, or learning how to learn: Meta-meta… Hook. PhD thesis at the Technical University of Munich (1987).

  • Schmidhuber, J. Potential implementation of curiosity and boredom in model-building neural controllers. in procedure. International Conference on Adaptive Behavioral Simulation: From Animals to Animats 222–227 (MIT Press, 1991).

  • Schmidhuber, J., Chao, J., Wheeling, M. Simple principles of metal learning. Report number IDSIA-69-96 (Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale, 1996).

  • Thrun, S. & Pratt, L. Learning to Learn: Overview and Overview 3-17 (Springer, 1998).

  • Pan, S.J., Yang, Q. A survey on transfer learning. IEEE Trans. Please know. data engineering twenty two1345–1359 (2009).

    Article Google Scholar

  • Parisi, GI, Kemker, R., Part, JL, Kanan, C. & Wermter, S. Continuous lifelong learning with neural networks: A review. Neural network. 11354–71 (2019).

    Article PubMed Google Scholar

  • Caruana, R. Multitasking learning. Mach. learn. 2841–75 (1997).

    Article Google Scholar

  • Führer, M. & Hitter, F. Hyperparameter optimization 3–33 (Springer, 2019).

  • Yao, Q. et al. Removing humans from learning applications: A survey of automated machine learning. Preprint available at https://www.arxiv.org/abs/1810.13306v3 (2018).

  • Storck, J. et al. Reinforcement-driven information acquisition in nondeterministic environments. in International Conference on Artificial Neural Networks 2159–164 (ICANN, 1995).

  • Duan, Y. et al. R.L.2: Fast reinforcement learning with slow reinforcement learning. Preprint available at https://arxiv.org/abs/1611.02779 (2016).

  • Niv, Y., Joel, D., Meilijson, I. & Ruppin, E. Evolution of reinforcement learning in uncertain environments: A simple explanation of complex foraging behavior. Adapt. behavior. 105–24 (2002).

  • Xiong, Z., Zintgraf, L., Beck, J., Vuorio, R., Whiteson, S. On the practical consistency of meta-reinforcement learning algorithms. Preprint at https://arxiv.org/abs/2112.00478 (2021).

  • Sutton, RS & Tanner, B. Time-lagged networks. in Procedure advanced neural information processing. system. 171377–1384 (NeurIPS, 2004).

  • Mnih, V., Kavukcuoglu, K., Silver, D. Human-level control using deep reinforcement learning. nature 518529–533 (2015).

  • Cobbe, K., Hilton, J., Klimov, O., and Schulman, J. Phase policy gradients. in procedure. International conference on machine learning 1392020-2027 (PMLR, 2021).

  • Hessel, M. et al. Rainbow: Combining improvements in deep reinforcement learning. in procedure. AAAI Artificial Intelligence Conference 323215–3222 (Artificial Intelligence Promotion Association, 2018).

  • Sutton, RS Learning to predict by the method of time differences. Mach. learn. 39–44 (1988).

  • Bradbury, J. et al. JAX: Composable transformation of Python+ NumPy programs. http://github.com/jax-ml/jax (2018).

  • Deepmind et al. DeepMind JAX Ecosystem. GitHub http://github.com/google-deepmind (2020).

  • Jouppi, NP et al. Performance analysis of tensor processing units within data centers. in Annual International Symposium on Procedural Computer Architecture 1–12 (ICSA, 2017).

  • Hessel, M. et al. Podracer architecture for scalable reinforcement learning. Preprint at https://arxiv.org/abs/2104.06272 (2021).

  • Kemaev, I., Calian, DA, Zintgraf, LM, Farquhar, G. & van Hasselt, H. Scalable meta-learning with mixed-mode differentiation. in procedure. International conference on machine learning 26729687–19605 (PMLR, 2025).



  • Source link