Maximum diffusion reinforcement learning | Nature Machine Intelligence

Machine Learning


  • Degrave, J. et al. Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602, 414–419 (2022).

    Article 

    Google Scholar 

  • Won, D.-O., Müller, K.-R. & Lee, S.-W. An adaptive deep reinforcement learning framework enables curling robots with human-like performance in real-world conditions. Sci. Robot. 5, eabb9764 (2020).

    Article 

    Google Scholar 

  • Irpan, A. Deep reinforcement learning doesn’t work yet. Sorta Insightful www.alexirpan.com/2018/02/14/rl-hard.html (2018).

  • Henderson, P. et al. Deep reinforcement learning that matters. In Proc. 32nd AAAI Conference on Artificial Intelligence (eds McIlraith, S. & Weinberger, K.) 3207–3214 (AAAI, 2018).

  • Ibarz, J. et al. How to train your robot with deep reinforcement learning: lessons we have learned. Int. J. Rob. Res. 40, 698–721 (2021).

    Article 

    Google Scholar 

  • Lillicrap, T. P. et al. Proc. 4th International Conference on Learning Representations (ICLR, 2016).

  • Haarnoja, T., Zhou, A., Abbeel, P. & Levine, S. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 1861–1870 (PMLR, 2018).

  • Plappert, M. et al. Proc. 6th International Conference on Learning Representations (ICLR, 2018).

  • Lin, L.-J. Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach. Learn. 8, 293–321 (1992).

    Article 

    Google Scholar 

  • Schaul, T., Quan, J., Antonoglou, I. & Silver, D. Proc. 4th International Conference on Learning Representations (ICLR, 2016).

  • Andrychowicz, M. et al. Hindsight experience replay. In Proc. Advances in Neural Information Processing Systems 30 (eds Guyon, I. et al.) 5049–5059 (Curran Associates, 2017).

  • Zhang, S. & Sutton, R. S. A deeper look at experience replay. Preprint at https://arxiv.org/abs/1712.01275 (2017).

  • Wang, Z. et al. Proc. 5th International Conference on Learning Representations (ICLR, 2017).

  • Hessel, M. et al. Rainbow: combining improvements in deep reinforcement learning. In Proc. 32nd AAAI Conference on Artificial Intelligence (eds McIlraith, S. and Weinberger, K.) 3215–3222 (AAAI Press, 2018).

  • Fedus, W. et al. Revisiting fundamentals of experience replay. In Proc. 37th International Conference on Machine Learning (eds Daumé III, H. & Singh, A.) 3061–3071 (JMLR.org, 2020).

  • Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

    Article 

    Google Scholar 

  • Ziebart, B. D., Maas, A. L., Bagnell, J. A. & Dey, A. K. Maximum entropy inverse reinforcement learning. In Proc. 23rd AAAI Conference on Artificial Intelligence (ed. Cohn, A.) 1433–1438 (AAAI, 2008).

  • Ziebart, B. D., Bagnell, J. A. & Dey, A. K. Modeling interaction via the principle of maximum causal entropy. In Proc. 27th International Conference on Machine Learning (eds Fürnkranz, J. & Joachims, T.) 1255–1262 (Omnipress, 2010).

  • Ziebart, B. D. Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy. PhD thesis, Carnegie Mellon Univ. (2010).

  • Todorov, E. Efficient computation of optimal actions. Proc. Natl Acad. Sci. USA 106, 11478–11483 (2009).

    Article 

    Google Scholar 

  • Toussaint, M. Robot trajectory optimization using approximate inference. In Proc. 26th International Conference on Machine Learning (eds Bottou, L. & Littman, M.) 1049–1056 (ACM, 2009).

  • Rawlik, K., Toussaint, M. & Vijayakumar, S. On stochastic optimal control and reinforcement learning by approximate inference. In Proc. Robotics: Science and Systems VIII (eds Roy, N. et al.) 353–361 (MIT, 2012).

  • Levine, S. & Koltun, V. Guided policy search. In Proc. 30th International Conference on Machine Learning (eds Dasgupta, S. & McAllester, D.) 1–9 (JMLR.org, 2013).

  • Haarnoja, T., Tang, H., Abbeel, P. & Levine, S. Reinforcement learning with deep energy-based policies. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 1352–1361 (JMLR.org, 2017).

  • Haarnoja, T. et al. Learning to walk via deep reinforcement learning. In Proc. Robotics: Science and Systems XV (eds Bicchi, A. et al.) (RSS, 2019).

  • Eysenbach, B. & Levine, S. Proc. 10th International Conference on Learning Representations (ICLR, 2022).

  • Chen, M. et al. Top-K off-policy correction for a REINFORCE recommender system. In Proc. 12th ACM International Conference on Web Search and Data Mining (eds Bennett, P. N. & Lerman, K.) 456–464 (ACM, 2019).

  • Afsar, M. M., Crump, T. & Far, B. Reinforcement learning based recommender systems: a survey. ACM Comput. Surv. 55, 1–38 (2022).

    Article 

    Google Scholar 

  • Chen, X., Yao, L., McAuley, J., Zhou, G. & Wang, X. Deep reinforcement learning in recommender systems: a survey and new perspectives. Knowl. Based Syst. 264, 110335 (2023).

    Article 

    Google Scholar 

  • Sontag, E. D. Mathematical Control Theory: Deterministic Finite Dimensional Systems (Springer, 2013).

  • Hespanha, J. P. Linear Systems Theory 2nd edn (Princeton Univ. Press, 2018).

  • Mitra, D. Wmatrix and the geometry of model equivalence and reduction. Proc. Inst. Electr. Eng. 116, 1101–1106 (1969).

    Article 
    MathSciNet 

    Google Scholar 

  • Dean, S., Mania, H., Matni, N., Recht, B. & Tu, S. On the sample complexity of the linear quadratic regulator. Found. Comput. Math. 20, 633–679 (2020).

    Article 
    MathSciNet 

    Google Scholar 

  • Tsiamis, A. & Pappas, G. J. Linear systems can be hard to learn. In Proc. 60th IEEE Conference on Decision and Control (ed. Prandini, M.) 2903–2910 (IEEE, 2021).

  • Tsiamis, A., Ziemann, I. M., Morari, M., Matni, N. & Pappas, G. J. Learning to control linear systems can be hard. In Proc. 35th Conference on Learning Theory (eds Loh, P.-L. & Raginsky, M.) 3820–3857 (PMLR, 2022).

  • Williams, G. et al. Information theoretic MPC for model-based reinforcement learning. In Proc. IEEE International Conference on Robotics and Automation (ed. Nakamura, Y.) 1714–1721 (IEEE, 2017).

  • So, O., Wang, Z. & Theodorou, E. A. Maximum entropy differential dynamic programming. In Proc. IEEE International Conference on Robotics and Automation (ed. Kress-Gazit, H.) 3422–3428 (IEEE, 2022).

  • Thrun, S. B. Efficient Exploration in Reinforcement Learning. Technical report (Carnegie Mellon Univ., 1992).

  • Amin, S., Gomrokchi, M., Satija, H., van Hoof, H. & Precup, D. A survey of exploration methods in reinforcement learning. Preprint at https://arXiv.org/2109.00157 (2021).

  • Jaynes, E. T. Information theory and statistical mechanics. Phys. Rev. 106, 620–630 (1957).

    Article 
    MathSciNet 

    Google Scholar 

  • Dixit, P. D. et al. Perspective: maximum caliber is a general variational principle for dynamical systems. J. Chem. Phys. 148, 010901 (2018).

    Article 

    Google Scholar 

  • Chvykov, P. et al. Low rattling: a predictive principle for self-organization in active collectives. Science 371, 90–95 (2021).

    Article 
    MathSciNet 

    Google Scholar 

  • Kapur, J. N. Maximum Entropy Models in Science and Engineering (Wiley, 1989).

  • Moore, C. C. Ergodic theorem, ergodic theory, and statistical mechanics. Proc. Natl Acad. Sci. USA 112, 1907–1911 (2015).

    Article 
    MathSciNet 

    Google Scholar 

  • Taylor, A. T., Berrueta, T. A. & Murphey, T. D. Active learning in robotics: a review of control principles. Mechatronics 77, 102576 (2021).

    Article 

    Google Scholar 

  • Seo, Y. et al. State entropy maximization with random encoders for efficient exploration. In Proc. 38th International Conference on Machine Learning, Virtual (eds Meila, M. & Zhang, T.) 9443–9454 (ICML, 2021).

  • Prabhakar, A. & Murphey, T. Mechanical intelligence for learning embodied sensor-object relationships. Nat. Commun. 13, 4108 (2022).

    Article 

    Google Scholar 

  • Chentanez, N., Barto, A. & Singh, S. Intrinsically motivated reinforcement learning. In Proc. Advances in Neural Information Processing Systems 17 (eds Saul, L. et al.) 1281–1288 (MIT, 2004).

  • Pathak, D., Agrawal, P., Efros, A. A. & Darrell, T. Curiosity-driven exploration by self-supervised prediction. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 2778–2787 (JLMR.org, 2017).

  • Taiga, A. A., Fedus, W., Machado, M. C., Courville, A. & Bellemare, M. G. Proc. 8th International Conference on Learning Representations (ICLR, 2020).

  • Wang, X., Deng, W. & Chen, Y. Ergodic properties of heterogeneous diffusion processes in a potential well. J. Chem. Phys. 150, 164121 (2019).

    Article 

    Google Scholar 

  • Palmer, R. G. Broken ergodicity. Adv. Phys. 31, 669–735 (1982).

    Article 

    Google Scholar 

  • Islam, R., Henderson, P., Gomrokchi, M. & Precup, D. Reproducibility of benchmarked deep reinforcement learning tasks for continuous control. Preprint at https://arXiv.org/1708.04133 (2017).

  • Moos, J. et al. Robust reinforcement learning: a review of foundations and recent advances. Mach. Learn. Knowl. Extr. 4, 276–315 (2022).

    Article 

    Google Scholar 

  • Strehl, A. L., Li, L., Wiewiora, E., Langford, J. & Littman, M. L. PAC model-free reinforcement learning. In Proc. 23rd International Conference on Machine Learning (eds Cohen, W. W. & Moore, A.) 881–888 (ICML, 2006).

  • Strehl, A. L., Li, L. & Littman, M. L. Reinforcement learning in finite MDPs: PAC analysis. J. Mach. Learn. Res. 10, 2413–2444 (2009).

  • Kirk, R., Zhang, A., Grefenstette, E. & Rocktäaschel, T. A survey of zero-shot generalisation in deep reinforcement learning. J. Artif. Intell. Res. 76, 201–264 (2023).

    Article 
    MathSciNet 

    Google Scholar 

  • Oh, J., Singh, S., Lee, H. & Kohli, P. Zero-shot task generalization with multi-task deep reinforcement learning. In Proc. 34th International Conference on Machine Learning (eds Precup, D. & Teh, Y. W.) 2661–2670 (JLMR.org, 2017).

  • Krakauer, J. W., Hadjiosif, A. M., Xu, J., Wong, A. L. & Haith, A. M. Motor learning. Compr. Physiol. 9, 613–663 (2019).

  • Lu, K., Grover, A., Abbeel, P. & Mordatch, I. Proc. 9th International Conference on Learning Representations (ICLR, 2021).

  • Chen, A., Sharma, A., Levine, S. & Finn, C. You only live once: single-life reinforcement learning. In Proc. Advances in Neural Information Processing Systems 35 (eds Koyejo, S. et al.) 14784–14797 (NeurIPS, 2022).

  • Ames, A., Grizzle, J. & Tabuada, P. Control barrier function based quadratic programs with application to adaptive cruise control. In Proc. 53rd IEEE Conference on Decision and Control 6271–6278 (IEEE, 2014).

  • Taylor, A., Singletary, A., Yue, Y. & Ames, A. Learning for safety-critical control with control barrier functions. In Proc. 2nd Conference on Learning for Dynamics and Control (eds Bayen, A. et al.) 708–717 (PLMR, 2020).

  • Xiao, W. et al. BarrierNet: differentiable control barrier functions for learning of safe robot control. IEEE Trans. Robot. 39, 2289–2307 (2023).

  • Seung, H. S., Sompolinsky, H. & Tishby, N. Statistical mechanics of learning from examples. Phys. Rev. A 45, 6056–6091 (1992).

    Article 
    MathSciNet 

    Google Scholar 

  • Chen, C., Murphey, T. D. & MacIver, M. A. Tuning movement for sensing in an uncertain world. eLife 9, e52371 (2020).

    Article 

    Google Scholar 

  • Song, S. et al. Deep reinforcement learning for modeling human locomotion control in neuromechanical simulation. J. Neuroeng. Rehabil. 18, 126 (2021).

    Article 

    Google Scholar 

  • Berrueta, T. A., Murphey, T. D. & Truby, R. L. Materializing autonomy in soft robots across scales. Adv. Intell. Syst. 6, 2300111 (2024).

    Article 

    Google Scholar 

  • Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (MIT, 2018).

  • Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).

    Article 

    Google Scholar 

  • Berrueta, T. A., Pinosky, A. & Murphey, T. D. Maximum diffusion reinforcement learning repository. Zenodo https://doi.org/10.5281/zenodo.10723320 (2024).



  • Source link

    Leave a Reply

    Your email address will not be published. Required fields are marked *