首页    期刊浏览 2024年10月05日 星期六
登录注册

文章基本信息

  • 标题:Bridging the gap between QP-based and MPC-based Reinforcement Learning
  • 本地全文:下载
  • 作者:Shambhuraj Sawant ; Sebastien Gros
  • 期刊名称:IFAC PapersOnLine
  • 印刷版ISSN:2405-8963
  • 出版年度:2022
  • 卷号:55
  • 期号:15
  • 页码:7-12
  • DOI:10.1016/j.ifacol.2022.07.600
  • 语种:English
  • 出版社:Elsevier
  • 摘要:AbstractReinforcement learning methods typically use Deep Neural Networks to approximate the value functions and policies underlying a Markov Decision Process. Unfortunately, DNN-based RL suffers from a lack of explainability of the resulting policy. In this paper, we instead approximate the policy and value functions using an optimization problem, taking the form of Quadratic Programs (QPs). We propose simple tools to promote structures in the QP, pushing it to resemble a linear MPC scheme. A generic unstructured QP offers high flexibility for learning, while a QP having the structure of an MPC scheme promotes the explainability of the resulting policy, additionally provides ways for its analysis. The tools we propose allow for continuously adjusting the trade-off between the former and the latter during learning. We illustrate the workings of our proposed method with the resulting structure using a point-mass task.
  • 关键词:KeywordsQuadratic ProgrammingReinforcement LearningModel Predictive Control
国家哲学社会科学文献中心版权所有