首页    期刊浏览 2025年07月11日 星期五
登录注册

文章基本信息

  • 标题:複数環境におけるエキスパート軌跡を用いたベイジアン逆強化学習
  • 本地全文:下载
  • 作者:中田 勇介 ; 荒井 幸代
  • 期刊名称:人工知能学会論文誌
  • 印刷版ISSN:1346-0714
  • 电子版ISSN:1346-8030
  • 出版年度:2020
  • 卷号:35
  • 期号:1
  • 页码:1-10
  • DOI:10.1527/tjsai.G-J73
  • 出版社:The Japanese Society for Artificial Intelligence
  • 摘要:

    Though a reinforcement learning framework has numerous achievements, it requires a careful shaping of a reward function that represents the objective of a task. There is a class of task in which an expert could demonstrate the optimal way of doing, but it is difficult to design a proper reward function. For these tasks, an inverse reinforcement learning approach seems useful because it makes it possible to estimates a reward function from expert’s demonstrations. Most existing inverse reinforcement learning algorithms assume that an expert gives demonstrations in a unique environment. However, an expert also could provide demonstrations of tasks within other environments of which have a specific objective function. For example, though it is hard to represent objective explicitly for a driving task, the driver could give demonstrations under multiple situations. In such cases, it is natural to utilize these demonstrations in multiple environments to estimate expert’s reward functions. We formulate this problem as Bayesian Inverse Reinforcement Learning problem and propose a Markov Chain Monte Carlo method for the problem. Experimental results show that the proposed method quantitatively overperforms existing methods.

  • 关键词:inverse reinforcement learning;reinforcement learning;Bayesian inference;Markov decision processes
国家哲学社会科学文献中心版权所有