首页    期刊浏览 2024年11月26日 星期二
登录注册

文章基本信息

  • 标题:TD Algorithm for the Variance of Return and Mean-Variance Reinforcement Learning
  • 本地全文:下载
  • 作者:Makoto Sato ; Hajime Kimura ; Shibenobu Kobayashi
  • 期刊名称:人工知能学会論文誌
  • 印刷版ISSN:1346-0714
  • 电子版ISSN:1346-8030
  • 出版年度:2001
  • 卷号:16
  • 期号:3
  • 页码:353-362
  • DOI:10.1527/tjsai.16.353
  • 出版社:The Japanese Society for Artificial Intelligence
  • 摘要:Estimating probability distributions on returns provides various sophisticated decision making schemes for control problems in Markov environments, including risk-sensitive control, efficient exploration of environments and so on. Many reinforcement learning algorithms, however, have simply relied on the expected return. This paper provides a scheme of decision making using mean and variance of returndistributions. This paper presents a TD algorithm for estimating the variance of return in MDP(Markov decision processes) environments and a gradient-based reinforcement learning algorithm on the variance penalized criterion, which is a typical criterion in risk-avoiding control. Empirical results demonstrate behaviors of the algorithms and validates of the criterion for risk-avoiding sequential decision tasks.
  • 关键词:reinforcement learning ; Markov decision processes ; variance penalized criteria ; gradient-based learning, ; machine mainenance problem ; TD-method
国家哲学社会科学文献中心版权所有