文章基本信息

标题：TD Algorithm for the Variance of Return and Mean-Variance Reinforcement Learning
本地全文：下载
作者：Makoto Sato ; Hajime Kimura ; Shibenobu Kobayashi 等
期刊名称：人工知能学会論文誌
印刷版ISSN：1346-0714
电子版ISSN：1346-8030
出版年度：2001
卷号：16
期号：3
页码：353-362
DOI：10.1527/tjsai.16.353
出版社：The Japanese Society for Artificial Intelligence
摘要：Estimating probability distributions on returns provides various sophisticated decision making schemes for control problems in Markov environments, including risk-sensitive control, efficient exploration of environments and so on. Many reinforcement learning algorithms, however, have simply relied on the expected return. This paper provides a scheme of decision making using mean and variance of returndistributions. This paper presents a TD algorithm for estimating the variance of return in MDP(Markov decision processes) environments and a gradient-based reinforcement learning algorithm on the variance penalized criterion, which is a typical criterion in risk-avoiding control. Empirical results demonstrate behaviors of the algorithms and validates of the criterion for risk-avoiding sequential decision tasks.
关键词：reinforcement learning ; Markov decision processes ; variance penalized criteria ; gradient-based learning, ; machine mainenance problem ; TD-method