出版社:The Japanese Society for Artificial Intelligence
摘要:This paper describes a reinforcement learning framework based on compound returns, which is called compound reinforcement learning. Compound reinforcement learning maximizes the compound return in returns-based MDPs. We also describe compound Q-learning algorithm. We present experimental results using an ilustrative example, 2-armed bandit.
关键词:reinforecement learning ; value functions ; compound returns ; Q-learning