期刊名称:Journal of Data Analysis and Information Processing
印刷版ISSN:2327-7211
电子版ISSN:2327-7203
出版年度:2016
卷号:04
期号:04
页码:159-176
DOI:10.4236/jdaip.2016.44014
语种:English
出版社:Scientific Research Publishing
摘要:Double Q-learning has been shown to be effective in reinforcement learning scenarios when the reward system is stochastic. We apply the idea of double learning that this algorithm uses to Sarsa and Expected Sarsa, producing two new algorithms called Double Sarsa and Double Expected Sarsa that are shown to be more robust than their single counterparts when rewards are stochastic. We find that these algorithms add a significant amount of stability in the learning process at only a minor computational cost, which leads to higher returns when using an on-policy algorithm. We then use shallow and deep neural networks to approximate the actionvalue, and show that Double Sarsa and Double Expected Sarsa are much more stable after convergence and can collect larger rewards than the single versions.