文章基本信息

标题：An Extention of Profit Sharing to Partially Observable Markov Decision Processes : Proposition of PS-r* and its Evaluation
本地全文：下载
作者：Kazuteru Miyazaki ; Shigenobu Kobayashi
期刊名称：人工知能学会論文誌
印刷版ISSN：1346-0714
电子版ISSN：1346-8030
出版年度：2003
卷号：18
期号：5
页码：286-296
DOI：10.1527/tjsai.18.286
出版社：The Japanese Society for Artificial Intelligence
摘要：We know the rationality theorem of Profit Sharing (PS) [Miyazaki 94, Miyazaki 99b] and the Rational Policy Making algorithm (RPM) [Miyazaki 99a] to guarantee the rationality in a typical class of Partially Observable Markov Decision Processes (POMDPs). In this paper, we focus on the whole class of POMDPs and propose PS-r that is an algorithm connected PS and RPM with random selection. In the first, we have analyzed the behavior of PS-r. We have derived that the maximum value of the step to get a reward by PS-r divided by that of random selection is $({\Large r\frac{(1+\frac{M-1}{r})^n}{M^n}})$ where $n$ is the maximum number of state that senses same state due to the agent's sensory limitation and M is the number of actions. Furthermore, we propose PS-r* that can improve the behavior of PS-r. Through numerical examples, we conform the effectiveness of PS-r*.
关键词：reinforcement learning ; profit sharing ; rational policy making algorithm ; POMDPs ; theorem