期刊名称:Journal of Automation, Mobile Robotics & Intelligent Systems (JAMRIS)
印刷版ISSN:1897-8649
电子版ISSN:2080-2145
出版年度:2006
卷号:25
页码:17-74
出版社:Industrial Research Inst. for Automation and Measurements, Warsaw
摘要:A decision process in which rewards depend on history rather than merely on the
current state is called a decision process with non-Markovian rewards (NMRDP).
In decision-theoretic planning, where many desirable behaviours are more
naturally expressed as properties of execution sequences rather than as
properties of states, NMRDPs form a more natural model than the commonly adopted
fully Markovian decision process (MDP) model. While the more tractable solution
methods developed for MDPs do not directly apply in the presence of
non-Markovian rewards, a number of solution methods for NMRDPs have been
proposed in the literature. These all exploit a compact specification of the
non-Markovian reward function in temporal logic, to automatically translate the
NMRDP into an equivalent MDP which is solved using efficient MDP solution
methods. This paper presents NMRDPP (Non-Markovian Reward Decision Process
Planner), a software platform for the development and experimentation of methods
for decision-theoretic planning with non-Markovian rewards. The current version
of NMRDPP implements, under a single interface, a family of methods based on
existing as well as new approaches which we describe in detail. These include
dynamic programming, heuristic search, and structured methods. Using NMRDPP, we
compare the methods and identify certain problem features that affect their
performance. NMRDPP's treatment of non-Markovian rewards is inspired by the
treatment of domain-specific search control knowledge in the TLPlan planner,
which it incorporates as a special case. In the First International
Probabilistic Planning Competition, NMRDPP was able to compete and perform well
in both the domain-independent and hand-coded tracks, using search control
knowledge in the latter.