文章基本信息

标题：The Cross-Entropy Method for Policy Search in Decentralized POMDPs
本地全文：下载
作者：Frans A. Oliehoek ; Julian F.P. Kooij ; Nikos Vlassis 等
期刊名称：Informatica
印刷版ISSN：1514-8327
电子版ISSN：1854-3871
出版年度：2008
卷号：32
期号：4
出版社：The Slovene Society Informatika, Ljubljana
摘要：DecentralizedPOMDPs (Dec-POMDPs)are becomingincreasinglypopularas modelsformultiagentplan- ning under uncertainty,but solving a Dec-POMDP exactly is known to be an intractable combinatorial op- timization problem. In this paper we apply the Cross-Entropy (CE) method, a recently introduced method for combinatorial optimization, to Dec-POMDPs, resulting in a randomized (sampling-based) algorithm for approximately solving Dec-POMDPs. This algorithm operates by sampling pure policies from an ap- propriatelyparametrizedstochasticpolicy,andthenevaluatesthesepolicieseitherexactlyorapproximately in order to define the next stochastic policy to sample from, and so on until convergence. Experimental results demonstrate that the CE method can search huge spaces efficiently, supporting our claim that com- binatorial optimization methods can bring leverage to the approximate solution of Dec-POMDPs.
关键词：multiagent planning; decentralized POMDPs; combinatorial optimization