期刊名称:Electronic Colloquium on Computational Complexity
印刷版ISSN:1433-8092
出版年度:2021
卷号:21
语种:English
出版社:Universität Trier, Lehrstuhl für Theoretische Computer-Forschung
摘要:Similar to the role of Markov decision processes in reinforcement learning, Markov Games (also called Stochastic Games)lay down the foundation for the study of multi-agent reinforcement learning (MARL) and sequential agent interactions. In this paper, we introduce the solution concept, approximate Markov Perfect Equilibrium (MPE), to finite-state Stochastic Games repeated in the infinite horizon, and prove its PPAD-completeness in computational complexity. Technically, we adopt a function with a polynomially bounded description in the strategy space to convert the MPE computation to a fixed-point problem, even though the stochastic game may demand an exponential number of pure strategies, in the number of states, for each agent. The completeness result follows the reduction of the fixed-point problem to \textsc{End of the Line}. Past works on the stochastic games are mostly zero-sum MARL algorithms. A PPPAD algorithm for the general sum stochastic games in the finite horizon can be derived to establish an approximation algorithm for the general-sum stochastic games. That implies an approximate NE solution to the infinite-horizon setting. Such a possible extension suffers from three weaknesses: 1. the solution is time-dependent and hence not a perfect equilibrium; 2. the time-dependent solution suffers a weakness of noncredible threats; 3. the time complexity is not tight (lower bound PPAD and upper bound PPPAD). Our result beats such a solution in all those three properties.
关键词:Markov Perfect Equilibrium;PPAD-Completeness;Stochastic Games