文章基本信息

标题：A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning
本地全文：下载
作者：Wesley Suttle ; Zhuoran Yang ; Kaiqing Zhang 等
期刊名称：IFAC PapersOnLine
印刷版ISSN：2405-8963
出版年度：2020
卷号：53
期号：2
页码：1549-1554
DOI：10.1016/j.ifacol.2020.12.2021
语种：English
出版社：Elsevier
摘要：AbstractThis paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy while following a distinct behavior policy. To this end, the paper develops a multi-agent version of emphatic temporal difference learning for off-policy policy evaluation, and proves convergence under linear function approximation. The paper then leverages this result, in conjunction with a novel multi-agent off-policy policy gradient theorem and recent work in both multi-agent on-policy and single-agent off-policy actor-critic methods, to develop and give convergence guarantees for a new multi-agent off-policy actor-critic algorithm. An empirical validation of these theoretical results is given.
关键词：Keywordsconsensusreinforcement learning controladaptive control of multi-agent systems