文章基本信息

标题：Distributed Gradient Temporal Difference Off-policy Learning With Eligibility Traces: Weak Convergence
本地全文：下载
作者：Miloš S. Stanković ; Marko Beko ; Srdjan S. Stanković 等
期刊名称：IFAC PapersOnLine
印刷版ISSN：2405-8963
出版年度：2020
卷号：53
期号：2
页码：1563-1568
DOI：10.1016/j.ifacol.2020.12.2184
语种：English
出版社：Elsevier
摘要：AbstractIn this paper we propose two novel distributed algorithms for multi-agent off-policy learning of linear approximation of the value function in Markov decision processes. The algorithms differ in the way of how distributed consensus iterations are incorporated in a basic, recently proposed, single agent scheme. The proposed completely decentralized off-policy learning schemes subsume local eligibility traces, and allow applications in which all the agents may have different behavior policies while evaluating a single target policy. Under nonrestrictive assumptions on the time-varying network topology and the individual state-visiting distributions of the agents, we prove that the parameter estimates of the algorithms weakly converge to a consensus. The variance reduction properties of the proposed algorithms are demonstrated. We also formulate specific guidelines on how to design the network weights and topology. The results are illustrated using simulations.
关键词：KeywordsReinforcement learningDistributed consensusValue function approximationConvergenceEligibility tracesOff-policy learningWeak convergenceMulti-agent systems