首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:Distributed Gradient Temporal Difference Off-policy Learning With Eligibility Traces: Weak Convergence
  • 本地全文:下载
  • 作者:Miloš S. Stanković ; Marko Beko ; Srdjan S. Stanković
  • 期刊名称:IFAC PapersOnLine
  • 印刷版ISSN:2405-8963
  • 出版年度:2020
  • 卷号:53
  • 期号:2
  • 页码:1563-1568
  • DOI:10.1016/j.ifacol.2020.12.2184
  • 语种:English
  • 出版社:Elsevier
  • 摘要:AbstractIn this paper we propose two novel distributed algorithms for multi-agent off-policy learning of linear approximation of the value function in Markov decision processes. The algorithms differ in the way of how distributed consensus iterations are incorporated in a basic, recently proposed, single agent scheme. The proposed completely decentralized off-policy learning schemes subsume local eligibility traces, and allow applications in which all the agents may have different behavior policies while evaluating a single target policy. Under nonrestrictive assumptions on the time-varying network topology and the individual state-visiting distributions of the agents, we prove that the parameter estimates of the algorithms weakly converge to a consensus. The variance reduction properties of the proposed algorithms are demonstrated. We also formulate specific guidelines on how to design the network weights and topology. The results are illustrated using simulations.
  • 关键词:KeywordsReinforcement learningDistributed consensusValue function approximationConvergenceEligibility tracesOff-policy learningWeak convergenceMulti-agent systems
国家哲学社会科学文献中心版权所有