文章基本信息

标题：並列座標降下法によるマルチエージェント逆強化学習の学習速度改善
本地全文：下载
作者：浪越圭一 ; 荒井幸代
期刊名称：人工知能学会論文誌
印刷版ISSN：1346-0714
电子版ISSN：1346-8030
出版年度：2021
卷号：36
期号：5
页码：1-9
DOI：10.1527/tjsai.36-5_AG21-B
语种：Japanese
出版社：The Japanese Society for Artificial Intelligence
摘要：Multi-agent inverse reinforcement learning (MAIRL) is a framework for inferring expert agents’ reward functions from observed trajectories in a Markov game. MAIRL consists of two steps: the calculation of the optimal policy for reward and the update of reward based on the difference between the calculated policy and the expert trajectory. The former becomes a bottleneck because it is a multi-agent reinforcement learning (MARL) problem, which causes the non-stationary problem. Avoiding this problem, we propose the parallel coordinate descent method based MAIRL, which is an extension of maximum discounted causal entropy inverse reinforcement learning to theMarkov game. A previous method that uses coordinate descent updates one agent’s reward and policy at a time when other agents’ policies are fixed. On the other hand, the proposed method updates reward and policy for each agent in parallel and exchanges other agent policies synchronously for improving learning speed. In computer experiments, we compare the learning speeds of the previous and proposed method in the case of inferring the reward of a one equilibrium solution in two agents grid navigation. Experimental results showed that the parallelization does not always improve convergence speed, that the other agent’s policies significantly affect the learning speed, and that the learning speed is improved by parallelization when the other agent’s policies are the pseudo policy that is overwritten by the expert trajectories distribution.
关键词：inverse reinforcement learning;multi-agent system;coordinate descent