摘要:AbstractIn contrast to its great empirical success, theoretical understanding of multi-agent reinforcement learning (MARL) remains largely underdeveloped. As an initial attempt, we provide a finite-sample analysis for decentralized cooperative MARL with networked agents. In particular, we consider a team of cooperative agents connected by a time-varying communication network, with no central controller coordinating them. The goal for each agent is to maximize the long-term return associated with the team-average reward, by communicating only with its neighbors over the network. A batch MARL algorithm is developed for this setting, which can be implemented in a decentralized fashion. We then quantify the estimation errors of the action-value functions obtained from our algorithm, establishing their dependence on the function class, the number of samples in each iteration, and the number of iterations. This work appears to be the first finite-sample analysis for decentralized cooperative MARL from batch data.