摘要:We investigate the problem of multi-agent reinforcement learning, in which each agent only has access to its local reward and can only communicate with its nearby neighbors. A distributed algorithm based on actor-critic method has been developed to enable all agents to cooperatively learn a control policy that maximizes the global objective function. Simulations are also provided to validate the proposed algorithm.