期刊名称:International Journal of Applied Mathematics and Computer Science
电子版ISSN:2083-8492
出版年度:2019
卷号:29
期号:2
页码:1-12
DOI:10.2478/amcs-2019-0026
出版社:De Gruyter Open
摘要:Reinforcement learning (RL) constitutes an effective method of controlling dynamic systems without prior knowledge.
One of the most important and difficult problems in RL is the improvement of data efficiency. Probabilistic inference
for learning control (PILCO) is a state-of-the-art data-efficient framework that uses a Gaussian process to model dynamic
systems. However, it only focuses on optimizing cumulative rewards and does not consider the accuracy of a dynamic
model, which is an important factor for controller learning. To further improve the data efficiency of PILCO, we propose
its active exploration version (AEPILCO) that utilizes information entropy to describe samples. In the policy evaluation
stage, we incorporate an information entropy criterion into long-term sample prediction. Through the informative policy
evaluation function, our algorithm obtains informative policy parameters in the policy improvement stage. Using the policy
parameters in the actual execution produces an informative sample set; this is helpful in learning an accurate dynamic
model. Thus, the AEPILCO algorithm improves data efficiency by learning an accurate dynamic model by actively selecting
informative samples based on the information entropy criterion. We demonstrate the validity and efficiency of the proposed
algorithm for several challenging controller problems involving a cart pole, a pendubot, a double pendulum, and a cart
double pendulum. The AEPILCO algorithm can learn a controller using fewer trials compared to PILCO. This is verified
through theoretical analysis and experimental results.
关键词:reinforcement learning; information entropy; PILCO; data efficiency;