摘要:AbstractModel-free reinforcement learning methods have achieved significant success in a variety of decision-making problems. In fact, they traditionally rely on large amounts of data generated by sample-efficient simulators. However, many process control industries involve complex and costly computations, which limits the applicability of model-free reinforcement learning. In addition, extrinsic rewards are naturally sparse in the real world, further increasing the amount of necessary interactions with the environment. This paper presents a sample-efficient model-free algorithm for process control, which massively accelerates the learning process even when rewards are extremely sparse. To achieve this, we leverage existing controllers to guide the agent's learning — controller guidance is used to drive exploration towards key regions of the state space. To further mitigate the above-mentioned challenges, we propose a strategy for self-supervision learning that lets us improve the agent's policy via its own successful experience. Notably, the method we develop is able to leverage guidance that does not include the actions and remains effective when the existing controllers are suboptimal. We present an empirical evaluation on a vinyl acetate monomer (VAM) chemical plant under disturbances. The proposed method exhibits better performance than baselines approaches and higher sample efficiency. Besides, empirical results show that our method outperforms the existing controllers for controlling the plant and canceling disturbances, mitigating the drop in the production load.