摘要:AbstractReinforcement learning aims to compute optimal control policies with the help of data from closed-loop trajectories. Traditional model-free approaches need huge number of data points to achieve an acceptable performance, rendering them not applicable in most real situations, even if the data can be obtained from a detailed simulator. Model-based reinforcement learning approaches try to leverage model knowledge to drastically reduce the amount of data needed or to enforce important constraints to the closed-loop operation, which is another important drawback of model-free approaches. This paper proposes a novel model-based reinforcement learning approach. The main novelty is the fact that we exploit all the information of a model predictive control (MPC) computing step, and not only the first input that is actually applied to the plant, to efficiently learn a good approximation of the state value function. This approximation can be included into a model predictive control formulation as a terminal cost with a short prediction horizon, achieving a similar performance to an MPC with a very long prediction horizon. Simulation results of a discretized batch bioreactor illustrate the potential of the proposed methodology.