摘要:Given a parametrized stabilizing controller, the approach presented in this work seeks to find optimal parameters with respect to an infinite-horizon cost. Since the latter is in general not computable, it is suggested to apply an adaptive actor-critic structure to approximate the respective value function. The actor is realized explicitly using the projected subgradient method. A particular challenge arises from the fact that the approximated value function is time-varying depending on the evolution of the dynamical system and critic’s approximation of the value function. Provided that a certain stability constraint is convex and under persistence of excitation conditions, it is shown that the actor and critic parameters converge to prescribed vicinities of the optimal values. The whole setup is done in continuous time. A computational study is presented.