文章基本信息

标题：Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes
本地全文：下载
作者：Steven Carr ; Nils Jansen ; Ufuk Topcu 等
期刊名称：Journal of Artificial Intelligence Research
印刷版ISSN：1076-9757
出版年度：2021
卷号：72
页码：1-29
DOI：10.1613/jair.1.12963
语种：English
出版社：American Association of Artificial
摘要：Partially observable Markov decision processes (POMDPs) are models for sequential decision-making under uncertainty and incomplete information. Machine learning methods typically train recurrent neural networks (RNN) as effective representations of POMDP policies that can efficiently process sequential data. However it is hard to verify whether the POMDP driven by such RNN-based policies satisfies safety constraints for instance given by temporal logic specifications. We propose a novel method that combines techniques from machine learning with the field of formal methods: training an RNN-based policy and then automatically extracting a so-called finite-state controller (FSC) from the RNN. Such FSCs offer a convenient way to verify temporal logic constraints. Implemented on a POMDP they induce a Markov chain and probabilistic verification methods can efficiently check whether this induced Markov chain satisfies a temporal logic specification. Using such methods if the Markov chain does not satisfy the specification a byproduct of verification is diagnostic information about the states in the POMDP that are critical for the specification. The method exploits this diagnostic information to either adjust the complexity of the extracted FSC or improve the policy by performing focused retraining of the RNN. The method synthesizes policies that satisfy temporal logic specifications for POMDPs with up to millions of states which are three orders of magnitude larger than comparable approaches.
关键词：markov decision processes;neural networks;uncertainty