文章基本信息

标题：General Value Function Networks
本地全文：下载
作者：Matthew Schlegel ; Andrew Jacobsen ; Zaheer Abbas 等
期刊名称：Journal of Artificial Intelligence Research
印刷版ISSN：1076-9757
出版年度：2021
卷号：70
页码：497-543
出版社：American Association of Artificial
摘要：State construction is important for learning in partially observable environments. A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN); which updates the internal state using the current internal state and the most recent observation. This internal state provides a summary of the observed sequence; to facilitate accurate predictions and decision-making. At the same time; specifying and training RNNs is notoriously tricky; particularly as the common strategy to approximate gradients back in time; called truncated Back-prop Through Time (BPTT); can be sensitive to the truncation window. Further; domain-expertise—which can usually help constrain the function class and so improve trainability—can be difficult to incorporate into complex recurrent units used within RNNs. In this work; we explore how to use multi-step predictions to constrain the RNN and incorporate prior knowledge. In particular; we revisit the idea of using predictions to construct state and ask: does constraining (parts of) the state to consist of predictions about the future improve RNN trainability? We formulate a novel RNN architecture; called a General Value Function Network (GVFN); where each internal state component corresponds to a prediction about the future represented as a value function. We first provide an objective for optimizing GVFNs; and derive several algorithms to optimize this objective. We then show that GVFNs are more robust to the truncation level; in many cases only requiring one-step gradient updates.
关键词：reinforcement learning;neural networks