首页    期刊浏览 2024年07月08日 星期一
登录注册

文章基本信息

  • 标题:General Value Function Networks
  • 本地全文:下载
  • 作者:Matthew Schlegel ; Andrew Jacobsen ; Zaheer Abbas
  • 期刊名称:Journal of Artificial Intelligence Research
  • 印刷版ISSN:1076-9757
  • 出版年度:2021
  • 卷号:70
  • 页码:497-543
  • 出版社:American Association of Artificial
  • 摘要:State construction is important for learning in partially observable environments. A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN); which updates the internal state using the current internal state and the most recent observation. This internal state provides a summary of the observed sequence; to facilitate accurate predictions and decision-making. At the same time; specifying and training RNNs is notoriously tricky; particularly as the common strategy to approximate gradients back in time; called truncated Back-prop Through Time (BPTT); can be sensitive to the truncation window. Further; domain-expertise—which can usually help constrain the function class and so improve trainability—can be difficult to incorporate into complex recurrent units used within RNNs. In this work; we explore how to use multi-step predictions to constrain the RNN and incorporate prior knowledge. In particular; we revisit the idea of using predictions to construct state and ask: does constraining (parts of) the state to consist of predictions about the future improve RNN trainability? We formulate a novel RNN architecture; called a General Value Function Network (GVFN); where each internal state component corresponds to a prediction about the future represented as a value function. We first provide an objective for optimizing GVFNs; and derive several algorithms to optimize this objective. We then show that GVFNs are more robust to the truncation level; in many cases only requiring one-step gradient updates.
  • 关键词:reinforcement learning;neural networks
国家哲学社会科学文献中心版权所有