首页    期刊浏览 2024年11月29日 星期五
登录注册

文章基本信息

  • 标题:認知的満足化
  • 本地全文:下载
  • 作者:高橋 達二 ; 甲野 佑 ; 浦上 大輔
  • 期刊名称:人工知能学会論文誌
  • 印刷版ISSN:1346-0714
  • 电子版ISSN:1346-8030
  • 出版年度:2016
  • 卷号:31
  • 期号:6
  • 页码:AI30-M_1-11
  • DOI:10.1527/tjsai.AI30-M
  • 出版社:The Japanese Society for Artificial Intelligence
  • 摘要:

    As the scope of reinforcement learning broadens, the number of possible states and of executable actions, and hence the product of the two sets explode. Often, there are more feasible options than allowed trials, because of physical and computational constraints imposed on the agents. In such an occasion, optimization procedures that require first trying all the options once do not work. The situation is what the theory of bounded rationality was proposed to deal with. We formalize the central heuristics of bounded rationality theory named satisficing. Instead of the traditional formulation of satisficing at the policy level in terms of reinforcement learning, we introduce a value function that implements the asymmetric risk attitudes characteristic of human cognition. Operated under the simple greedy policy, the RS (reference satisficing) value function enables an efficient satisficing in K-armed bandit problems, and when the reference level for satisficing is set at an appropriate value, it leads to effective optimization. RS is also tested in a robotic motion learning task in which a robot learns to perform giant-swings (acrobot). While the standard algorithms fail because of the coarse-grained state space, RS shows a stable performance and autonomous exploration that goes without randomized exploration and its gradual annealing necessary for the standard methods.

  • 关键词:bandit problems;giant-swing;acrobot;POMDP;cognitive biases
国家哲学社会科学文献中心版权所有