首页    期刊浏览 2024年11月26日 星期二
登录注册

文章基本信息

  • 标题:Reinforcement Learning for Penalty Avoiding Rational Policy Making
  • 本地全文:下载
  • 作者:Kazuteru Miyazaki ; sougo Tsuboi ; Shigenobu Kobayashi
  • 期刊名称:人工知能学会論文誌
  • 印刷版ISSN:1346-0714
  • 电子版ISSN:1346-8030
  • 出版年度:2001
  • 卷号:16
  • 期号:2
  • 页码:185-192
  • DOI:10.1527/tjsai.16.185
  • 出版社:The Japanese Society for Artificial Intelligence
  • 摘要:Reinforcement learning is a kind of machine learning. It aims to adapt an agent to a given environment with a clue to rewards. In general, the purpose of reinforcement learning system is to acquire an optimum policy that can maximize expected reward per an action. However, it is not always important for any environment. Especially, if we apply reinforcement learning system to engineering, environments, we expect the agent to avoid all penalties. In Markov Decision Processes, a pair of a sensory input and an action is called rule. We call a rule penalty if and only if it has a penalty or it can transit to a penalty state where it does not contribute to get any reward. After suppressing all penalty rules, we aim to make a rational policy whose expected reward per an action is larger than zero. In this paper, we propose a suppressing penalty algorithm that can suppress any penalty and get a reward constantly. By applying the algorithm to the tick-tack-toe, its effectiveness is shown.
  • 关键词:reinforcement learning ; reward and penalty ; penalty avoiding ; rational policy making
国家哲学社会科学文献中心版权所有