首页    期刊浏览 2024年10月03日 星期四
登录注册

文章基本信息

  • 标题:Reinforcement Learning of Multi-Party Trading Dialog Policies
  • 本地全文:下载
  • 作者:Takuya Hiraoka ; Kallirroi Georgila ; Elnaz Nouri
  • 期刊名称:人工知能学会論文誌
  • 印刷版ISSN:1346-0714
  • 电子版ISSN:1346-8030
  • 出版年度:2016
  • 卷号:31
  • 期号:4
  • 页码:B-FC1_1-14
  • DOI:10.1527/tjsai.B-FC1
  • 出版社:The Japanese Society for Artificial Intelligence
  • 摘要:

    Trading dialogs are a kind of negotiation in which an exchange of ownership of items is discussed, and these kinds of dialogs are pervasive in many situations. Recently, there has been an increasing amount of research on applying reinforcement learning (RL) to negotiation dialog domains. However, in previous research, the focus was on negotiation dialog between two participants only, ignoring cases where negotiation takes place between more than two interlocutors. In this paper, as a first study on multi-party negotiation, we apply RL to a multi-party trading scenario where the dialog system (learner) trades with one, two, or three other agents. We experiment with different RL algorithms and reward functions. We use Q-learning with linear function approximation, least-squares policy iteration, and neural fitted Q iteration. In addition, to make the learning process more efficient, we introduce an incremental reward function. The negotiation strategy of the learner is learned through simulated dialog with trader simulators. In our experiments, we evaluate how the performance of the learner varies depending on the RL algorithm used and the number of traders. Furthermore, we compare the learned dialog policies with two strong hand-crafted baseline dialog policies. Our results show that (1) even in simple multi-party trading dialog tasks, learning an effective negotiation policy is not a straightforward task and requires a lot of experimentation; and (2) the use of neural fitted Q iteration combined with an incremental reward function produces negotiation policies as effective or even better than the policies of the two strong hand-crafted baselines.

  • 关键词:dialog policy;trading dialog;negotiation dialog;multi-party dialog;reinforcement learning
国家哲学社会科学文献中心版权所有