首页    期刊浏览 2024年11月24日 星期日
登录注册

文章基本信息

  • 标题:HILDA: A Discourse Parser Using Support Vector Machine Classification
  • 本地全文:下载
  • 作者:Hugo Herneault ; Helmut Prendinger ; David duVerle
  • 期刊名称:Dialogue and Discourse
  • 电子版ISSN:2152-9620
  • 出版年度:2010
  • 卷号:1
  • 期号:3
  • 页码:1-33
  • DOI:10.5087/dad.2010.003
  • 出版社:Linguistic Society of America
  • 摘要:Discourse structures have a central role in several computational tasks, such as question–answering or dialogue generation. In particular, the framework of the Rhetorical Structure Theory (RST) offers a sound formalism for hierarchical text organization. In this article, we present HILDA, an implemented discourse parser based on RST and Support Vector Machine (SVM) classification. SVM classifiers are trained and applied to discourse segmentation and relation labeling. By combining labeling with a greedy bottom-up tree building approach, we are able to create accurate discourse trees in linear time complexity with respect to the length of the input text. Importantly, our parser can parse entire texts, whereas the publicly available parser SPADE (Soricut and Marcu, 2003) is limited to sentence level analysis. HILDA outperforms other discourse parsers for tree structure construction and discourse relation labeling. For the discourse parsing task, our system reaches 78.3% of the performance level of human annotators. Compared to a state-of-the-art rule-based discourse parser, our system achieves an performance increase of 11.6%.
  • 关键词:Discourse Parser; Rhetorical Structure Theory; Support Vector Machines
国家哲学社会科学文献中心版权所有