摘要:Discourse structures have a central role in several computational tasks, such as question–answering
or dialogue generation. In particular, the framework of the Rhetorical Structure Theory (RST) offers
a sound formalism for hierarchical text organization. In this article, we present HILDA, an implemented
discourse parser based on RST and Support Vector Machine (SVM) classification. SVM
classifiers are trained and applied to discourse segmentation and relation labeling. By combining
labeling with a greedy bottom-up tree building approach, we are able to create accurate discourse
trees in linear time complexity with respect to the length of the input text. Importantly, our parser
can parse entire texts, whereas the publicly available parser SPADE (Soricut and Marcu, 2003) is
limited to sentence level analysis. HILDA outperforms other discourse parsers for tree structure
construction and discourse relation labeling. For the discourse parsing task, our system reaches
78.3% of the performance level of human annotators. Compared to a state-of-the-art rule-based
discourse parser, our system achieves an performance increase of 11.6%.
关键词:Discourse Parser; Rhetorical Structure Theory; Support Vector Machines