文章基本信息

标题：Constructing Corpora for the Development and Evaluation of Paraphrase Systems
本地全文：下载
作者：Trevor Cohn ; Chris Callison-Burch ; Mirella Lapata 等
期刊名称：Computational Linguistics
印刷版ISSN：0891-2017
电子版ISSN：1530-9312
出版年度：2008
卷号：34
期号：4
页码：597-614
DOI：10.1162/coli.08-003-R1-07-044
语种：English
出版社：MIT Press
摘要：Automatic paraphrasing is an important component in many natural language processing tasks. In this article we present a new parallel corpus with paraphrase annotations. We adopt a definition of paraphrase based on word alignments and show that it yields high inter-annotator agreement. As Kappa is suited to nominal data, we employ an alternative agreement statistic which is appropriate for structured alignment tasks. We discuss how the corpus can be usefully employed in evaluating paraphrase systems automatically (e.g., by measuring precision, recall, and F1) and also in developing linguistically rich paraphrase models based on syntactic structure.