文章基本信息

标题：Identification of Multiword Expressions by Combining Multiple Linguistic Information Sources
本地全文：下载
作者：Yulia Tsvetkov ; Shuly Wintner
期刊名称：Computational Linguistics
印刷版ISSN：0891-2017
电子版ISSN：1530-9312
出版年度：2014
卷号：40
期号：2
页码：449-468
DOI：10.1162/COLI_a_00177
语种：English
出版社：MIT Press
摘要：We propose a framework for using multiple sources of linguistic information in the task of identifying multiword expressions in natural language texts. We define various linguistically motivated classification features and introduce novel ways for computing them. We then manually define interrelationships among the features, and express them in a Bayesian network. The result is a powerful classifier that can identify multiword expressions of various types and multiple syntactic constructions in text corpora. Our methodology is unsupervised and language-independent; it requires relatively few language resources and is thus suitable for a large number of languages. We report results on English, French, and Hebrew, and demonstrate a significant improvement in identification accuracy, compared with less sophisticated baselines.