文章基本信息

标题：Towards Identification of Nominal Multiword Expressions in Bengali Language
本地全文：下载
作者：Tanmoy Chakraborty
期刊名称：Open Access Library Journal
印刷版ISSN：2333-9705
电子版ISSN：2333-9721
出版年度：2014
卷号：1
期号：3
页码：1-11
DOI：10.4236/oalib.1100582
语种：English
出版社：Scientific Research Pub
摘要：Noun-Noun compounds, as a subset of Compound Nouns as well as Nominal Compounds, play an important role in NLP applications like Machine Translation, Information Retrieval because of the token frequency, type frequency and their occurrence in the world’s languages. Recognition of MWEs requires deep or shallow syntactic preprocessing tools and large corpora. The problem is quite difficult in Bengali due to the lack of such tools and large corpora. This paper deals with the investigation of Noun-Noun bigram collocations from the medium-size untagged Bengali corpus of the articles of Rabindranath Tagore using simple unsupervised approach with various statistical evidences to show the affinity of the constituents of each bigram candidate as a proof of the Multi-Word Expression (MWE) and build a weighted measurement to get a distinction between MWE or non-MWE. We have mentioned different taxonomies of compound noun MWEs in Bengali based on morpho-syntactic flexibility. We have also identified major Noun-Noun semantic collocations that are not MWEs. This initial approach for Bengali is promising in terms of the Precision, Recall and F-score.
关键词：Nominal CompoundsMultiword ExpressionsStatistcal AnalysisBengali