首页    期刊浏览 2024年11月06日 星期三
登录注册

文章基本信息

  • 标题:Towards Identification of Nominal Multiword Expressions in Bengali Language
  • 本地全文:下载
  • 作者:Tanmoy Chakraborty
  • 期刊名称:Open Access Library Journal
  • 印刷版ISSN:2333-9705
  • 电子版ISSN:2333-9721
  • 出版年度:2014
  • 卷号:1
  • 期号:3
  • 页码:1-11
  • DOI:10.4236/oalib.1100582
  • 语种:English
  • 出版社:Scientific Research Pub
  • 摘要:Noun-Noun compounds, as a subset of Compound Nouns as well as Nominal Compounds, play an important role in NLP applications like Machine Translation, Information Retrieval because of the token frequency, type frequency and their occurrence in the world’s languages. Recognition of MWEs requires deep or shallow syntactic preprocessing tools and large corpora. The problem is quite difficult in Bengali due to the lack of such tools and large corpora. This paper deals with the investigation of Noun-Noun bigram collocations from the medium-size untagged Bengali corpus of the articles of Rabindranath Tagore using simple unsupervised approach with various statistical evidences to show the affinity of the constituents of each bigram candidate as a proof of the Multi-Word Expression (MWE) and build a weighted measurement to get a distinction between MWE or non-MWE. We have mentioned different taxonomies of compound noun MWEs in Bengali based on morpho-syntactic flexibility. We have also identified major Noun-Noun semantic collocations that are not MWEs. This initial approach for Bengali is promising in terms of the Precision, Recall and F-score.
  • 关键词:Nominal CompoundsMultiword ExpressionsStatistcal AnalysisBengali
国家哲学社会科学文献中心版权所有