出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:A considerable interest has been given to Multiword Expression (MWEs) identification andtreatment. The identification of MWEs affects the quality of results of different tasks heavilyused in natural language processing (NLP) such as parsing and generation. Differentapproaches for MWEs identification have been applied such as statistical methods whichemployed as an inexpensive and language independent way of finding co-occurrence patterns.Another approach relays on linguistic methods for identification, which employ informationsuch as part of speech (POS) filters and lexical alignment between languages is also used andproduced more targeted candidate lists. This paper presents a framework for extracting ArabicMWEs (nominal or verbal MWEs) for bi-gram using hybrid approach. The proposed approachstarts with applying statistical method and then utilizes linguistic rules in order to enhance theresults by extracting only patterns that match relevant language rule. The proposed hybridapproach outperforms other traditional approaches.
关键词:tiword expressions (MWEs); Statistical Measures; Part of speech tagging (POS); Nominal;MWEs; verbal MWEs.