首页    期刊浏览 2024年11月23日 星期六
登录注册

文章基本信息

  • 标题:COMPARATIVE ANALYSIS OF ML POS ON ARABIC TWEETS
  • 本地全文:下载
  • 作者:MUSTAFA ABDULKAREEM ; SABRINA TIUN
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2017
  • 卷号:95
  • 期号:2
  • 出版社:Journal of Theoretical and Applied
  • 摘要:One of the challenges of natural language processing is social media text like tweets. Conversational text in contrast to genres that are highly edited (standard language) which traditional NLP tools have been developed for contains many syntactic patterns and non-standard lexical items. These are the outcomes of dialectal variation, diversity in topic, orthography, unintended errors, conversational errors and creative language use. The fact that twitter text is characterized by idiosyncratic style, noise and linguistic errors makes it difficult to part-of-speech tag. The aim of this paper is to design and implement models of speech tagging for Arabic tweets by investigating numerous models of machine learning like K-Nearest Neighbour, Naive Bayes and Decision tree models. In this paper, a novel Arabic Twitter corpus is introduced while assessing various state-of-the-art POS taggers which retrained on the given corpus. A state-of-the-art accuracy of 87.97% is achieved when tagging twitter.
  • 关键词:Arabic part of speech tagging; Arabic tweets Classification; Feature Extraction
国家哲学社会科学文献中心版权所有