首页    期刊浏览 2024年10月07日 星期一
登录注册

文章基本信息

  • 标题:AUTOMATIC MACHINE LEARNING TECHNIQUES (AMLT) FOR ARABIC TEXT CLASSIFICATION BASED ON TERM COLLOCATIONS
  • 本地全文:下载
  • 作者:FEKRY OLAYAH ; WASEEM ALROMIMA
  • 期刊名称:Journal of Theoretical and Applied Information Technology
  • 印刷版ISSN:1992-8645
  • 电子版ISSN:1817-3195
  • 出版年度:2018
  • 卷号:96
  • 期号:12
  • 出版社:Journal of Theoretical and Applied
  • 摘要:Due to the rapid and increased availability of documents in a digital format, effect for retrieving information with highest accuracy and the lowest error rate is becoming more difficult. Text Classification (TC) has become one of the key techniques for controlling and organizing documents based on the content of documents. Therefore, keyword extraction is one of the most important natural language processing applications, which extracts information from the document such as term collocations, which are two or more words appear together and always seem as associated. In Arabic language, there are many problems in keyword extraction because of the complexity of Arabic orthography. Moreover, the accuracy is affecting by the document content and the classification technique used. The need for automatic text classification came from a large amount of electronic documents on the web. This research aims to propose an Automatic Machine Learning Techniques (AMLT) for classifying Arabic documents by using term collocations. These collocations are mined from Arabic documents, the extracted term collocations will scoring by using association measure and will be used as terms feature selection. To achieve this study, we used Arabic documents divided into four categories (Economy/ business, Politics, Religion and Science). The results of our approach have compared with the full-document approach and summary-document approach using four techniques (SVM, NB, J48, and KNN) for Arabic documents to determine which classifier is more accurate for Arabic text based on term collocation. The evaluation results proved that our proposed approach outperforms the other method in accuracy.
  • 关键词:Arabic Language; Text classification; Term collocations; bi-gram; Machine Learning; Category
国家哲学社会科学文献中心版权所有