首页    期刊浏览 2025年06月01日 星期日
登录注册

文章基本信息

  • 标题:A New Machine Learning Approach for Arabic-English Documents Classification
  • 本地全文:下载
  • 作者:Walid Mohamed Aly ; Wafaa Hanna Sharaby ; Hany Atef Kelleny
  • 期刊名称:International Journal of Computer Science Issues
  • 印刷版ISSN:1694-0784
  • 电子版ISSN:1694-0814
  • 出版年度:2013
  • 卷号:10
  • 期号:6
  • 出版社:IJCSI Press
  • 摘要:This paper aims at developing a system that is capable of classifying Arabic and English un-structured documents; it proposes to classify these documents in consecutive two phases. In the first phase, incremental Automated Domain-Meta-Document Construction (ADC) algorithm is applied as a new automated machine learning approach. ADC constructs updatable summarized Domain-Meta-Documents, which corresponds to the trained classified documents. The output would be stored in a knowledge base in order to help in the classification process. In the second phase, an enhanced supervised classification algorithm based on automated calculation of threshold value would utilize the previously generated Domain-Meta-Documents to classify the incoming Dataset. To evaluate the performance of this proposed approach, two experiments were conducted on two standard dataset, namely Corpus of Contemporary Arabic (CCA) and Newsgroup 20, whose results revealed that the proposed classification approach outperformed the compared classification algorithms (C4.5 and Back Propagation Neural Network) in different measures. The general accuracy of the proposed system was found to be about 95%.
  • 关键词:Unstructured Documents; Machine Learning; Classification; Threshold.
国家哲学社会科学文献中心版权所有