首页    期刊浏览 2024年11月07日 星期四
登录注册

文章基本信息

  • 标题:Automatic Multi-Document Arabic Text Summarization Using Clustering and Keyphrase Extraction
  • 本地全文:下载
  • 作者:Hamzah Noori Fejer ; Nazlia Omar
  • 期刊名称:Journal of Artificial Intelligence
  • 印刷版ISSN:1994-5450
  • 电子版ISSN:2077-2173
  • 出版年度:2015
  • 卷号:8
  • 期号:1
  • 页码:1-9
  • DOI:10.3923/jai.2015.1.9
  • 出版社:Asian Network for Scientific Information
  • 摘要:Automatic text summarization has become important due to the rapid growth of information texts since it is very difficult for human beings to manually summarize large documents of texts. A full understanding of the document is essential to form an ideal summary. However, achieving full understanding is either difficult or impossible for computers. Therefore, selecting important sentences from the original text and presenting these sentences as a summary present the most common techniques in automated text summarization. Arabic natural language processing lacks tools and resources which are essential to advance research in Arabic text summarization. In addition to the limited resources, there has been little attention and research done in this field. Arabic text summarization still suffer from low accuracy as they use simple summarization techniques. The aim of this research is to improve Arabic text summarization by using clustering and keyphrase extraction. This study proposes a combined clustering method to group Arabic documents into several clusters. Keyphrase extraction module is applied to extract important keyphrases from each cluster, which helps to identify the most important sentences and find similar sentences based on several similarity algorithms. These algorithms are applied to extract one sentence from a group of similar sentences while ignoring the other similar sentences. The Recall-Oriented Understudy for Gisting Evaluation (ROGUE) metrics were used for the evaluation. For the summarization dataset the corpus DUC2002 was used. This model achieved an accuracy of 43.4%. The experiments have proved that the proposed model has given better performance in comparison to other work.
国家哲学社会科学文献中心版权所有