首页    期刊浏览 2024年11月30日 星期六
登录注册

文章基本信息

  • 标题:Analysis of K-means, DBSCAN and OPTICS Cluster Algorithms on Al-Quran Verses
  • 本地全文:下载
  • 作者:Mohammed A. Ahmed ; Hanif Baharin ; Puteri N.E. Nohuddin
  • 期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
  • 印刷版ISSN:2158-107X
  • 电子版ISSN:2156-5570
  • 出版年度:2020
  • 卷号:11
  • 期号:8
  • DOI:10.14569/IJACSA.2020.0110832
  • 出版社:Science and Information Society (SAI)
  • 摘要:Chapter Al-Baqarah is the longest chapter in the Holy Quran, and it covers various topics. Al-Quran is the primary text of Islamic faith and practice. Millions of Muslims worldwide use Al - Quran as their reference book, and it, therefore, helps Muslims and Islamic scholars as guidance of the law life. Text clustering (unsupervised learning) is a process of separation that has to be divided text into the same section of similar documents. There are many text clustering algorithms and techniques used to make clusters, such as partitioning and density-based methods. In this paper, k-means preferred as a partitioning method and DBSCAN, OPTICS as a density-based method. This study aims to investigate and find which algorithm produced as the best accurate performance cluster for Al-Baqarah’s English Tafseer chapter. Data preprocessing and feature extraction using Term Frequency-Inverse Document Frequency (TF-IDF) have applied for the dataset. The result shows k-means outperformed even has the smallest of Silhouette Coefficient (SC) score compared to others due to less implementation time with no noise production for seven clusters of Al-Baqarah chapter. OPTICS has no noise with the medium of SC score but has the longest implementation time due to its complexity.
  • 关键词:K-means; DBSCAN; OPTICS; Al-Baqarah clustering; Silhouette Coefficient; Tafseer; text clustering
国家哲学社会科学文献中心版权所有