首页    期刊浏览 2024年09月20日 星期五
登录注册

文章基本信息

  • 标题:Scalable Varied Density Clustering Algorithm for Large Datasets
  • 本地全文:下载
  • 作者:Ahmed Fahim ; Abd-Elbadeeh Salem ; Fawzy Torkey
  • 期刊名称:Journal of Software Engineering and Applications
  • 印刷版ISSN:1945-3116
  • 电子版ISSN:1945-3124
  • 出版年度:2010
  • 卷号:3
  • 期号:6
  • 页码:593-602
  • DOI:10.4236/jsea.2010.36069
  • 出版社:Scientific Research Publishing
  • 摘要:Finding clusters in data is a challenging problem especially when the clusters are being of widely varied shapes, sizes, and densities. Herein a new scalable clustering technique which addresses all these issues is proposed. In data mining, the purpose of data clustering is to identify useful patterns in the underlying dataset. Within the last several years, many clustering algorithms have been proposed in this area of research. Among all these proposed methods, density clustering methods are the most important due to their high ability to detect arbitrary shaped clusters. Moreover these methods often show good noise-handling capabilities, where clusters are defined as regions of typical densities separated by low or no density regions. In this paper, we aim at enhancing the well-known algorithm DBSCAN, to make it scalable and able to discover clusters from uneven datasets in which clusters are regions of homogenous densities. We achieved the scalability of the proposed algorithm by using the k-means algorithm to get initial partition of the dataset, applying the enhanced DBSCAN on each partition, and then using a merging process to get the actual natural number of clusters in the underlying dataset. This means the proposed algorithm consists of three stages. Experimental results using synthetic datasets show that the proposed clustering algorithm is faster and more scalable than the enhanced DBSCAN counterpart.
  • 关键词:EDBSCAN; Data Clustering; Varied Density Clustering; Cluster Analysis
国家哲学社会科学文献中心版权所有