首页    期刊浏览 2024年11月29日 星期五
登录注册

文章基本信息

  • 标题:Data Clustering Method for Very Large Databases using entropy-based algorithm
  • 本地全文:下载
  • 作者:S.Karunakar ; K.Rajesh ; Ashraf Ali
  • 期刊名称:International Journal of Computer Technology and Applications
  • 电子版ISSN:2229-6093
  • 出版年度:2011
  • 卷号:2
  • 期号:5
  • 页码:1197-1200
  • 出版社:Technopark Publications
  • 摘要:Finding useful patterns in large datasets has attracted considerable interest recently and one of the most widely studied problems in this area is the identification of clusters, or densely y populated regions, in a multi-dimensional dataset. Prior work does not adequately address the problem of large datasets and minimization of I/O costs. Clustering of categorical attributes is a difficult problem that has not received as much attention as its numerical counterpart. In this paper we explore the connection between clustering and entropy: clusters of similar points have lower entropy than those of dissimilar ones. We use this connection to design a heuristic algorithm, which is capable of efficiently cluster large data sets of records with categorical attributes. In contrast with other categorical clustering algorithms published in the past, clustering results are very stable for different sample sizes and parameter settings. Also, the criteria for clustering are a very intuitive one, since it is deeply rooted on the well-known notion of entropy
  • 关键词:Data mining; categorical clustering; data labeling.
国家哲学社会科学文献中心版权所有