首页    期刊浏览 2024年08月21日 星期三
登录注册

文章基本信息

  • 标题:Clustering of Mixed Variety of Data using D&C Approach (Categorical, Numeric, Binary, Ordinal, and Nominal, Ratio-Scaled Datum)
  • 本地全文:下载
  • 作者:Rohit Rastogi ; Abhishek Jha ; Poonam Maher
  • 期刊名称:International Journal of Computer Science & Technology
  • 印刷版ISSN:2229-4333
  • 电子版ISSN:0976-8491
  • 出版年度:2012
  • 卷号:3
  • 期号:4
  • 页码:335-339
  • 语种:English
  • 出版社:Ayushmaan Technologies
  • 摘要:Many algorithms for clustering focus on numerical data whose inherent geometric properties can be exploited naturally to define distance functions or dissimilarities between data points. However, in most of the cases data in the real life application is categorical, where attribute values cannot be naturally ordered as numerical values. Due to the different characteristics of two or more kinds of data, attempts to develop criteria functions for mixed data have been not very successful. In this research, we propose a novel Divide-Conquer and combine technique with greedy characteristics for optimal solutions to solve the above said problem. Algorithm can be defined recursive in nature. First, the original mixed dataset is divided into several subproblems as n sub-datasets, which are similar to the original sub problems but smaller in size. Solve these sub problems for (n=2,3 or 4 depending on number of available data sets in our data tuples) recursively and then combine these solutions to create a solution to the original problem. It may be the true and real categorical dataset and-or the pure numeric dataset or exponentially distributed ratio scaled dataset or binary data set on any real life application with yes or no form. Next, available clustering algorithms designed for different types of datasets are employed to produce corresponding clusters. Finally, the clustering results on the categorical and other available datasets are combined as a categorical dataset, on which the unsupervised learning algorithm for categorical dataset is employed to get the final output. Main contribution in this research is to provide an algorithm framework for the mixed attributes clustering problem, in which already available clustering algorithms can be easily amalgamated.
  • 关键词:Clustering Algorithms;Divide-and-Conquer;Greedy Approach; Categorical Dataset;Numerical Dataset;Binary Dataset;Ordinal Dataset;Nominal Dataset;and Ratio-Scaled Dataset
国家哲学社会科学文献中心版权所有