期刊名称:International Journal on Computer Science and Engineering
印刷版ISSN:2229-5631
电子版ISSN:0975-3397
出版年度:2012
卷号:4
期号:07
页码:1348-1353
出版社:Engg Journals Publications
摘要:Clustering is the problem of discovering �meaningful� groups in given data. The first and common step in the process of Partitional Clustering is to decide the best value of K, the number of partitions. The clustering solution varies with K. Instead of clustering the data by guessing K value, in this paper we propose to cluster the data based on their similarity to obtain more meaningful clusters. Other characteristics of our clustering approach are (1) It deals with outliers (2) It deals the problem of clustering heterogeneous data (3) It reduces the high dimensionality of the term document matrix (4) It outperforms in accuracy the well- known clustering algorithm K-Means.
关键词:Clustering; Latent Semantic Indexing; Text Preprocessing; Term document matrix.