期刊名称:International Journal of Data Mining & Knowledge Management Process
印刷版ISSN:2231-007X
电子版ISSN:2230-9608
出版年度:2015
卷号:5
期号:2
页码:23
DOI:10.5121/ijdkp.2015.5203
出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:With ever increasing number of documents on web and other repositories, the task of organizing andcategorizing these documents to the diverse need of the user by manual means is a complicated job, hencea machine learning technique named clustering is very useful. Text documents are clustered by pair wisesimilarity of documents with similarity measures like Cosine, Jaccard or Pearson. Best clustering resultsare seen when overlapping of terms in documents is less, that is, when clusters are distinguishable. Hencefor this problem, to find document similarity we apply link and neighbor introduced in ROCK. Linkspecifies number of shared neighbors of a pair of documents. Significantly similar documents are called asneighbors. This work applies links and neighbors to Bisecting K-means clustering in identifying seeddocuments in the dataset, as a heuristic measure in choosing a cluster to be partitioned and as a means tofind the number of partitions possible in the dataset. Our experiments on real-time datasets showed asignificant improvement in terms of accuracy with minimum time.