文章基本信息

标题：Correlation Preserved Indexing Based Approach For Document Clustering
本地全文：下载
作者：Meena.S.U ; P.Parthasarathi
期刊名称：International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
印刷版ISSN：2278-1323
出版年度：2013
卷号：2
期号：2
页码：462-470
出版社：Shri Pannalal Research Institute of Technolgy
摘要：Document clustering is the act of collecting similar documents into clusters, where similarity is some function on a document. Document clustering method achieves 1) a high accuracy for documents 2) document frequency can be calculated 3) term weight is calculated with the term frequency vector. Document clustering is closely related to the concept of data clustering. Document clustering is a more specific technique for unsupervised document organization, automatic topic extraction and fast information retrieval or filtering. Clustering methods can be used to automatically group the retrieved documents into a list of meaningful categories. The correlation preserving indexing method is performed to find the correlation between the documents. The Term Frequency-Inverse Document Frequency (TF-IDF) method is used to find the frequency of occurrence of words in each document. The disadvantage of this method is computation complexity. In this paper Significant Score Calculation method is introduced, where similarity between the words are calculated using word net tool. Here the related words are identified. The 98% accuracy is occurred with significant score calculation for finding correlation preserving indexing.
关键词：Correlation Preserving ; Indexing; Document Clustering; ; Significant Score Term Frequency- ; Inverse Document Frequency.