期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2016
卷号:7
期号:5
页码:2277-2280
出版社:TechScience Publications
摘要:This paper proposes a new and efficientmethodology for classification of research documents (pdfdocuments), clean of duplicate data and cluster generation forthe documents. The topic the efficient and easiest searchingfor documents into different clusters makes. This techniquecan be utilized by search engines to provide relevant results tothe user according to query and also utilized by online journaldomains that are maintaining large set of documents. Thispaper also suggests a good cluster generation and wordmatching technique so, the time consume for finding theappropriate cluster for a document will be reduced. Theproper clustering of documents will be further utilized bymulti document summarization system, which produces asummary for the documents related to each other
关键词:Cluster; clustering; word matching; classification of;PDF documents.