首页    期刊浏览 2024年11月24日 星期日
登录注册

文章基本信息

  • 标题:Reorganization of Duplicate Data Cleaning and Cluster Generation for Documents
  • 本地全文:下载
  • 作者:Ajay Kumar ; Davesh Singh Som ; Ramander Singh
  • 期刊名称:International Journal of Computer Science and Information Technologies
  • 电子版ISSN:0975-9646
  • 出版年度:2016
  • 卷号:7
  • 期号:5
  • 页码:2277-2280
  • 出版社:TechScience Publications
  • 摘要:This paper proposes a new and efficientmethodology for classification of research documents (pdfdocuments), clean of duplicate data and cluster generation forthe documents. The topic the efficient and easiest searchingfor documents into different clusters makes. This techniquecan be utilized by search engines to provide relevant results tothe user according to query and also utilized by online journaldomains that are maintaining large set of documents. Thispaper also suggests a good cluster generation and wordmatching technique so, the time consume for finding theappropriate cluster for a document will be reduced. Theproper clustering of documents will be further utilized bymulti document summarization system, which produces asummary for the documents related to each other
  • 关键词:Cluster; clustering; word matching; classification of;PDF documents.
国家哲学社会科学文献中心版权所有