首页    期刊浏览 2024年11月24日 星期日
登录注册

文章基本信息

  • 标题:Near Duplicate Document Detection using Document Image
  • 本地全文:下载
  • 作者:Gaudence Uwamahoro ; Zhang Zuping ; Ambele Robert Mtafya
  • 期刊名称:International Journal of Multimedia and Ubiquitous Engineering
  • 印刷版ISSN:1975-0080
  • 出版年度:2016
  • 卷号:11
  • 期号:7
  • 页码:159-168
  • DOI:10.14257/ijmue.2016.11.7.17
  • 出版社:SERSC
  • 摘要:With development, access of Internet has allowed storage of huge documents containing information. Identifying near duplicate documents among those documents is a major problem in information retrieval due to their dimensionality which leads to high cost time. We propose an algorithm based on tf-idf method with importance and discriminative power of a term within a single document to speed up search process for detecting how documents are similar in collection. Using only 26.6% of original document size, our method performs well on efficiency and memory usage as we have reduced compare to the original one and that leads to a decreased time in searching process for similar documents in a collection.
  • 关键词:near duplicate document; tf-idf; document image; document relevance; ; extraction
国家哲学社会科学文献中心版权所有