文章基本信息

标题：Near Duplicate Document Detection using Document Image
本地全文：下载
作者：Gaudence Uwamahoro ; Zhang Zuping ; Ambele Robert Mtafya 等
期刊名称：International Journal of Multimedia and Ubiquitous Engineering
印刷版ISSN：1975-0080
出版年度：2016
卷号：11
期号：7
页码：159-168
DOI：10.14257/ijmue.2016.11.7.17
出版社：SERSC
摘要：With development, access of Internet has allowed storage of huge documents containing information. Identifying near duplicate documents among those documents is a major problem in information retrieval due to their dimensionality which leads to high cost time. We propose an algorithm based on tf-idf method with importance and discriminative power of a term within a single document to speed up search process for detecting how documents are similar in collection. Using only 26.6% of original document size, our method performs well on efficiency and memory usage as we have reduced compare to the original one and that leads to a decreased time in searching process for similar documents in a collection.
关键词：near duplicate document; tf-idf; document image; document relevance; ; extraction