文章基本信息

标题：A Novel Approach to Detect the Near Duplicate by Refining Provenance Matrix
本地全文：下载
作者：Tanvi Gupta ; Asst.Prof. Latha Banda
期刊名称：International Journal of Computer Technology and Applications
电子版ISSN：2229-6093
出版年度：2012
卷号：3
期号：1
页码：231-234
出版社：Technopark Publications
摘要：In this paper, the provenance matrix is refined to get more accuracy and efficiency in detecting near-duplicates by adding two more factors ‘How’ and ‘Why’ , as the performance of the web search depends on the search results having information without duplicates or redundancy . More redundancy leads to more time consume and more storage, that’s why search engines try to avoid indexing of duplicates documents. Provenance model combines both the content-based and trust-based factors for classifying near-duplicates or original documents, as now a days, many of near-duplicates are from the distrusted websites
关键词：near-duplicates; Provenance ; distrusted ; provenance matrix; trustworthiness