首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:A Novel Approach to Detect the Near Duplicate by Refining Provenance Matrix
  • 本地全文:下载
  • 作者:Tanvi Gupta ; Asst.Prof. Latha Banda
  • 期刊名称:International Journal of Computer Technology and Applications
  • 电子版ISSN:2229-6093
  • 出版年度:2012
  • 卷号:3
  • 期号:1
  • 页码:231-234
  • 出版社:Technopark Publications
  • 摘要:In this paper, the provenance matrix is refined to get more accuracy and efficiency in detecting near-duplicates by adding two more factors ‘How’ and ‘Why’ , as the performance of the web search depends on the search results having information without duplicates or redundancy . More redundancy leads to more time consume and more storage, that’s why search engines try to avoid indexing of duplicates documents. Provenance model combines both the content-based and trust-based factors for classifying near-duplicates or original documents, as now a days, many of near-duplicates are from the distrusted websites
  • 关键词:near-duplicates; Provenance ; distrusted ; provenance matrix; trustworthiness
国家哲学社会科学文献中心版权所有