首页    期刊浏览 2024年12月03日 星期二
登录注册

文章基本信息

  • 标题:Identification of level of resemblance between web based documents
  • 本地全文:下载
  • 作者:Surbhi Kakar
  • 期刊名称:International Journal of Engineering and Computer Science
  • 印刷版ISSN:2319-7242
  • 出版年度:2013
  • 卷号:2
  • 期号:11
  • 页码:3097-3100
  • 出版社:IJECS
  • 摘要:One of the biggest challenges today on web is to deal with the “Big data” problem. Finding documents which are near duplicates ofeach other is another challenge which is in turn brought up by Big data. In this paper the author focuses on finding out the near duplicatedocuments using a technique called shingling. This paper also presents the different types of shingling that can be used. Further, a measurecalled the Jaccard coefficient is discussed which can be used to judge the degree of similarity between the documents
  • 关键词:Big data; shingling; Jaccard Coefficent
国家哲学社会科学文献中心版权所有