首页    期刊浏览 2024年11月28日 星期四
登录注册

文章基本信息

  • 标题:Development of the combined method of identification of near duplicates in electronic scientific works
  • 本地全文:下载
  • 作者:Petro Lizunov ; Andrii Biloshchytskyi ; Alexander Kuchansky
  • 期刊名称:Eastern-European Journal of Enterprise Technologies
  • 印刷版ISSN:1729-3774
  • 电子版ISSN:1729-4061
  • 出版年度:2021
  • 卷号:4
  • 期号:4
  • 页码:57-63
  • DOI:10.15587/1729-4061.2021.238318
  • 语种:English
  • 出版社:PC Technology Center
  • 摘要:The methods for identification of near-duplicates in electronic scientific papers, which include the content of the same type, for example, text data, mathematical formulas, numerical data, etc. were described. For text data, the method of locally sensitive hashing with the finding of Hamming distance between the elements of indices of electronic scientific papers was formalized. If Hamming distance exceeds a fixed numerical threshold, a scientific paper contains a near-duplicate. For numerical data, sub-sequences for each scientific work are formed and the proximity between the papers is determined as the Euclidian distance between the vectors consisting of the numbers of these sub-sequences. To compare mathematical formulas, the method for comparing the sample of formulas is used and the names of variables are compared. To identify near-duplicates in graphic information, two directions are separated: finding key points in the image and applying locally sensitive hashing for individual pixels of the image. Since scientific papers often include such objects as schemes and diagrams, subscriptions to them are examined separately using the methods for comparing text information. The combined method for identification of near-duplicates in electronic scientific papers, which combines the methods for identification of near-duplicates of various types of data, was proposed. To implement the combined method for the identification of near-duplicates in electronic scientific papers, an information-analytical system that processes scientific materials depending on the content type was devised. This makes it possible to qualitatively identify near-duplicates and as widely as possible identify possible abuses and plagiarism in electronic scientific papers: scientific articles, dissertations, monographs, conference materials, etc.
  • 关键词:near-duplicate;electronic scientific paper;antiplagiarism system;locally sensitive hashing
国家哲学社会科学文献中心版权所有