首页    期刊浏览 2024年10月07日 星期一
登录注册

文章基本信息

  • 标题:InChIKey collision resistance: an experimental testing
  • 本地全文:下载
  • 作者:Igor Pletnev ; Andrey Erin ; Alan McNaught
  • 期刊名称:Journal of Cheminformatics
  • 印刷版ISSN:1758-2946
  • 电子版ISSN:1758-2946
  • 出版年度:2012
  • 卷号:4
  • 期号:1
  • 页码:39
  • DOI:10.1186/1758-2946-4-39
  • 语种:English
  • 出版社:BioMed Central
  • 摘要:InChIKey is a 27-character compacted (hashed) version of InChI which is intended for Internet and database searching/indexing and is based on an SHA-256 hash of the InChI character string. The first block of InChIKey encodes molecular skeleton while the second block represents various kinds of isomerism (stereo, tautomeric, etc.). InChIKey is designed to be a nearly unique substitute for the parent InChI. However, a single InChIKey may occasionally map to two or more InChI strings (collision). The appearance of collision itself does not compromise the signature as collision-free hashing is impossible; the only viable approach is to set and keep a reasonable level of collision resistance which is sufficient for typical applications. We tested, in computational experiments, how well the real-life InChIKey collision resistance corresponds to the theoretical estimates expected by design. For this purpose, we analyzed the statistical characteristics of InChIKey for datasets of variable size in comparison to the theoretical statistical frequencies. For the relatively short second block, an exhaustive direct testing was performed. We computed and compared to theory the numbers of collisions for the stereoisomers of Spongistatin I (using the whole set of 67,108,864 isomers and its subsets). For the longer first block, we generated, using custom-made software, InChIKeys for more than 3 × 1010 chemical structures. The statistical behavior of this block was tested by comparison of experimental and theoretical frequencies for the various four-letter sequences which may appear in the first block body. From the results of our computational experiments we conclude that the observed characteristics of InChIKey collision resistance are in good agreement with theoretical expectations.
  • 关键词:Hash Function ; Letter Sequence ; Molecular Skeleton ; Uniform Random Distribution ; Collision Resistance
国家哲学社会科学文献中心版权所有