首页    期刊浏览 2024年11月25日 星期一
登录注册

文章基本信息

  • 标题:DECIMER: towards deep learning for chemical image recognition
  • 本地全文:下载
  • 作者:Kohulan Rajan ; Achim Zielesny ; Christoph Steinbeck
  • 期刊名称:Journal of Cheminformatics
  • 印刷版ISSN:1758-2946
  • 电子版ISSN:1758-2946
  • 出版年度:2020
  • 卷号:12
  • 期号:1
  • 页码:1-9
  • DOI:10.1186/s13321-020-00469-w
  • 出版社:BioMed Central
  • 摘要:The automatic recognition of chemical structure diagrams from the literature is an indispensable component of workflows to re-discover information about chemicals and to make it available in open-access databases. Here we report preliminary findings in our development of Deep lEarning for Chemical ImagE Recognition (DECIMER), a deep learning method based on existing show-and-tell deep neural networks, which makes very few assumptions about the structure of the underlying problem. It translates a bitmap image of a molecule, as found in publications, into a SMILES. The training state reported here does not yet rival the performance of existing traditional approaches, but we present evidence that our method will reach a comparable detection power with sufficient training time. Training success of DECIMER depends on the input data representation: DeepSMILES are superior over SMILES and we have a preliminary indication that the recently reported SELFIES outperform DeepSMILES. An extrapolation of our results towards larger training data sizes suggests that we might be able to achieve near-accurate prediction with 50 to 100 million training structures. This work is entirely based on open-source software and open data and is available to the general public for any purpose.
  • 关键词:Optical chemical entity recognition ; Chemical structure ; Deep learning ; Deep neural networks ; Autoencoder/decoder
国家哲学社会科学文献中心版权所有