首页    期刊浏览 2024年11月08日 星期五
登录注册

文章基本信息

  • 标题:Transformer-CNN: Swiss knife for QSAR modeling and interpretation
  • 本地全文:下载
  • 作者:Pavel Karpov ; Guillaume Godin ; Igor V. Tetko
  • 期刊名称:Journal of Cheminformatics
  • 印刷版ISSN:1758-2946
  • 电子版ISSN:1758-2946
  • 出版年度:2020
  • 卷号:12
  • 期号:1
  • 页码:1-12
  • DOI:10.1186/s13321-020-00423-w
  • 出版社:BioMed Central
  • 摘要:We present SMILES-embeddings derived from the internal encoder state of a Transformer [1] model trained to canonize SMILES as a Seq2Seq problem. Using a CharNN [2] architecture upon the embeddings results in higher quality interpretable QSAR/QSPR models on diverse benchmark datasets including regression and classification tasks. The proposed Transformer-CNN method uses SMILES augmentation for training and inference, and thus the prognosis is based on an internal consensus. That both the augmentation and transfer learning are based on embeddings allows the method to provide good results for small datasets. We discuss the reasons for such effectiveness and draft future directions for the development of the method. The source code and the embeddings needed to train a QSAR model are available on https://github.com/bigchem/transformer-cnn. The repository also has a standalone program for QSAR prognosis which calculates individual atoms contributions, thus interpreting the model’s result. OCHEM [3] environment (https://ochem.eu) hosts the on-line implementation of the method proposed.
  • 关键词:Transformer model ; Convolutional neural neural networks ; Augmentation ; QSAR ; SMILES ; Embeddings ; Character;based models ; Cheminformatics ; Regression ; Classification
国家哲学社会科学文献中心版权所有