文章基本信息

标题：COVER: conformational oversampling as data augmentation for molecules
本地全文：下载
作者：Jennifer Hemmerich ; Ece Asilar ; Gerhard F. Ecker 等
期刊名称：Journal of Cheminformatics
印刷版ISSN：1758-2946
电子版ISSN：1758-2946
出版年度：2020
卷号：12
期号：1
页码：1-12
DOI：10.1186/s13321-020-00420-z
出版社：BioMed Central
摘要：Training neural networks with small and imbalanced datasets often leads to overfitting and disregard of the minority class. For predictive toxicology, however, models with a good balance between sensitivity and specificity are needed. In this paper we introduce conformational oversampling as a means to balance and oversample datasets for prediction of toxicity. Conformational oversampling enhances a dataset by generation of multiple conformations of a molecule. These conformations can be used to balance, as well as oversample a dataset, thereby increasing the dataset size without the need of artificial samples. We show that conformational oversampling facilitates training of neural networks and provides state-of-the-art results on the Tox21 dataset.
关键词：Deep learning ; Toxicity ; Imbalanced learning ; Upsampling