首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:A new semi-automated workflow for chemical data retrieval and quality checking for modeling applications
  • 作者:Domenico Gadaleta ; Anna Lombardo ; Cosimo Toma
  • 期刊名称:Journal of Cheminformatics
  • 印刷版ISSN:1758-2946
  • 电子版ISSN:1758-2946
  • 出版年度:2018
  • 卷号:10
  • 期号:1
  • 页码:60
  • DOI:10.1186/s13321-018-0315-6
  • 语种:English
  • 出版社:BioMed Central
  • 摘要:The quality of data used for QSAR model derivation is extremely important as it strongly affects the final robustness and predictive power of the model. Ambiguous or wrong structures need to be carefully checked, because they lead to errors in calculation of descriptors, hence leading to meaningless results. The increasing amounts of data, however, have often made it hard to check of very large databases manually. In the light of this, we designed and implemented a semi-automated workflow integrating structural data retrieval from several web-based databases, automated comparison of these data, chemical structure cleaning, selection and standardization of data into a consistent, ready-to-use format that can be employed for modeling. The workflow integrates best practices for data curation that have been suggested in the recent literature. The workflow has been implemented with the freely available KNIME software and is freely available to the cheminformatics community for improvement and application to a broad range of chemical datasets.
  • 关键词:QSAR ; Data curation ; Data cleaning ; Semi-automated ; Workflow
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有