首页    期刊浏览 2024年09月20日 星期五
登录注册

文章基本信息

  • 标题:A document processing pipeline for annotating chemical entities in scientific documents
  • 本地全文:下载
  • 作者:David Campos ; Sérgio Matos ; José L Oliveira
  • 期刊名称:Journal of Cheminformatics
  • 印刷版ISSN:1758-2946
  • 电子版ISSN:1758-2946
  • 出版年度:2015
  • 卷号:7
  • 期号:1
  • 页码:S7
  • DOI:10.1186/1758-2946-7-S1-S7
  • 语种:English
  • 出版社:BioMed Central
  • 摘要:The recognition of drugs and chemical entities in text is a very important task within the field of biomedical information extraction, given the rapid growth in the amount of published texts (scientific papers, patents, patient records) and the relevance of these and other related concepts. If done effectively, this could allow exploiting such textual resources to automatically extract or infer relevant information, such as drug profiles, relations and similarities between drugs, or associations between drugs and potential drug targets. The objective of this work was to develop and validate a document processing and information extraction pipeline for the identification of chemical entity mentions in text. We used the BioCreative IV CHEMDNER task data to train and evaluate a machine-learning based entity recognition system. Using a combination of two conditional random field models, a selected set of features, and a post-processing stage, we achieved F-measure results of 87.48% in the chemical entity mention recognition task and 87.75% in the chemical document indexing task. We present a machine learning-based solution for automatic recognition of chemical and drug names in scientific documents. The proposed approach applies a rich feature set, including linguistic, orthographic, morphological, dictionary matching and local context features. Post-processing modules are also integrated, performing parentheses correction, abbreviation resolution and filtering erroneous mentions using an exclusion list derived from the training data. The developed methods were implemented as a document annotation tool and web service, freely available at http://bioinformatics.ua.pt/becas-chemicals/ .
  • 关键词:Chemicals ; Named Entity Recognition ; Conditional Random Fields
国家哲学社会科学文献中心版权所有