首页    期刊浏览 2025年07月16日 星期三
登录注册

文章基本信息

  • 标题:LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools
  • 作者:Wahed Hemati ; Wahed Hemati ; Alexander Mehler
  • 期刊名称:Journal of Cheminformatics
  • 印刷版ISSN:1758-2946
  • 电子版ISSN:1758-2946
  • 出版年度:2019
  • 卷号:11
  • 期号:1
  • 页码:3
  • DOI:10.1186/s13321-018-0327-2
  • 语种:English
  • 出版社:BioMed Central
  • 摘要:Chemical and biomedical named entity recognition (NER) is an essential preprocessing task in natural language processing. The identification and extraction of named entities from scientific articles is also attracting increasing interest in many scientific disciplines. Locating chemical named entities in the literature is an essential step in chemical text mining pipelines for identifying chemical mentions, their properties, and relations as discussed in the literature. In this work, we describe an approach to the BioCreative V.5 challenge regarding the recognition and classification of chemical named entities. For this purpose, we transform the task of NER into a sequence labeling problem. We present a series of sequence labeling systems that we used, adapted and optimized in our experiments for solving this task. To this end, we experiment with hyperparameter optimization. Finally, we present LSTMVoter, a two-stage application of recurrent neural networks that integrates the optimized sequence labelers from our study into a single ensemble classifier. We introduce LSTMVoter, a bidirectional long short-term memory (LSTM) tagger that utilizes a conditional random field layer in conjunction with attention-based feature modeling. Our approach explores information about features that is modeled by means of an attention mechanism. LSTMVoter outperforms each extractor integrated by it in a series of experiments. On the BioCreative IV chemical compound and drug name recognition (CHEMDNER) corpus, LSTMVoter achieves an F1-score of 90.04%; on the BioCreative V.5 chemical entity mention in patents corpus, it achieves an F1-score of 89.01%. Data and code are available at https://github.com/texttechnologylab/LSTMVoter .
  • 关键词:BioCreative V.5 ; CEMP ; CHEMDNER ; BioNLP ; Named entity recognition ; Deep learning ; LSTM ; Attention mechanism
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有