首页    期刊浏览 2024年11月29日 星期五
登录注册

文章基本信息

  • 标题:LeadMine: a grammar and dictionary driven approach to entity recognition
  • 本地全文:下载
  • 作者:Daniel M Lowe ; Roger A Sayle
  • 期刊名称:Journal of Cheminformatics
  • 印刷版ISSN:1758-2946
  • 电子版ISSN:1758-2946
  • 出版年度:2015
  • 卷号:7
  • 期号:1
  • 页码:S5
  • DOI:10.1186/1758-2946-7-S1-S5
  • 语种:English
  • 出版社:BioMed Central
  • 摘要:Chemical entity recognition has traditionally been performed by machine learning approaches. Here we describe an approach using grammars and dictionaries. This approach has the advantage that the entities found can be directly related to a given grammar or dictionary, which allows the type of an entity to be known and, if an entity is misannotated, indicates which resource should be corrected. As recognition is driven by what is expected, if spelling errors occur, they can be corrected. Correcting such errors is highly useful when attempting to lookup an entity in a database or, in the case of chemical names, converting them to structures. Our system uses a mixture of expertly curated grammars and dictionaries, as well as dictionaries automatically derived from public resources. We show that the heuristics developed to filter our dictionary of trivial chemical names (from PubChem) yields a better performing dictionary than the previously published Jochem dictionary. Our final system performs post-processing steps to modify the boundaries of entities and to detect abbreviations. These steps are shown to significantly improve performance (2.6% and 4.0% F1-score respectively). Our complete system, with incremental post-BioCreative workshop improvements, achieves 89.9% precision and 85.4% recall (87.6% F1-score) on the CHEMDNER test set. Grammar and dictionary approaches can produce results at least as good as the current state of the art in machine learning approaches. While machine learning approaches are commonly thought of as
  • 关键词:LeadMine ; grammars ; dictionaries ; chemical entity recognition ; CHEMDNER ; Biocreative IV
国家哲学社会科学文献中心版权所有