文章基本信息

标题：OGER++: hybrid multi-type entity recognition
作者：Lenz Furrer ; Lenz Furrer ; Anna Jancso 等
期刊名称：Journal of Cheminformatics
印刷版ISSN：1758-2946
电子版ISSN：1758-2946
出版年度：2019
卷号：11
期号：1
页码：7
DOI：10.1186/s13321-018-0326-3
语种：English
出版社：BioMed Central
摘要：We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching spelling variants. The disambiguation classifier is implemented as a feed-forward neural network which acts as a postfilter to the previous step. We evaluated the system in terms of processing speed and annotation quality. In the speed benchmarks, the OGER++ web service processes 9.7 abstracts or 0.9 full-text documents per second. On the CRAFT corpus, we achieved 71.4% and 56.7% F1 for named entity recognition and concept recognition, respectively. Combining knowledge-based and data-driven components allows creating a system with competitive performance in biomedical text mining.
关键词：Named entity recognition ; Concept recognition ; Natural language processing ; Machine learning