首页    期刊浏览 2024年10月05日 星期六
登录注册

文章基本信息

  • 标题:A Comparative Study Of Word Representation Methods With Conditional Random Fields And Maximum Entropy Markov For Bio-Named Entity Recognition
  • 本地全文:下载
  • 作者:Maan Tareq Abd ; Masnizah Mohd
  • 期刊名称:Malaysian Journal of Computer Science
  • 印刷版ISSN:0127-9084
  • 出版年度:2018
  • 卷号:31
  • 期号:5
  • 出版社:University of Malaya * Faculty of Computer Science and Information Technology
  • 摘要:BioNamed Entity Recognition (BioNER) is the process of identifying and semantically classifying biomedical technical terms and named entities in Biomedicine literature. Therefore, it is a major task in biomedical knowledge acquisition. Meanwhile, Natural Language Processing (NLP) plays an important role in BioNER in the biomedical domain. The first and most essential biomedical literature mining task incorporates biomedical entity recognition such as protein, gene, and chemicals. The most recent BioNER methods rely on predefined traditional features, which attempt to capture the specific surface properties of entity types. However, these empirically predefined feature sets differ between entity types and are manually constructed and complicated, which means developing them is costly. In this paper, we systematically present a comparative evaluation study of three methods, which are: the traditional feature representation method, the continuous bagofwords (CBOW) model, and a new prototypical representation method with two popular sequencelabeling approaches (Conditional Random Fields (CRFs) and Maximum Entropy Markov Models (MEMM)). We evaluated these models with two major BioNER tasks, which involve the JNLPBA and GENETAG corpora. This paper examined the prototypical word representation method and found that Word2Vec can be successfully used for BioNER. Our results show that the new prototypical representation method improved the performance of the two machine learning models with different datasets. Also, the new prototypical representation method performed better than the traditional feature representation method and CBOW model for both datasets. Finally, our experiment proved that the CRF classifier with the new prototypical representation method achieved the best results when 90% data was used as training data, yielding overall Fmeasure values of 0.79% and 0.85% for the JNLPBA corpus and GENETAG corpus, respectively. In comparison, the results achieved using the ME classifier yielded overall Fmeasure values of 0.76% and 0.78% for the JNLPBA corpus and GENETAG corpus, respectively.
  • 关键词:biomedical named entity; prototypical representation; data representation methods; Word2Vec
国家哲学社会科学文献中心版权所有