期刊名称:TELKOMNIKA (Telecommunication Computing Electronics and Control)
印刷版ISSN:2302-9293
出版年度:2021
卷号:19
期号:6
DOI:10.12928/telkomnika.v19i6.21667
语种:English
出版社:Universitas Ahmad Dahlan
摘要:Drug named entity recognition (DNER) becomes the prerequisite of other medical relation extraction systems. Existing approaches to automatically recognize drug names includes rule-based, machine learning (ML) and deep learning (DL) techniques. DL techniques have been verified to be the state-of-the-art as it is independent of handcrafted features. The previous DL methods based on word embedding input representation uses the same vector representation for an entity irrespective of its context in different sentences and hence could not capture the context properly. Also, identification of the n-gram entity is a challenge. In this paper, a novel architecture is proposed that includes a sentence embedding layer that works on the entire sentence to efficiently capture the context of an entity. A hybrid model that comprises a stacked bidirectional long short-term memory (Bi-LSTM) with residual LSTM has been designed to overcome the limitations and to upgrade the performance of the model. We have contrasted the achievement of our proposed approach with other DNER models and the percentage of improvements of the proposed model over LSTM-conditional random field (CRF), LIU and WBI with respect to micro-average F1-score are 11.17, 8.8 and 17.64 respectively. The proposed model has also shown promising result in recognizing 2- and 3-gram entities.
关键词:drug named entity recognition;natural language processing;residual LSTM;sentence level embedding;stacked Bi-LSTM