首页    期刊浏览 2024年10月05日 星期六
登录注册

文章基本信息

  • 标题:A Two-Stage Joint Model for Domain-Specific Entity Detection and Linking Leveraging an Unlabeled Corpus
  • 本地全文:下载
  • 作者:Hongzhi Zhang
  • 期刊名称:Information
  • 电子版ISSN:2078-2489
  • 出版年度:2017
  • 卷号:8
  • 期号:2
  • 页码:59
  • DOI:10.3390/info8020059
  • 语种:English
  • 出版社:MDPI Publishing
  • 摘要:The intensive construction of domain-specific knowledge bases (DSKB) has posed an urgent demand for researches about domain-specific entity detection and linking (DSEDL). Joint models are usually adopted in DSEDL tasks, but data imbalance and high computational complexity exist in these models. Besides, traditional feature representation methods are insufficient for domain-specific tasks, due to problems such as lack of labeled data, link sparseness in DSKBs, and so on. In this paper, a two-stage joint (TSJ) model is proposed to solve the data imbalance problem by discriminatively processing entity mentions with different degrees of ambiguity. In addition, three novel methods are put forward to generate effective features by incorporating an unlabeled corpus. One crucial feature involving entity detection is the mention type, extracted by a long short-term memory (LSTM) model trained on automatically annotated data. The other two types of features mainly involve entity linking, including the inner-document topical coherence, which is measured based on entity co-occurring relationships in the corpus, and the cross-document entity coherence evaluated using similar documents. An overall 74.26% F1 value is obtained on a dataset of real-world movie comments, demonstrating the effectiveness of the proposed approach and indicating its potentiality to be used in real-world domain-specific applications.
  • 关键词:entity detection; entity linking; domain-specific knowledge base; LSTM; topical coherence entity detection ; entity linking ; domain-specific knowledge base ; LSTM ; topical coherence
国家哲学社会科学文献中心版权所有