首页    期刊浏览 2024年11月24日 星期日
登录注册

文章基本信息

  • 标题:Construction of a Domain Dictionary for Fundamental Vocabulary and its Application to Automatic Blog Categorization Using Dynamically Estimated Domains of Unknown Words
  • 本地全文:下载
  • 作者:Chikara Hashimoto ; Sadao Kurohashi
  • 期刊名称:Information and Media Technologies
  • 电子版ISSN:1881-0896
  • 出版年度:2014
  • 卷号:9
  • 期号:4
  • 页码:712-735
  • DOI:10.11185/imt.9.712
  • 出版社:Information and Media Technologies Editorial Board
  • 摘要:The semantic relations between words are essential for natural language understanding. Toward deeper natural language understanding, we semi-automatically constructed a domain dictionary that represents the domain relations between fundamental Japanese words. Our method does not require a document collection. As a task-based evaluation of the domain dictionary, we categorized blogs by assigning a domain for each word in a blog article and categorizing it as the most dominant domain. Thus, we dynamically estimated the domains of unknown words, (i.e., those not listed in the domain dictionary), resulting in our blog categorization achieving an accuracy of 94.0% (564/600). Moreover, the domain estimation technique for unknown words achieved an accuracy of 76.6% (383/500).
  • 关键词:domain;lexicon;blog;text categorization;unknown words' domain
国家哲学社会科学文献中心版权所有