首页    期刊浏览 2024年11月24日 星期日
登录注册

文章基本信息

  • 标题:Splitting Katakana Noun Compounds by Paraphrasing and Back-transliteration
  • 本地全文:下载
  • 作者:Nobuhiro Kaji ; Masaru Kitsuregawa
  • 期刊名称:Information and Media Technologies
  • 电子版ISSN:1881-0896
  • 出版年度:2014
  • 卷号:9
  • 期号:4
  • 页码:790-813
  • DOI:10.11185/imt.9.790
  • 出版社:Information and Media Technologies Editorial Board
  • 摘要:Word boundaries within noun compounds in a number of languages, including Japanese, are not marked by white spaces. Thus, it is beneficial for various NLP applications to split such noun compounds. In the case of Japanese, noun compounds composed of katakana words are particularly difficult to split because katakana words are highly productive and are often out of vocabulary. Therefore, we propose using paraphrasing and back-transliteration of katakana noun compounds to split them. Experiments in which paraphrases and back-transliterations from unlabeled textual data were extracted and used to construct splitting models improved splitting accuracy with statistical significance.
  • 关键词:paraphrasing;back-transliteration;katakana words;noun compound splitting;word segmentation
国家哲学社会科学文献中心版权所有