首页    期刊浏览 2024年11月27日 星期三
登录注册

文章基本信息

  • 标题:Text Segmentation by Language Using Minimum Description Length
  • 本地全文:下载
  • 作者:Hiroshi Yamaguchi ; Kumiko Tanaka-Ishii
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2012
  • 卷号:2012
  • 出版社:ACL Anthology
  • 摘要:The problem addressed in this paper is to segment a given multilingual document into segments for each language and then identify the language of each segment. The problem was motivated by an attempt to collect a large amount of linguistic data for non-major languages from the web. The problem is formulated in terms of obtaining the minimum description length of a text, and the proposed solution finds the segments and their languages through dynamic programming. Empirical results demonstrating the potential of this approach are presented for experiments using texts taken from the Universal Declaration of Human Rights and Wikipedia, covering more than 200 languages.
国家哲学社会科学文献中心版权所有