首页    期刊浏览 2024年10月07日 星期一
登录注册

文章基本信息

  • 标题:Language ID in the Context of Harvesting Language Data off the Web
  • 本地全文:下载
  • 作者:Fei Xia ; William Lewis ; Hoifung Poon
  • 期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
  • 出版年度:2009
  • 卷号:2009
  • 出版社:ACL Anthology
  • 摘要:As the arm of NLP technologies extends beyond a small core of languages, techniques for working with instances of language data across hundreds to thousands of languages may require revisiting and recalibrating the tried and true methods that are used. Of the NLP techniques that has been treated as “solved” is language identification (language ID) of written text. However, we argue that language ID is far from solved when one considers input spanning not dozens of languages, but rather hundreds to thousands, a number that one approaches when harvesting language data found on the Web. We formulate language ID as a coreference resolution problem and apply it to aWeb harvesting task for a specific linguistic data type and achieve a much higher accuracy than long accepted language ID approaches.
国家哲学社会科学文献中心版权所有