首页    期刊浏览 2024年10月01日 星期二
登录注册

文章基本信息

  • 标题:Creating and Weighting Hunspell Dictionaries as Finite-State Automata
  • 本地全文:下载
  • 作者:Tommi Pirinen, Krister Lindén
  • 期刊名称:Investigationes Linguisticae (Online)
  • 印刷版ISSN:1426-188X
  • 电子版ISSN:1733-1757
  • 出版年度:2010
  • 卷号:XXI
  • 出版社:Adam Mickiewicz University
  • 摘要:

    There are numerous formats for writing spell-checkers for open-source systems and there are many lexical descriptions for natural languages written in these formats. In this paper, we demonstrate a method for converting Hunspell and related spell-checking lexicons into finite-state automata. We also present a simple way to apply unigram corpus training in order to improve the spell-checking suggestion mechanism using weighted finite-state technology. The performance ofthe finite-state based spell-checking system compared with the hunspell approach seems to be an order of magnitude faster. What we propose is a generic and efficient language-independent framework of weighted finite-state automata for spell-checking in typical open-source software, e.g. Mozilla Firefox, OpenOffice and the Gnome desktop.

  • 关键词:Dictionaries;Natural Languages;GNOME Desktops;Spell-checking
国家哲学社会科学文献中心版权所有