文章基本信息

标题：Creating and Weighting Hunspell Dictionaries as Finite-State Automata
本地全文：下载
作者：Tommi Pirinen, Krister Lindén
期刊名称：Investigationes Linguisticae (Online)
印刷版ISSN：1426-188X
电子版ISSN：1733-1757
出版年度：2010
卷号：XXI
出版社：Adam Mickiewicz University
摘要：
There are numerous formats for writing spell-checkers for open-source systems and there are many lexical descriptions for natural languages written in these formats. In this paper, we demonstrate a method for converting Hunspell and related spell-checking lexicons into finite-state automata. We also present a simple way to apply unigram corpus training in order to improve the spell-checking suggestion mechanism using weighted finite-state technology. The performance ofthe finite-state based spell-checking system compared with the hunspell approach seems to be an order of magnitude faster. What we propose is a generic and efficient language-independent framework of weighted finite-state automata for spell-checking in typical open-source software, e.g. Mozilla Firefox, OpenOffice and the Gnome desktop.
关键词：Dictionaries;Natural Languages;GNOME Desktops;Spell-checking