首页    期刊浏览 2025年02月23日 星期日
登录注册

文章基本信息

  • 标题:Fast Approximate Search in Large Dictionaries
  • 本地全文:下载
  • 作者:Stoyan Mihov ; Klaus U. Schulz
  • 期刊名称:Computational Linguistics
  • 印刷版ISSN:0891-2017
  • 电子版ISSN:1530-9312
  • 出版年度:2004
  • 卷号:30
  • 期号:4
  • 页码:451-477
  • DOI:10.1162/0891201042544938
  • 语种:English
  • 出版社:MIT Press
  • 摘要:The need to correct garbled strings arises in many areas of natural language processing. If a dictionary is available that covers all possible input tokens, a natural set of candidates for correcting an erroneous input P is the set of all words in the dictionary for which the Levenshtein distance to P does not exceed a given (small) bound k . In this article we describe methods for efficiently selecting such candidate sets. After introducing as a starting point a basic correction method based on the concept of a “universal Levenshtein automaton,” we show how two filtering methods known from the field of approximate text search can be used to improve the basic procedure in a significant way. The first method, which uses standard dictionaries plus dictionaries with reversed words, leads to very short correction times for most classes of input strings. Our evaluation results demonstrate that correction times for fixed-distance bounds depend on the expected number of correction candidates, which decreases for longer input words. Similarly the choice of an optimal filtering method depends on the length of the input words.
国家哲学社会科学文献中心版权所有