首页    期刊浏览 2025年07月16日 星期三
登录注册

文章基本信息

  • 标题:A Comparison of Four Character-Level String-to-String Translation Models for (OCR) Spelling Error Correction
  • 作者:Steffen Eger ; Tim vor der Brück ; Alexander Mehler
  • 期刊名称:The Prague Bulletin of Mathematical Linguistics
  • 印刷版ISSN:0032-6585
  • 电子版ISSN:1804-0462
  • 出版年度:2016
  • 卷号:105
  • 期号:1
  • 页码:77-99
  • DOI:10.1515/pralin-2016-0004
  • 语种:English
  • 出版社:Walter de Gruyter GmbH
  • 摘要:We consider the isolated spelling error correction problem as a specific subproblem of the more general string-to-string translation problem. In this context, we investigate four general string-to-string transformation models that have been suggested in recent years and apply them within the spelling error correction paradigm. In particular, we investigate how a simple ‘ k -best decoding plus dictionary lookup’ strategy performs in this context and find that such an approach can significantly outdo baselines such as edit distance, weighted edit distance, and the noisy channel Brill and Moore model to spelling error correction. We also consider elementary combination techniques for our models such as language model weighted majority voting and center string combination. Finally, we consider real-world OCR post-correction for a dataset sampled from medieval Latin texts.
Loading...
联系我们|关于我们|网站声明
国家哲学社会科学文献中心版权所有