首页    期刊浏览 2025年08月18日 星期一
登录注册

文章基本信息

  • 标题:Querying and Serving N-gram Language Models with Python
  • 本地全文:下载
  • 期刊名称:The Python Papers
  • 印刷版ISSN:1834-3147
  • 出版年度:2009
  • 卷号:4
  • 期号:2
  • 页码:5
  • 语种:English
  • 出版社:The Python Papers
  • 摘要:Statistical n-gram language modeling is a very important technique in Natural Language Processing (NLP) and Computational Linguistics used to assess the fluency of an utterance in any given language. It is widely employed in several important NLP applications such as Machine Translation and Automatic Speech Recognition. However, the most commonly used toolkit (SRILM) to build such language models on a large scale is written entirely in C++ which presents a challenge to an NLP developer or researcher whose primary language of choice is Python. This article first provides a gentle introduction to statistical language modeling. It then describes how to build a native and efficient Python interface (using SWIG) to the SRILM toolkit such that language models can be queried and used directly in Python code. Finally, it also demonstrates an effective use case of this interface by showing how to leverage it to build a Python language model server. Such a server can prove to be extremely useful when the language model needs to be queried by multiple clients over a network: the language model must only be loaded into memory once by the server and can then satisfy multiple requests. This article includes only those listings of source code that are most salient. To conserve space, some are only presented in excerpted form. The complete set of full source code listings may be found in Volume 1 of The Python Papers Source Codes Journal.
  • 关键词:Computer Science; Natural Language Processing; Python Programming
国家哲学社会科学文献中心版权所有