期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2009
卷号:2009
出版社:ACL Anthology
摘要:In this paper, we present an efficient query
selection algorithm for the retrieval of web
text data to augment a statistical language
model (LM). The number of retrieved relevant
documents is optimized with respect
to the number of queries submitted.
The querying scheme is applied in the domain
of SMS text messages. Continuous
speech recognition experiments are conducted
on three languages: English, Spanish,
and French. The web data is utilized
for augmenting in-domain LMs in general
and for adapting the LMs to a user-specific
vocabulary. Word error rate reductions
of up to 6.6% (in LM augmentation) and
26.0% (in LM adaptation) are obtained in
setups, where the size of the web mixture
LM is limited to the size of the baseline
in-domain LM.