出版社:Vilnius University, University of Latvia, Latvia University of Agriculture, Institute of Mathematics and Informatics of University of Latvia
摘要:We outline the proposal for the doctoral research of heterogeneous statistical language model. The conventional language models can be classifi ed as homogeneous, because the usage of certain structures as the base language units (words or morphemes) in lexicon is fi xed. We, on the contrary, propose not to constrain the language structures to be used in the lexicon beforehand. Our model would allow the insertion of morphemes, words as well as multi-word expressions into the lexicon. For fi nding the optimal lexicon, we propose two criteria: the amount of language covered with the lexicon and the amount of structure of the language preserved. Such structure- preserving model will hopefully lead to better results of certain Natural Language Processing applications, like for example Automatic Speech Recognition or Machine Translation.
关键词:Statistical language modeling; Heterogeneous language model; Heterogeneous ; structure of language