首页    期刊浏览 2025年07月09日 星期三
登录注册

文章基本信息

  • 标题:Using Finite-State Tranducer Theory for Representation on Very Large Scale Lexicons
  • 本地全文:下载
  • 作者:Matej Rojc ; Zdravko Kacic
  • 期刊名称:Informatica
  • 印刷版ISSN:1514-8327
  • 电子版ISSN:1854-3871
  • 出版年度:2004
  • 卷号:28
  • 期号:2
  • 页码:159-165
  • 出版社:The Slovene Society Informatika, Ljubljana
  • 摘要:In multilingual text-to-speech synthesis systems, many external extensive natural language resources are used, especially in the text processing part. Therefore it is very important that representation of these resources is time and space efficient. It is also very important that language resources for new languages can be easily incorporated into the system, without modifying the common algorithms devel-oped for multiple languages. In this regard the use of large external language resources represents an important problem because of the needed space and slow lookup-time. In the paper a method and re-sults of compiling large lexicons, with an example of compiling German phonetic and morphology lexi-cons (CISLEX), into corresponding finite-state transducers (FSTs) are presented. Each lexicon con-sisted of about 300.000 words. Representation of large lexicons using finite-state transducers is mainly motivated by considerations of space and time efficiency. For both lexicons a great reduction in size and optimal access time was achieved. The starting size for German phonetic lexicon was 12.53 MB and 18.49 MB for morphology lexicon. The final size of the corresponding FST was only 2.78 MB for the phonetic lexicon and 6.33 MB for the morphology lexicon. At the same time the look-up time is optimal, since it depends only on the length of the input word and not on the size of the lexicon. Using such rep-resentation, the integration of lexicons for new languages into the multilingual TTS system is easy and does not require any changes of algorithms that use such lexicons.
  • 关键词:finite-state transducers; natural language resources; multilingual text-to-speech synthesis; morphology; lexicons
国家哲学社会科学文献中心版权所有