This work presents a new phonetic transcription system
based on a tree of hierarchical pronunciation rules expressed as
context-specific grapheme-phoneme correspondences. The tree is
automatically inferred from a phonetic dictionary by incrementally
analyzing deeper context levels, eventually representing a minimum
set of exhaustive rules that pronounce without errors all the words in
the training dictionary and that can be applied to out-of-vocabulary
words.
The proposed approach improves upon existing rule-tree-based
techniques in that it makes use of graphemes, rather than letters,
as elementary orthographic units. A new linear algorithm for the
segmentation of a word in graphemes is introduced to enable outof-
vocabulary grapheme-based phonetic transcription.
Exhaustive rule trees provide a canonical representation of the
pronunciation rules of a language that can be used not only to
pronounce out-of-vocabulary words, but also to analyze and compare
the pronunciation rules inferred from different dictionaries. The
proposed approach has been implemented in C and tested on Oxford
British English and Basic English. Experimental results show that
grapheme-based rule trees represent phonetically sound rules and
provide better performance than letter-based rule trees.