文章基本信息

标题：Prosodic Phrase Break Prediction: Problems in the Evaluation of Models against a Gold Standard
本地全文：下载
作者：Claire Brierley, Eric Atwell
期刊名称：Traitement Automatique des Langues
印刷版ISSN：1248-9433
电子版ISSN：1965-0906
出版年度：2007
卷号：48
期号：1
出版社：ATALA - Assoc Traitement Automatique Langues
摘要：The goal of automatic phrase break prediction is to identify prosodic-syntactic boundaries in text which correspond to the way a native speaker might process or chunk that same text as speech. This is treated as a classification task in machine learning and output predictions from language models are evaluated against a ‘gold standard’: human-labelled prosodic phrase break annotations in transcriptions of recorded speech - the speech corpus. Despite the introduction of rigorous metrics such as precision and recall, the evaluation of phrase break models is still problematic because prosody is inherently variable; morphosyntactic analysis and prosodic annotations for a given text are not representative of the range of parsing and phrasing strategies available to, and exhibited by, native speakers. This article recommends creating automatically-generated POS tagged and prosodically annotated variants of a text to enrich the gold standard and enable more robust ‘noise-tolerant’ evaluation of language models.