摘要:This paper describes a method for estimating the internal
state of a user of a spoken dialog system before
his/her first input utterance. When actually using a
dialog-based system, the user is often perplexed by
the prompt. A typical system provides more detailed
information to a user who is taking time to make an
input utterance, but such assistance is nuisance if the
user is merely considering how to answer the prompt.
To respond appropriately, the spoken dialog system
should be able to consider the user’s internal state
before the user’s input. Conventional studies on user
modeling have focused on the linguistic information of
the utterance for estimating the user’s internal state,
but this approach cannot estimate the user’s state
until the end of the user’s first utterance. Therefore,
we focused on the user’s nonverbal output such as
fillers, silence, or head-moving until the beginning of
the input utterance. The experimental data was collected on a Wizard
of Oz basis, and the labels were decided by five evaluators.
Finally, we conducted a discrimination experiment
with the trained user model using combined
features. As a three-class discrimination result, we
obtained about 85% accuracy in an open test.