摘要:To recognize stress and emotion, most of the existing methods only observe and analyze speech patterns from present-time features. However, an emotion (especially for stress) can change because it was triggered by an event while speaking. To address this issue, we propose a novel method for predicting stress and emotions by analyzing prior emotional states. We named this method the deep time-delay Markov network (DTMN). Structurally, the proposed DTMN contains a hidden Markov model (HMM) and a time-delay neural network (TDNN). We evaluated the effectiveness of the proposed DTMN by comparing it with several state transition methods in predicting an emotional state from time-series (sequences) speech data of the SUSAS dataset. The experimental results show that the proposed DTMN can accurately predict present emotional states by outperforming the baseline systems in terms of the prediction error rate (PER). We then modeled the emotional state transition using a finite Markov chain based on the prediction result. We also conducted an ablation experiment to observe the effect of different HMM values and TDNN parameters on the prediction result and the computational training time of the proposed DTMN.
其他摘要:Abstract To recognize stress and emotion, most of the existing methods only observe and analyze speech patterns from present-time features. However, an emotion (especially for stress) can change because it was triggered by an event while speaking. To address this issue, we propose a novel method for predicting stress and emotions by analyzing prior emotional states. We named this method the deep time-delay Markov network (DTMN). Structurally, the proposed DTMN contains a hidden Markov model (HMM) and a time-delay neural network (TDNN). We evaluated the effectiveness of the proposed DTMN by comparing it with several state transition methods in predicting an emotional state from time-series (sequences) speech data of the SUSAS dataset. The experimental results show that the proposed DTMN can accurately predict present emotional states by outperforming the baseline systems in terms of the prediction error rate (PER). We then modeled the emotional state transition using a finite Markov chain based on the prediction result. We also conducted an ablation experiment to observe the effect of different HMM values and TDNN parameters on the prediction result and the computational training time of the proposed DTMN.