首页    期刊浏览 2024年11月24日 星期日
登录注册

文章基本信息

  • 标题:Emotional voice conversion using neural networks with arbitrary scales F0 based on wavelet transform
  • 本地全文:下载
  • 作者:Zhaojie Luo ; Jinhui Chen ; Tetsuya Takiguchi
  • 期刊名称:EURASIP Journal on Audio, Speech, and Music Processing
  • 印刷版ISSN:1687-4714
  • 电子版ISSN:1687-4722
  • 出版年度:2017
  • 卷号:2017
  • 期号:1
  • 页码:1-13
  • DOI:10.1186/s13636-017-0116-2
  • 出版社:Hindawi Publishing Corporation
  • 摘要:An artificial neural network is an important model for training features of voice conversion (VC) tasks. Typically, neural networks (NNs) are very effective in processing nonlinear features, such as Mel Cepstral Coefficients (MCC), which represent the spectrum features. However, a simple representation of fundamental frequency (F0) is not enough for NNs to deal with emotional voice VC. This is because the time sequence of F0 for an emotional voice changes drastically. Therefore, in our previous method, we used the continuous wavelet transform (CWT) to decompose F0 into 30 discrete scales, each separated by one third of an octave, which can be trained by NNs for prosody modeling in emotional VC. In this study, we propose the arbitrary scales CWT (AS-CWT) method to systematically capture F0 features of different temporal scales, which can represent different prosodic levels ranging from micro-prosody to sentence levels. Meanwhile, the proposed method uses deep belief networks (DBNs) to pre-train the NNs that then convert spectral features. By utilizing these approaches, the proposed method can change the spectrum and the F0 for an emotional voice simultaneously as well as outperform other state-of-the-art methods in terms of emotional VC.
  • 关键词:F0 features ; Continuous wavelet transform ; Neural networks ; Deep belief networks ; Emotional voice conversion ;
国家哲学社会科学文献中心版权所有