首页    期刊浏览 2024年11月24日 星期日
登录注册

文章基本信息

  • 标题:Enhancement of speech dynamics for voice activity detection using DNN
  • 本地全文:下载
  • 作者:Suci Dwijayanti ; Kei Yamamori ; Masato Miyoshi
  • 期刊名称:EURASIP Journal on Audio, Speech, and Music Processing
  • 印刷版ISSN:1687-4714
  • 电子版ISSN:1687-4722
  • 出版年度:2018
  • 卷号:2018
  • 期号:1
  • 页码:1-15
  • DOI:10.1186/s13636-018-0135-7
  • 出版社:Hindawi Publishing Corporation
  • 摘要:Voice activity detection (VAD) is an important preprocessing step for various speech applications to identify speech and non-speech periods in input signals. In this paper, we propose a deep neural network (DNN)-based VAD method for detecting such periods in noisy signals using speech dynamics, which are time-varying speech signals that may be expressed as the first- and second-order derivatives of mel cepstra, also known as the delta and delta-delta features. Unlike these derivatives, in this paper, the dynamics are highlighted by speech period candidates, which are calculated based on heuristic rules for the patterns of the first and second derivatives of the input signals. These candidates, together with the log power spectra, are input into the DNN to obtain VAD decisions. In this study, experiments are conducted to compare the proposed method with a DNN-based method, which exclusively utilizes log power spectra by using speech signals smeared with five types of noise (white, babble, factory, car, and pink) with signal-to-noise ratios (SNRs) of 10, 5, 0, and − 5 dB. The experimental results show that the proposed method is superior under all the considered noise conditions, indicating that the speech period candidates improve the log power spectra.
  • 关键词:Voice activity detection ; Dynamics ; Speech period candidates ; Deep neural network ;
国家哲学社会科学文献中心版权所有