首页    期刊浏览 2024年10月03日 星期四
登录注册

文章基本信息

  • 标题:Dynamically localizing multiple speakers based on the time-frequency domain
  • 本地全文:下载
  • 作者:Hodaya Hammer ; Shlomo E. Chazan ; Jacob Goldberger
  • 期刊名称:EURASIP Journal on Audio, Speech, and Music Processing
  • 印刷版ISSN:1687-4714
  • 电子版ISSN:1687-4722
  • 出版年度:2021
  • 卷号:2021
  • 期号:1
  • 页码:1
  • DOI:10.1186/s13636-021-00203-w
  • 出版社:Hindawi Publishing Corporation
  • 摘要:In this study, we present a deep neural network-based online multi-speaker localization algorithm based on a multi-microphone array. Following the W-disjoint orthogonality principle in the spectral domain, time-frequency (TF) bin is dominated by a single speaker and hence by a single direction of arrival (DOA). A fully convolutional network is trained with instantaneous spatial features to estimate the DOA for each TF bin. The high-resolution classification enables the network to accurately and simultaneously localize and track multiple speakers, both static and dynamic. Elaborated experimental study using simulated and real-life recordings in static and dynamic scenarios demonstrates that the proposed algorithm significantly outperforms both classic and recent deep-learning-based algorithms. Finally, as a byproduct, we further show that the proposed method is also capable of separating moving speakers by the application of the obtained TF masks.
  • 关键词:DOA ; UNET ; Tracking
国家哲学社会科学文献中心版权所有