首页    期刊浏览 2024年11月26日 星期二
登录注册

文章基本信息

  • 标题:Dual supervised learning for non-native speech recognition
  • 本地全文:下载
  • 作者:Kacper Radzikowski ; Robert Nowak ; Le Wang
  • 期刊名称:EURASIP Journal on Audio, Speech, and Music Processing
  • 印刷版ISSN:1687-4714
  • 电子版ISSN:1687-4722
  • 出版年度:2019
  • 卷号:2019
  • 期号:1
  • 页码:1-10
  • DOI:10.1186/s13636-018-0146-4
  • 出版社:Hindawi Publishing Corporation
  • 摘要:Current automatic speech recognition (ASR) systems achieve over 90–95% accuracy, depending on the methodology applied and datasets used. However, the level of accuracy decreases significantly when the same ASR system is used by a non-native speaker of the language to be recognized. At the same time, the volume of labeled datasets of non-native speech samples is extremely limited both in size and in the number of existing languages. This problem makes it difficult to train or build sufficiently accurate ASR systems targeted at non-native speakers, which, consequently, calls for a different approach that would make use of vast amounts of large unlabeled datasets. In this paper, we address this issue by employing dual supervised learning (DSL) and reinforcement learning with policy gradient methodology. We tested DSL in a warm-start approach, with two models trained beforehand, and in a semi warm-start approach with only one of the two models pre-trained. The experiments were conducted on English language pronounced by Japanese and Polish speakers. The results of our experiments show that creating ASR systems with DSL can achieve an accuracy comparable to traditional methods, while simultaneously making use of unlabeled data, which obviously is much cheaper to obtain and comes in larger sizes.
  • 关键词:Speech recognition; Dual supervised learning; Reinforcement learning; Policy gradients; Non-native speaker; Machine learning; Deep learning; Artificial intelligence
国家哲学社会科学文献中心版权所有