首页    期刊浏览 2024年12月03日 星期二
登录注册

文章基本信息

  • 标题:Multimodal Lip-Reading for Tracheostomy Patients in the Greek Language
  • 本地全文:下载
  • 作者:Yorghos Voutos ; Georgios Drakopoulos ; Georgios Chrysovitsiotis
  • 期刊名称:Computers
  • 电子版ISSN:2073-431X
  • 出版年度:2022
  • 卷号:11
  • 期号:3
  • 页码:34
  • DOI:10.3390/computers11030034
  • 语种:English
  • 出版社:MDPI Publishing
  • 摘要:Voice loss constitutes a crucial disorder which is highly associated with social isolation. The use of multimodal information sources, such as, audiovisual information, is crucial since it can lead to the development of straightforward personalized word prediction models which can reproduce the patient’s original voice. In this work we designed a multimodal approach based on audiovisual information from patients before loss-of-voice to develop a system for automated lip-reading in the Greek language. Data pre-processing methods, such as, lip-segmentation and frame-level sampling techniques were used to enhance the quality of the imaging data. Audio information was incorporated in the model to automatically annotate sets of frames as words. Recurrent neural networks were trained on four different video recordings to develop a robust word prediction model. The model was able to correctly identify test words in different time frames with 95% accuracy. To our knowledge, this is the first word prediction model that is trained to recognize words from video recordings in the Greek language.
国家哲学社会科学文献中心版权所有