文章基本信息

标题：Multimodal Lip-Reading for Tracheostomy Patients in the Greek Language
本地全文：下载
作者：Yorghos Voutos ; Georgios Drakopoulos ; Georgios Chrysovitsiotis 等
期刊名称：Computers
电子版ISSN：2073-431X
出版年度：2022
卷号：11
期号：3
页码：34
DOI：10.3390/computers11030034
语种：English
出版社：MDPI Publishing
摘要：Voice loss constitutes a crucial disorder which is highly associated with social isolation. The use of multimodal information sources, such as, audiovisual information, is crucial since it can lead to the development of straightforward personalized word prediction models which can reproduce the patient’s original voice. In this work we designed a multimodal approach based on audiovisual information from patients before loss-of-voice to develop a system for automated lip-reading in the Greek language. Data pre-processing methods, such as, lip-segmentation and frame-level sampling techniques were used to enhance the quality of the imaging data. Audio information was incorporated in the model to automatically annotate sets of frames as words. Recurrent neural networks were trained on four different video recordings to develop a robust word prediction model. The model was able to correctly identify test words in different time frames with 95% accuracy. To our knowledge, this is the first word prediction model that is trained to recognize words from video recordings in the Greek language.