摘要:Visuo-auditory sensory substitution devices transform a video stream into an audio stream to help visually impaired people in situations where spatial information is required, such as avoiding moving obstacles. In these particular situations, the latency between an event in the real world and its auditory transduction is of paramount importance. In this article, we describe an optimized software architecture for low-latency video-to-audio transduction using current mobile hardware. We explain step-by-step the required computations and we report the corresponding measured latencies. The whole latency is approximately 65 ms with a capture resolution of 160 × 120 at 30 frames-per-second and 1000 sonified pixels per frame.