首页 | 本学科首页   官方微博 | 高级检索  
     


A comparison of models for fusion of the auditory and visual sensors in speech perception
Authors:Jordi Robert-Ribes  Jean-Luc Schwartz  Pierre Escudier
Affiliation:(1) Institut de la Communication Parlée, CNRS UA 368, INPG/ENSERG, Université Stendhal INPG, 46 Av. Félix Viallet, 38031 Grenoble Cedex 1, France
Abstract:Though a large amount of psychological and physiological evidence of audio-visual integration in speech has been collected in the last 20 years, there is no agreement about the nature of the fusion process. We present the main experimental data, and describe the various models proposed in the literature, together with a number of studies in the field of automatic audiovisual speech recognition. We discuss these models in relation to general proposals arising from psychology in the field of intersensory interaction, or from the field of vision and robotics in the field of sensor fusion. Then we examine the characteristics of four main models, in the light of psychological data and formal properties, and we present the results of a modelling study on audio-visual recognition of French vowels in noise. We conclude in favor of the relative superiority of a model in which the auditory and visual inputs are projected and fused in a common representation space related to motor properties of speech objects, the fused representation being further classified for lexical access.
Keywords:audiovisual speech perception  sensor fusion  noisy speech recognition  intersensory interactions  nowel processing
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号