A comparison of models for fusion of the auditory and visual sensors in speech perception期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

A comparison of models for fusion of the auditory and visual sensors in speech perception

Authors:	Jordi Robert-Ribes Jean-Luc Schwartz Pierre Escudier

Affiliation:	(1) Institut de la Communication Parlée, CNRS UA 368, INPG/ENSERG, Université Stendhal INPG, 46 Av. Félix Viallet, 38031 Grenoble Cedex 1, France

Abstract:	Though a large amount of psychological and physiological evidence of audio-visual integration in speech has been collected in the last 20 years, there is no agreement about the nature of the fusion process. We present the main experimental data, and describe the various models proposed in the literature, together with a number of studies in the field of automatic audiovisual speech recognition. We discuss these models in relation to general proposals arising from psychology in the field of intersensory interaction, or from the field of vision and robotics in the field of sensor fusion. Then we examine the characteristics of four main models, in the light of psychological data and formal properties, and we present the results of a modelling study on audio-visual recognition of French vowels in noise. We conclude in favor of the relative superiority of a model in which the auditory and visual inputs are projected and fused in a common representation space related to motor properties of speech objects, the fused representation being further classified for lexical access.

Keywords:	audiovisual speech perception sensor fusion noisy speech recognition intersensory interactions nowel processing
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏