Multimodal information fusion application to human emotion recognition from face and speech |
| |
Authors: | Muharram Mansoorizadeh Nasrollah Moghaddam Charkari |
| |
Affiliation: | (1) Parallel and Image Processing Lab, Faculty of Electrical and Computer Engineering, Tarbiat Modares University, Tehran, Iran |
| |
Abstract: | A multimedia content is composed of several streams that carry information in audio, video or textual channels. Classification
and clustering multimedia contents require extraction and combination of information from these streams. The streams constituting
a multimedia content are naturally different in terms of scale, dynamics and temporal patterns. These differences make combining
the information sources using classic combination techniques difficult. We propose an asynchronous feature level fusion approach
that creates a unified hybrid feature space out of the individual signal measurements. The target space can be used for clustering
or classification of the multimedia content. As a representative application, we used the proposed approach to recognize basic
affective states from speech prosody and facial expressions. Experimental results over two audiovisual emotion databases with
42 and 12 subjects revealed that the performance of the proposed system is significantly higher than the unimodal face based
and speech based systems, as well as synchronous feature level and decision level fusion approaches. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|