Audio-visual speaker diarization using fisher linear semi-discriminant analysis期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Audio-visual speaker diarization using fisher linear semi-discriminant analysis

Authors:	Nikolaos Sarafianos Theodoros Giannakopoulos Sergios Petridis

Affiliation:	1. Computational Intelligence Laboratory (CIL), Institute of Informatics and Telecommunications, National Center for Scientific Research “Demokritos”, Athens, Greece

Abstract:	Speaker diarization aims to automatically answer the question “who spoke when” given a speech signal. In this work, we have focused on applying the FLsD approach, a semi-supervised version of Fisher Linear Discriminant analysis, both in the audio and the video signals to form a complete multimodal speaker diarization system. Extensive experiments have proven that the FLsD method boosts the performance of the face diarization task (i.e. the task of discovering faces over time given only the visual signal). In addition, we have proven through experimentation that applying the FLsD method for discriminating between faces is also independent of the initial feature space and remains relatively unaffected as the number of faces increases. Finally, a fusion method is proposed that leads to performance improvement in comparison to the best individual modality, which is the audio signal.

Keywords:
本文献已被 SpringerLink 等数据库收录！