首页 | 本学科首页   官方微博 | 高级检索  
     


Audio-visual speaker diarization using fisher linear semi-discriminant analysis
Authors:Nikolaos Sarafianos  Theodoros Giannakopoulos  Sergios Petridis
Affiliation:1. Computational Intelligence Laboratory (CIL), Institute of Informatics and Telecommunications, National Center for Scientific Research “Demokritos”, Athens, Greece
Abstract:Speaker diarization aims to automatically answer the question “who spoke when” given a speech signal. In this work, we have focused on applying the FLsD approach, a semi-supervised version of Fisher Linear Discriminant analysis, both in the audio and the video signals to form a complete multimodal speaker diarization system. Extensive experiments have proven that the FLsD method boosts the performance of the face diarization task (i.e. the task of discovering faces over time given only the visual signal). In addition, we have proven through experimentation that applying the FLsD method for discriminating between faces is also independent of the initial feature space and remains relatively unaffected as the number of faces increases. Finally, a fusion method is proposed that leads to performance improvement in comparison to the best individual modality, which is the audio signal.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号