首页 | 本学科首页   官方微博 | 高级检索  
     


Prosody modification for speech recognition in emotionally mismatched conditions
Authors:Vishnu Vidyadhara Raju Vegesna  Krishna Gurugubelli  Anil kumar Vuppala
Affiliation:1.Speech Processing Lab , KCIS,International Institute of Information Technology, Hyderabad (IIIT-H),Hyderabad,India
Abstract:A degradation in the performance of automatic speech recognition systems (ASR) is observed in mismatched training and testing conditions. One of the reasons for this degradation is due to the presence of emotions in the speech. The main objective of this work is to improve the performance of ASR in the presence of emotional conditions using prosody modification. The influence of different emotions on the prosody parameters is exploited in this work. Emotion conversion methods are employed to generate the word level non-uniform prosody modified speech. Modification factors for prosodic components such as pitch, duration and energy are used. The prosody modification is done in two ways. Firstly, emotion conversion is done at the testing stage to generate the neutral speech from the emotional speech. Secondly, the ASR is trained with the generated emotional speech from the neutral speech. In this work, the presence of emotions in speech is studied for the Telugu ASR systems. A new database of IIIT-H Telugu speech corpus is collected to build the large vocabulary neutral Telugu speech ASR system. The emotional speech samples from IITKGP-SESC Telugu corpus are used for testing it. The emotions of anger, happiness and compassion are considered during the evaluation. An improvement in the performance of ASR systems is observed in the prosody modified speech.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号