Speech emotion recognition based on hierarchical attributes using feature nets期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Speech emotion recognition based on hierarchical attributes using feature nets

Authors:	Huijuan Zhao Ning Ye Ruchuan Wang

Affiliation:	1. College of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing, People's Republic of China;2. College of Computer and Software, Nanjing Institute of Industry Technology, Nanjing, People's Republic of Chinazhaohj86@126.com;4. College of Computer, Nanjing University of Posts and Telecommunications, Nanjing, People's Republic of China;5. Jiangsu High Technology Research Key Laboratory for Wireless Sensor Networks, Nanjing, People's Republic of China

Abstract:	Speech emotion recognition is a challenging topic and has many important applications in our real life, especially in terms of human-computer interaction. Traditional methods are based on the pipeline of pre-processing, feature extraction, dimensionality reduction and emotion classification. Previous studies have focussed on emotion recognition based on two different models: discrete model and continuous model. Both the speaker's age and gender affect the speech emotion recognition in the two models. Moreover, investigation results shown that the dimensional attributes of emotion such as arousal, valence and dominance are related to each other. Based on these observations, we propose a new attributes recognition model using Feature Nets, aims to improve the emotion recognition performance and generalisation capabilities. The method utilises the corpus to train the age and gender classification model, which will be transferred to the main model: a hierarchical deep learning model, using age and gender as the high level attributes of the emotion. The public databases EMO-DB and IEMOCAP have been conducted to evaluate the performance both in the classification task and regression task. Experiment results show that the proposed approach based on attributes transferring can improve the recognition accuracy, no matter transferring age or gender.

Keywords:	Speech emotion recognition multi-task learning transfer learning deep learning feature nets

设为首页 | 免责声明 | 关于勤云 | 加入收藏