基于双通道卷积门控循环网络的语音情感识别 Speech Emotion Recognition Based on Dual-Channel Convolutional Gated Recurrent Network期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于双通道卷积门控循环网络的语音情感识别

引用本文：	孙韩玉,黄丽霞,张雪英,李娟. 基于双通道卷积门控循环网络的语音情感识别[J]. 计算机工程与应用, 2023, 59(2): 170-177. DOI: 10.3778/j.issn.1002-8331.2107-0249

作者姓名：	孙韩玉黄丽霞张雪英李娟

作者单位：	太原理工大学信息与计算机学院，太原 030024

基金项目：	山西省回国留学人员科研教研资助项目（HGKY2019025）；

摘要：	为了构建高效的语音情感识别模型，充分利用不同情感特征所包含的信息，将语谱图特征和LLDs特征相结合，构建了一种基于自注意力机制的双通道卷积门控循环网络模型。同时，为了解决交叉熵损失函数无法增大语音情感特征类内紧凑性和类间分离性的问题，结合一致性相关系数提出新的损失函数——一致性相关损失（CCC-Loss）。将语谱图和LLDs特征分别输入CGRU模型提取深层特征并引入自注意力机制为关键时刻赋予更高的权重；使用CCC-Loss与交叉熵损失共同训练模型，CCC-Loss将不同类情感样本的一致性相关系数之和与同类情感样本的一致性相关系数之和的比值作为损失项，改善了样本特征的类内类间相关性，提高了模型的特征判别能力；将两个网络的分类结果进行决策层融合。所提出的方法在EMODB、RAVDESS以及CASIA数据库上分别取得了92.90%、88.54%以及90.58%的识别结果，相比于ACRNN、DSCNN等基线模型识别效果更好。
关键词：	语音情感识别卷积神经网络门控循环单元自注意力机制损失函数深度学习一致性相关系数
Speech Emotion Recognition Based on Dual-Channel Convolutional Gated Recurrent Network

SUN Hanyu,HUANG Lixia,ZHANG Xueying,LI Juan. Speech Emotion Recognition Based on Dual-Channel Convolutional Gated Recurrent Network[J]. Computer Engineering and Applications, 2023, 59(2): 170-177. DOI: 10.3778/j.issn.1002-8331.2107-0249

Authors:	SUN Hanyu HUANG Lixia ZHANG Xueying LI Juan

Affiliation:	College of Information and Computer, Taiyuan University of Technology, Taiyuan 030024, China

Abstract:	In order to build an efficient speech emotion recognition model, make full use of the information contained in different emotion features, a dual-channel convolutional gated recurrent network model based on the self-attention mechanism is constructed, which uses spectrogram features and LLDs features as the input. Simultaneously, in order to solve the problem that the cross-entropy loss function cannot increase the compactness and separation of the emotional characteristics of the speech, a new loss—CCC-Loss is proposed which is combined with the consistency correlation coefficient. First, the two features are separately input into the CGRU model to extract deep features and the self-attention mechanism is used to give higher weight to the key moments. Then, the model uses CCC-Loss and cross-entropy loss to train together. CCC-Loss calculates the ratio of the sum of consistency correlation coefficients of different types of emotional samples and of similar emotion samples and then uses it as the loss term, which improves the intra-class correlation of sample features and improves the feature discrimination ability of the model. Finally, the classification results of the two networks are used to achieve decision fusion. The proposed method has achieved 92.90%, 88.54% and 90.58% recognition results on the EMODB, RAVDESS and CASIA databases, which is better than baseline models such as ACRNN and DSCNN.

Keywords:	speech emotion recognition convolutional neural networks gate recurrent unit self-attention loss function deep learning consistency correlation coefficient

	点击此处可从《计算机工程与应用》浏览原始摘要信息
	点击此处可从《计算机工程与应用》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏