首页 | 本学科首页   官方微博 | 高级检索  
     

全卷积循环神经网络的语音情感识别
引用本文:朱敏,姜芃旭,赵力. 全卷积循环神经网络的语音情感识别[J]. 声学技术, 2021, 40(5): 645-651
作者姓名:朱敏  姜芃旭  赵力
作者单位:常州信息职业技术学院电子工程学院, 江苏常州 213164;东南大学信息科学与工程学院, 江苏南京 210096
基金项目:国家自然科学基金项目(61673108、61571106)、国家重点研发计划(2018YFB1305203)。
摘    要:语音情感识别是人机交互的热门研究领域之一。然而,由于缺乏对语音中时频相关信息的研究,导致情感信息挖掘深度不够。为了更好地挖掘语音中的时频相关信息,提出了一种全卷积循环神经网络模型,采用并行多输入的方式组合不同模型,同时从两个模块中提取不同功能的特征。利用全卷积神经网络(Fully Convolutional Network,FCN)学习语音谱图特征中的时频相关信息,同时,利用长短期记忆(Long Short-Term Memory,LSTM)神经网络来学习语音的帧级特征,以补充模型在FCN学习过程中缺失的时间相关信息,最后,将特征融合后使用分类器进行分类,在两个公开的情感数据集上的测试验证了所提算法的优越性。

关 键 词:神经网络  语音情感  特征提取
收稿时间:2020-08-15
修稿时间:2020-12-21

Speech emotion recognition based on full convolution recurrent neural network
ZHU Min,JIANG Pengxu,ZHAO Li. Speech emotion recognition based on full convolution recurrent neural network[J]. Technical Acoustics, 2021, 40(5): 645-651
Authors:ZHU Min  JIANG Pengxu  ZHAO Li
Affiliation:School of Electronic Engineering, Changzhou College of Information Technology, Changzhou 213164, Jiangsu, China;School of Information Science and Engineering, Southeast University, Nanjing 210096, Jiangsu, China
Abstract:Speech emotion recognition is one of the hot research fields of human-computer interaction. However, lack of researches on speech time-frequency information leads to the insufficient depth of exploring emotional information. To better explore the time-frequency related information in speech, a novel fully convolutional recurrent neural network model is proposed, in which, the multi-input parallel model combination method is used to extract features of different functions from two modules. The fully convolutional network (FCN) is used to learn the time-frequency related information in the features of speech spectrogram, and long short-term memory neural network (LTSM) is used to learn the frame-level features of speech to supplement the missing time-dependent information during FCN learning. Finally, the features are fused and classified by classifier. Experiments on two public emotional data sets show the superiority of the proposed algorithm.
Keywords:neural network  speech emotion  feature extraction
点击此处可从《声学技术》浏览原始摘要信息
点击此处可从《声学技术》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号