首页 | 官方网站   微博 | 高级检索  
     

美尔谱系数与卷积神经网络相组合的环境声音识别方法
引用本文:刘亚荣,黄昕哲,谢晓兰,刘鑫.美尔谱系数与卷积神经网络相组合的环境声音识别方法[J].信号处理,2020,36(6):1020-1028.
作者姓名:刘亚荣  黄昕哲  谢晓兰  刘鑫
作者单位:桂林理工大学广西嵌入式技术与智能系统重点实验室
基金项目:国家自然科学基金资助项目(61762031);国家自然科学基金(61961010);广西创新驱动重大项目(AA19046004);广西重点研发项目(桂科AB17195029,桂科AB18126006);广西嵌入式技术与智能系统重点实验室开放基金资助(编号:2019-02-03)
摘    要:通过对复杂环境下声音识别技术进行研究,本文提出了美尔谱系数(MFSC)与卷积神经网络(CNN)相组合的环境声音识别方法。对声音事件提取其MFSC特征,将特征参数作为输入送入设计好的CNN模型中对声音事件进行分类。实验数据集采用ESC-10,将构建的卷积神经网络模型与随机森林、支持向量机(SVM)、深度神经网络(DNN)及DCASE比赛中常用的三种识别模型进行对比实验。实验结果表明,在相同数据集下,本文所设计的美尔谱系数与卷积神经网络相组合的环境声音识别方法相较传统的声音识别方法在识别率上分别有13.1%,18.3%,15.7%的提升,相较于DCASE比赛中的三种常用识别模型,本文所设计识别模型识别率及识别效率均有明显的优势。

关 键 词:卷积神经网络  美尔谱系数  环境声音识别
收稿时间:2020-03-17

Environmental Sound Recognition Method Combining Meir Spectral Coefficients and Convolutional Neural network
Liu Yarong,Huang Xinzhe,Xie Xiaolan,Liu Xin.Environmental Sound Recognition Method Combining Meir Spectral Coefficients and Convolutional Neural network[J].Signal Processing,2020,36(6):1020-1028.
Authors:Liu Yarong  Huang Xinzhe  Xie Xiaolan  Liu Xin
Affiliation:Guangxi Key Laboratory of Embedded Technology and Intelligent System, Guilin University of TechnologyCollege of Information Science and Engineering, Guilin University of Technology
Abstract:Through research on sound recognition technology in complex environments, this paper proposes an environmental sound recognition method that combines the Merrill Spectrum Coefficient (MFSC) and Convolutional Neural Network (CNN). The MFSC features of sound events are extracted, and the feature parameters are used as input to the designed CNN model to classify the sound events. The experimental data set uses ESC-10 to compare the constructed convolutional neural network model with three recognition models commonly used in random forest, support vector machine (SVM), deep neural network (DNN) and DCASE competitions. The experimental results show that, under the same data set, the environmental sound recognition method combining the Meir spectral coefficients and the convolutional neural network designed in this paper has a recognition rate of 13.1%, 18.3%, and 15.7, respectively, compared with the traditional sound recognition method. Compared with the three commonly used recognition models in the DCASE competition, the recognition rate and recognition efficiency of the recognition model designed in this paper have obvious advantages. 
Keywords:convolutional neural network  Mel-spectral coefficient  ambient sound recognition
本文献已被 维普 等数据库收录!
点击此处可从《信号处理》浏览原始摘要信息
点击此处可从《信号处理》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号