首页 | 本学科首页   官方微博 | 高级检索  
     

基于多分辨率时频特征融合的声学场景分类
引用本文:姚琨,杨吉斌,张雄伟,郑昌艳,孙蒙.基于多分辨率时频特征融合的声学场景分类[J].声学技术,2020,39(4):494-500.
作者姓名:姚琨  杨吉斌  张雄伟  郑昌艳  孙蒙
作者单位:陆军工程大学, 江苏南京 210007
基金项目:国家自然科学基金(61471394)、江苏省优秀青年基金(BK20180080)资助项目。
摘    要:声学场景分类是计算机听觉中最难的任务之一,在单一特征条件下采用基本的卷积神经网络相对于传统的分类方法精度已经有所提升,但是效果依然不够理想。针对这一问题,在卷积神经网络框架下,提出了一种基于时频特征融合的声学场景分类方案。在分类模型构建方面,提出一种多分辨率卷积池化方案,构造多分辨率卷积神经网络,以更好地适应提取特征的时频结构;在特征选取方面,融合低层次包络特征对数——Mel子带能量和高层次结构特征——非负矩阵分解系数矩阵,把两种二维特征堆叠为三维特征送入分类模型。在2017年和2018年声学场景分类和事件检测挑战赛的开发数据集上进行了训练和测试。实验结果表明,文中提出方案比基线系统的分类精度分别提高7.5%和10.3%,可有效改善分类效果。

关 键 词:声学场景分类  多分辨率卷积神经网络  时频特征融合  时频结构  非负矩阵分解
收稿时间:2019/3/3 0:00:00
修稿时间:2019/4/30 0:00:00

Acoustic scene classification based on multi-resolution time-frequency feature fusion
YAO Kun,YANG Jibin,ZHANG Xiongwei,ZHENG Changyan,SUN Meng.Acoustic scene classification based on multi-resolution time-frequency feature fusion[J].Technical Acoustics,2020,39(4):494-500.
Authors:YAO Kun  YANG Jibin  ZHANG Xiongwei  ZHENG Changyan  SUN Meng
Affiliation:Army Engineering University, Nanjing 210007, Jiangsu, China
Abstract:Acoustic scene classification is one of the most difficult tasks in computer hearing. It is difficult to achieve good classification performance by using basic convolutional neural network structure under the condition of single feature. To solve this problem, this paper proposes an acoustic scene classification scheme based on time-frequency feature fusion and multi-resolution convolutional neural network. In the model design, a multi-resolution pooling scheme is adopted to construct a multi-resolution convolutional neural network, which can better adapt to the time-frequency structure of feature extraction. In the feature extraction, the Log Mel-band energies of low level envelope features and the non-negative matrix decomposition coefficient matrix of high level structure features are fused into three dimensional features to input the classification model. Training and testing are carried out on the development data sets of the acoustic scene classification and event detection challenge in 2017 and 2018. The experimental results show that the classification accuracy of the proposed scheme is 7.5% and 10.3% higher than the classification accuracy of the baseline system respectively.
Keywords:acoustic scene classification  multi-resolution convolutional neural network  time-frequency feature fusion  time-frequency structure  non-negative matrix factorization
本文献已被 CNKI 等数据库收录!
点击此处可从《声学技术》浏览原始摘要信息
点击此处可从《声学技术》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号