基于多分辨率时频特征融合的声学场景分类 Acoustic scene classification based on multi-resolution time-frequency feature fusion期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于多分辨率时频特征融合的声学场景分类

引用本文：	姚琨,杨吉斌,张雄伟,郑昌艳,孙蒙.基于多分辨率时频特征融合的声学场景分类[J].声学技术,2020,39(4):494-500.

作者姓名：	姚琨杨吉斌张雄伟郑昌艳孙蒙

作者单位：	陆军工程大学, 江苏南京 210007

基金项目：	国家自然科学基金（61471394）、江苏省优秀青年基金（BK20180080）资助项目。

摘要：	声学场景分类是计算机听觉中最难的任务之一,在单一特征条件下采用基本的卷积神经网络相对于传统的分类方法精度已经有所提升,但是效果依然不够理想。针对这一问题,在卷积神经网络框架下,提出了一种基于时频特征融合的声学场景分类方案。在分类模型构建方面,提出一种多分辨率卷积池化方案,构造多分辨率卷积神经网络,以更好地适应提取特征的时频结构;在特征选取方面,融合低层次包络特征对数——Mel子带能量和高层次结构特征——非负矩阵分解系数矩阵,把两种二维特征堆叠为三维特征送入分类模型。在2017年和2018年声学场景分类和事件检测挑战赛的开发数据集上进行了训练和测试。实验结果表明,文中提出方案比基线系统的分类精度分别提高7.5%和10.3%,可有效改善分类效果。
关键词：	声学场景分类多分辨率卷积神经网络时频特征融合时频结构非负矩阵分解
收稿时间：	2019/3/3 0:00:00
修稿时间：	2019/4/30 0:00:00
Acoustic scene classification based on multi-resolution time-frequency feature fusion

YAO Kun,YANG Jibin,ZHANG Xiongwei,ZHENG Changyan,SUN Meng.Acoustic scene classification based on multi-resolution time-frequency feature fusion[J].Technical Acoustics,2020,39(4):494-500.

Authors:	YAO Kun YANG Jibin ZHANG Xiongwei ZHENG Changyan SUN Meng

Affiliation:	Army Engineering University, Nanjing 210007, Jiangsu, China

Abstract:	Acoustic scene classification is one of the most difficult tasks in computer hearing. It is difficult to achieve good classification performance by using basic convolutional neural network structure under the condition of single feature. To solve this problem, this paper proposes an acoustic scene classification scheme based on time-frequency feature fusion and multi-resolution convolutional neural network. In the model design, a multi-resolution pooling scheme is adopted to construct a multi-resolution convolutional neural network, which can better adapt to the time-frequency structure of feature extraction. In the feature extraction, the Log Mel-band energies of low level envelope features and the non-negative matrix decomposition coefficient matrix of high level structure features are fused into three dimensional features to input the classification model. Training and testing are carried out on the development data sets of the acoustic scene classification and event detection challenge in 2017 and 2018. The experimental results show that the classification accuracy of the proposed scheme is 7.5% and 10.3% higher than the classification accuracy of the baseline system respectively.

Keywords:	acoustic scene classification multi-resolution convolutional neural network time-frequency feature fusion time-frequency structure non-negative matrix factorization
本文献已被 CNKI 等数据库收录！
	点击此处可从《声学技术》浏览原始摘要信息
	点击此处可从《声学技术》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏