基于多维神经网络深度特征融合的鸟鸣识别算法 Deep Feature Fusion of Multi-Dimensional Neural Network for Bird Call Recognition期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于多维神经网络深度特征融合的鸟鸣识别算法

引用本文：	吉训生,江昆,谢捷.基于多维神经网络深度特征融合的鸟鸣识别算法[J].信号处理,2022,38(4):844-853.

作者姓名：	吉训生江昆谢捷

作者单位：	1.江南大学物联网工程学院，江苏无锡 214122

基金项目：	国家自然科学基金61902154中央大学基础研究基金JUSRP11924江苏省自然科学基金BK2019043526江苏省重点研发项目-现代农业BE2018334

摘要：	为了进一步提高夜间迁徙鸟鸣监测的准确率，提出一种基于多维神经网络深度特征融合的鸟鸣识别算法。首先，提取鸟鸣对数尺度的梅尔谱图作为VGG Style模型的训练特征，增强时频谱图的能量分布，通过Mix up数据混合生成虚拟数据以减少模型的过拟合。之后，将预训练的VGG Style作为特征提取器对每一段鸟鸣提取深度特征。鉴于不同维度模型的互补性，该文提出分别使用1维CNN-LSTM、2维VGG Style与3维DenseNet121模型作为特征提取器生成高级特征。对于1维CNN-LSTM，使用小波分解作为池化方法，分别对鸟鸣时、频域进行9层小波分解，生成多层LBP特征以获取更丰富的时频信息。最后，对CNN-LSTM与DenseNet121的全连接层进行优化，减少模型参数，提高实时性。实验结果表明，通过融合多维神经网络的深度特征，使用浅层分类器在含有43种鸟类的CLO-43SD数据集中，获得了93.89%的平衡准确率，相较于最新的Mel-VGG与Subnet-CNN融合模型，平衡准确率提高了7.58%。
关键词：	鸟鸣识别 1维 CNN-LSTM 2维 VGG Style 3维 DenseNet121 深度特征融合
收稿时间：	2021-08-03
Deep Feature Fusion of Multi-Dimensional Neural Network for Bird Call Recognition

Affiliation:	1.School of Internet of Things Engineering，Jiangnan University，Wuxi，Jiangsu 214122，China2.Key Laboratory of Advanced Process Control for Light Industry （Ministry of Education），Jiangnan University，Wuxi，Jiangsu 214122，China3.Jiangsu Key Laboratory of Advanced Food Manufacturing Equipment and Technology，Jiangnan University，Wuxi，Jiangsu 214122，China

Abstract:	In order to improve the accuracy of bird sound monitoring during night migration， this paper proposed a deep feature fusion system of multi-dimensional neural network for bird sound classification. Firstly， we proposed an improved VGG Style model， which used log-scaled Mel spectrogram as training feature to enhance the energy distribution of spectrogram， and generate virtual data by mix up to reduce model over-fitting. Then， the pre-trained VGG Style was used to generate deep features for each bird sound. In view of the complementarity of different dimensional models， 1D CNN-LSTM， 2D VGG Style and 3D DenseNet121 were employed as feature extractors to generate advanced features. For 1D CNN-LSTM， in order to obtain richer time-frequency information， the wavelet decomposition was used as pooling method to extract multi-level LBP features from time domain and frequency domain respectively as training input. Meanwhile， the fully connected layer of CNN-LSTM and DenseNet121 were optimized to reduce model parameters and improve real-time performance. Finally， the deep features of three models were fused and fed to K-nearest neighbor for classification， which got the balanced-accuracy of 93.89% for a public dataset CLO-43SD of 5428 flight calls spanning 43 species and exceeded the latest fusion of Mel-VGG and Subnet-CNN by 7.58%.

Keywords:

	点击此处可从《信号处理》浏览原始摘要信息
	点击此处可从《信号处理》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏