基于深度学习的双耳声源定位算法研究 Binaural localization algorithm based on deep learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于深度学习的双耳声源定位算法研究

引用本文：	宋昊,刘雪洁,俞胜锋,钟小丽. 基于深度学习的双耳声源定位算法研究[J]. 声学技术, 2022, 41(4): 602-607

作者姓名：	宋昊刘雪洁俞胜锋钟小丽

作者单位：	广东工业大学管理学院, 广东广州 510000;华南师范大学物理与电信工程学院, 广东广州 510006;华南理工大学物理与光电学院, 广东广州 510640

基金项目：	广东省自然科学基金项目(2021A1515011871,2021A1515012630)

摘要：	针对多种定位因素存在复杂关联且不易准确提取的问题，提出了以完整双耳声信号作为输入的、基于深度学习的双耳声源定位算法。首先，分别采用深层全连接后向传播神经网络(Deep Back Propagation Neural Network，D-BPNN)和卷积神经网络(Convolutional Neural Network， CNN)实现深度学习框架；然后，分别以水平面 15°、30°和 45°空间角度间隔的双耳声信号进行模型训练；最后，采用前后混乱率、定位准确率与训练时长等指标进行算法有效性分析。模型预测结果表明，CNN模型的前后混乱率远低于 D-BPNN；D-BPNN模型的定位准确率能够达到87%以上，而 CNN模型的定位准确率能够达到 98%左右；在相同实验条件下，CNN模型的训练时长大于 D-BPNN，且随着水平面角度间隔的减小，两者训练时长之间的差异愈发显著。
关键词：	双耳声源定位深度学习卷积神经网络
收稿时间：	2021-03-01
修稿时间：	2021-05-04
Binaural localization algorithm based on deep learning

SONG Hao,LIU Xuejie,YU Shengfeng,ZHONG Xiaoli. Binaural localization algorithm based on deep learning[J]. Technical Acoustics, 2022, 41(4): 602-607

Authors:	SONG Hao LIU Xuejie YU Shengfeng ZHONG Xiaoli

Affiliation:	School of Management, Guangdong University of Technology, Guangzhou 510000, Guangdong, China;School of Physics and Telecommunication Engineering, South China Normal University, Guangzhou 510006, Guangdong, China;School of Physics and Optoelectronics, South China University of Technology, Guangzhou 510640, Guangdong, China

Abstract:	Due to existence of complicated relationships between multiple localization cues, which causes them hard to be extracted accurately, a deep learning-based binaural sound source localization algorithm with complete binaural sound signals as input is proposed. Firstly, the deep fully connected back propagation neural network (D-BPNN) and the convolutional neural network (CNN) are used to implement the deep learning framework respectively. And then, binaural sound source signals with uniform azimuthal spacing of 15°, 30° and 45° in horizontal plane are applied to model training respectively. Finally, indicators such as front-back confusion rate, localization accuracy and training duration are used to investigate effectiveness of the models. The model prediction results show that the front-back confusion rate of the CNN model is much lower than that of D-BPNN model. The localization accuracy of the DBPNN model can reach more than 87%, while the localization accuracy of the CNN model is about 98%. Under the same experimental conditions, the training time of CNN model is longer than that of D-BPNN model; Moreover, this difference in training time becomes more and more obviously as the azimuthal spacing in the horizontal plane decreases.

Keywords:	binaural localization algorithm deep learning convolutional neural network (CNN)

	点击此处可从《声学技术》浏览原始摘要信息
	点击此处可从《声学技术》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏