首页 | 本学科首页   官方微博 | 高级检索  
     

基于多尺度注意力特征融合CRNN的声音事件检测
引用本文:刘亚灵,郭敏,马苗.基于多尺度注意力特征融合CRNN的声音事件检测[J].光电子.激光,2021,32(12):1271-1277.
作者姓名:刘亚灵  郭敏  马苗
作者单位:陕西师范大学计算机科学学院,陕西西安710119
基金项目:国家自然科学基金(61877038)和陕西师范大学中央高校基本科研业务费专项资金(GK202105006)资助项目 (陕西师范大学 计算机科学学院,陕西 西安 710119)
摘    要:针对声音事件检测中仅在时频维度使用注意力机制的局限性以及卷积层单一导致的 特征提取不足问题,本文提出基于多尺度注意力特征融合的卷积循环神经网络(convolutional recurrent neural network,CRNN)模型,以提高声音事件检测性能。首 先,提出多尺度注意力模块,实现对局部时频单元和全局通道特征的多尺度注意,提高模型 的特征选择能力;其次,提出一种多尺度特征融合方法,融合含有丰富上下文信息的多尺度 注意力特征,提高模型的特征表达能力;最后,双向门控循环网络层对时间依赖性进行建模 , 全连接层对声音事件进行逐帧分类。除此之外,使用数据平衡技术进一步泛化模型。在 AudioSet子数据集上的实验结果表明:提出的网络模型与CRNN相比,评估集(error rate, ER)下降 11%,F1分数 (F1-score, F1)提升8.3%,有效地提高了声音事件检测性能。

关 键 词:声音事件检测  多尺度特征融合  注意力机制  数据平衡
收稿时间:2021/4/13 0:00:00

CRNN with multi-scale attention feature fusion for sound event detection
LIU Yaling,GUO Min and MA Miao.CRNN with multi-scale attention feature fusion for sound event detection[J].Journal of Optoelectronics·laser,2021,32(12):1271-1277.
Authors:LIU Yaling  GUO Min and MA Miao
Abstract:Aiming at the limitation of using attention mechanism only on the time -frequency dimension and the insufficient feature extraction caused by the single convoluti onal layer in sound event detection,a convolutional recurrent neural network (CRNN) model wit h multi-scale attention feature fusion is proposed to improve the sound event det ection performance.Firstly,a multi-scale attention module is proposed,which pays mu lti-scale attention to the local time-frequency units and global channel features and imp roves the feature selection ability of the network.Secondly,a multi-scale feature fusio n method is proposed.The method is used to fuse multi-scale attention feature with rich co ntext information and improve the feature expression ability of the network.Finally, the bidirectional gated recurrent units layer is used to model the time dependence a nd the fully connected layer is used to achieve sound events sort frame by frame.In addition ,data balancing technique is used to further generalize the model.The experimental re sults on a sub-dataset of Audio Set show that,compared with CRNN,the proposed network mo del makes error rate (ER) reduce by 11% and F1-score (F1) increase by 8.3% of the evaluation dataset,which effectively improves the performance of sound event detection.
Keywords:sound event detection  multi-scale feature fusion  attention mechan ism  data balancing
本文献已被 万方数据 等数据库收录!
点击此处可从《光电子.激光》浏览原始摘要信息
点击此处可从《光电子.激光》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号