首页 | 本学科首页   官方微博 | 高级检索  
     

基于ResNet的音频场景声替换造假的检测算法
引用本文:董明宇,严迪群. 基于ResNet的音频场景声替换造假的检测算法[J]. 计算机应用, 2022, 42(6): 1724-1728. DOI: 10.11772/j.issn.1001-9081.2021061432
作者姓名:董明宇  严迪群
作者单位:宁波大学 信息科学与工程学院,浙江 宁波 315211
东南数字经济发展研究院,浙江 衢州 324000
基金项目:国家自然科学基金资助项目(U1736215,61901237);;浙江省自然科学基金资助项目(LY20F020010,LY17F020010);;宁波市自然科学基金资助项目(202003N4089)~~;
摘    要:针对造假成本低、不易察觉的音频场景声替换的造假样本检测问题,提出了基于ResNet的造假样本检测算法。该算法首先提取音频的常数Q频谱系数(CQCC)特征,之后由残差网络(ResNet)结构学习输入的特征,结合网络的多层的残差块以及特征归一化,最后输出分类结果。在TIMIT和Voicebank数据库上,所提算法的检测准确率最高可达100%,错误接收率最低仅为1.37%。在现实场景下检测由多种不同录音设备录制的带有设备本底噪声以及原始场景声音频,该算法的检测准确率最高可达99.27%。实验结果表明,在合适的模型下利用音频的CQCC特征来检测音频的场景替换痕迹是有效的。

关 键 词:音频造假  音频场景声替换  残差网络  常数Q频谱系数  
收稿时间:2021-08-10
修稿时间:2021-11-10

Detection algorithm of audio scene sound replacement falsification based on ResNet
Mingyu DONG,Diqun YAN. Detection algorithm of audio scene sound replacement falsification based on ResNet[J]. Journal of Computer Applications, 2022, 42(6): 1724-1728. DOI: 10.11772/j.issn.1001-9081.2021061432
Authors:Mingyu DONG  Diqun YAN
Affiliation:Faculty of Electrical Engineering and Computer Science,Ningbo University,Ningbo Zhejiang 315211,China
Southeast Digital Economic Development Institute,Quzhou Zhejiang 324000,China
Abstract:A ResNet-based faked sample detection algorithm was proposed for the detection of faked samples in audio scenes with low faking cost and undetectable sound replacement. The Constant Q Cepstral Coefficient (CQCC) features of the audio were extracted firstly, then the input features were learnt by the Residual Network (ResNet) structure, by combining the multi-layer residual blocks of the network and feature normalization, the classification results were output finally. On TIMIT and Voicebank databases, the highest detection accuracy of the proposed algorithm can reach 100%, and the lowest false acceptance rate of the algorithm can reach 1.37%. In realistic scenes, the highest detection accuracy of this algorithm is up to 99.27% when detecting the audios recorded by three different recording devices with the background noise of the device and the audio of the original scene. Experimental results show that it is effective to use the CQCC features of audio to detect the scene replacement trace of audio.
Keywords:audio falsification  audio scene sound replacement  Residual Network (ResNet)  Constant Q Cepstral Coefficient (CQCC)  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号