基于ResNet的音频场景声替换造假的检测算法 Detection algorithm of audio scene sound replacement falsification based on ResNet期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于ResNet的音频场景声替换造假的检测算法

引用本文：	董明宇,严迪群. 基于ResNet的音频场景声替换造假的检测算法[J]. 计算机应用, 2022, 42(6): 1724-1728. DOI: 10.11772/j.issn.1001-9081.2021061432

作者姓名：	董明宇严迪群

作者单位：	宁波大学信息科学与工程学院,浙江宁波 315211 东南数字经济发展研究院,浙江衢州 324000

基金项目：	国家自然科学基金资助项目（U1736215,61901237）;;浙江省自然科学基金资助项目（LY20F020010,LY17F020010）;;宁波市自然科学基金资助项目（202003N4089）~~;

摘要：	针对造假成本低、不易察觉的音频场景声替换的造假样本检测问题,提出了基于ResNet的造假样本检测算法。该算法首先提取音频的常数Q频谱系数（CQCC）特征,之后由残差网络（ResNet）结构学习输入的特征,结合网络的多层的残差块以及特征归一化,最后输出分类结果。在TIMIT和Voicebank数据库上,所提算法的检测准确率最高可达100%,错误接收率最低仅为1.37%。在现实场景下检测由多种不同录音设备录制的带有设备本底噪声以及原始场景声音频,该算法的检测准确率最高可达99.27%。实验结果表明,在合适的模型下利用音频的CQCC特征来检测音频的场景替换痕迹是有效的。
关键词：	音频造假音频场景声替换残差网络常数Q频谱系数
收稿时间：	2021-08-10
修稿时间：	2021-11-10
Detection algorithm of audio scene sound replacement falsification based on ResNet

Mingyu DONG,Diqun YAN. Detection algorithm of audio scene sound replacement falsification based on ResNet[J]. Journal of Computer Applications, 2022, 42(6): 1724-1728. DOI: 10.11772/j.issn.1001-9081.2021061432

Authors:	Mingyu DONG Diqun YAN

Affiliation:	Faculty of Electrical Engineering and Computer Science,Ningbo University,Ningbo Zhejiang 315211,China Southeast Digital Economic Development Institute,Quzhou Zhejiang 324000,China

Abstract:	A ResNet-based faked sample detection algorithm was proposed for the detection of faked samples in audio scenes with low faking cost and undetectable sound replacement. The Constant Q Cepstral Coefficient （CQCC） features of the audio were extracted firstly, then the input features were learnt by the Residual Network （ResNet） structure, by combining the multi-layer residual blocks of the network and feature normalization, the classification results were output finally. On TIMIT and Voicebank databases, the highest detection accuracy of the proposed algorithm can reach 100%, and the lowest false acceptance rate of the algorithm can reach 1.37%. In realistic scenes, the highest detection accuracy of this algorithm is up to 99.27% when detecting the audios recorded by three different recording devices with the background noise of the device and the audio of the original scene. Experimental results show that it is effective to use the CQCC features of audio to detect the scene replacement trace of audio.

Keywords:	audio falsification audio scene sound replacement Residual Network (ResNet) Constant Q Cepstral Coefficient (CQCC)

	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏