在分析篡改音频特征变化的基础上,提出了一种语音被动取证方法。采用语音的美尔倒谱域参数及其动态特征参数和小波域统计矩特征来建立模型,并选取支持向量机(SVM)作为分类器以寻找最优分类平面,实现对可疑语音信号真实性的盲取证。实验结果表明,该方法对语音片段的删除、剪接和替换等改变语音内容真实性的篡改操作能够达到较高的检测准确率。  相似文献   

针对流行音乐中人声的发现问题,使用SVM分类器针对MFCC特征进行训练和分类。依据音频特征的连续性,后期对分类结果进行低通滤波。实验结果表明,该方法在帧层面上的识别率可以达到85.76%。实验中也发现不同语种的演唱者在发音上,特别是在MFCC特征上存在很大的统计差异性。实验中对歌曲分类的结果可以作为近一步实现音乐相似性度量的依据之一。  相似文献   

林晓丹  邱应强 《计算机应用》2019,39(12):3510-3514
语音变调常用于掩盖说话人身份,各种变声软件的出现使得说话人身份伪装变得更加容易。针对现有变调语音检测方法无法判断语音是经过了何种变调操作(升调或降调)的问题,通过分析语音变调在信号频谱,尤其是高频区域留下的痕迹,提出了基于翻转梅尔倒谱系数(IMFCC)统计矩特征的电子变调语音检测方法。首先,提取各语音帧IMFCC及其一阶差分;然后,计算其统计均值;最后,在该统计特征上利用支持向量机(SVM)多分类器的设计来区分原始语音、升调语音和降调语音。在TIMIT和NIST语音集上的实验结果表明,所提方法无论对于原始语音、升调语音还是降调语音都具有良好的检测性能。与MFCC作为特征构造的基线系统相比,所设计的特征的方法明显提高了变调操作的识别率。在较少的训练资源的情况下,所提方法也获得了比基于卷积神经网络(CNN)的框架更好的性能;此外,在不同数据集和不同变调方法上也都取得了较好的泛化性能。  相似文献   

林朗  王让定  严迪群  李璨 《计算机应用》2018,38(6):1648-1652
随着语音技术的发展,以回放语音为代表的各种仿冒语音给声纹认证系统及音频取证技术带来了极大挑战。针对回放语音对声纹认证系统的攻击问题,提出一种基于修正倒谱特征的检测算法。首先,采用变异系数来分析原始语音和回放语音在频域上的差异;然后,有针对性地将提取梅尔倒谱系数(MFCC)过程中的Mel滤波器组换成由linear滤波器和逆Mel滤波器组合的新滤波器组,进而得到基于新滤波器组的修正倒谱特征;最后,使用高斯混合模型(GMM)作为分类器进行分类判别。实验结果表明,修正的倒谱特征能够有效地检测回放语音,其等错误率约为3.45%。  相似文献   

王忠民  刘戈  宋辉 《计算机工程》2019,45(8):248-254
在语音情感识别中提取梅尔频率倒谱系数(MFCC)会丢失谱特征信息,导致情感识别准确率较低。为此,提出一种结合MFCC和语谱图特征的语音情感识别方法。从音频信号中提取MFCC特征,将信号转换为语谱图,利用卷积神经网络提取图像特征。在此基础上,使用多核学习算法融合音频特征,并将生成的核函数应用于支持向量机进行情感分类。在2种语音情感数据集上的实验结果表明,与单一特征的分类器相比,该方法的语音情感识别准确率高达96%。  相似文献   

为实现对腭裂高鼻音等级的自动识别,通过对语音信号小波处理和特征提取方法的综合研究,提出基于小波分解系数倒谱特征的腭裂高鼻音等级自动识别算法。目前,研究人员对腭裂语音的研究多基于MFCC、Teager能量、香农能量等特征,识别正确率偏低,且计算量过大。文中对4种等级腭裂高鼻音的1789个元音\a\语音数据提取小波分解系数倒谱特征参数,使用KNN分类器对4种不同等级的高鼻音进行自动识别,将识别结果与MFCC、LPCC、基音周期、共振峰和短时能量共5种经典声学特征的识别结果作比较,同时使用SVM分类器对不同等级的腭裂高鼻音进行自动识别,并与KNN分类器进行对比。实验结果表明,基于小波分解系数倒谱特征的识别结果优于经典声学特征,且KNN分类器的识别结果优于SVM分类器。小波分解系数倒谱特征在KNN中的识别率最高达到91.67%,在SVM中达到87.60%,经典声学特征在KNN分类器中的识别率为21.69%~84.54%,在SVM中的识别率为30.61%~78.24%。  相似文献   

重音是语言交流中不可或缺的部分,在语言交流中扮演着非常重要的角色。本文基于ASCCD朗读语篇语料库,使用MFCC算法提取每个语音段的融合上下文子段拼接短时谱信息,构建基于MFCC算法的上下文短时谱特征集;并选用NaiveBayes分类器对这类特征集进行建模,而且将具有最大后验概率的类作为该对象所属的类,这种分类方法充分利用了当前语音段的相关语音特性;融合上下文的MFCC短时谱特征组在ASCCD上能够得到83.6%的汉语重音检测正确率。实验结果证明,融合上下文子段拼接特征规整方法可以用于汉语重音检测研究中。  相似文献   

基于多邻域统计矩直方图的彩色图像检索   总被引:7,自引:0,他引:7  
提出一种创新的基于多邻域统计矩直方图方法(MNSMH),该方法在量化HSV颜色模型下,计算每个像素点不同邻域的统计矩,对每个邻域统计矩,计算它的归一化直方图.以这些直方图和颜色直方图一起作为图像的特征索引进行彩色图像检索.这些不同邻域矩反映了图像颜色的空间分布信息,而它们的直方图又是对整个图像的全局统计,对图像的平移、旋转和尺度不变.实验结果表明,该方法性能稳定,与两种基于颜色直方图方法相比,能够明显地提高检索率.  相似文献   

为提高复杂噪声环境下语音信号端点检测的准确率,提出一种基于梅尔频谱倒谱系数(MFCC)距离的多维特征语音信号端点检测算法。通过计算语音信号的MFCC距离,结合短时能量和短时过零率对特征距离进行修正,并更新其阈值,建立自适应噪声模型,实现复杂噪声中语音信号端点的准确检测。实验结果表明,与基于双门限能量和基于倒谱距离的2种经典检测算法相比,在计算效率相同的条件下,该算法的检测准确率更高。  相似文献   

基于小波变分辨率频谱特征的静音检测   总被引:1,自引:0,他引:1  
薛卫  都思丹  叶迎宪 《计算机工程》2009,35(13):232-233
针对静音检测提出基于小波变分辨率频谱特征的检测算法。算法采用多门限过零率对静音进行初判,并提取多个语音感觉特征与基于小波变分辨率频谱的Mel频率倒谱系数(MFCC)组合成语音特征,通过二分类支持向量机对该特征进行分类实现静音检测。测试结果表明,该算法在不同信噪比下语音识别正确率高于G.729b,MFCC特征静音检测算法,基于该算法的视频会议服务器运算量低于使用G.729b静音检测算法的视频系统。  相似文献   

This article uses prolonged oral reading corpora for various experiments to analyze and detect vocal fatigue. Vocal fatigue particularly concerns voice professionals, including teachers, telemarketing operators, users of automatic speech recognition technology and actors. In analyzing and detecting vocal fatigue, we focused our investigations on three main experiments: a prosodic analysis that can be compared to the results found in related work, a two-class Support Vector Machines (SVM) classifier into Fatigue and Non-Fatigue states using a large set of audio features and a comparison function that estimates the difference in fatigue level between two speech segments using a combination of multiple phoneme-based comparison functions. The experiments on prosodic analysis showed that vocal fatigue was not associated with an increase in fundamental frequency and voice intensity. A two-class SVM classifier using the Paralinguistic Challenge 2010 audio feature set gave an unweighted accuracy of 94.1% for the training set (10-fold cross-validation) and 68.2% for the test set. These results show that the phenomenon of vocal fatigue can be modeled and detected. The comparison function was assessed by detecting increased fatigue levels between two speech segments. The fatigue level detection performance in Equal Error Rate (EER) was 31% using all phonetic segments and yielded EER of 21% after filtering phonetic segments and 19% after filtering phonetic segments and cepstral features. These results show that some phonemes are more sensitive than others to vocal fatigue. These experiments show that the fatigued voice has specific characteristics for prolonged oral reading and suggest the feasibility of vocal fatigue detection.  相似文献   

We address the problem of detecting the presence of hidden messages in audio. The detector is based on the characteristics of the denoised residuals of the audio file, which may consist of a mixture of speech and music data. A set of generalized moments of the audio signal is measured in terms of objective and perceptual quality measures. The detector discriminates between cover and stego files using a selected subset of features and an SVM classifier. The proposed scheme achieves on the average 88% discrimination performance on individual steganographic algorithms and 98.5% on individual watermarking algorithms. Between 75 and 90% discrimination performance is achieved in universal tests. Correct detection performance for individual embedding algorithms is roughly 90% when the detector can encounter any one in an ensemble of different embedding algorithms.  相似文献   

Median filtering is a popular nonlinear denoising operator, it not only can be used for image enhancement, and it also is an effective tool in application of anti-forensics. So, the blind detection of median filtering is a particularly hot topic. Different from the existing median filtering forensic methods using the image pixel statistical features, this paper proposed a novel approach for detecting median filtering in digital images using coefficients of image blocks in frequency domain, based on the theory analysis and experiments test. Large numbers of experimental results show that the proposed approach achieved a high accuracy in median filtering detection and a good robustness of defending JPEG compression, the algorithm also can be used to locate the median filtering area. The approach achieves much better performance than the existing state-of-the-art methods with different format and size of image blocks, particularly when the image blocks are tiny and have high JPEG compression ratio.  相似文献   

袁泉  郭江帆 《计算机应用》2018,38(6):1591-1595
针对数据流中概念漂移和噪声问题,提出一种新型的增量式学习的数据流集成分类算法。首先,引入噪声过滤机制过滤噪声;然后,引入假设检验方法对概念漂移进行检测,以增量式C4.5决策树为基分类器构建加权集成模型;最后,实现增量式学习实例并随之动态更新分类模型。实验结果表明,该集成分类器对概念漂移的检测精度达到95%~97%,对数据流抗噪性保持在90%以上。该算法分类精度较高,且在检测概念漂移的准确性和抗噪性方面有较好的表现。  相似文献   

针对传统的二分类音频隐写分析方法对未知隐写方法的适应性较差的问题,提出了一种基于模糊C均值(FCM)聚类与单类支持向量机(OC-SVM)的音频隐写分析方法。在训练过程中,首先对训练音频进行特征提取,包括短时傅里叶变换(STFT)频谱的统计特征和基于音频质量测度的特征,然后对所提取的特征进行FCM聚类得到C个聚类,最后送入多个超球面的OC-SVM分类器进行训练;检测过程中,对测试音频进行特征提取,根据多个超球面OC-SVM分类器的边界对待测音频进行检测。实验结果表明,该隐写分析方法对于几种典型的音频隐写方法能够较为正确地检测,满容量嵌入时,测试音频的总体检测率达到85.1%,与K-means聚类方法相比,所提方法的检测正确率提高了至少2%。该隐写分析方法比二分类的隐写分析方法更具有通用性,更适用于隐写方法事先未知情况下的隐写音频的检测。  相似文献   

针对目前以WAV格式语音为载体的最低有效位(LSB)隐写方法的检测性能较低的问题,提出了一种基于深度残差网络的语音隐写分析方法。首先,通过多组高通滤波器组成的固定卷积层来计算输入语音信号的残差信号,并利用截断线性激活单元对得到的残差信号进行截断操作;然后,通过卷积层与设计的残差块的堆叠来构建深度网络,以提取深层次的隐写特征数据;最后,利用全连接层与Softmax层组成的分类器输出最终的分类结果。实验结果表明,在Hide4PGP和LSBmatching两种隐写方法的不同密信嵌入率下,所提出模型的检测正确率都要优于现有的基于卷积神经网络(CNN)的隐写分析方法。对于0.1 bps嵌入率的Hide4PGP隐写方法,该隐写分析模型的检测正确率比LinNet提高了近7个百分点。  相似文献   

As multimedia becomes the dominant form of entertainment through an ever increasing range of digital formats, there has been a growing interest in obtaining information from entertainment media. Speech is one of the core resources in multimedia, providing a foundation for the extraction of semantic information. Thus, detecting speech is a critical first step for speech-based information retrieval systems. This work focuses on speech detection in one of the dominant forms of entertainment media: feature films. A novel approach for voice activity detection (VAD) in film audio is proposed. The approach uses correlation to analyze associations of Mel Frequency Cepstral Coefficient (MFCC) pairs in speech and non-speech data. This information then drives feature selection for the creation of MFCC cross-covariance feature vectors (MFCC-CCs) which are used to train a random forest classifier to solve a binary speech/non-speech classification problem on audio data from entertainment media. The classifier performance is evaluated on a number of test sets and achieves a classification accuracy of up to 94%. The approach is also compared with state of the art and contemporary VAD algorithms, and demonstrates competitive results.  相似文献   

分类器动态集成的入侵数据流检测算法   总被引:1,自引:0,他引:1       下载免费PDF全文
入侵数据流具有快速更新以及概念漂移的特点,静态集成分类器无法及时反映整个空间的数据分布,入侵检测正确率不高,对此,文中提出了一种单分类器动态集成的入侵检测方法,该方法动态分配各分类器权值并用区间估计检查概念漂移并更新分类器。实验结果表明,在处理超平面构造的数据流上,分类效果优于多数投票、加权投票两种静态分类方法,在真实入侵实数据集上有高检测率。  相似文献   

This paper proposes a new technique for face detection and lip feature extraction. A real-time field-programmable gate array (FPGA) implementation of the two proposed techniques is also presented. Face detection is based on a naive Bayes classifier that classifies an edge-extracted representation of an image. Using edge representation significantly reduces the model's size to only 5184 B, which is 2417 times smaller than a comparable statistical modeling technique, while achieving an 86.6% correct detection rate under various lighting conditions. Lip feature extraction uses the contrast around the lip contour to extract the height and width of the mouth, metrics that are useful for speech filtering. The proposed FPGA system occupies only 15050 logic cells, or about six times less than a current comparable FPGA face detection system.  相似文献   

