首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 765 毫秒
1.
在噪声环境下的分级说话人辨识   总被引:1,自引:0,他引:1  
将小波变换与维纳滤波结合起来对语音进行去噪.为了提高系统的鲁棒性和辨识率,在采用分级说话人辨识的基础上,将基音周期的高斯概率密度对GMM分类器的似然度进行加权,形成新的似然度进行说话人辨识.实验结果显示,所提出系统的鲁棒性和辨识率都有所提高.  相似文献   

2.
为了改善发声力度对说话人识别系统性能的影响,在训练语音存在少量耳语、高喊语音数据的前提下,提出了使用最大后验概率(MAP)和约束最大似然线性回归(CMLLR)相结合的方法来更新说话人模型、投影转换说话人特征。其中,MAP自适应方法用于对正常语音训练的说话人模型进行更新,而CMLLR特征空间投影方法则用来投影转换耳语、高喊测试语音的特征,从而改善训练语音与测试语音的失配问题。实验结果显示,采用MAP+CMLLR方法时,说话人识别系统等错误率(EER)明显降低,与基线系统、最大后验概率(MAP)自适应方法、最大似然线性回归(MLLR)模型投影方法和约束最大似然线性回归(CMLLR)特征空间投影方法相比,MAP+CMLLR方法的平均等错率分别降低了75.3%、3.5%、72%和70.9%。实验结果表明,所提出方法削弱了发声力度对说话人区分性的影响,使说话人识别系统对于发声力度变化更加鲁棒。  相似文献   

3.
在文本无关的说话人辨识中,为了提高系统在电话语音条件下的鲁棒性,提出了将说话人确认中常用的评分规整手段用于说话人辨识中,即对测试语音通过不同话者模型的评分分别进行评分规整,为测试语音选取最接近的话者模型作为系统识别输出,有效地提高了系统性能。在NIST’03 1spk数据库上的说话人辨识实验表明了评分规整技术对说话人辨识的有效性。  相似文献   

4.
传统的说话人识别中,人们往往认为人耳对相位信息不敏感而忽略了相位信息对语音识别的影响。为了验证相位信息对说话人识别的影响,提出了一种提取相位特征参数的方法。分别在纯净语音和带噪语音条件下,基于高斯混合模型,通过将相位特征参数与耳蜗倒谱系数(CFCC)相结合,研究了相位信息对说话人辨识性能的影响。实验结果标明:相位信息在说话人识别中也有着重要的作用,将其应用于说话人辨识系统,可明显提高系统的识别率和鲁棒性。  相似文献   

5.
陈芬菲 《微处理机》2006,27(4):76-77,79
实现了一个基于高斯混合模型(GMM)的说话人辨识系统。GMM是用多个高斯分布的概率密度函数的组合来描述特征矢量在概率空间的分布状况,不同的说话人对应了不同的GMM。模型的训练采取了极大似然估计(ML)的EM方法。并在不同的数据集上实验,得到了好的结果。  相似文献   

6.
针对说话人识别易受环境噪声影响的问题,借鉴生物听皮层神经元频谱-时间感受野(STRF)的时空滤波机制,提出一种新的声纹特征提取方法。在该方法中,对基于STRF获得的听觉尺度-速率图进行了二次特征提取,并与传统梅尔倒谱系数(MFCC)进行组合,获得了对环境噪声具有强容忍的声纹特征。采用支持向量机(SVM)作为分类器,对不同信噪比(SNR)语音数据进行测试的结果表明,基于STRF的特征对噪声的鲁棒性普遍高于MFCC系数,但识别正确率较低;组合特征提升了语音识别的正确率,同时对环境噪声具有良好的鲁棒性。该结果说明所提方法在强噪声环境下说话人识别上是有效的。  相似文献   

7.
基于PCA和多约简SVM的多级说话人辨识   总被引:2,自引:1,他引:1  
提出一种基于主成分分析(PCA)和多约简支持向量机(SVM)的多级说话人辨识方法。首先用PCA对注册说话人进行快速粗判决,再用多约简SVM进行最后决策。此多约简SVM有两个约简步骤,即用PCA和样本选择算法分别减少训练数据的维数和个数。理论分析和实验结果表明:该方法可以大大减少系统的存储量和计算量,提高训练和识别时间,并具有较好的鲁棒性。  相似文献   

8.
在正弦激励模型的线性预测(LP)残差转换的基础上,提出了一种改进语音特征转换性能的语音转换方法.基于线性预测分析和综合的构架,该方法一方面通过谱包络估计声码器提取源说话人的线性预测编码(LPC)倒谱包络,并使用双线性变换函数实现倒谱包络的转换;另一方面由谐波正弦模型对线性预测残差信号建模和分解,采用基音频率变换将源说话人的残差信号转换为近似目标说话人的残差信号.最后由修正后的残差信号激励时变滤波器得到转换语音,滤波器参数通过转换得到的LPC倒谱包络实时更新.实验结果表明,该方法在主观和客观测试中都具有良好的结果,能有效地转换说话人声音特征,获得高相似度的转换语音.  相似文献   

9.
在噪声环境下, 为提高说话人识别系统的鲁棒性, 需要对系统进行各种抗噪声处理. 本文基于说话人特征的统计特性和直方图均衡化在说话人识别中的应用特点, 提出了直方图均衡化的自适应方法. 实验结果表明, 与普通直方图均衡化变换方法相比, 自适应直方图均衡化能进一步提高辨认系统的辨认率; 并且无论在平稳噪声还是非平稳噪声环境下, 该算法都能取得较好辨认率, 进一步增强系统的鲁棒性.  相似文献   

10.
罗元  孙龙 《计算机科学》2016,43(8):297-299, 317
为提高说话人确认系统在噪声环境下的鲁棒性,在利用听觉外周模型改进Mel频率倒谱系数(Mel FrequencyCepstral Coefficient,MFCC)的基础上,结合感知线性预测系数(Perceptual Linear Predictive Coefficient,PLPC),以类间区分度为依据,在特征域对两种声纹特征进行融合,提出一种新的声纹特征提取方法,并对基于该特征的说话人确认系统的噪声鲁棒性进行研究。针对不同信噪比的语音信号进行了融合特征与原始特征的对比实验,结果表明,融合特征在模拟餐厅噪声环境中的错误率更低,较MFCC与PLPC分别降低了2.2%和3.1%,说话人确认系统在噪声中的鲁棒性得到提升。  相似文献   

11.
Ke Chen  Huisheng Chi 《Neurocomputing》1998,20(1-3):227-252
A novel method is proposed for combining multiple probabilistic classifiers on different feature sets. In order to achieve the improved classification performance, a generalized finite mixture model is proposed as a linear combination scheme and implemented based on radial basis function networks. In the linear combination scheme, soft competition on different feature sets is adopted as an automatic feature rank mechanism so that different feature sets can be always simultaneously used in an optimal way to determine linear combination weights. For training the linear combination scheme, a learning algorithm is developed based on Expectation–Maximization (EM) algorithm. The proposed method has been applied to a typical real-world problem, viz., speaker identification, in which different feature sets often need consideration simultaneously for robustness. Simulation results show that the proposed method yields good performance in speaker identification.  相似文献   

12.
This paper presents an efficient approach for automatic speaker identification based on cepstral features and the Normalized Pitch Frequency (NPF). Most relevant speaker identification methods adopt a cepstral strategy. Inclusion of the pitch frequency as a new feature in the speaker identification process is expected to enhance the speaker identification accuracy. In the proposed framework for speaker identification, a neural classifier with a single hidden layer is used. Different transform domains are investigated for reliable feature extraction from the speech signal. Moreover, a pre-processing noise reduction step, is used prior to the feature extraction process to enhance the performance of the speaker identification system. Simulation results prove that the NPF as a feature in speaker identification enhances the performance of the speaker identification system, especially with the Discrete Cosine Transform (DCT) and wavelet denoising pre-processing step.  相似文献   

13.
一种新的说话人确认方法   总被引:3,自引:0,他引:3  
张怡颖  朱小燕  张钹 《软件学报》1999,10(4):372-376
文章在对说话人确认和说话人辨认进行比较研究的基础上,提出一种新的说话人确认方法.同传统方法相比,该方法通过建立非特定说话人模型综合多个说话人的语音特性,使其能够对于不同的待确认语音给出不同的判决阈值,从而解决了说话人确认在判决阈值设置上存在的困难.实验结果表明,该方法能够显著降低说话人确认系统的错误接受率和错误拒绝率,为说话人确认应用于保密性要求较高的环境提供了一条有效的途径.  相似文献   

14.
针对目前说话人识别模型精度不高,应用性不强的缺点,提出一种采用熵相关性优化原始特征参数的方法,并综合特征熵相关性和原始特征特性值两方面因素改进了说话人识别的分离性测度。以说话人聚类类间差异最大化为目标,建立围绕基于特征分类相关性的参数自适应重构策略及分离性测度计算方法的说话人识别模型。仿真实验结果表明,该模型结构稳定,使说话人识别的精度及效率达到较好的平衡,具有较强的应用性能。  相似文献   

15.
In this paper we are proposing neural network based feature transformation framework for developing emotion independent speaker identification system. Most of the present speaker recognition systems may not perform well during emotional environments. In real life, humans extensively express emotions during conversations for effectively conveying the messages. Therefore, in this work we propose the speaker recognition system, robust to variations in emotional moods of speakers. Neural network models are explored to transform the speaker specific spectral features from any specific emotion to neutral. In this work, we have considered eight emotions namely, Anger, Sad, Disgust, Fear, Happy, Neutral, Sarcastic and Surprise. The emotional databases developed in Hindi, Telugu and German are used in this work for analyzing the effect of proposed feature transformation on the performance of speaker identification system. In this work, spectral features are represented by mel-frequency cepstral coefficients, and speaker models are developed using Gaussian mixture models. Performance of the speaker identification system is analyzed with various feature mapping techniques. Results have demonstrated that the proposed neural network based feature transformation has improved the speaker identification performance by 20?%. Feature transformation at the syllable level has shown the better performance, compared to sentence level.  相似文献   

16.
提出一个新的基于MRSVM的说话人辨识方法,首先对语音特征矢量进行LDA降维,得到具有区分力的特征矢量,然后对其进行模糊核聚类,根据样本选择算法,选择聚类边界的特征矢量作为支持向量训练支持向量机,在不影响识别率的情况下,大大减少了支持向量机的存储量和训练量。实验表明该方法具有较好的总体效果。  相似文献   

17.
This paper presents the study of speaker identification for security systems based on the energy of speaker utterances. The proposed system consisted of a combination of signal pre-process, feature extraction using wavelet packet transform (WPT) and speaker identification using artificial neural network. In the signal pre-process, the amplitude of utterances, for a same sentence, were normalized for preventing an error estimation caused by speakers’ change in volume. In the feature extraction, three conventional methods were considered in the experiments and compared with the irregular decomposition method in the proposed system. In order to verify the effect of the proposed system for identification, a general regressive neural network (GRNN) was used and compared in the experimental investigation. The experimental results demonstrated the effectiveness of the proposed speaker identification system and were compared with the discrete wavelet transform (DWT), conventional WPT and WPT in Mel scale.  相似文献   

18.
This paper presents the feature analysis and design of compensators for speaker recognition under stressed speech conditions. Any condition that causes a speaker to vary his or her speech production from normal or neutral condition is called stressed speech condition. Stressed speech is induced by emotion, high workload, sleep deprivation, frustration and environmental noise. In stressed condition, the characteristics of speech signal are different from that of normal or neutral condition. Due to changes in speech signal characteristics, performance of the speaker recognition system may degrade under stressed speech conditions. Firstly, six speech features (mel-frequency cepstral coefficients (MFCC), linear prediction (LP) coefficients, linear prediction cepstral coefficients (LPCC), reflection coefficients (RC), arc-sin reflection coefficients (ARC) and log-area ratios (LAR)), which are widely used for speaker recognition, are analyzed for evaluation of their characteristics under stressed condition. Secondly, Vector Quantization (VQ) classifier and Gaussian Mixture Model (GMM) are used to evaluate speaker recognition results with different speech features. This analysis help select the best feature set for speaker recognition under stressed condition. Finally, four VQ based novel compensation techniques are proposed and evaluated for improvement of speaker recognition under stressed condition. The compensation techniques are speaker and stressed information based compensation (SSIC), compensation by removal of stressed vectors (CRSV), cepstral mean normalization (CMN) and combination of MFCC and sinusoidal amplitude (CMSA) features. Speech data from SUSAS database corresponding to four different stressed conditions, Angry, Lombard, Question and Neutral, are used for analysis of speaker recognition under stressed condition.  相似文献   

19.
The speaker recognition has been one of the interesting issues in signal and speech processing over the last few decades. Feature selection is one of the main parts of speaker recognition system which can improve the performance of the system. In this paper, we have proposed two methods to find MFCCs feature vectors with the highest similar that is applied to text independent speaker identification system. These feature vectors show individual properties of each person’s vocal tract that are mostly repeated. They are used to build speaker’s model and to specify decision boundary. We applied MFCC of each window over main signal as a feature vector and used clustering to obtain feature vectors with the highest similar. The Speaker identification experiments are performed using the ELSDSR database that consists of 22 speakers (12 male and 10 female) and Neural Network is used as a classifier. The effect of three main parameters have been considered in two proposed methods. Experimental results indicate that the performance of speaker identification system has been improved in accuracy and time consumption term.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号