首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 328 毫秒
1.
研究了基于美尔倒谱特征参数及高斯混合模型的文本无关的说话人识别系统,为了提高噪声环境下识别系统的识别率,从两个角度研究改善该系统抗噪性能的方法,即利用语音识别将文本无关的系统转化为文本有关的说话人识别方法和通过选择鲁棒性较强的帧进行说话人识别的方法,分析了以上方法对系统识别性能的改善作用,并通过实验验证上述方法确实可以提高系统在噪声环境下的识别率。  相似文献   

2.
在文本无关的说话人识别中,韵律特征由于其对信道环境噪声不敏感等特性而被应用于话者识别任务中.本文对韵律参数采用基于高斯混合模型超向量的支持向量机建模方法,并将类内协方差特征映射方法应用于模型超向量上,单系统的性能比传统方法的混合高斯-通用背景模型(Gaussian mixture model-universal background model,GMM-UBM)基线系统有了40.19%的提升.该方法与本文的基于声学倒谱参数的确认系统融合后,能使整体系统的识别性能有9.25%的提升.在NIST(National institute of standards and technology mixture)2006说话人测试数据库上,融合后的系统能够取得4.9%的等错误率.  相似文献   

3.
During our pronunciation process, the position and movement properties of articulators such as tongue, jaw, lips, etc are mainly captured by the articulatory movement features (AMFs). This paper investigates to use the AMFs for short-duration text-dependent speaker verification. The AMFs can characterize the relative motion trajectory of articulators of individual speakers directly, which is rarely affected by the external environment. Therefore, we expect that, the AMFs are superior to the traditional acoustic features, such as mel-frequency cepstral coefficients (MFCC), to characterize the speaker identity differences between speakers. The speaker similarity scores measured by the dynamic time warping (DTW) algorithm are used to make the speaker verification decisions. Experimental results show that the AMFs can bring significant performance gains over the traditional MFCC features for short-duration text-dependent speaker verification task.  相似文献   

4.

Speaker recognition revolution has lead to the inclusion of speaker recognition modules in several commercial products. Most published algorithms for speaker recognition focus on text-dependent speaker recognition. In contrast, text-independent speaker recognition is more advantageous as the client can talk freely to the system. In this paper, text-independent speaker recognition is considered in the presence of some degradation effects such as noise and reverberation. Mel-Frequency Cepstral Coefficients (MFCCs), spectrum and log-spectrum are used for feature extraction from the speech signals. These features are processed with the Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) as a classification tool to complete the speaker recognition task. The network learns to recognize the speakers efficiently in a text-independent manner, when the recording circumstances are the same. The recognition rate reaches 95.33% using MFCCs, while it is increased to 98.7% when using spectrum or log-spectrum. However, the system has some challenges to recognize speakers from different recording environments. Hence, different speech enhancement techniques, such as spectral subtraction and wavelet denoising, are used to improve the recognition performance to some extent. The proposed approach shows superiority, when compared to the algorithm of R. Togneri and D. Pullella (2011).

  相似文献   

5.
针对说话人识别技术多基于语音的现状,文章提出了一种新颖的基于唇动的说话人识别技术。通过离散余弦变换,从说话人讲话时的图像序列提取那些既反映说话人嘴部生理特性也反映了说话人唇动的行为特性的视觉特征。基于这些特征,为说话人建立静态-动态混合模型,其中使用半连续隐马尔可夫模型为说话人建立动态模型。在一个小型的视觉语料库上,我们分别对说话人辨认系统和确认系统进行实现。对说话人辨认系统,其文本有关与文本无关模式的正确率分别达到了100%和99.7%;对说话人确认系统,文本有关与文本无关模式的等错误率分别为0.09%与0.33%。  相似文献   

6.
This paper proposes to incorporate full covariance matrices into the radial basis function (RBF) networks and to use the expectation-maximization (EM) algorithm to estimate the basis function parameters. The resulting networks, referred to as elliptical basis function (EBF) networks, are evaluated through a series of text-independent speaker verification experiments involving 258 speakers from a phonetically balanced, continuous speech corpus (TIMIT). We propose a verification procedure using RBF and EBF networks as speaker models and show that the networks are readily applicable to verifying speakers using LP-derived cepstral coefficients as features. Experimental results show that small EBF networks with basis function parameters estimated by the EM algorithm outperform the large RBF networks trained in the conventional approach. The results also show that the equal error rate achieved by the EBF networks is about two-third of that achieved by the vector quantization-based speaker models.  相似文献   

7.
We present a new modeling approach for speaker recognition that uses the maximum-likelihood linear regression (MLLR) adaptation transforms employed by a speech recognition system as features for support vector machine (SVM) speaker models. This approach is attractive because, unlike standard frame-based cepstral speaker recognition models, it normalizes for the choice of spoken words in text-independent speaker verification without data fragmentation. We discuss the basics of the MLLR-SVM approach, and show how it can be enhanced by combining transforms relative to multiple reference models, with excellent results on recent English NIST evaluation sets. We then show how the approach can be applied even if no full word-level recognition system is available, which allows its use on non-English data even without matching speech recognizers. Finally, we examine how two recently proposed algorithms for intersession variability compensation perform in conjunction with MLLR-SVM.  相似文献   

8.
邓蕾  高勇 《计算机系统应用》2017,26(12):227-232
针对噪声环境中说话人识别性能急剧下降的问题. 提出了一种用于说话人识别的鲁棒特征提取的方法. 采用弯折滤波器组(Warped filter banks,WFBS)来模拟人耳听觉特性,将立方根压缩算法、相对谱滤波技术(RASTA)、倒谱均值方差归一化算法(CMVN)引入到鲁棒特征的提取中. 在高斯混合模型(GMM)下进行仿真,实验结果表明该方法提取的特征参数在鲁棒性和识别性能上均优于MFCC特征参数和CFCC特征参数.  相似文献   

9.
In speaker recognition tasks, one of the reasons for reduced accuracy is due to closely resembling speakers in the acoustic space. In order to increase the discriminative power of the classifier, the system must be able to use only the unique features of a given speaker with respect to his/her acoustically resembling speaker. This paper proposes a technique to reduce the confusion errors, by finding speaker-specific phonemes and formulate a text using the subset of phonemes that are unique, for speaker verification task using i-vector based approach. In this paper spectral features such as linear prediction cepstral co-efficients (LPCC), perceptual linear prediction co-efficients (PLP) and phase feature such as modified group delay are experimented to analyse the importance of speaker-specific-text in speaker verification task. Experiments have been conducted on speaker verification task using speech data of 50 speakers collected in a laboratory environment. The experiments show that the equal error rate (EER) has been decreased significantly using i-vector approach with speaker-specific-text when compared to i-vector approach with random-text using different spectral and phase based features.  相似文献   

10.
在文本无关的说话人确认中,训练与测试语音中信道环境的不匹配是一种说话者话路变化问题.这种不匹配会严重降低说话人确认系统的性能.为了有效解决该问题,本文提出一种基于说话者话路变化的主成分分析方法,将其应用在说话者确认中,我们将这种方法称为面向话路变化的主成分分析方法.这种方法能够与类内协方差归一化结合,进一步提高识别效果.在NIST 2006年说话者识别数据库上进行实验,证明该方法不仅在系统识别等错误率上比基线系统有了24.2%的降低,而且在计算复杂度上相对于目前传统的方法也有很大的优势.  相似文献   

11.
The main objective of this paper is to develop the system of speaker identification. Speaker identification is a technology that allows a computer to automatically identify the person who is speaking, based on the information received from speech signal. One of the most difficult problems in speaker recognition is dealing with noises. The performance of speaker recognition using close speaking microphone (CSM) is affected in background noises. To overcome this problem throat microphone (TM) which has a transducer held at the throat resulting in a clean signal and unaffected by background noises is used. Acoustic features namely linear prediction coefficients, linear prediction cepstral coefficients, Mel frequency cepstral coefficients and relative spectral transform-perceptual linear prediction are extracted. These features are classified using RBFNN and AANN and their performance is analyzed. A new method was proposed for identification of speakers in clean and noisy using combined CSM and TM. The identification performance of the combined system is increased than individual system due to complementary nature of CSM and TM.  相似文献   

12.
The characterization of a speech signal using non-linear dynamical features has been the focus of intense research lately. In this work, the results obtained with time-dependent largest Lyapunov exponents (TDLEs) in a text-dependent speaker verification task are reported. The baseline system used Gaussian mixture models (GMMs), obtained from the adaptation of a universal background model (UBM), for the speaker voice models. Sixteen cepstral and 16 delta cepstral features were used in the experiments, and it is shown how the addition of TDLEs can improve the system’s accuracy. Cepstral mean subtraction was applied to all features in the tests for channel equalization, and silence frames were discarded. The corpus used, obtained from a subset of the Center for Spoken Language Understanding (CSLU) Speaker Recognition corpus, consisted of telephone speech from 91 different speakers.  相似文献   

13.
This paper presents the feature analysis and design of compensators for speaker recognition under stressed speech conditions. Any condition that causes a speaker to vary his or her speech production from normal or neutral condition is called stressed speech condition. Stressed speech is induced by emotion, high workload, sleep deprivation, frustration and environmental noise. In stressed condition, the characteristics of speech signal are different from that of normal or neutral condition. Due to changes in speech signal characteristics, performance of the speaker recognition system may degrade under stressed speech conditions. Firstly, six speech features (mel-frequency cepstral coefficients (MFCC), linear prediction (LP) coefficients, linear prediction cepstral coefficients (LPCC), reflection coefficients (RC), arc-sin reflection coefficients (ARC) and log-area ratios (LAR)), which are widely used for speaker recognition, are analyzed for evaluation of their characteristics under stressed condition. Secondly, Vector Quantization (VQ) classifier and Gaussian Mixture Model (GMM) are used to evaluate speaker recognition results with different speech features. This analysis help select the best feature set for speaker recognition under stressed condition. Finally, four VQ based novel compensation techniques are proposed and evaluated for improvement of speaker recognition under stressed condition. The compensation techniques are speaker and stressed information based compensation (SSIC), compensation by removal of stressed vectors (CRSV), cepstral mean normalization (CMN) and combination of MFCC and sinusoidal amplitude (CMSA) features. Speech data from SUSAS database corresponding to four different stressed conditions, Angry, Lombard, Question and Neutral, are used for analysis of speaker recognition under stressed condition.  相似文献   

14.
基于方差归一化失真测度的改进的LBG算法   总被引:2,自引:1,他引:2  
矢量量化(VQ)技术在话者识别系统中得到了广泛的应用。 VQ码本的产生通常采用 LBG算法,失真测度则为对矢量的各分量等权重的欧氏距离。在话者识别系统中特征矢量的各个分量的分布是有差别的,且对于不同的话者,这种差别的程度又是不一样的。由于不同分布的各维参数对话者识别的有效性各不相同,因此,文章提出了一种能反映这种有效性差别的失真测度,即:方差归一化失真测度。以该失真测度为基础,并结合时序相关的初始码本设计方法及有效的零胞腔处理技术,文章提出了改进的LBG算法,同时利用该算法训练出改进的VQ话者模型,并进行了话者识别实验。  相似文献   

15.
陈迪  龚卫国  杨利平 《计算机应用》2007,27(5):1217-1219
提出了一种可用于改善说话人识别效果的基于基音周期的可变窗长语音MFCC参数提取方法。基本原理是将原始的语音分解为当前基音周期整数倍长度以内部分及其以外部分,并保留前者舍去后者,以减小训练语音与测试语音的频谱失真。通过文本无关的说话人确认实验,验证了该方法能有效提高说话人确认的识别率,并能提高短时语音的稳定性。  相似文献   

16.
A vector quantization based talker recognition system is described and evaluated. The system is based on constructing highly efficient short-term spectral representations of individual talkers using vector quantization codebook construction techniques. Although the approach is intrinsically text-independent, the system can be easily extended to text-dependent operation for improved performance and security by encoding specified training word utterances to form word prototypes. The system was evaluated using a 100-talker database of 20 000 digits spoken in isolation. In a talker verification mode, average equal-error rate performance of 2·2% for text-independent operation and 0·3% for text-dependent operation was obtained for 7-digit long test utterances. Because the evaluation database was restricted to the vocabulary of spoken digits, the text-independent operation of the system has not been formally tested beyond the confines of that vocabulary.  相似文献   

17.
18.
Local features for any pattern recognition system are based on the information extracted locally. In this paper, a local feature extraction technique was developed. This feature was extracted in the time–frequency plain by taking the moving average on the diagonal directions of the time–frequency plane. This feature captured the time–frequency events producing a unique pattern for each speaker that can be viewed as a voice print of the speaker. Hence, we referred to this technique as voice print-based local feature. The proposed feature was compared to other features including mel-frequency cepstral coefficient (MFCC) for speaker recognition using two different databases. One of the databases used in the comparison is a subset of an LDC database that consisted of two short sentences uttered by 182 speakers. The proposed feature attained 98.35% recognition rate compared to 96.7% for MFCC using the LDC subset.  相似文献   

19.
基于模型距离和支持向量机的说话人确认   总被引:1,自引:0,他引:1  
针对采用支持向量机的说话人的确认问题,提出采用背景模型、说话人模型、测试语句模型间距离和夹角作为支持向量机的特征矢量,同时将组特征矢量与广义线性判别式序列核函数的参数相拼接,能够取得相对于基线的混合高斯模型算法更高的识别率.在2004年NIST评测数据库上,采用推荐算法的系统等错误率比基线的混合高斯-背景模型系统低16%.对说话人识别取得一定进展.  相似文献   

20.
以线性预测系数为特征通过高斯混合模型的迭代算法对训练样本的初始k均值聚类结果进行优化,得到语音组成单位的表示.以语音组成单位的模式匹配为基础,提出一种文本无关说话人确认的方法——均值法,以及一种文本无关说话人辨认方法.实验结果表明,即使在短时语音下本文方法都能取得较好效果.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号