首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In speaker recognition tasks, one of the reasons for reduced accuracy is due to closely resembling speakers in the acoustic space. In order to increase the discriminative power of the classifier, the system must be able to use only the unique features of a given speaker with respect to his/her acoustically resembling speaker. This paper proposes a technique to reduce the confusion errors, by finding speaker-specific phonemes and formulate a text using the subset of phonemes that are unique, for speaker verification task using i-vector based approach. In this paper spectral features such as linear prediction cepstral co-efficients (LPCC), perceptual linear prediction co-efficients (PLP) and phase feature such as modified group delay are experimented to analyse the importance of speaker-specific-text in speaker verification task. Experiments have been conducted on speaker verification task using speech data of 50 speakers collected in a laboratory environment. The experiments show that the equal error rate (EER) has been decreased significantly using i-vector approach with speaker-specific-text when compared to i-vector approach with random-text using different spectral and phase based features.  相似文献   

2.
Exploiting the capabilities offered by the plethora of existing wavelets, together with the powerful set of orthonormal bases provided by wavelet packets, we construct a novel wavelet packet-based set of speech features that is optimized for the task of speaker verification. Our approach differs from previous wavelet-based work, primarily in the wavelet-packet tree design that follows the concept of critical bands, as well as in the particular wavelet basis function that has been used. In comparative experiments, we investigate several alternative speech parameterizations with respect to their usefulness for differentiating among human voices. The experimental results confirm that the proposed speech features outperform Mel-Frequency Cepstral Coefficients (MFCC) and previously used wavelet features on the task of speaker verification. A relative reduction of the equal error rate by 15%, 15% and 8% was observed for the proposed speech features, when compared to the wavelet packet features introduced by Farooq and Datta, the MFCC of Slaney, and the subband based cepstral coefficients of Sarikaya et al., respectively.  相似文献   

3.
4.
提出了一种基于韵律特征和SVM的文本无关说话人确认系统。采用小波分析方法,从语音信号的MFCC、F0和能量轨迹中提取出超音段韵律特征,通过实验研究三者的韵律特征在特征层的最佳互补融合,得到信号的韵律特征PMFCCFE,用韵律特征的GMM均值超矢量作为参数训练目标话者的SVM模型,以更有效地区分目标话者和冒认话者。在NIST06 8side-1side数据库的实验表明,以短时倒谱参数的GMM-UBM系统为基准,超音段韵律特征的GMM-SVM系统的EER相对下降了57.9%,MinDCF相对下降了41.4%。  相似文献   

5.
This correspondence introduces a new text-independent speaker verification method, which is derived from the basic idea of pattern recognition that the discriminating ability of a classifier can be improved by removing the common information between classes. In looking for the common speech characteristics between a group of speakers, a global speaker model can be established. By subtracting the score acquired from this model, the conventional likelihood score is normalized with the consequence of more compact score distribution and lower equal error rates. Several experiments are carried out to demonstrate the effectiveness of the proposed method  相似文献   

6.
In this paper, we propose a sub-vector based speaker characterization method for biometric speaker verification, where speakers are represented by uniform segmentation of their maximum likelihood linear regression (MLLR) super-vectors called m-vectors. The MLLR transformation is estimated with respect to universal background model (UBM) without any speech/phonetic information. We introduce two strategies for segmentation of MLLR super-vector: one is called disjoint and other is an overlapped window technique. During test phase, m-vectors of the test utterance are scored against the claimant speaker. Before scoring, m-vectors are post-processed to compensate the session variability. In addition, we propose a clustering algorithm for multiple-class wise MLLR transformation, where Gaussian components of the UBM are clustered into different groups using the concept of expectation maximization (EM) and maximum likelihood (ML). In this case, MLLR transformations are estimated with respect to each class using the sufficient statistics accumulated from the Gaussian components belonging to the particular class, which are then used for m-vector system. The proposed method needs only once alignment of the data with respect to the UBM for multiple MLLR transformations. We first show that the proposed multi-class m-vector system shows promising speaker verification performance when compared to the conventional i-vector based speaker verification system. Secondly, the proposed EM based clustering technique is robust to the random initialization in-contrast to the conventional K-means algorithm and yields system performance better/equal which is best obtained by the K-means. Finally, we show that the fusion of the m-vector with the i-vector further improves the performance of the speaker verification in both score as well as feature domain. The experimental results are shown on various tasks of NIST 2008 speaker recognition evaluation (SRE) core condition.  相似文献   

7.
Speaker verification has been studied widely from different points of view, including accuracy, robustness and being real-time. Recent studies have turned toward better feature stability and robustness. In this paper we study the effect of nonlinear manifold based dimensionality reduction for feature robustness. Manifold learning is a popular recent approach for nonlinear dimensionality reduction. Algorithms for this task are based on the idea that each data point may be described as a function of only a few parameters. Manifold learning algorithms attempt to uncover these parameters in order to find a low-dimensional representation of the data. From the manifold based dimension reduction approaches, we applied the widely used Isometric mapping (Isomap) algorithm. Since in the problem of speaker verification, the input utterance is compared with the model of the claiming client, a speaker dependent feature transformation would be beneficial for deciding on the identity of the speaker. Therefore, our first contribution is to use Isomap dimension reduction approach in the speaker dependent context and compare its performance with two other widely used approaches, namely principle component analysis and factor analysis. The other contribution of our work is to perform the nonlinear transformation in a speaker-dependent framework. We evaluated this approach in a GMM based speaker verification framework using Tfarsdat Telephone speech dataset for different noises and SNRs and the evaluations have shown reliability and robustness even in low SNRs. The results also show better performance for the proposed Isomap approach compared to the other approaches.  相似文献   

8.
在说话人识别系统中,一种结合深度神经网路(DNN)、身份认证矢量(i-vector)和概率线性鉴别分析(PLDA)的模型被证明十分有效。为进一步提升PLDA模型信道补偿的性能,将降噪自动编码器(DAE)和受限玻尔兹曼机(RBM)以及它们的组合(DAE-RBM)分别应用到信道补偿PLDA模型端,降低说话人i-vector空间信道信息的影响。实验表明相比标准PLDA系统,基于DAE-PLDA和RBM-PLDA的识别系统的等错误率(EER)和检测代价函数(DCF)都显著降低,结合两者优势的DAE-RBMPLDA使系统识别性能得到了进一步提升。  相似文献   

9.
In this paper, an intelligent speaker identification system is presented for speaker identification by using speech/voice signal. This study includes both combination of the adaptive feature extraction and classification by using optimum wavelet entropy parameter values. These optimum wavelet entropy values are obtained from measured Turkish speech/voice signal waveforms using speech experimental set. It is developed a genetic wavelet adaptive network based on fuzzy inference system (GWANFIS) model in this study. This model consists of three layers which are genetic algorithm, wavelet and adaptive network based on fuzzy inference system (ANFIS). The genetic algorithm layer is used for selecting of the feature extraction method and obtaining the optimum wavelet entropy parameter values. In this study, one of the eight different feature extraction methods is selected by using genetic algorithm. Alternative feature extraction methods are wavelet decomposition, wavelet decomposition – short time Fourier transform, wavelet decomposition – Born–Jordan time–frequency representation, wavelet decomposition – Choi–Williams time–frequency representation, wavelet decomposition – Margenau–Hill time–frequency representation, wavelet decomposition – Wigner–Ville time–frequency representation, wavelet decomposition – Page time–frequency representation, wavelet decomposition – Zhao–Atlas–Marks time–frequency representation. The wavelet layer is used for optimum feature extraction in the time–frequency domain and is composed of wavelet decomposition and wavelet entropies. The ANFIS approach is used for evaluating to fitness function of the genetic algorithm and for classification speakers. It has been evaluated the performance of the developed system by using noisy Turkish speech/voice signals. The test results showed that this system is effective in detecting real speech signals. The correct classification rate is about 91% for speaker classification.  相似文献   

10.
《微型机与应用》2016,(11):51-55
在文本无关说话人确认领域,基于总差异空间的说话人确认方法已成为主流方法,其中概率线性判别分析(Probabilistic Linear Discriminant Analysis,PLDA)因其优异的性能受到广泛关注。然而传统PLDA模型没有考虑注册语音与测试语音时长失配情况下的差异信息,不能很好地解决因时长失配带来的说话人确认系统性能下降的问题。该文提出一种估计时长差异信息方法,并将此差异信息融入PLDA模型,从而提高PLDA模型对时长差异的鲁棒性。在NIST数据库上的实验表明,所提出的方法可以较好地补偿时长差异,性能上也优于PLDA方法。  相似文献   

11.
针对语音识别率不高的问题,提出一种基于PCS-PCA和支持向量机的分级说话人确认方法.首先采用主成分分析法对话者特征向量降维的同时,得到说话人特征向量的主成份空间,在此空间中构造PCS-PCA分类器,筛选可能的目标说话人,然后采用支持向量机进行最终的说话人确认.仿真实验结果表明该方法具有较高的识别率和较快的训练速度.  相似文献   

12.
王明  肖熙 《计算机应用》2007,27(8):2051-2052
从变帧长、变帧率角度考虑提出一种新的提取MFCC的方法。该方法先将帧长和帧率都限制为基音周期的整数倍,即基音同步算法;然后基于变帧率算法的原理在语音特征变化缓慢的地方去除一些帧来降低帧率。在NIST 99说话人评测上进行的说话人确认实验表明,该方法不但提升了系统性能,而且降低了帧率,节省了特征文件的存储空间。  相似文献   

13.
To reduce the high dimensionality required for training of feature vectors in speaker identification, we propose an efficient GMM based on local PCA with fuzzy clustering. The proposed method firstly partitions the data space into several disjoint clusters by fuzzy clustering, and then performs PCA using the fuzzy covariance matrix on each cluster. Finally, the GMM for speaker is obtained from the transformed feature vectors with reduced dimension in each cluster. Compared to the conventional GMM with diagonal covariance matrix, the proposed method shows faster result with less storage maintaining same performance.  相似文献   

14.
Multimedia Tools and Applications - Due to the mismatch between training and test conditions, speaker verification in real environments, continues to be a challenging problem. An effective way of...  相似文献   

15.
Gaussian Mixture Models (GMM) have been the most popular approach in speaker recognition and verification for over two decades. The inefficiencies of this model for signals such as speech are well documented and include an inability to model temporal dependencies that result from nonlinearities in the speech signal. The resulting models are often complex and overdetermined, which leads to a lack of generalization. In this paper, we present a nonlinear mixture autoregressive model (MixAR) that attempts to directly model nonlinearities in the trajectories of the speech features. We apply this model to the problem of speaker verification. Experiments with synthetic data demonstrate the viability of the model. Evaluations on standard speech databases, including TIMIT, NTIMIT, and NIST-2001, demonstrate that MixAR, using only half the number of parameters and only static features, can achieve a lower equal error rate when compared to GMMs, particularly in the presence of previously unseen noise. Performance as a function of the duration of both the training and evaluation utterances is also analyzed.  相似文献   

16.
In the context of mobile devices, speaker recognition engines may suffer from ergonomic constraints and limited amount of computing resources. Even if they prove their efficiency in classical contexts, GMM/UBM systems show their limitations when restricting the quantity of speech data. In contrast, the proposed GMM/UBM extension addresses situations characterised by limited enrolment data and only the computing power typically found on modern mobile devices. A key contribution comes from the harnessing of the temporal structure of speech using client-customised pass-phrases and new Markov model structures. Additional temporal information is then used to enhance discrimination with Viterbi decoding, increasing the gap between client and imposter scores. Experiments on the MyIdea database are presented with a standard GMM/UBM configuration acting as a benchmark. When imposters do not know the client pass-phrase, a relative gain of up to 65% in terms of EER is achieved over the GMM/UBM baseline configuration. The results clearly highlight the potential of this new approach, with a good balance between complexity and recognition accuracy.  相似文献   

17.
高新建  屈丹  李弼程 《计算机应用》2007,27(10):2602-2604
在说话人确认中,由于目标说话人和冒认者的得分分布是双峰分布,并且不同目标说话人模型得分分布不一致,使对所有说话人确定一个统一的阈值变得困难,导致系统性能下降。分数归一化通过调整冒认者的得分分布来调整阈值。简要介绍了目前最常用的两种归一化方法:零归一化(Z-Norm)和测试归一化(T-Norm)。重点引入了一种新的根据KL距离的D-Norm 归一化方法。然后结合Z-Norm 和D-Norm的优点,又提出一种新的方法ZD-Norm。对这四种归一化方法的性能进行了比较。实验表明,ZD-Norm相对Z-Norm和D-Norm,能够更有效地提高说话人确认系统的性能。  相似文献   

18.
Speaker verification techniques neglect the short-time variation in the feature space even though it contains speaker related attributes. We propose a simple method to capture and characterize this spectral variation through the eigenstructure of the sample covariance matrix. This covariance is computed using sliding window over spectral features. The newly formulated feature vectors representing local spectral variations are used with classical and state-of-the-art speaker recognition systems. Results on multiple speaker recognition evaluation corpora reveal that eigenvectors weighted with their normalized singular values are useful in representing local covariance information. We have also shown that local variability features can be extracted using mel frequency cepstral coefficients (MFCCs) as well as using three recently developed features: frequency domain linear prediction (FDLP), mean Hilbert envelope coefficients (MHECs) and power-normalized cepstral coefficients (PNCCs). Since information conveyed in the proposed feature is complementary to the standard short-term features, we apply different fusion techniques. We observe considerable relative improvements in speaker verification accuracy in combined mode on text-independent (NIST SRE) and text-dependent (RSR2015) speech corpora. We have obtained up to 12.28% relative improvement in speaker recognition accuracy on text-independent corpora. Conversely in experiments on text-dependent corpora, we have achieved up to 40% relative reduction in EER. To sum up, combining local covariance information with the traditional cepstral features holds promise as an additional speaker cue in both text-independent and text-dependent recognition.  相似文献   

19.
A novel pseudo-outer product based fuzzy neural network (POPFNN-TVR) driven signature verification system called the antiforgery system is presented in this paper. As Plamondon and Lorette have stated that the design of a signature verification system generally requires the solution of five types of problems: data acquisition, preprocessing, feature extraction, comparison process, and performance evaluation. However, unlike most existing automatic signature verification systems which employ traditional techniques (i.e. image processing techniques) to solve these problems, the proposed system is constructed on the basis of a novel fuzzy neural network called the POPFNN-TVR. The characteristics of POPFNN-TVR, such as the learning ability, generalization ability, and high computational ability, make antiforgery particularly powerful when verifying skilled forgeries. To demonstrate the efficacy of POPFNN-TVR and its application in the antiforgery system, several types of experiments have been designed and implemented in this work. The experimental results and analysis are presented at the end of the paper for discussion.  相似文献   

20.
In this paper an online text-independent speaker verification system developed at IIT Guwahati under multivariability condition for remote person authentication is described. The system is developed on a voice server accessible via telephone network using an interactive voice response (IVR) system in which both enrollment and testing can be done online. The speaker verification system is developed using Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction and Gaussian Mixture Model—Universal Background Model (GMM-UBM) for modeling. The performance of the system under multi-variable condition is evaluated using online enrollments and testing from the subjects. The evaluation of the system helps in understanding the impact of several well known issues related to speaker verification such as the effect of environment noise, duration of test speech, robustness of the system against playing recorded speech etc. in an online system scenario. These issues need to be taken care for the development and deployment of speaker verification system in real life applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号