共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents a principled SVM based speaker verification system. We propose a new framework and a new sequence kernel that can make use of any Mercer kernel at the frame level. An extension of the sequence kernel based on the Max operator is also proposed. The new system is compared to state-of-the-art GMM and other SVM based systems found in the literature on the Banca and Polyvar databases. The new system outperforms, most of the time, the other systems, statistically significantly. Finally, the new proposed framework clarifies previous SVM based systems and suggests interesting future research directions. 相似文献
2.
Gaussian Mixture Models (GMM) have been the most popular approach in speaker recognition and verification for over two decades. The inefficiencies of this model for signals such as speech are well documented and include an inability to model temporal dependencies that result from nonlinearities in the speech signal. The resulting models are often complex and overdetermined, which leads to a lack of generalization. In this paper, we present a nonlinear mixture autoregressive model (MixAR) that attempts to directly model nonlinearities in the trajectories of the speech features. We apply this model to the problem of speaker verification. Experiments with synthetic data demonstrate the viability of the model. Evaluations on standard speech databases, including TIMIT, NTIMIT, and NIST-2001, demonstrate that MixAR, using only half the number of parameters and only static features, can achieve a lower equal error rate when compared to GMMs, particularly in the presence of previously unseen noise. Performance as a function of the duration of both the training and evaluation utterances is also analyzed. 相似文献
3.
In this paper an online text-independent speaker verification system developed at IIT Guwahati under multivariability condition for remote person authentication is described. The system is developed on a voice server accessible via telephone network using an interactive voice response (IVR) system in which both enrollment and testing can be done online. The speaker verification system is developed using Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction and Gaussian Mixture Model—Universal Background Model (GMM-UBM) for modeling. The performance of the system under multi-variable condition is evaluated using online enrollments and testing from the subjects. The evaluation of the system helps in understanding the impact of several well known issues related to speaker verification such as the effect of environment noise, duration of test speech, robustness of the system against playing recorded speech etc. in an online system scenario. These issues need to be taken care for the development and deployment of speaker verification system in real life applications. 相似文献
4.
In this paper Type-2 Information Set (T2IS) features and Hanman Transform (HT) features as Higher Order Information Set (HOIS) based features are proposed for the text independent speaker recognition. The speech signals of different speakers represented by Mel Frequency Cepstral Coefficients (MFCC) are converted into T2IS features and HT features by taking account of the cepstral and temporal possibilistic uncertainties. The features are classified by Improved Hanman Classifier (IHC), Support Vector Machine (SVM) and k-Nearest Neighbours (kNN). The performance of the proposed approaches is tested in terms of speed, computational complexity, memory requirement and accuracy on three datasets namely NIST-2003, VoxForge 2014 speech corpus and VCTK speech corpus and compared with that of the baseline features like MFCC, ?MFCC, ??MFCC and GFCC under white Gaussian noisy environment at different signal-to-noise ratios. The proposed features have the reduced feature size, computational time, and complexity and also their performance is not degraded under the noisy environment. 相似文献
5.
Speaker verification (SV) systems involve mainly two individual stages: feature extraction and classification. In this paper, we explore these two modules with the aim of improving the performance of a speaker verification system under noisy conditions. On the one hand, the choice of the most appropriate acoustic features is a crucial factor for performing robust speaker verification. The acoustic parameters used in the proposed system are: Mel Frequency Cepstral Coefficients, their first and second derivatives (Deltas and Delta–Deltas), Bark Frequency Cepstral Coefficients, Perceptual Linear Predictive, and Relative Spectral Transform Perceptual Linear Predictive. In this paper, a complete comparison of different combinations of the previous features is discussed. On the other hand, the major weakness of a conventional support vector machine (SVM) classifier is the use of generic traditional kernel functions to compute the distances among data points. However, the kernel function of an SVM has great influence on its performance. In this work, we propose the combination of two SVM-based classifiers with different kernel functions: linear kernel and Gaussian radial basis function kernel with a logistic regression classifier. The combination is carried out by means of a parallel structure approach, in which different voting rules to take the final decision are considered. Results show that significant improvement in the performance of the SV system is achieved by using the combined features with the combined classifiers either with clean speech or in the presence of noise. Finally, to enhance the system more in noisy environments, the inclusion of the multiband noise removal technique as a preprocessing stage is proposed. 相似文献
6.
To improve the speaker verification system in adverse conditions, a novel score fusion approach using adaptive method, based on a prior Equal Error Rate (EER), is presented in this paper. Currently, the most commonly used methods are the mean, product, minimum, maximum, or the weighted sum of scores. Our method introduces the MLP network which approximates the estimated scores under noisy conditions, to those of the ideal estimated in clean environments and gives the optimally weighted parameters, to be added in the adaptive weights used for weighting sum of scores. This method is assessed by using the NIST 2000 corpus and different feature extraction methods. Noisy conditions are created using NOISEX-92. In severely degraded conditions, the results show that the speaker verification process using our proposed score fusion approach applied to the GMM-UBM and GMM-SVM based systems, achieves better performances in terms of EER reduction than each system used alone. 相似文献
7.
A text-independent speaker recognition system based on multi-resolution singular value decomposition (MSVD) is proposed. The MSVD is applied to the speaker data compression and feature extraction not at the square matrix. Our results have shown that this MSVD introduced better performance than the other Karhunen-Loeve transform with respect to the percentages of recognition. 相似文献
8.
在说话人识别系统中,一种结合深度神经网路(DNN)、身份认证矢量(i-vector)和概率线性鉴别分析(PLDA)的模型被证明十分有效。为进一步提升PLDA模型信道补偿的性能,将降噪自动编码器(DAE)和受限玻尔兹曼机(RBM)以及它们的组合(DAE-RBM)分别应用到信道补偿PLDA模型端,降低说话人i-vector空间信道信息的影响。实验表明相比标准PLDA系统,基于DAE-PLDA和RBM-PLDA的识别系统的等错误率(EER)和检测代价函数(DCF)都显著降低,结合两者优势的DAE-RBMPLDA使系统识别性能得到了进一步提升。 相似文献
9.
An intelligent verification platform based on a structured analysis model is presented.Using an abstract model mechanism with specific signal interfaces for user callback,the unified structured analysis data,shared by the electronic system level design,functional verification,and performance evaluation,enables efficient management review,auto-generation of code,and modeling in the transaction level.We introduce the class tree,flow parameter diagram,structured flow chart,and event-driven finite state machine as structured analysis models.As a sand table to carry maps from different perspectives and levels via an engine,this highly reusable platform provides the mapping topology to search for unintended consequences and the graph theory for comprehensive coverage and smart test cases.Experimental results show that the engine generates efficient test sequences,with a sharp increase in coverage for the same vector count compared with a random test. 相似文献
10.
This work evaluates the performance of speaker verification system based on Wavelet based Fuzzy Learning Vector Quantization (WLVQ) algorithm. The parameters of Gaussian mixture model (GMM) are designed using this proposed algorithm. Mel Frequency Cepstral Coefficients (MFCC) are extracted from the speech data and vector quantized through Wavelet based FLVQ algorithm. This algorithm develops a multi resolution codebook by updating both winning and nonwinning prototypes through an unsupervised learning process. This codebook is used as mean vector of GMM. The other two parameters, weight and covariance are determined from the clusters formed by the WLVQ algorithm. The multi resolution property of wavelet transform and ability of FLVQ in regulating the competition between prototypes during learning are combined in this algorithm to develop an efficient codebook for GMM. Because of iterative nature of Expectation Maximization (EM) algorithm, the applicability of alternative training algorithms is worth investigation. In this work, the performance of speaker verification system using GMM trained by LVQ, FLVQ and WLVQ algorithms are evaluated and compared with EM algorithm. FLVQ and WLVQ based training algorithms for modeling speakers using GMM yields better performance than EM based GMM. 相似文献
11.
In this paper, we propose a sub-vector based speaker characterization method for biometric speaker verification, where speakers are represented by uniform segmentation of their maximum likelihood linear regression (MLLR) super-vectors called m-vectors. The MLLR transformation is estimated with respect to universal background model (UBM) without any speech/phonetic information. We introduce two strategies for segmentation of MLLR super-vector: one is called disjoint and other is an overlapped window technique. During test phase, m-vectors of the test utterance are scored against the claimant speaker. Before scoring, m-vectors are post-processed to compensate the session variability. In addition, we propose a clustering algorithm for multiple-class wise MLLR transformation, where Gaussian components of the UBM are clustered into different groups using the concept of expectation maximization (EM) and maximum likelihood (ML). In this case, MLLR transformations are estimated with respect to each class using the sufficient statistics accumulated from the Gaussian components belonging to the particular class, which are then used for m-vector system. The proposed method needs only once alignment of the data with respect to the UBM for multiple MLLR transformations. We first show that the proposed multi-class m-vector system shows promising speaker verification performance when compared to the conventional i-vector based speaker verification system. Secondly, the proposed EM based clustering technique is robust to the random initialization in-contrast to the conventional K-means algorithm and yields system performance better/equal which is best obtained by the K-means. Finally, we show that the fusion of the m-vector with the i-vector further improves the performance of the speaker verification in both score as well as feature domain. The experimental results are shown on various tasks of NIST 2008 speaker recognition evaluation (SRE) core condition. 相似文献
12.
Clustering is needed in various applications such as biometric person authentication, speech coding and recognition, image compression and information retrieval. Hundreds of clustering methods have been proposed for the task in various fields but, surprisingly, there are few extensive studies actually comparing them. An important question is how much the choice of a clustering method matters for the final pattern recognition application. Our goal is to provide a thorough experimental comparison of clustering methods for text-independent speaker verification. We consider parametric Gaussian mixture model (GMM) and non-parametric vector quantization (VQ) model using the best known clustering algorithms including iterative ( K-means, random swap, expectation-maximization), hierarchical (pairwise nearest neighbor, split, split-and-merge), evolutionary (genetic algorithm), neural (self-organizing map) and fuzzy (fuzzy C-means) approaches. We study recognition accuracy, processing time, clustering validity, and correlation of clustering quality and recognition accuracy. Experiments from these complementary observations indicate clustering is not a critical task in speaker recognition and the choice of the algorithm should be based on computational complexity and simplicity of the implementation. This is mainly because of three reasons: the data is not clustered, large models are used and only the best algorithms are considered. For low-order models, choice of the algorithm, however, can have a significant effect. 相似文献
13.
A new speaker feature extracted from wavelet eigenfunction estimation is described. The signal is decomposed through interpolating the scaling function. Wavelets can offer a significant computational advantage by reducing the dimensionality of the eigenvalue problem. Our results have shown that this wavelet feature introduced better performance than the other Karhunen-Loeve transform (KLT) with respect to the percentages of recognition. 相似文献
14.
Speaker verification has been studied widely from different points of view, including accuracy, robustness and being real-time. Recent studies have turned toward better feature stability and robustness. In this paper we study the effect of nonlinear manifold based dimensionality reduction for feature robustness. Manifold learning is a popular recent approach for nonlinear dimensionality reduction. Algorithms for this task are based on the idea that each data point may be described as a function of only a few parameters. Manifold learning algorithms attempt to uncover these parameters in order to find a low-dimensional representation of the data. From the manifold based dimension reduction approaches, we applied the widely used Isometric mapping (Isomap) algorithm. Since in the problem of speaker verification, the input utterance is compared with the model of the claiming client, a speaker dependent feature transformation would be beneficial for deciding on the identity of the speaker. Therefore, our first contribution is to use Isomap dimension reduction approach in the speaker dependent context and compare its performance with two other widely used approaches, namely principle component analysis and factor analysis. The other contribution of our work is to perform the nonlinear transformation in a speaker-dependent framework. We evaluated this approach in a GMM based speaker verification framework using Tfarsdat Telephone speech dataset for different noises and SNRs and the evaluations have shown reliability and robustness even in low SNRs. The results also show better performance for the proposed Isomap approach compared to the other approaches. 相似文献
15.
针对语音识别率不高的问题,提出一种基于PCS-PCA和支持向量机的分级说话人确认方法.首先采用主成分分析法对话者特征向量降维的同时,得到说话人特征向量的主成份空间,在此空间中构造PCS-PCA分类器,筛选可能的目标说话人,然后采用支持向量机进行最终的说话人确认.仿真实验结果表明该方法具有较高的识别率和较快的训练速度. 相似文献
16.
In speaker recognition tasks, one of the reasons for reduced accuracy is due to closely resembling speakers in the acoustic space. In order to increase the discriminative power of the classifier, the system must be able to use only the unique features of a given speaker with respect to his/her acoustically resembling speaker. This paper proposes a technique to reduce the confusion errors, by finding speaker-specific phonemes and formulate a text using the subset of phonemes that are unique, for speaker verification task using i-vector based approach. In this paper spectral features such as linear prediction cepstral co-efficients (LPCC), perceptual linear prediction co-efficients (PLP) and phase feature such as modified group delay are experimented to analyse the importance of speaker-specific-text in speaker verification task. Experiments have been conducted on speaker verification task using speech data of 50 speakers collected in a laboratory environment. The experiments show that the equal error rate (EER) has been decreased significantly using i-vector approach with speaker-specific-text when compared to i-vector approach with random-text using different spectral and phase based features. 相似文献
17.
Whispered speech speaker identification system is one of the most demanding efforts in automatic speaker recognition applications. Due to the profound variations between neutral and whispered speech in acoustic characteristics, the performance of conventional speaker identification systems applied on neutral speech degrades drastically when compared to whisper speech. This work presents a novel speaker identification system using whispered speech based on an innovative learning algorithm which is named as extreme learning machine (ELM). The features used in this proposed system are Instantaneous frequency with probability density models. Parametric and nonparametric probability density estimation with ELM was compared with the hybrid parametric and nonparametric probability density estimation with Extreme Learning Machine (HPNP-ELM) for instantaneous frequency modeling. The experimental result shows the significant performance improvement of the proposed whisper speech speaker identification system. 相似文献
18.
In this paper, the common vector approach (CVA) is newly used for text-independent speaker recognition. The performance of CVA is compared with those of Fisher’s linear discriminant analysis (FLDA) and Gaussian mixture models (GMM). The recognition rates obtained for the TIMIT database indicate that CVA and GMM are superior to FLDA. However, while the recognition rates obtained from CVA and GMM are identical, CVA enjoys advantages in terms of processing power and memory requirement. In order to obtain better results than those achieved with GMM, a new method which is a combination of CVA and GMM is proposed in this paper. 相似文献
19.
讨论了SRAM型FPGA信号完整性验证的必要性,提出了一种基于IBIS模型和HyperLynx软件的针对SRAM型FPGA器件信号完整性仿真验证方法,并以Stratix-2和Virtex-4两种类型的FPGA为例进行了实际仿真验证,分析了信号的有效数据宽度、电平幅值和传输速率等仿真验证结果,比较了这两种器件的信号完整性优劣,通过该仿真实例也验证了这种FPGA信号完整性仿真验证技术的可行性。随后对模型参数进行了仿真对比,得出造成器件信号完整性差异的内在机理,从而在设计上指导优化器件信号完整性性能。 相似文献
20.
为了提高说话人识别系统的识别效率,提出一种基于说话人模型聚类的说话人识别方法,通过近似KL距离将相似的说话人模型聚类,为每类确定类中心和类代表,构成分级说话人识别模型。测试时先通过计算测试矢量与类中心或类代表之间的距离选择类,再通过计算测试矢量与选中类中的说话人模型之间对数似然度确定目标说话人,这样可以大大减少计算量。实验结果显示,在相同条件下,基于说话人模型聚类的说话人识别的识别速度要比传统的GMM的识别速度快4倍,但是识别正确率只降低了0.95%。因此,与传统GMM相比,基于说话人模型聚类的说话人识别能在保证识别正确率的同时大大提高识别速度。 相似文献
|