首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
针对训练和测试阶段中的语音数据类型(普通话和四川方言)的不匹配导致说话人确认系统性能下降很大的问题,提出了一种新的建立高斯混合模型(GMM)方法——普通话和四川方言按比例混合建立普通话和四川方言联合GMM的方法,并发现使系统针对普通话和四川方言不匹配导致的性能下降率至很低(2.79%)的比例。实验结果表明,该方法可以有效地加强测试阶段针对语种变化的鲁棒性,可以有效的减少普通话和四川方言在训练和测试阶段的不匹配造成的性能下降率。  相似文献   

2.
研究了多种低速率信道环境下,语音编码对与文本无关说话人确认的影响。针对训练和测试语音匹配和不匹配的两种情况下,分别提出了两种方法来提高系统的鲁棒性。在前者中,通过分析语音编码对LPCC参数的影响,提出了一种基于编码失真的 LPCC 加权参数。在后者中,采用了基于高斯混合模型(GMM)的语音编码检测器,通过判别测试语音的编码类型,选择不同的说话人确认模型。实验结果表明,这两种方法提高了说话人确认系统在多信道条件下的鲁棒性。  相似文献   

3.
Speaker verification is a challenging problem in speaker recognition where the objective is to determine whether a segment of speech in fact comes from a specific individual. In supervised machine learning terms this is a challenging problem as, while examples belonging to the target class are easy to gather, the set of counter-examples is completely open. This makes it difficult to cast this as a supervised classification problem as it is difficult to construct a representative set of counter examples. So we cast this as a one-class classification problem and evaluate a variety of state-of-the-art one-class classification techniques on a benchmark speech recognition dataset. We construct this as a two-level classification process whereby, at the lower level, speech segments of 20 ms in length are classified and then a decision on an complete speech sample is made by aggregating these component classifications. We show that of the one-class classification techniques we evaluate, Gaussian Mixture Models shows the best performance on this task.  相似文献   

4.
一种确定高斯核模型参数的新方法   总被引:1,自引:0,他引:1       下载免费PDF全文
张翔  肖小玲  徐光祐 《计算机工程》2007,33(12):52-53,5
支持向量机中核函数及其参数的选择非常重要,该文提出了一种利用支持向量之间的距离求取高斯核函数参数的有效方法。该方法充分利用了支持向量机方法的最优判别函数仅仅与支持向量有关,并且支持向量为高斯核中心的特点。实验结果表明,该方法较好地反映了图像特征的本质,解决了高斯核函数参数在实际使用中不易确定的问题。  相似文献   

5.
提出了一种基于韵律特征和SVM的文本无关说话人确认系统。采用小波分析方法,从语音信号的MFCC、F0和能量轨迹中提取出超音段韵律特征,通过实验研究三者的韵律特征在特征层的最佳互补融合,得到信号的韵律特征PMFCCFE,用韵律特征的GMM均值超矢量作为参数训练目标话者的SVM模型,以更有效地区分目标话者和冒认话者。在NIST06 8side-1side数据库的实验表明,以短时倒谱参数的GMM-UBM系统为基准,超音段韵律特征的GMM-SVM系统的EER相对下降了57.9%,MinDCF相对下降了41.4%。  相似文献   

6.
以说话人识别中的背景模型为基础,根据模型中的各个高斯分量,构造出说话人特征空间,将长度不一样的语句映射成为空间中大小相同的向量,且经过相关矩阵进行规整后,采用线性支持向量机进行说话人识别。借鉴几种常见的特征规整方式,结合语句映射后的向量,提出四种不同的规整方法:均值/方差规整、权重规整、WLOG规整和球形规整,并与概率序列核进行比较研究。根据语音特征向量序列中相邻的特征向量的前后转移关系,结合提出的概率序列核,构造出转移概率序列核。实验在NIST2001库上进行,结果表明概率序列核模型识别性能接近经典的UBM-MAP模型,将这两类模型得分进行融合,可非常明显地提高识别性能,进一步融合转移概率序列核后,性能还可提高19.1%。  相似文献   

7.
高斯序列核支持向量机用于说话人识别   总被引:3,自引:1,他引:2       下载免费PDF全文
说话人识别问题具有重要的理论价值和深远的实用意义,在研究支持向量机核方法理论的基础上,将其与传统高斯混合模型(GMM)相结合构建成基于高斯序列核的支持向量机(SVM)。SVM的灵活性和强大分类能力主要在于可以根据要处理的问题来相应的选取核函数。在识别的过程中引入特征空间归正技术NAP(Nuisance Attribute Projection)对同一说话人在不同信道和环境所带来的特征差异进行弥补。用美国国家标准与技术研究所(NIST)2004年评测数据集进行实验,结果表明该方法可以大幅度提高识别率。  相似文献   

8.
In this paper a new text-independent speaker verification method GSMSV is proposed based on likelihood score normalization.In this novel method a global speaker model is established to represent the universal features of speech and normalize the likelihood score.Statistical analysis demonstrates that this normalization method can remove common factors of speech and bring the differences between speakers into prominence.As a result the equal error rate is decreased significantly,verification procedure is accelerated and system adaptability to speaking speed is improved.  相似文献   

9.
The performances of the automatic speaker verification (ASV) systems degrade due to the reduction in the amount of speech used for enrollment and verification. Combining multiple systems based on different features and classifiers considerably reduces speaker verification error rate with short utterances. This work attempts to incorporate supplementary information during the system combination process. We use quality of the estimated model parameters as supplementary information. We introduce a class of novel quality measures formulated using the zero-order sufficient statistics used during the i-vector extraction process. We have used the proposed quality measures as side information for combining ASV systems based on Gaussian mixture model–universal background model (GMM–UBM) and i-vector. The proposed methods demonstrate considerable improvement in speaker recognition performance on NIST SRE corpora, especially in short duration conditions. We have also observed improvement over existing systems based on different duration-based quality measures.  相似文献   

10.
This work is about intra-sentence segmentation performed before syntactic analysis of long sentences composed of at least 20 words in an English–Korean machine translation system. A long sentence has been known to spend enormous computational time and space when it is analyzed syntactically. It can also produce poor translation results. To resolve this problem, we partitioned a long sentence into a few segments to analyze each segment separately. To partition the sentence, firstly, we tried to find candidates for each segment position in the sentence. We then generated input vectors representing lexical contexts of the corresponding candidates and also used the support vector machines (SVM) algorithm to learn and recognize the appropriate segment positions. We used three kernel functions, the linear kernel, the polynomial kernel and the Gaussian kernel, to find optimal hyperplanes classifying proper positions and we compared results obtained from each kernel function. As a result of the experiments, we acquired 0.81, 0.83, and 0.79 f-measure values from the linear, polynomial and Gaussian kernel, respectively.  相似文献   

11.
We propose a new method for general Gaussian kernel hyperparameter optimization for support vector machines classification. The hyperparameters are constrained to lie on a differentiable manifold. The proposed optimization technique is based on a gradient-like descent algorithm adapted to the geometrical structure of the manifold of symmetric positive-definite matrices. We compare the performance of our approach with the classical support vector machine for classification and with other methods of the state of the art on toy data and on real world data sets.  相似文献   

12.
In the i-vector/probabilistic linear discriminant analysis (PLDA) technique, the PLDA backend classifier is modelled on i-vectors. PLDA defines an i-vector subspace that compensates the unwanted variability and helps to discriminate among speaker-phrase pairs. The channel or session variability manifested in i-vectors are known to be nonlinear in nature. PLDA training, however, assumes the variability to be linearly separable, thereby causing loss of important discriminating information. Besides, the i-vector estimation, itself, is known to be poor in case of short utterances. This paper attempts to address these issues using a simple hierarchy-based system. A modified fuzzy-clustering technique is employed to divide the feature space into more characteristic feature subspaces using vocal source features. Thereafter, a separate i-vector/PLDA model is trained for each of the subspaces. The sparser alignment owing to subspace-specific universal background model and the relatively reduced dimensions of variability in individual subspaces help to train more effective i-vector/PLDA models. Also, vocal source features are complementary to mel frequency cepstral coefficients, which are transformed into i-vectors using mixture model technique. As a consequence, vocal source features and i-vectors tend to have complementary information. Thus using vocal source features for classification in a hierarchy tree may help to differentiate some of the speaker-phrase classes, which otherwise are not easily discriminable based on i-vectors. The proposed technique has been validated on Part 1 of RSR2015 database, and it shows a relative equal error rate reduction of up to 37.41% with respect to the baseline i-vector/PLDA system.  相似文献   

13.
《Pattern recognition》2003,36(2):347-359
Speaker verification and utterance verification are examples of techniques that can be used for speaker authentication purposes.Speaker verification consists of accepting or rejecting the claimed identity of a speaker by processing samples of his/her voice. Usually, these systems are based on HMM models that try to represent the characteristics of the speakers’ vocal tracts.Utterance verification systems make use of a set of speaker-independent speech models to recognize a certain utterance. If the utterances consist of passwords, this can be used for identity verification purposes.Up to now, both techniques have been used separately. This paper is focused on the problem of how to combine these two sources of information. New architectures are presented to join an utterance verification system and a speaker verification system in order to improve the performance in a speaker verification task.  相似文献   

14.
Clustering is needed in various applications such as biometric person authentication, speech coding and recognition, image compression and information retrieval. Hundreds of clustering methods have been proposed for the task in various fields but, surprisingly, there are few extensive studies actually comparing them. An important question is how much the choice of a clustering method matters for the final pattern recognition application. Our goal is to provide a thorough experimental comparison of clustering methods for text-independent speaker verification. We consider parametric Gaussian mixture model (GMM) and non-parametric vector quantization (VQ) model using the best known clustering algorithms including iterative (K-means, random swap, expectation-maximization), hierarchical (pairwise nearest neighbor, split, split-and-merge), evolutionary (genetic algorithm), neural (self-organizing map) and fuzzy (fuzzy C-means) approaches. We study recognition accuracy, processing time, clustering validity, and correlation of clustering quality and recognition accuracy. Experiments from these complementary observations indicate clustering is not a critical task in speaker recognition and the choice of the algorithm should be based on computational complexity and simplicity of the implementation. This is mainly because of three reasons: the data is not clustered, large models are used and only the best algorithms are considered. For low-order models, choice of the algorithm, however, can have a significant effect.  相似文献   

15.
This paper describes a Speaker Verification System based on the use of multi resolution classifiers in order to cope with performance degradation due to natural variations of the excitation source and of the vocal tract. The different resolution representations of the speaker are obtained by considering multiple frame lengths in the feature extraction process and from these representations a single Pseudo‐Multi Parallel Branch (P‐MPB) Hidden Markov Model is obtained. In the verification process, different resolution representations of the speech signal are classified by multiple P‐MPB systems: the final decision is obtained by means of different combination techniques. The system based on the Weighted Majority Vote technique considerably outperforms baseline systems: improvements are between 15% and 38%. The execution time of the verification process is also evaluated and it proves to be very acceptable, thus allowing the use of the approach for applications in real time systems.  相似文献   

16.
针对语音识别率不高的问题,提出一种基于PCS-PCA和支持向量机的分级说话人确认方法.首先采用主成分分析法对话者特征向量降维的同时,得到说话人特征向量的主成份空间,在此空间中构造PCS-PCA分类器,筛选可能的目标说话人,然后采用支持向量机进行最终的说话人确认.仿真实验结果表明该方法具有较高的识别率和较快的训练速度.  相似文献   

17.
The cascading appearance-based (CAB) feature extraction technique has established itself as the state-of-the-art in extracting dynamic visual speech features for speech recognition. In this paper, we will focus on investigating the effectiveness of this technique for the related speaker verification application. By investigating the speaker verification ability of each stage of the cascade we will demonstrate that the same steps taken to reduce static speaker and environmental information for the visual speech recognition application also provide similar improvements for visual speaker recognition. A further study is conducted comparing synchronous HMM (SHMM) based fusion of CAB visual features and traditional perceptual linear predictive (PLP) acoustic features to show that higher complexity inherit in the SHMM approach does not appear to provide any improvement in the final audio–visual speaker verification system over simpler utterance level score fusion.  相似文献   

18.
Estimating the length of surgical cases is an important research topic due to its significant effect on the accuracy of the surgical schedule and operating room (OR) efficiency. Several factors can be considered in the estimation, for example, surgeon, surgeon experience, case type, case start time, etc. Some of these factors are correlated, and this correlation needs to be considered in the prediction model in order to have an accurate estimation. Extensive research exists that identifies the preferred estimation methods for cases that occur frequently. However, in practice, there are many procedure types with limited historical data, which makes it hard to use common statistical methods (such as regression) that rely on a large number of data points. Moreover, only point estimates are typically provided. In this research, kernel density estimation (KDE) is implemented as an estimator for the probability distribution of surgery duration, and a comparison against lognormal and Gaussian mixture models is reported, showing the efficiency of the KDE. In addition, an improvement procedure for the KDE that further enables the algorithm to outperform other methods is proposed. Based on the analysis, KDE can be recommended as an alternative estimator of surgical duration for cases with low volume (or limited historical data).  相似文献   

19.
基于GMM统计特性参数和SVM的话者确认   总被引:1,自引:0,他引:1  
针对与文本无关的话者确认中大量训练样本数据的情况,本文提出了一种基于GMM统计特性参数和支持向量机的与文本无关的话者确认系统,以说话人的GMM统计特性参数作为特征参数训练建立目标话者的SVM模型,既有效地提取了话者特征信息,解决了大样本数据下的SVM训练问题,又结合了统计模型鲁棒性好和辨别模型分辨力好的优点,提高了确认系统的确认性能及鲁棒性。对微软麦克风语音数据库和NIST’01手机电话语音数据库的实验表明该方法的有效性。  相似文献   

20.
Wavelet theory has a profound impact on signal processing as it offers a rigorous mathematical framework to the treatment of multiresolution problems. The combination of soft computing and wavelet theory has led to a number of new techniques. On the other hand, as a new generation of learning algorithms, support vector regression (SVR) was developed by Vapnik et al. recently, in which ?-insensitive loss function was defined as a trade-off between the robust loss function of Huber and one that enables sparsity within the SVs. The use of support vector kernel expansion also provides us a potential avenue to represent nonlinear dynamical systems and underpin advanced analysis. However, for the support vector regression with the standard quadratic programming technique, the implementation is computationally expensive and sufficient model sparsity cannot be guaranteed. In this article, from the perspective of model sparsity, the linear programming support vector regression (LP-SVR) with wavelet kernel was proposed, and the connection between LP-SVR with wavelet kernel and wavelet networks was analyzed. In particular, the potential of the LP-SVR for nonlinear dynamical system identification was investigated.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号