首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
GMM文本无关的说话人识别系统研究   总被引:3,自引:2,他引:1       下载免费PDF全文
在高斯混合模型(Gaussian Mixture Model,GMM)训练时,对传统的模型参数初始化方法(随机法、K均值聚类法)进行改进,提出分裂法与K均值聚类相结合的新方法。实验表明,采用改进的方法与传统方法相比,系统平均识别率有15.47%和7.5%的提高。研究了GMM的阶数、协方差阈值、预加重系数对系统识别率的影响。对实验结果进行详细分析,并根据实验数据,取它们各自表现最好的值,从而使构建的说话人识别系统获得一个较高的识别率。实验表明,在规定的实验条件下,系统可达到90%以上的识别率。  相似文献   

2.
为了探讨高斯混合模型在说话人识别中的作用,设计了一个基于GMM的说话人识别系统。整个系统由音频信号预处理,语音活动检测,说话人模型建立以及音频信号识别4个模块组成。前三个模块构成了系统的模型训练部分,最后一个模块构成了系统的语音识别部分。包含在第二个模块中的由GMM模型搭建的语音活动检测器是研究的创新之处。利用增强的多方互动会议语料库中的视听会议对系统中的部分可调参数以及系统的识别错误率进行了测试。仿真结果表明,在语音活动检测器和若干滤波算法的帮助下,系统对包含重叠语音的音频信号的识别准确率可以达到83.02%。  相似文献   

3.
改进的说话人聚类初始化和GMM的多说话人识别*   总被引:2,自引:1,他引:1  
针对多说话人聚类线性初始化方法精度较差的问题,提出了一种改进的聚类初始化方法。该方法引入BIC对由线性初始化产生的初始类进行检测分割,有效提升了说话人初始类纯度。最后将该方法应用到高斯混合模型(GMM)多说话人识别系统。实验结果表明,所提方法使说话人平均类纯度(ACP)提高了48.51%,系统的错误识别率平均降低12.09%。  相似文献   

4.
Tensor decompositions have many application areas in several domains where one key application is revealing relational structure between multiple dimensions simultaneously and thus enabling the compression of relational data. In this paper, we propose the Discriminative Tensor Decomposition with Large Margin (shortly, Large Margin Tensor Decomposition, LMTD), which can be viewed as a tensor-to-tensor projection operation. It is a novel method for calculating the mutual projection matrices that map the tensors into a lower dimensional space such that the nearest neighbor classification accuracy is improved. The LMTD aims finding the mutual discriminative projection matrices which minimize the misclassification rate by minimizing the Frobenius distance between the same class instances (in-class neighbors) and maximizing the distance between different class instances (impostor neighbors). Two versions of LMTD are proposed, where the nearest neighbor classification error is computed in the feature (latent) or input (observations) space. We evaluate the proposed models on real data sets and provide a comparison study with alternative decomposition methods in the literature in terms of their classification accuracy and mean average precision.  相似文献   

5.
针对训练和测试阶段中的语音数据类型(普通话和四川方言)的不匹配导致说话人确认系统性能下降很大的问题,提出了一种新的建立高斯混合模型(GMM)方法——普通话和四川方言按比例混合建立普通话和四川方言联合GMM的方法,并发现使系统针对普通话和四川方言不匹配导致的性能下降率至很低(2.79%)的比例。实验结果表明,该方法可以有效地加强测试阶段针对语种变化的鲁棒性,可以有效的减少普通话和四川方言在训练和测试阶段的不匹配造成的性能下降率。  相似文献   

6.
In this paper, the common vector approach (CVA) is newly used for text-independent speaker recognition. The performance of CVA is compared with those of Fisher’s linear discriminant analysis (FLDA) and Gaussian mixture models (GMM). The recognition rates obtained for the TIMIT database indicate that CVA and GMM are superior to FLDA. However, while the recognition rates obtained from CVA and GMM are identical, CVA enjoys advantages in terms of processing power and memory requirement. In order to obtain better results than those achieved with GMM, a new method which is a combination of CVA and GMM is proposed in this paper.  相似文献   

7.
提出了一种基于高斯混合模型(GMM)的自然环境声音的识别方法。提取Mel频率倒谱系数(MFCCs)来分析声音信号;对于每种声音使用期望最大化算法基于MFCC特征集建立高斯混合模型;使用最小错误率判决规则和投票裁决的方法进行识别。使用GMM对36种自然环境的声音进行识别的正确率可达95.83%,且识别效果优于K最近邻(KNN)。  相似文献   

8.
基于GMM区分性训练方法的语言辨识系统   总被引:2,自引:2,他引:0  
文章给出了一种新的语言辨识系统,该系统基于高斯混合模型的区分性训练算法。该区分训练算法在估计模型参数时,采用了广义概率下降法(GPD)和最小分类误差准则(MCE)。利用OGI多语言电话语料库对算法进行了测试,实验表明,该算法是进行语言辨识的一种有效方法。  相似文献   

9.
This paper evaluates the impact of three special forms of the Minkowski metric (Euclidean, City Block, and Chebychev distances) on the performance of the conventional vector quantization (VQ) and Gaussian mixture model (GMM) based closed-set text-independent speaker recognition systems, in terms of recognition rate and confidence on decisions. For the VQ based system, evaluations are carried out using the two most common clustering algorithms, LBG and K-means, and it is revealed which clustering algorithm and distance pair should be used to exploit the best attribute of both to achieve the best recognition rate for a given codebook size. In the case of GMM based system, we introduce the metrics into the GMM using a concatenation of the LBG and K-means algorithms in estimating the initial mean vectors, to which the system performance is sensitive, and explore their impact on system performance. We also make comparison of results obtained from evaluations on clean speech (TIMIT) and telephone speech databases (NTIMIT and NIST2001) with the modern classifiers VQ-UBM and GMM-UBM. It is found that there are cases where conventional VQ based system outperforms the modern systems. Moreover, the impact of distance metrics on the performance of the conventional and modern systems depends on the recognition task imposed (verification/identification).  相似文献   

10.
Speech and speaker recognition systems are rapidly being deployed in real-world applications. In this paper, we discuss the details of a system and its components for indexing and retrieving multimedia content derived from broadcast news sources. The audio analysis component calls for real-time speech recognition for converting the audio to text and concurrent speaker analysis consisting of the segmentation of audio into acoustically homogeneous sections followed by speaker identification. The output of these two simultaneous processes is used to abstract statistics to automatically build indexes for text-based and speaker-based retrieval without user intervention. The real power of multimedia document processing is the possibility of Boolean queries in the form of combined text- and speaker-based user queries. Retrieval for such queries entails combining the results of individual text and speaker based searches. The underlying techniques discussed here can easily be extended to other speech-centric applications and transactions.  相似文献   

11.

Speaker recognition revolution has lead to the inclusion of speaker recognition modules in several commercial products. Most published algorithms for speaker recognition focus on text-dependent speaker recognition. In contrast, text-independent speaker recognition is more advantageous as the client can talk freely to the system. In this paper, text-independent speaker recognition is considered in the presence of some degradation effects such as noise and reverberation. Mel-Frequency Cepstral Coefficients (MFCCs), spectrum and log-spectrum are used for feature extraction from the speech signals. These features are processed with the Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) as a classification tool to complete the speaker recognition task. The network learns to recognize the speakers efficiently in a text-independent manner, when the recording circumstances are the same. The recognition rate reaches 95.33% using MFCCs, while it is increased to 98.7% when using spectrum or log-spectrum. However, the system has some challenges to recognize speakers from different recording environments. Hence, different speech enhancement techniques, such as spectral subtraction and wavelet denoising, are used to improve the recognition performance to some extent. The proposed approach shows superiority, when compared to the algorithm of R. Togneri and D. Pullella (2011).

  相似文献   

12.
13.
一种基于MFCC和LPCC的文本相关说话人识别方法   总被引:1,自引:0,他引:1  
于明  袁玉倩  董浩  王哲 《计算机应用》2006,26(4):883-885
在说话人识别的建模过程中,为传统矢量量化模型的码字增加了方差分量,形成了一种新的连续码字分布的矢量量化模型。同时采用美尔倒谱系数及其差分和线性预测倒谱系数及其差分相结合作为识别的特征参数,来进行与文本有关的说话人识别。通过与动态时间规整算法和传统的矢量量化方法进行比较表明,在系统响应时间并未明显增加的基础上,该模型识别率有一定提高。  相似文献   

14.
This paper investigates advanced channel compensation techniques for the purpose of improving i-vector speaker verification performance in the presence of high intersession variability using the NIST 2008 and 2010 SRE corpora. The performance of four channel compensation techniques: (a) weighted maximum margin criterion (WMMC), (b) source-normalized WMMC (SN-WMMC), (c) weighted linear discriminant analysis (WLDA) and (d) source-normalized WLDA (SN-WLDA) have been investigated. We show that, by extracting the discriminatory information between pairs of speakers as well as capturing the source variation information in the development i-vector space, the SN-WLDA based cosine similarity scoring (CSS) i-vector system is shown to provide over 20% improvement in EER for NIST 2008 interview and microphone verification and over 10% improvement in EER for NIST 2008 telephone verification, when compared to SN-LDA based CSS i-vector system. Further, score-level fusion techniques are analyzed to combine the best channel compensation approaches, to provide over 8% improvement in DCF over the best single approach, SN-WLDA, for NIST 2008 interview/telephone enrolment-verification condition. Finally, we demonstrate that the improvements found in the context of CSS also generalize to state-of-the-art GPLDA with up to 14% relative improvement in EER for NIST SRE 2010 interview and microphone verification and over 7% relative improvement in EER for NIST SRE 2010 telephone verification.  相似文献   

15.
16.
Robust large margin discriminant tangent analysis for face recognition   总被引:2,自引:2,他引:0  
Fisher’s Linear Discriminant Analysis (LDA) has been recognized as a powerful technique for face recognition. However, it could be stranded in the non-Gaussian case. Nonparametric discriminant analysis (NDA) is a typical algorithm that extends LDA from Gaussian case to non-Gaussian case. However, NDA suffers from outliers and unbalance problems, which cause a biased estimation of the extra-class scatter information. To address these two problems, we propose a robust large margin discriminant tangent analysis method. A tangent subspace-based algorithm is first proposed to learn a subspace from a set of intra-class and extra-class samples which are distributed in a balanced way on the local manifold patch near each sample point, so that samples from the same class are clustered as close as possible and samples from different classes will be separated far away from the tangent center. Then each subspace is aligned to a global coordinate by tangent alignment. Finally, an outlier detection technique is further proposed to learn a more accurate decision boundary. Extensive experiments on challenging face recognition data set demonstrate the effectiveness and efficiency of the proposed method for face recognition. Compared to other nonparametric methods, the proposed one is more robust to outliers.  相似文献   

17.
给出了一个基于HMM和GMM双引擎识别模型的维吾尔语联机手写体整词识别系统。在GMM部分,系统提取了8-方向特征,生成8-方向特征样式图像、定位空间采样点以及提取模糊的方向特征。在对模型精细化迭代训练之后,得到GMM模型文件。HMM部分,系统采用了笔段特征的方法来获取笔段分段点特征序列,在对模型进行精细化迭代训练后,得到HMM模型文件。将GMM模型文件和HMM模型文件分别打包封装再进行联合封装成字典。在第一期的实验中,系统的识别率达到97%,第二期的实验中,系统的识别率高达99%。  相似文献   

18.
19.
Multimedia Tools and Applications - Emotional speaker recognition under real life conditions becomes an urgent need for several applications. This paper proposes a novel approach using multiple...  相似文献   

20.
随着说话人模型数量的增加,说话人识别系统的识别速度下降,不能满足实时性要求。针对这个问题,提出了基于分层识别模型的快速说话人识别方法。将变分法求解的KL散度的近似值作为模型间的相似性度量准则,并设计了说话人模型聚类的方法。结果表明,本文方法能够保证说话人模型聚类结果的有效性,在系统识别率损失很小的情况下,使系统的识别速度得到大幅度提升。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号