期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

蒋晔唐振民《计算机工程与应用》2010,46(11):179-182

在高斯混合模型（Gaussian Mixture Model,GMM）训练时,对传统的模型参数初始化方法（随机法、K均值聚类法）进行改进,提出分裂法与K均值聚类相结合的新方法。实验表明,采用改进的方法与传统方法相比,系统平均识别率有15.47%和7.5%的提高。研究了GMM的阶数、协方差阈值、预加重系数对系统识别率的影响。对实验结果进行详细分析,并根据实验数据,取它们各自表现最好的值,从而使构建的说话人识别系统获得一个较高的识别率。实验表明,在规定的实验条件下,系统可达到90%以上的识别率。相似文献

2.

基于GMM的说话人识别技术研究

下载免费PDF全文

曹洁潘鹏《计算机工程与应用》2011,47(11):114-117

为了探讨高斯混合模型在说话人识别中的作用,设计了一个基于GMM的说话人识别系统。整个系统由音频信号预处理,语音活动检测,说话人模型建立以及音频信号识别4个模块组成。前三个模块构成了系统的模型训练部分,最后一个模块构成了系统的语音识别部分。包含在第二个模块中的由GMM模型搭建的语音活动检测器是研究的创新之处。利用增强的多方互动会议语料库中的视听会议对系统中的部分可调参数以及系统的识别错误率进行了测试。仿真结果表明,在语音活动检测器和若干滤波算法的帮助下,系统对包含重叠语音的音频信号的识别准确率可以达到83.02%。相似文献

3.

改进的说话人聚类初始化和GMM的多说话人识别* 总被引：2，自引：1，他引：1

曹洁余丽珍b 《计算机应用研究》2012,29(2):590-593

针对多说话人聚类线性初始化方法精度较差的问题,提出了一种改进的聚类初始化方法。该方法引入BIC对由线性初始化产生的初始类进行检测分割,有效提升了说话人初始类纯度。最后将该方法应用到高斯混合模型(GMM)多说话人识别系统。实验结果表明,所提方法使说话人平均类纯度(ACP)提高了48.51%,系统的错误识别率平均降低12.09%。相似文献

4.

Discriminative tensor decomposition with large margin

《Digital Signal Processing》2019

Tensor decompositions have many application areas in several domains where one key application is revealing relational structure between multiple dimensions simultaneously and thus enabling the compression of relational data. In this paper, we propose the Discriminative Tensor Decomposition with Large Margin (shortly, Large Margin Tensor Decomposition, LMTD), which can be viewed as a tensor-to-tensor projection operation. It is a novel method for calculating the mutual projection matrices that map the tensors into a lower dimensional space such that the nearest neighbor classification accuracy is improved. The LMTD aims finding the mutual discriminative projection matrices which minimize the misclassification rate by minimizing the Frobenius distance between the same class instances (in-class neighbors) and maximizing the distance between different class instances (impostor neighbors). Two versions of LMTD are proposed, where the nearest neighbor classification error is computed in the feature (latent) or input (observations) space. We evaluate the proposed models on real data sets and provide a comparison study with alternative decomposition methods in the literature in terms of their classification accuracy and mean average precision. 相似文献

5.

基于GMM的普通话和四川方言独立文本的说话人确认

赵靖龚卫国杨利平《计算机应用》2008,28(3):792-794

针对训练和测试阶段中的语音数据类型（普通话和四川方言）的不匹配导致说话人确认系统性能下降很大的问题,提出了一种新的建立高斯混合模型（GMM）方法——普通话和四川方言按比例混合建立普通话和四川方言联合GMM的方法,并发现使系统针对普通话和四川方言不匹配导致的性能下降率至很低（2.79%）的比例。实验结果表明,该方法可以有效地加强测试阶段针对语种变化的鲁棒性,可以有效的减少普通话和四川方言在训练和测试阶段的不匹配造成的性能下降率。相似文献

6.

Common vector approach and its combination with GMM for text-independent speaker recognition

Selami Sadıç M. Bilginer Gülmezoğlu 《Expert systems with applications》2011,38(9):11394-11400

In this paper, the common vector approach (CVA) is newly used for text-independent speaker recognition. The performance of CVA is compared with those of Fisher’s linear discriminant analysis (FLDA) and Gaussian mixture models (GMM). The recognition rates obtained for the TIMIT database indicate that CVA and GMM are superior to FLDA. However, while the recognition rates obtained from CVA and GMM are identical, CVA enjoys advantages in terms of processing power and memory requirement. In order to obtain better results than those achieved with GMM, a new method which is a combination of CVA and GMM is proposed in this paper. 相似文献

7.

基于高斯混合模型的自然环境声音的识别

下载免费PDF全文

余清清李应李勇《计算机工程与应用》2011,47(25):152-155

提出了一种基于高斯混合模型（GMM）的自然环境声音的识别方法。提取Mel频率倒谱系数（MFCCs）来分析声音信号;对于每种声音使用期望最大化算法基于MFCC特征集建立高斯混合模型;使用最小错误率判决规则和投票裁决的方法进行识别。使用GMM对36种自然环境的声音进行识别的正确率可达95.83%,且识别效果优于K最近邻（KNN）。相似文献

8.

基于GMM区分性训练方法的语言辨识系统 总被引：2，自引：2，他引：0

屈丹王炳锡藏传辉《计算机工程与应用》2004,40(6):108-110

文章给出了一种新的语言辨识系统,该系统基于高斯混合模型的区分性训练算法。该区分训练算法在估计模型参数时,采用了广义概率下降法(GPD)和最小分类误差准则(MCE)。利用OGI多语言电话语料库对算法进行了测试,实验表明,该算法是进行语言辨识的一种有效方法。相似文献

9.

Comparison of the impact of some Minkowski metrics on VQ/GMM based speaker recognition

Cemal Hanilçi Figen Erta? 《Computers & Electrical Engineering》2011,37(1):41-56

This paper evaluates the impact of three special forms of the Minkowski metric (Euclidean, City Block, and Chebychev distances) on the performance of the conventional vector quantization (VQ) and Gaussian mixture model (GMM) based closed-set text-independent speaker recognition systems, in terms of recognition rate and confidence on decisions. For the VQ based system, evaluations are carried out using the two most common clustering algorithms, LBG and K-means, and it is revealed which clustering algorithm and distance pair should be used to exploit the best attribute of both to achieve the best recognition rate for a given codebook size. In the case of GMM based system, we introduce the metrics into the GMM using a concatenation of the LBG and K-means algorithms in estimating the initial mean vectors, to which the system performance is sensitive, and explore their impact on system performance. We also make comparison of results obtained from evaluations on clean speech (TIMIT) and telephone speech databases (NTIMIT and NIST2001) with the modern classifiers VQ-UBM and GMM-UBM. It is found that there are cases where conventional VQ based system outperforms the modern systems. Moreover, the impact of distance metrics on the performance of the conventional and modern systems depends on the recognition task imposed (verification/identification). 相似文献

10.

Multimedia document retrieval using speech and speaker recognition

Mahesh Viswanathan Homayoon S.M. Beigi Satya Dharanipragada Fereydoun Maali Alain Tritschler 《International Journal on Document Analysis and Recognition》2000,2(4):147-162

Speech and speaker recognition systems are rapidly being deployed in real-world applications. In this paper, we discuss the details of a system and its components for indexing and retrieving multimedia content derived from broadcast news sources. The audio analysis component calls for real-time speech recognition for converting the audio to text and concurrent speaker analysis consisting of the segmentation of audio into acoustically homogeneous sections followed by speaker identification. The output of these two simultaneous processes is used to abstract statistics to automatically build indexes for text-based and speaker-based retrieval without user intervention. The real power of multimedia document processing is the possibility of Boolean queries in the form of combined text- and speaker-based user queries. Retrieval for such queries entails combining the results of individual text and speaker based searches. The underlying techniques discussed here can easily be extended to other speech-centric applications and transactions. 相似文献

11.

Text-independent speaker recognition using LSTM-RNN and speech enhancement

El-Moneim Samia Abd Nassar M. A. Dessouky Moawad I. Ismail Nabil A. El-Fishawy Adel S. Abd El-Samie Fathi E. 《Multimedia Tools and Applications》2020,79(33-34):24013-24028

Speaker recognition revolution has lead to the inclusion of speaker recognition modules in several commercial products. Most published algorithms for speaker recognition focus on text-dependent speaker recognition. In contrast, text-independent speaker recognition is more advantageous as the client can talk freely to the system. In this paper, text-independent speaker recognition is considered in the presence of some degradation effects such as noise and reverberation. Mel-Frequency Cepstral Coefficients (MFCCs), spectrum and log-spectrum are used for feature extraction from the speech signals. These features are processed with the Long-Short Term Memory Recurrent Neural Network (LSTM-RNN) as a classification tool to complete the speaker recognition task. The network learns to recognize the speakers efficiently in a text-independent manner, when the recording circumstances are the same. The recognition rate reaches 95.33% using MFCCs, while it is increased to 98.7% when using spectrum or log-spectrum. However, the system has some challenges to recognize speakers from different recording environments. Hence, different speech enhancement techniques, such as spectral subtraction and wavelet denoising, are used to improve the recognition performance to some extent. The proposed approach shows superiority, when compared to the algorithm of R. Togneri and D. Pullella (2011).

相似文献

12.

Enhancing GMM speaker identification by incorporating SVM speaker verification for intelligent web-based speech applications

Ing-Jr Ding Chih-Ta Yen 《Multimedia Tools and Applications》2015,74(14):5131-5140

相似文献

13.

一种基于MFCC和LPCC的文本相关说话人识别方法 总被引：1，自引：0，他引：1

于明袁玉倩董浩王哲《计算机应用》2006,26(4):883-885

在说话人识别的建模过程中，为传统矢量量化模型的码字增加了方差分量，形成了一种新的连续码字分布的矢量量化模型。同时采用美尔倒谱系数及其差分和线性预测倒谱系数及其差分相结合作为识别的特征参数，来进行与文本有关的说话人识别。通过与动态时间规整算法和传统的矢量量化方法进行比较表明，在系统响应时间并未明显增加的基础上，该模型识别率有一定提高。相似文献

14.

I-vector based speaker recognition using advanced channel compensation techniques

《Computer Speech and Language》2014,28(1):121-140

This paper investigates advanced channel compensation techniques for the purpose of improving i-vector speaker verification performance in the presence of high intersession variability using the NIST 2008 and 2010 SRE corpora. The performance of four channel compensation techniques: (a) weighted maximum margin criterion (WMMC), (b) source-normalized WMMC (SN-WMMC), (c) weighted linear discriminant analysis (WLDA) and (d) source-normalized WLDA (SN-WLDA) have been investigated. We show that, by extracting the discriminatory information between pairs of speakers as well as capturing the source variation information in the development i-vector space, the SN-WLDA based cosine similarity scoring (CSS) i-vector system is shown to provide over 20% improvement in EER for NIST 2008 interview and microphone verification and over 10% improvement in EER for NIST 2008 telephone verification, when compared to SN-LDA based CSS i-vector system. Further, score-level fusion techniques are analyzed to combine the best channel compensation approaches, to provide over 8% improvement in DCF over the best single approach, SN-WLDA, for NIST 2008 interview/telephone enrolment-verification condition. Finally, we demonstrate that the improvements found in the context of CSS also generalize to state-of-the-art GPLDA with up to 14% relative improvement in EER for NIST SRE 2010 interview and microphone verification and over 7% relative improvement in EER for NIST SRE 2010 telephone verification. 相似文献

15.

Emotion recognition using semi-supervised feature selection with speaker normalization

Yaxin Sun Guihua Wen 《International Journal of Speech Technology》2015,18(3):317-331

相似文献

16.

Robust large margin discriminant tangent analysis for face recognition 总被引：2，自引：2，他引：0

Nanhai Yang Ran He Wei-Shi Zheng Xiukun Wang 《Neural computing & applications》2012,21(2):269-279

Fisher’s Linear Discriminant Analysis (LDA) has been recognized as a powerful technique for face recognition. However, it could be stranded in the non-Gaussian case. Nonparametric discriminant analysis (NDA) is a typical algorithm that extends LDA from Gaussian case to non-Gaussian case. However, NDA suffers from outliers and unbalance problems, which cause a biased estimation of the extra-class scatter information. To address these two problems, we propose a robust large margin discriminant tangent analysis method. A tangent subspace-based algorithm is first proposed to learn a subspace from a set of intra-class and extra-class samples which are distributed in a balanced way on the local manifold patch near each sample point, so that samples from the same class are clustered as close as possible and samples from different classes will be separated far away from the tangent center. Then each subspace is aligned to a global coordinate by tangent alignment. Finally, an outlier detection technique is further proposed to learn a more accurate decision boundary. Extensive experiments on challenging face recognition data set demonstrate the effectiveness and efficiency of the proposed method for face recognition. Compared to other nonparametric methods, the proposed one is more robust to outliers. 相似文献

17.

基于HMM和GMM的维吾尔语联机手写体识别研究

许辉热依曼.吐尔逊吾守尔.斯拉木《计算机工程与应用》2014,(11):202-205,222

给出了一个基于HMM和GMM双引擎识别模型的维吾尔语联机手写体整词识别系统。在GMM部分,系统提取了8-方向特征,生成8-方向特征样式图像、定位空间采样点以及提取模糊的方向特征。在对模型精细化迭代训练之后,得到GMM模型文件。HMM部分,系统采用了笔段特征的方法来获取笔段分段点特征序列,在对模型进行精细化迭代训练后,得到HMM模型文件。将GMM模型文件和HMM模型文件分别打包封装再进行联合封装成字典。在第一期的实验中,系统的识别率达到97%,第二期的实验中,系统的识别率高达99%。相似文献

18.

Discriminative learning of generative models: large margin multinomial mixture models for document classification

Hui Jiang Zhenyu Pan Pingzhao Hu 《Pattern Analysis & Applications》2015,18(3):535-551

相似文献

19.

Emotional speaker recognition in real life conditions using multiple descriptors and i-vector speaker modeling technique

Mansour Asma Chenchah Farah Lachiri Zied 《Multimedia Tools and Applications》2019,78(6):6441-6458

Multimedia Tools and Applications - Emotional speaker recognition under real life conditions becomes an urgent need for several applications. This paper proposes a novel approach using multiple... 相似文献

20.

基于分层识别的快速说话人识别研究

茅正冲涂文辉《计算机工程与科学》2018,40(7):1244-1249

随着说话人模型数量的增加,说话人识别系统的识别速度下降,不能满足实时性要求。针对这个问题,提出了基于分层识别模型的快速说话人识别方法。将变分法求解的KL散度的近似值作为模型间的相似性度量准则,并设计了说话人模型聚类的方法。结果表明,本文方法能够保证说话人模型聚类结果的有效性,在系统识别率损失很小的情况下,使系统的识别速度得到大幅度提升。相似文献