期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

吴昊鲁周迅《计算机工程与应用》2011,47(5):141-145

针对小波阈值选择的多样性,主要研究了小波自适应阈值消噪联合子空间增强对特定人汉语孤立词识别系统的鲁棒性提升。采用Mel倒谱系数,在基于矢量量化（VQ）和高斯混合模型（GMM）的两个系统上,检验采用联合Symlets小波多阈值消噪和子空间增强算法在互为先后顺序作用下系统的识别率,给出一个先Symlets小波阈值消噪再子空间增强的语音增强方法。人耳感官和Matlab实验证实该方法结合了两者的优点,不但平衡了语音失真和噪声抑制,亦可提高VQ系统的顽健性,而对于GMM系统作用有限。相似文献

2.

短语音噪声环境下说话人识别特征提取

高会贤马全福郑晓势《计算机应用》2010,30(10):2712-2714

为了使说话人识别系统在语音较短和存在噪声的环境下也具有较高的识别率,基于矢量量化识别算法,对提取的特征参数进行研究。把小波变换与美尔频率倒谱系数(MFCC)的提取相结合,并将改进后的特征与谱质心特征进行了组合,建立了一种美尔频率小波变换系数+谱质心(MFWTC+SC)的新的组合特征参数。经实验表明,该组合特征可以有效地提高说话人识别系统的性能。相似文献

3.

Statistical feature evaluation for classification of stressed speech

H. Patro G. Senthil Raja S. Dandapat 《International Journal of Speech Technology》2007,10(2-3):143-152

The variations in speech production due to stress have an adverse affect on the performances of speech and speaker recognition algorithms. In this work, different speech features, such as Sinusoidal Frequency Features (SFF), Sinusoidal Amplitude Features (SAF), Cepstral Coefficients (CC) and Mel Frequency Cepstral Coefficients (MFCC), are evaluated to find out their relative effectiveness to represent the stressed speech. Different statistical feature evaluation techniques, such as Probability density characteristics, F-ratio test, Kolmogorov-Smirnov test (KS test) and Vector Quantization (VQ) classifier are used to assess the performances of the speech features. Four different stressed conditions, Neutral, Compassionate, Anger and Happy are tested. The stressed speech database used in this work consists of 600 stressed speech files which are recorded from 30 speakers. SAF shows maximum recognition result followed by SFF, MFCC and CC respectively with the VQ classifier. The relative classification results and the relative magnitudes of F-ratio values for SFF, MFCC and CC features are obtained with the same order. SFF and MFCC feature show consistent relative performance for all the three tests, F-ratio, K-S test and VQ classifier. 相似文献

4.

多声源环境下的鲁棒说话人识别

张凤仪夏秀渝冉国敬何礼叶于林《计算机系统应用》2015,24(4):32-37

针对多声源干扰环境下说话人识别系统性能急剧下降的问题,提出一种提取目标语音的前端处理方法,该方法依据独立语音时频域的近似稀疏性,基于目标语音方位信息采用非线性时频掩蔽方法提取目标语音。建立了基于梅尔倒谱系数(MFCC)的高斯混合模型(GMM)说话人识别系统。仿真实验证明,该方法能有效提取目标语音,提高说话人识别系统的鲁棒性。该文多声源干扰仿真实验条件下,说话人识别系统的识别率平均提高了25%左右。相似文献

5.

A hybrid VQ-GMM approach for identifying Indian languages

Pinki Roy Pradip K. Das 《International Journal of Speech Technology》2013,16(1):33-39

Language Identification is the task of identifying a language from a given spoken utterance. Main task of a language identifier is to design an efficient algorithm which helps a machine to identify correctly a particular language from a given audio sample. We have proposed here a hybrid approach for identifying a language which is a combination of Vector Quantization (VQ) and Gaussian Mixture Models (GMM). A brief review of work carried out in the area of Speaker Identification using VQ-GMM hybrid approach is discussed here. We have carried out experiments for identifying four Indian Languages—Assamese, Bengali, Hindi and Indian English. The experiments were carried out on our own recorded standard language database collected from 50 speakers. Speech features were extracted using MFCCs. Results show that after applying hybrid approach, accuracy is best with highest mixture order and with the increase in mixture order, accuracy increases uniformly for all four languages. It is also concluded here that hybrid approach gives better results when compared with the baseline GMM system. 相似文献

6.

Speaker recognition under stressed condition

G. Senthil Raja S. Dandapat 《International Journal of Speech Technology》2010,13(3):141-161

This paper presents the feature analysis and design of compensators for speaker recognition under stressed speech conditions. Any condition that causes a speaker to vary his or her speech production from normal or neutral condition is called stressed speech condition. Stressed speech is induced by emotion, high workload, sleep deprivation, frustration and environmental noise. In stressed condition, the characteristics of speech signal are different from that of normal or neutral condition. Due to changes in speech signal characteristics, performance of the speaker recognition system may degrade under stressed speech conditions. Firstly, six speech features (mel-frequency cepstral coefficients (MFCC), linear prediction (LP) coefficients, linear prediction cepstral coefficients (LPCC), reflection coefficients (RC), arc-sin reflection coefficients (ARC) and log-area ratios (LAR)), which are widely used for speaker recognition, are analyzed for evaluation of their characteristics under stressed condition. Secondly, Vector Quantization (VQ) classifier and Gaussian Mixture Model (GMM) are used to evaluate speaker recognition results with different speech features. This analysis help select the best feature set for speaker recognition under stressed condition. Finally, four VQ based novel compensation techniques are proposed and evaluated for improvement of speaker recognition under stressed condition. The compensation techniques are speaker and stressed information based compensation (SSIC), compensation by removal of stressed vectors (CRSV), cepstral mean normalization (CMN) and combination of MFCC and sinusoidal amplitude (CMSA) features. Speech data from SUSAS database corresponding to four different stressed conditions, Angry, Lombard, Question and Neutral, are used for analysis of speaker recognition under stressed condition. 相似文献

7.

A heterogeneous speech feature vectors generation approach with hybrid hmm classifiers

Virender Kadyan Archana Mantri R. K. Aggarwal 《International Journal of Speech Technology》2017,20(4):761-769

Automatic speech recognition (ASR) system plays a vital role in the human–machine interaction. ASR system faces the challenge of performance degradation due to inconsistency between training and testing phases. This occurs due to extraction and representation of erroneous, redundant feature vectors. This paper proposes three different combinations at speech feature vector generation phase and two hybrid classifiers at modeling phase. In feature extraction phase MFCC, RASTA-PLP, and PLP are combined in different ways. In modeling phase, the mean and variance are calculated to generate the inter and intra class feature vectors. These feature vectors are further adopted by optimization algorithm to generate refined feature vectors with traditional statistical technique. This approach uses GA?+?HMM and DE?+?HMM techniques to produce refine model parameters. The experiments are conducted on datasets of large vocabulary isolated Punjabi lexicons. The simulation result shows the performance improvement using MFCC and DE?+?HMM technique when compared with RASTA-PLP, PLP using hybrid HMM classifiers. 相似文献

8.

Automatic Speaker Recognition from Speech Signals Using Self Organizing Feature Map and Hybrid Neural Network

《Microprocessors and Microsystems》2020

This paper explains a new hybrid method for Automatic Speaker Recognition using speech signals based on the Artificial Neural Network (ANN). ASR performance characteristics is regarded as the foremost challenge and necessitated to be improved. This research work mainly focusses on resolving the ASR problems as well as to improve the accuracy of the prediction of a speaker.. Mel Frequency Cepstral Coefficient (MFCC) is greatly exploited for signal feature extraction.The input samples are created using these extracted features and its dimensions have been reduced using Self Organizing Feature Map (SOFM). Finally, using the reduced input samples, recognition is performed using Multilayer Perceptron (MLP) with Bayesian Regularization.. The training of the network has been accomplished and verified by means of real speech datasets from the Multivariability speaker recognition database for 10 speakers. The proposed method is validated by performance estimation as well as classification accuracies in contradiction to other models.The proposed method gives better recognition rate and 93.33% accuracy is attained. 相似文献

9.

Filterbank optimization for robust ASR using GA and PSO

R. K. Aggarwal M. Dave 《International Journal of Speech Technology》2012,15(2):191-201

Automatic speech recognition (ASR) systems follow a well established approach of pattern recognition, that is signal processing based feature extraction at front-end and likelihood evaluation of feature vectors at back-end. Mel-frequency cepstral coefficients (MFCCs) are the features widely used in state-of-the-art ASR systems, which are derived by logarithmic spectral energies of the speech signal using Mel-scale filterbank. In filterbank analysis of MFCC there is no consensus for the spacing and number of filters used in various noise conditions and applications. In this paper, we propose a novel approach to use particle swarm optimization (PSO) and genetic algorithm (GA) to optimize the parameters of MFCC filterbank such as the central and side frequencies. The experimental results show that the new front-end outperforms the conventional MFCC technique. All the investigations are conducted using two separate classifiers, HMM and MLP, for Hindi vowels recognition in typical field condition as well as in noisy environment. 相似文献

10.

The Supervised Learning Gaussian Mixture Model

下载免费PDF全文

Ma Jiyong Gao Wen 《计算机科学技术学报》1998,13(5):471-474

The traditional Gaussian Mixture Model(GMM)for pattern recognition is an unsupervised learning method.The parameters in the model are derived only by the training samples in one class without taking into account the effect of sample distributions of other classes,hence,its recognition accuracy is not ideal sometimes.This paper introduces an approach for estimating the parameters in GMM in a supervising way.The Supervised Learning Gaussian Mixture Model(SLGMM)improves the recognition accuracy of the GMM.An experimental example has shown its effectiveness.The experimental results have shown that the recognition accuracy derived by the approach is higher than those obtained by the Vector Quantization(VQ)approach,the Radial Basis Function (RBF) network model,the Learning Vector Quantization (LVQ) approach and the GMM.In addition,the training time of the approach is less than that of Multilayer Perceptrom(MLP). 相似文献

11.

新颖检测法在说话人识别技术中的应用

石艳王晓晔《现代计算机》2008,(7)

提出一种用于说话人识别中说话人语音特征向量聚类的方法--新颖检测法.通过提取出的特征参数(MFCC和LPCC),建立系统模型,实验结果表明,将新颖检测法结合VQ用于特征向量的分类,较之于单纯的VQ分类,取得了识别率高、稳健型强、确认可靠的效果. 相似文献

12.

基于动态阈值失量量化的说话人识别 总被引：2，自引：0，他引：2

亢明汪成亮陈娟娟《计算机应用》2009,29(1):146-148

在基于矢量量化的说话识别系统所选用的LBG算法中,码本分裂时的阈值是影响初始码本生成的重要因素之一, 而传统方式所采用的阈值不容易确定,且需要进行大量的实验来获得经验值。提出在一定范围内动态地,随机地产生阈值的方法来改进初始码本形成策略,并结合差分倒谱参数建立说话人识别模型。实验结果表明该方法在识别率得到一定改善的前提下,训练时间及识别时间有了明显改善。相似文献

13.

Robust regression fusion of GMM-UBM and GMM-SVM normalized scores using G729 bit-stream for speaker recognition over IP

Dalila Yessad Abderrahmane Amrouche 《International Journal of Speech Technology》2014,17(1):43-51

A novel approach, based on robust regression with normalized score fusion (namely Normalized Scores following Robust Regression Fusion: NSRRF), is proposed for enhancement of speaker recognition over IP networks, which can be used both in Network Speaker Recognition (NSR) and Distributed Speaker Recognition (DSR) systems. In this framework, it is basically assumed that the speech must be encoded by G729 coder in client side, and then, transmitted at a server side, where the ASR systems are located. The Universal Background Gaussian Mixture Model (GMM-UBM) and Gaussian Supervector (GMM-SVM) with normalized scores are used for speaker recognition. In this work, Mel Frequency Cepstral Coefficient (MFCC) and Linear Prediction Cepstral Coefficient (LPCC), both of these features are derived from Line Spectral Pairs (LSP) extracted from G729 bit-stream over IP, constitute the features vectors. Experimental results, conducted with the LIA SpkDet system based on the ALIZE platform3 using ARADIGITS database, have shown in first that the proposed method using features extracted directly from G729 bit-stream reduces significantly the error rate and outperforms the baseline system in ASR over IP based on the resynthesized (reconstructed) speech obtained from the G729 decoder. In addition, the obtained results show that the proposed approach, based on scores normalization following robust regression fusion technique, achieves the best result and outperform the conventional ASR over IP network. 相似文献

14.

DHMM在机械设备音频识别中的应用

苏鹏程健《计算机工程与应用》2015,51(1):266-270

为了对现场机械或设备进行监控、诊断和识别,以音频为监控手段,引入矢量量化（VQ）算法并建立机械设备音频的离散隐Markov模型（DHMM）。特征参数采用MFCC,码书设计采用Linde-Buzo-Gray（LBG）算法;推导出Baum-Welch算法参数重估的多观察序列的最简标定形式;分析了多种HMM类型,提出了适合机械设备音频的HMM。实验在22种音频中进行,识别准确率在97%以上,证明了方法的有效性。相似文献

15.

一种基于MFCC和LPCC的文本相关说话人识别方法 总被引：1，自引：0，他引：1

于明袁玉倩董浩王哲《计算机应用》2006,26(4):883-885

在说话人识别的建模过程中，为传统矢量量化模型的码字增加了方差分量，形成了一种新的连续码字分布的矢量量化模型。同时采用美尔倒谱系数及其差分和线性预测倒谱系数及其差分相结合作为识别的特征参数，来进行与文本有关的说话人识别。通过与动态时间规整算法和传统的矢量量化方法进行比较表明，在系统响应时间并未明显增加的基础上，该模型识别率有一定提高。相似文献

16.

Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning

El-Moneim Samia Abd El-Mordy Eman Abd Nassar M. A. Dessouky Moawad I. Ismail Nabil A. El-Fishawy Adel S. El-Dolil Sami El-Dokany Ibrahim M. El-Samie Fathi E. Abd 《International Journal of Speech Technology》2022,25(3):679-687

International Journal of Speech Technology - Automatic Speaker Recognition (ASR) in mismatched conditions is a challenging task, since robust feature extraction and classification... 相似文献

17.

基于基音周期与清浊音信息的梅尔倒谱参数 总被引：1，自引：0，他引：1

郭武王仁华戴礼荣《数据采集与处理》2007,22(2):229-233

提出一种在浊音部分不固定帧长的梅尔倒谱参数(Mel-cepstrum)提取的方法。针对浊音和清音所包含信息量不同,对浊音进行双倍的加权,从而将基音与清浊音信息融合进梅尔倒谱参数。将这种动态的梅尔倒谱参数应用在说话人确认中,在混合高斯模型(Gaussian mixture models,GMM)的情况下,取得了比常用的梅尔刻度式倒频谱参数(Mel-frequency cepstral coefficient,MFCC)更高的识别率,在NIST 2002年测试数据库中,512个混合高斯下能够将等错误率(EER)由9.4%降低到8.3%,2 048个混合高斯下能够将等错误率由7.8%降低到6.9%。相似文献

18.

Wavelet fuzzy LVQ based speaker verification system

P. Shanmugapriya Y. Venkataramani 《International Journal of Speech Technology》2013,16(4):403-412

This work evaluates the performance of speaker verification system based on Wavelet based Fuzzy Learning Vector Quantization (WLVQ) algorithm. The parameters of Gaussian mixture model (GMM) are designed using this proposed algorithm. Mel Frequency Cepstral Coefficients (MFCC) are extracted from the speech data and vector quantized through Wavelet based FLVQ algorithm. This algorithm develops a multi resolution codebook by updating both winning and nonwinning prototypes through an unsupervised learning process. This codebook is used as mean vector of GMM. The other two parameters, weight and covariance are determined from the clusters formed by the WLVQ algorithm. The multi resolution property of wavelet transform and ability of FLVQ in regulating the competition between prototypes during learning are combined in this algorithm to develop an efficient codebook for GMM. Because of iterative nature of Expectation Maximization (EM) algorithm, the applicability of alternative training algorithms is worth investigation. In this work, the performance of speaker verification system using GMM trained by LVQ, FLVQ and WLVQ algorithms are evaluated and compared with EM algorithm. FLVQ and WLVQ based training algorithms for modeling speakers using GMM yields better performance than EM based GMM. 相似文献

19.

基于HMM的性别识别 总被引：2，自引：1，他引：2

邓英欧贵文《计算机工程与应用》2004,40(15):74-75

进行男女生识别的方法有很多种,如GMM,VQ等,该文提出了基于HMM进行说话人性别识别的方法,该方法通过计算语音信号的Mel频率倒谱系数(MFCC)并使用隐马尔可夫模型(HMM)进行性别识别。在实验室环境下,对50个不同说话人(其中男女说话人各占一半)的语音文件采用该方法与基于VQ的方法进行比较实验,从实验方法和实验结果方面得出结论:HMM的方法更加简单易行,识别率更高。对于实验的语音材料,采用HMM的方法识别率可以达到100%。相似文献

20.

Automatic genre classification of Indian Tamil and western music using fractional MFCC

Betsy Rajesh D. G. Bhalke 《International Journal of Speech Technology》2016,19(3):551-563

相似文献