共查询到20条相似文献,搜索用时 31 毫秒
1.
针对小波阈值选择的多样性,主要研究了小波自适应阈值消噪联合子空间增强对特定人汉语孤立词识别系统的鲁棒性提升。采用Mel倒谱系数,在基于矢量量化(VQ)和高斯混合模型(GMM)的两个系统上,检验采用联合Symlets小波多阈值消噪和子空间增强算法在互为先后顺序作用下系统的识别率,给出一个先Symlets小波阈值消噪再子空间增强的语音增强方法。人耳感官和Matlab实验证实该方法结合了两者的优点,不但平衡了语音失真和噪声抑制,亦可提高VQ系统的顽健性,而对于GMM系统作用有限。 相似文献
2.
3.
H. Patro G. Senthil Raja S. Dandapat 《International Journal of Speech Technology》2007,10(2-3):143-152
The variations in speech production due to stress have an adverse affect on the performances of speech and speaker recognition algorithms. In this work, different speech features, such as Sinusoidal Frequency Features (SFF), Sinusoidal Amplitude Features (SAF), Cepstral Coefficients (CC) and Mel Frequency Cepstral Coefficients (MFCC), are evaluated to find out their relative effectiveness to represent the stressed speech. Different statistical feature evaluation techniques, such as Probability density characteristics, F-ratio test, Kolmogorov-Smirnov test (KS test) and Vector Quantization (VQ) classifier are used to assess the performances of the speech features. Four different stressed conditions, Neutral, Compassionate, Anger and Happy are tested. The stressed speech database used in this work consists of 600 stressed speech files which are recorded from 30 speakers. SAF shows maximum recognition result followed by SFF, MFCC and CC respectively with the VQ classifier. The relative classification results and the relative magnitudes of F-ratio values for SFF, MFCC and CC features are obtained with the same order. SFF and MFCC feature show consistent relative performance for all the three tests, F-ratio, K-S test and VQ classifier. 相似文献
4.
5.
Language Identification is the task of identifying a language from a given spoken utterance. Main task of a language identifier is to design an efficient algorithm which helps a machine to identify correctly a particular language from a given audio sample. We have proposed here a hybrid approach for identifying a language which is a combination of Vector Quantization (VQ) and Gaussian Mixture Models (GMM). A brief review of work carried out in the area of Speaker Identification using VQ-GMM hybrid approach is discussed here. We have carried out experiments for identifying four Indian Languages—Assamese, Bengali, Hindi and Indian English. The experiments were carried out on our own recorded standard language database collected from 50 speakers. Speech features were extracted using MFCCs. Results show that after applying hybrid approach, accuracy is best with highest mixture order and with the increase in mixture order, accuracy increases uniformly for all four languages. It is also concluded here that hybrid approach gives better results when compared with the baseline GMM system. 相似文献
6.
This paper presents the feature analysis and design of compensators for speaker recognition under stressed speech conditions.
Any condition that causes a speaker to vary his or her speech production from normal or neutral condition is called stressed
speech condition. Stressed speech is induced by emotion, high workload, sleep deprivation, frustration and environmental noise.
In stressed condition, the characteristics of speech signal are different from that of normal or neutral condition. Due to
changes in speech signal characteristics, performance of the speaker recognition system may degrade under stressed speech
conditions. Firstly, six speech features (mel-frequency cepstral coefficients (MFCC), linear prediction (LP) coefficients,
linear prediction cepstral coefficients (LPCC), reflection coefficients (RC), arc-sin reflection coefficients (ARC) and log-area
ratios (LAR)), which are widely used for speaker recognition, are analyzed for evaluation of their characteristics under stressed
condition. Secondly, Vector Quantization (VQ) classifier and Gaussian Mixture Model (GMM) are used to evaluate speaker recognition
results with different speech features. This analysis help select the best feature set for speaker recognition under stressed
condition. Finally, four VQ based novel compensation techniques are proposed and evaluated for improvement of speaker recognition
under stressed condition. The compensation techniques are speaker and stressed information based compensation (SSIC), compensation
by removal of stressed vectors (CRSV), cepstral mean normalization (CMN) and combination of MFCC and sinusoidal amplitude
(CMSA) features. Speech data from SUSAS database corresponding to four different stressed conditions, Angry, Lombard, Question
and Neutral, are used for analysis of speaker recognition under stressed condition. 相似文献
7.
Virender Kadyan Archana Mantri R. K. Aggarwal 《International Journal of Speech Technology》2017,20(4):761-769
Automatic speech recognition (ASR) system plays a vital role in the human–machine interaction. ASR system faces the challenge of performance degradation due to inconsistency between training and testing phases. This occurs due to extraction and representation of erroneous, redundant feature vectors. This paper proposes three different combinations at speech feature vector generation phase and two hybrid classifiers at modeling phase. In feature extraction phase MFCC, RASTA-PLP, and PLP are combined in different ways. In modeling phase, the mean and variance are calculated to generate the inter and intra class feature vectors. These feature vectors are further adopted by optimization algorithm to generate refined feature vectors with traditional statistical technique. This approach uses GA?+?HMM and DE?+?HMM techniques to produce refine model parameters. The experiments are conducted on datasets of large vocabulary isolated Punjabi lexicons. The simulation result shows the performance improvement using MFCC and DE?+?HMM technique when compared with RASTA-PLP, PLP using hybrid HMM classifiers. 相似文献
8.
This paper explains a new hybrid method for Automatic Speaker Recognition using speech signals based on the Artificial Neural Network (ANN). ASR performance characteristics is regarded as the foremost challenge and necessitated to be improved. This research work mainly focusses on resolving the ASR problems as well as to improve the accuracy of the prediction of a speaker.. Mel Frequency Cepstral Coefficient (MFCC) is greatly exploited for signal feature extraction.The input samples are created using these extracted features and its dimensions have been reduced using Self Organizing Feature Map (SOFM). Finally, using the reduced input samples, recognition is performed using Multilayer Perceptron (MLP) with Bayesian Regularization.. The training of the network has been accomplished and verified by means of real speech datasets from the Multivariability speaker recognition database for 10 speakers. The proposed method is validated by performance estimation as well as classification accuracies in contradiction to other models.The proposed method gives better recognition rate and 93.33% accuracy is attained. 相似文献
9.
Automatic speech recognition (ASR) systems follow a well established approach of pattern recognition, that is signal processing
based feature extraction at front-end and likelihood evaluation of feature vectors at back-end. Mel-frequency cepstral coefficients
(MFCCs) are the features widely used in state-of-the-art ASR systems, which are derived by logarithmic spectral energies of
the speech signal using Mel-scale filterbank. In filterbank analysis of MFCC there is no consensus for the spacing and number
of filters used in various noise conditions and applications. In this paper, we propose a novel approach to use particle swarm
optimization (PSO) and genetic algorithm (GA) to optimize the parameters of MFCC filterbank such as the central and side frequencies.
The experimental results show that the new front-end outperforms the conventional MFCC technique. All the investigations are
conducted using two separate classifiers, HMM and MLP, for Hindi vowels recognition in typical field condition as well as
in noisy environment. 相似文献
10.
The traditional Gaussian Mixture Model(GMM)for pattern recognition is an unsupervised learning method.The parameters in the model are derived only by the training samples in one class without taking into account the effect of sample distributions of other classes,hence,its recognition accuracy is not ideal sometimes.This paper introduces an approach for estimating the parameters in GMM in a supervising way.The Supervised Learning Gaussian Mixture Model(SLGMM)improves the recognition accuracy of the GMM.An experimental example has shown its effectiveness.The experimental results have shown that the recognition accuracy derived by the approach is higher than those obtained by the Vector Quantization(VQ)approach,the Radial Basis Function (RBF) network model,the Learning Vector Quantization (LVQ) approach and the GMM.In addition,the training time of the approach is less than that of Multilayer Perceptrom(MLP). 相似文献
11.
提出一种用于说话人识别中说话人语音特征向量聚类的方法--新颖检测法.通过提取出的特征参数(MFCC和LPCC),建立系统模型,实验结果表明,将新颖检测法结合VQ用于特征向量的分类,较之于单纯的VQ分类,取得了识别率高、稳健型强、确认可靠的效果. 相似文献
12.
13.
A novel approach, based on robust regression with normalized score fusion (namely Normalized Scores following Robust Regression Fusion: NSRRF), is proposed for enhancement of speaker recognition over IP networks, which can be used both in Network Speaker Recognition (NSR) and Distributed Speaker Recognition (DSR) systems. In this framework, it is basically assumed that the speech must be encoded by G729 coder in client side, and then, transmitted at a server side, where the ASR systems are located. The Universal Background Gaussian Mixture Model (GMM-UBM) and Gaussian Supervector (GMM-SVM) with normalized scores are used for speaker recognition. In this work, Mel Frequency Cepstral Coefficient (MFCC) and Linear Prediction Cepstral Coefficient (LPCC), both of these features are derived from Line Spectral Pairs (LSP) extracted from G729 bit-stream over IP, constitute the features vectors. Experimental results, conducted with the LIA SpkDet system based on the ALIZE platform3 using ARADIGITS database, have shown in first that the proposed method using features extracted directly from G729 bit-stream reduces significantly the error rate and outperforms the baseline system in ASR over IP based on the resynthesized (reconstructed) speech obtained from the G729 decoder. In addition, the obtained results show that the proposed approach, based on scores normalization following robust regression fusion technique, achieves the best result and outperform the conventional ASR over IP network. 相似文献
14.
为了对现场机械或设备进行监控、诊断和识别,以音频为监控手段,引入矢量量化(VQ)算法并建立机械设备音频的离散隐Markov模型(DHMM)。特征参数采用MFCC,码书设计采用Linde-Buzo-Gray(LBG)算法;推导出Baum-Welch算法参数重估的多观察序列的最简标定形式;分析了多种HMM类型,提出了适合机械设备音频的HMM。实验在22种音频中进行,识别准确率在97%以上,证明了方法的有效性。 相似文献
15.
16.
El-Moneim Samia Abd El-Mordy Eman Abd Nassar M. A. Dessouky Moawad I. Ismail Nabil A. El-Fishawy Adel S. El-Dolil Sami El-Dokany Ibrahim M. El-Samie Fathi E. Abd 《International Journal of Speech Technology》2022,25(3):679-687
International Journal of Speech Technology - Automatic Speaker Recognition (ASR) in mismatched conditions is a challenging task, since robust feature extraction and classification... 相似文献
17.
基于基音周期与清浊音信息的梅尔倒谱参数 总被引:1,自引:0,他引:1
提出一种在浊音部分不固定帧长的梅尔倒谱参数(Mel-cepstrum)提取的方法。针对浊音和清音所包含信息量不同,对浊音进行双倍的加权,从而将基音与清浊音信息融合进梅尔倒谱参数。将这种动态的梅尔倒谱参数应用在说话人确认中,在混合高斯模型(Gaussian mixture models,GMM)的情况下,取得了比常用的梅尔刻度式倒频谱参数(Mel-frequency cepstral coefficient,MFCC)更高的识别率,在NIST 2002年测试数据库中,512个混合高斯下能够将等错误率(EER)由9.4%降低到8.3%,2 048个混合高斯下能够将等错误率由7.8%降低到6.9%。 相似文献
18.
This work evaluates the performance of speaker verification system based on Wavelet based Fuzzy Learning Vector Quantization (WLVQ) algorithm. The parameters of Gaussian mixture model (GMM) are designed using this proposed algorithm. Mel Frequency Cepstral Coefficients (MFCC) are extracted from the speech data and vector quantized through Wavelet based FLVQ algorithm. This algorithm develops a multi resolution codebook by updating both winning and nonwinning prototypes through an unsupervised learning process. This codebook is used as mean vector of GMM. The other two parameters, weight and covariance are determined from the clusters formed by the WLVQ algorithm. The multi resolution property of wavelet transform and ability of FLVQ in regulating the competition between prototypes during learning are combined in this algorithm to develop an efficient codebook for GMM. Because of iterative nature of Expectation Maximization (EM) algorithm, the applicability of alternative training algorithms is worth investigation. In this work, the performance of speaker verification system using GMM trained by LVQ, FLVQ and WLVQ algorithms are evaluated and compared with EM algorithm. FLVQ and WLVQ based training algorithms for modeling speakers using GMM yields better performance than EM based GMM. 相似文献
19.
基于HMM的性别识别 总被引:2,自引:1,他引:2
进行男女生识别的方法有很多种,如GMM,VQ等,该文提出了基于HMM进行说话人性别识别的方法,该方法通过计算语音信号的Mel频率倒谱系数(MFCC)并使用隐马尔可夫模型(HMM)进行性别识别。在实验室环境下,对50个不同说话人(其中男女说话人各占一半)的语音文件采用该方法与基于VQ的方法进行比较实验,从实验方法和实验结果方面得出结论:HMM的方法更加简单易行,识别率更高。对于实验的语音材料,采用HMM的方法识别率可以达到100%。 相似文献