期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Average framing linear prediction coding with wavelet transform for text-independent speaker identification system

Khaled Daqrouq Khalooq Y. Al Azzawi 《Computers & Electrical Engineering》2012

In this work, an average framing linear prediction coding (AFLPC) technique for text-independent speaker identification systems is presented. Conventionally, linear prediction coding (LPC) has been applied in speech recognition applications. However, in this study the combination of modified LPC with wavelet transform (WT), termed AFLPC, is proposed for speaker identification. The investigation procedure is based on feature extraction and voice classification. In the phase of feature extraction, the distinguished speaker’s vocal tract characteristics were extracted using the AFLPC technique. The size of a speaker’s feature vector can be optimized in term of an acceptable recognition rate by means of genetic algorithm (GA). Hence, an LPC order of 30 is found to be the best according to the system performance. In the phase of classification, probabilistic neural network (PNN) is applied because of its rapid response and ease in implementation. In the practical investigation, performances of different wavelet transforms in conjunction with AFLPC were compared with one another. In addition, the capability analysis on the proposed system was examined by comparing it with other systems proposed in literature. Consequently, the PNN classifier achieves a better recognition rate (97.36%) with the wavelet packet (WP) and AFLPC termed WPLPCF feature extraction method. It is also suggested to analyze the proposed system in additive white Gaussian noise (AWGN) and real noise environments; 58.56% for 0 dB and 70.52% for 5 dB. The recognition rates for the whole database of the Gaussian mixture model (GMM) reached the lowest value in case of small number of training samples. 相似文献

2.

基于duffing随机共振的说话人特征提取方法 总被引：2，自引：0，他引：2

潘平何朝霞《计算机工程与应用》2012,48(35):123-125,142

说话人特征参数的提取直接影响识别模型的建立,MFCC与LPC参数提取方法,分别以局域低频信息和全局AR信号为主要特征。提出一种基于duffing随机共振的说话人频谱特征提取方法。仿真结果表明,该方法能识别说话人之间频谱的微小差别,有效地提取说话人频谱的基本特征,从而为说话人识别模型提供更为精细的识别模型。相似文献

3.

LPC及F₀参数组合基于GMM电话语音说话人识别

伊·达瓦吾守尔·斯拉木匂坂芳典《中文信息学报》2011,25(4):105-110

该文报告了组合LPC参数以及基频F₀的高斯混合模型(GMM)电话语音说话人自动识别技术的实验研究结果。该研究在基线试验中GMM使用16混合共分散对角矩阵,特征量为LPC倒谱系数。而在开发系统测试中分别利用语音的全发话区间和有声区间两部分参数增加基频参数进行试验,并给出实验比较结果。在50人电话通话开放集自动切分语音流实验中正确识别率为76.97%,而提案方法为80.29%,改善率为3.32%。接近人工切分语音流时的识别率82.34%。相似文献

4.

基于多码本矢量量化的非限定文本的联机话者辨认方法

马继涌高文姚鸿勋《计算机研究与发展》1999,36(6):712-716

传统的利用话者的一个时期的语音作为训练语音,进行话者码本训练的方法,识别系统往往不够稳定．为了适应话者自身语音的时变性,文中提出了利用话者不同时期的语音进行训练话者的模型,每个话者具有多个码本．这些码本是采用逐渐减小误识率的优化过程得到的．为了补偿不同信道对系统识别性能的影响,文中给出了一种信道补偿方法．同时提出以一帧高能的浊音语音特征代替一个浊音音素的特征,实现了在线浊音特征提取,利用两级矢量量化及码本索引策略减少了４４％的识别计算量．这些方法大大增加了系统的识别速度和鲁棒性．文中比较了用ＰＬＰ分析和ＬＰＣ倒谱分析进行话者辨认的识别结果．相似文献

5.

与文本无关说话人识别

赵玉晓顾秀秀张二华《计算机与数字工程》2014,(2):243-247,307

由于传统的说话人识别中,常用的特征参数有线性预测系数（LPC）、Mel频率倒谱系数（MFCC）,采用单一特征参数并不能很好地反映说话人特性.针对这种情况,提出了引入Delta特征和特征组合的方法.实验结果表明,引入Delta特征和特征组合对识别效果有明显提高,实验中选用GMM作为说话人识别模型. 相似文献

6.

基于DTW的编码域说话人识别研究

李榕健于洪涛李邵梅《电子技术应用》2010,36(8)

相对解码重建后的语音进行说话人识别,从VoIP的语音流中直接提取语音特征参数进行说话人识别方法具有便于实现的优点,针对G.729编码域数据,研究基于DTW算法的快速说话人识别方法。实验结果表明,在相关的说话人识别中,DTW算法相比GMM在识别正确率和效率上有了很大提高。相似文献

7.

Dynamic visual features for audio–visual speaker verification

David Dean Sridha Sridharan 《Computer Speech and Language》2010,24(2):136-149

The cascading appearance-based (CAB) feature extraction technique has established itself as the state-of-the-art in extracting dynamic visual speech features for speech recognition. In this paper, we will focus on investigating the effectiveness of this technique for the related speaker verification application. By investigating the speaker verification ability of each stage of the cascade we will demonstrate that the same steps taken to reduce static speaker and environmental information for the visual speech recognition application also provide similar improvements for visual speaker recognition. A further study is conducted comparing synchronous HMM (SHMM) based fusion of CAB visual features and traditional perceptual linear predictive (PLP) acoustic features to show that higher complexity inherit in the SHMM approach does not appear to provide any improvement in the final audio–visual speaker verification system over simpler utterance level score fusion. 相似文献

8.

一种基于语音组成单位的说话人识别算法

黄长存汪增福《模式识别与人工智能》2008,21(6):856-866

以线性预测系数为特征通过高斯混合模型的迭代算法对训练样本的初始k均值聚类结果进行优化,得到语音组成单位的表示.以语音组成单位的模式匹配为基础,提出一种文本无关说话人确认的方法——均值法,以及一种文本无关说话人辨认方法.实验结果表明,即使在短时语音下本文方法都能取得较好效果. 相似文献

9.

弯折滤波器在说话人识别的鲁棒特征提取中的应用

邓蕾高勇《计算机系统应用》2017,26(12):227-232

针对噪声环境中说话人识别性能急剧下降的问题. 提出了一种用于说话人识别的鲁棒特征提取的方法. 采用弯折滤波器组（Warped filter banks,WFBS）来模拟人耳听觉特性,将立方根压缩算法、相对谱滤波技术（RASTA）、倒谱均值方差归一化算法（CMVN）引入到鲁棒特征的提取中. 在高斯混合模型（GMM）下进行仿真,实验结果表明该方法提取的特征参数在鲁棒性和识别性能上均优于MFCC特征参数和CFCC特征参数. 相似文献

10.

Speaker identification using discrete wavelet packet transform technique with irregular decomposition

Jian-Da Wu Bing-Fu Lin 《Expert systems with applications》2009,36(2):3136-3143

This paper presents the study of speaker identification for security systems based on the energy of speaker utterances. The proposed system consisted of a combination of signal pre-process, feature extraction using wavelet packet transform (WPT) and speaker identification using artificial neural network. In the signal pre-process, the amplitude of utterances, for a same sentence, were normalized for preventing an error estimation caused by speakers’ change in volume. In the feature extraction, three conventional methods were considered in the experiments and compared with the irregular decomposition method in the proposed system. In order to verify the effect of the proposed system for identification, a general regressive neural network (GRNN) was used and compared in the experimental investigation. The experimental results demonstrated the effectiveness of the proposed speaker identification system and were compared with the discrete wavelet transform (DWT), conventional WPT and WPT in Mel scale. 相似文献

11.

改善线性预测系数倒谱抗噪声性能的方法

韩春光《计算机工程与设计》2005,26(5):1377-1379

线性预测系数倒谱(LPCC)是说话人辨认系统中较为有效的特征参数之一，但是该参数的抗噪性能不好，当语音中含有噪声时，系统的识别率明显下降。基于MATLAB软件，建立了一高斯混合模型(GMM)的说话人辨认系统，提出了特征参数加权窗口的方法。通过对多种加权窗口的正确识别率比较，发现对LPCC低阶参数的加窗提升，可以改善系统的噪声鲁棒性。MATLAB仿真结果显示，采用加窗后的系统识别率得到了明显改善。相似文献

12.

Speaker identification using vowels features through a combined method of formants,wavelets, and neural network classifiers

《Applied Soft Computing》2015

This paper proposes a new method for speaker feature extraction based on Formants, Wavelet Entropy and Neural Networks denoted as FWENN. In the first stage, five formants and seven Shannon entropy wavelet packet are extracted from the speakers’ signals as the speaker feature vector. In the second stage, these 12 feature extraction coefficients are used as inputs to feed-forward neural networks. Probabilistic neural network is also proposed for comparison. In contrast to conventional speaker recognition methods that extract features from sentences (or words), the proposed method extracts the features from vowels. Advantages of using vowels include the ability to recognize speakers when only partially-recorded words are available. This may be useful for deaf-mute persons or when the recordings are damaged. Experimental results show that the proposed method succeeds in the speaker verification and identification tasks with high classification rate. This is accomplished with minimum amount of information, using only 12 coefficient features (i.e. vector length) and only one vowel signal, which is the major contribution of this work. The results are further compared to well-known classical algorithms for speaker recognition and are found to be superior. 相似文献

13.

基于HAAR小波的分级说话人辨识

下载免费PDF全文

范小春邱政权《计算机工程与应用》2010,46(11):122-124

从线性预测（LP）残差信号中提出了一种新的特征提取方法,这种特征跟单个的说话人的声道密切相关。通过把HAAR小波变换运用于LP 残差而获得了一个新的特征（HOCOR）。为了进一步提高系统的鲁棒性和辨识率,在采用分级说话人辨识的基础上,将基音周期的高斯概率密度对GMM分类器的似然度进行加权,形成新的似然度进行说话人辨识。试验结果显示,所提出系统的鲁棒性和辨识率都有所提高。相似文献

14.

Pattern classification models for classifying and indexing audio signals

P. Dhanalakshmi S. Palanivel V. Ramalingam 《Engineering Applications of Artificial Intelligence》2011,24(2):350-357

In the age of digital information, audio data has become an important part in many modern computer applications. Audio classification and indexing has been becoming a focus in the research of audio processing and pattern recognition. In this paper, we propose effective algorithms to automatically classify audio clips into one of six classes: music, news, sports, advertisement, cartoon and movie. For these categories a number of acoustic features that include linear predictive coefficients, linear predictive cepstral coefficients and mel-frequency cepstral coefficients are extracted to characterize the audio content. The autoassociative neural network model (AANN) is used to capture the distribution of the acoustic feature vectors. Then the proposed method uses a Gaussian mixture model (GMM)-based classifier where the feature vectors from each class were used to train the GMM models for those classes. During testing, the likelihood of a test sample belonging to each model is computed and the sample is assigned to the class whose model produces the highest likelihood. Audio clip extraction, feature extraction, creation of index, and retrieval of the query clip are the major issues in automatic audio indexing and retrieval. A method for indexing the classified audio using LPCC features and k-means clustering algorithm is proposed. 相似文献

15.

Rotating machine fault detection based on HOS and artificial neural networks 总被引：1，自引：0，他引：1

Chih-Chung Wang Gee-Pinn James Too 《Journal of Intelligent Manufacturing》2002,13(4):283-293

In order to identify the faults of rotating machinery, classification process can be divided into two stages: one is the signal preprocessing and the feature extraction; the other is the recognition process. In the preprocessing and feature extraction stage, the higher-order statistics (HOS) is used to extract features from the vibration signals. In the recognition process, two kinds of neural network classifier are used to evaluate the classification results. These two classifiers are self-organizing feature mapping (SOM) network for collecting data at the initial stage and learning vector quantization (LVQ) network at the identification stage. The experimental results obtained from HOS as preprocessor to extract the features of fault are clearer than those obtained from the power spectrum. In addition, the recognizable rate by using either SOM or LVQ as classifiers is 100%. 相似文献

16.

Robust Feature Extraction for Continuous Speech Recognition Using the MVDR Spectrum Estimation Method

Satya Dharanipragada Umit H. Yapanel Bhaskar D. Rao 《IEEE transactions on audio, speech, and language processing》2007,15(1):224-234

This paper describes a robust feature extraction technique for continuous speech recognition. Central to the technique is the minimum variance distortionless response (MVDR) method of spectrum estimation. We consider incorporating perceptual information in two ways: 1) after the MVDR power spectrum is computed and 2) directly during the MVDR spectrum estimation. We show that incorporating perceptual information directly into the spectrum estimation improves both robustness and computational efficiency significantly. We analyze the class separability and speaker variability properties of the features using a Fisher linear discriminant measure and show that these features provide better class separability and better suppression of speaker-dependent information than the widely used mel frequency cepstral coefficient (MFCC) features. We evaluate the technique on four different tasks: an in-car speech recognition task, the Aurora-2 matched task, the Wall Street Journal (WSJ) task, and the Switchboard task. The new feature extraction technique gives lower word-error-rates than the MFCC and perceptual linear prediction (PLP) feature extraction techniques in most cases. Statistical significance tests reveal that the improvement is most significant in high noise conditions. The technique thus provides improved robustness to noise without sacrificing performance in clean conditions 相似文献

17.

Speaker recognition using pyramid match kernel based support vector machines

A. D. Dileep C. Chandra Sekhar 《International Journal of Speech Technology》2012,15(3):365-379

Gaussian mixture model (GMM) based approaches have been commonly used for speaker recognition tasks. Methods for estimation of parameters of GMMs include the expectation-maximization method which is a non-discriminative learning based method. Discriminative classifier based approaches to speaker recognition include support vector machine (SVM) based classifiers using dynamic kernels such as generalized linear discriminant sequence kernel, probabilistic sequence kernel, GMM supervector kernel, GMM-UBM mean interval kernel (GUMI) and intermediate matching kernel. Recently, the pyramid match kernel (PMK) using grids in the feature space as histogram bins and vocabulary-guided PMK (VGPMK) using clusters in the feature space as histogram bins have been proposed for recognition of objects in an image represented as a set of local feature vectors. In PMK, a set of feature vectors is mapped onto a multi-resolution histogram pyramid. The kernel is computed between a pair of examples by comparing the pyramids using a weighted histogram intersection function at each level of pyramid. We propose to use the PMK-based SVM classifier for speaker identification and verification from the speech signal of an utterance represented as a set of local feature vectors. The main issue in building the PMK-based SVM classifier is construction of a pyramid of histograms. We first propose to form hard clusters, using k-means clustering method, with increasing number of clusters at different levels of pyramid to design the codebook-based PMK (CBPMK). Then we propose the GMM-based PMK (GMMPMK) that uses soft clustering. We compare the performance of the GMM-based approaches, and the PMK and other dynamic kernel SVM-based approaches to speaker identification and verification. The 2002 and 2003 NIST speaker recognition corpora are used in evaluation of different approaches to speaker identification and verification. Results of our studies show that the dynamic kernel SVM-based approaches give a significantly better performance than the state-of-the-art GMM-based approaches. For speaker recognition task, the GMMPMK-based SVM gives a performance that is better than that of SVMs using many other dynamic kernels and comparable to that of SVMs using state-of-the-art dynamic kernel, GUMI kernel. The storage requirements of the GMMPMK-based SVMs are less than that of SVMs using any other dynamic kernel. 相似文献

18.

基于高斯混合模型的说话人确认系统 总被引：5，自引：1，他引：4

杨澄宇赵文杨鉴《计算机应用》2001,21(4):7-8,11

由于在人的话音频谱中,低频和较高频段含有较多说话人的个性信息,本文提出一种LPC倒谱的改进算法用于与文本无关的说话人识别,该改进算法通过话音频谱的各频段进行加权,突出说话人的个性信息,从而使说话人更易于区分。相似文献

19.

说话人识别中基于Fisher比的特征组合方法

谢小娟曾以成熊冰峰《计算机应用》2016,36(5):1421-1425

为了提高说话人识别的准确率,可以同时采用多个特征参数,针对综合特征参数中各维分量对识别结果的影响可能不一样,同等对待并不一定是最优的方案这个问题,提出基于Fisher准则的梅尔频率倒谱系数(MFCC)、线性预测梅尔倒谱系数(LPMFCC)、Teager能量算子倒谱参数(TEOCC)相混合的特征参数提取方法。首先,提取语音信号的MFCC、LPMFCC和TEOCC三种参数;然后,计算MFCC和LPMFCC参数中各维分量的Fisher比,分别选出六个Fisher比高的分量与TEOCC参数组合成混合特征参数;最后,采用TIMIT语音库和NOISEX-92噪声库进行说话人识别实验。仿真实验表明,所提方法与MFCC、LPMFCC、MFCC+LPMFCC、基于Fisher比的梅尔倒谱系数混合特征提取方法以及基于主成分分析(PCA)的特征抽取方法相比,在采用高斯混合模型(GMM)和BP神经网络的平均识别率在纯净语音环境下分别提高了21.65个百分点、18.39个百分点、15.61个百分点、15.01个百分点与22.70个百分点;在30 dB噪声环境下,则分别提升了15.15个百分点、10.81个百分点、8.69个百分点、7.64个百分点与17.76个百分点。实验结果表明,该混合特征参数能够有效提高说话人识别率,且具有更好的鲁棒性。相似文献

20.

GMM与RVM融合的话者辨识方法 总被引：1，自引：0，他引：1

下载免费PDF全文

郑建炜王万良郑泽萍《计算机工程》2010,36(15):168-170

相关向量机(RVM)分类法使用概率输出克服了支持向量机(SVM)识别速率低的缺点,并且具有更好的稀疏性。但在与文本无关的话者辨别中,大量训练样本数据体现了RVM在模型训练时计算量与内存需求过大的缺点。针对以上特点,提出基于GMM统计特征参数与RVM融合的与文本无关的语者辨别系统,既有效地提取话者特征信息,解决大样本数据下的RVM训练问题,又结合统计模型鲁棒性高和分辨模型辨别效果好的优点。实验结果证明,该系统比基本的GMM系统具有更优的错误辨别率,比GMM/SVM系统具有更高的稀疏性。相似文献