首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
胡峰松  张璇 《计算机应用》2012,32(9):2542-2544
为提高说话人识别系统的识别率,提出了基于梅尔频率倒谱系数(MFCC)与翻转梅尔频率倒谱系数(IMFCC)为特征参数的特征提取新方法。该方法利用Fisher准则将MFCC和IMFCC相结合,构造了一种混合特征参数。实验结果表明,新的混合特征参数与MFCC相比,在纯净语音库及噪声环境中均具有较好的识别性能。  相似文献   

2.
基于Fisher比的梅尔倒谱系数混合特征提取方法   总被引:1,自引:0,他引:1  
针对语音识别中梅尔倒谱系数(MFCC)对中高频信号的识别精度不高,并且没有考虑各维特征参数对识别结果影响的问题,提出基于MFCC、逆梅尔倒谱系数(IMFCC)和中频梅尔倒谱系数(MidMFCC),并结合Fisher准则的特征提取方法。首先对语音信号提取MFCC、IMFCC和MidMFCC三种特征参数,分别计算三种特征参数中各维分量的Fisher比,通过Fisher比对三种特征参数进行选择,组成一种混合特征参数,提高语音中高频信息的识别精度。实验结果表明,在相同环境下,新的特征与MFCC参数相比,识别率有一定程度的提高。  相似文献   

3.
与文本无关的说话人识别具有用户使用方便、可应用范围较宽等优点,是当前说话人识别技术的研究重点。对文本无关说话人识别系统中的特征参数提取进行了研究,通过对Mel子带系数进行修正,增强了说话人识别系统中说话人之间的频带差异,提高了特征空间中类别的可分性,得到了更能体现说话人个性特征的Mel子带系数,从而提高了说话人识别系统的平均正确识别率。  相似文献   

4.
This paper presents an efficient approach for automatic speaker identification based on cepstral features and the Normalized Pitch Frequency (NPF). Most relevant speaker identification methods adopt a cepstral strategy. Inclusion of the pitch frequency as a new feature in the speaker identification process is expected to enhance the speaker identification accuracy. In the proposed framework for speaker identification, a neural classifier with a single hidden layer is used. Different transform domains are investigated for reliable feature extraction from the speech signal. Moreover, a pre-processing noise reduction step, is used prior to the feature extraction process to enhance the performance of the speaker identification system. Simulation results prove that the NPF as a feature in speaker identification enhances the performance of the speaker identification system, especially with the Discrete Cosine Transform (DCT) and wavelet denoising pre-processing step.  相似文献   

5.
基于批量模糊学习矢量量化的模糊系统辨识   总被引:2,自引:0,他引:2  
于龙  肖建  白裔峰 《控制与决策》2007,22(8):903-906
提出一种基于批量模糊学习矢量量化的模糊系统辨识方法.首先通过优化方法自动调整模糊指数,使所得到的模糊规则前件隶属度函数与聚类规则得到的隶属度函数相比具有较好的可解释性;然后针对模糊系统可解释性与精度之间的困境问题,为保证参数的可理解性.利用带约束的非线性优化方法调整后件参数.并用调整参数的界评估因优化造成参数恶化的程度.仿真实验表明,利用该方法得到的模糊系统模型具有较高的透明度,满足合理的精度.  相似文献   

6.
针对预先给定参数求解共同向量所存在的不足,提出了一种基于共同向量的非常态语音说话人识别算法,首先,通过系统识别率自适应调整求解共同向量的参数;然后,将系统识别率最高的参数视为最优参数,为测试语音提取共同向量,并用SVM分类器进行非常态语音说话人分类。实验结果表明:该算法所提取的共同向量,对轻微感冒语音说话人识别率为85.4%,比对特征不进行处理的GMM算法、SVM和结合共同向量的GMM算法的识别率分别提高了16.9%、15.2%和3.2%。  相似文献   

7.
This paper proposes a multi-section vector quantization approach for on-line signature recognition. We have used a database of 330 users which includes 25 skilled forgeries performed by 5 different impostors. This database is larger than those typically used in the literature. Nevertheless, we also provide results from the SVC database. Our proposed system obtains similar results as the state-of-the-art online signature recognition algorithm, Dynamic Time Warping, with a reduced computational requirement, around 47 times lower. In addition, our system improves the database storage requirements due to vector compression, and is more privacy-friendly because it is not possible to recover the original signature using the codebooks. Experimental results reveal that our proposed multi-section vector quantization achieves a 98% identification rate, minimum Detection Cost Function value equal to 2.29% for random forgeries and 7.75% for skilled forgeries.  相似文献   

8.
基于支持向量机和小波分析的说话人识别   总被引:2,自引:0,他引:2  
为解决说话人识别问题,提出了一种基于支持向量机和小波分析的识别方法以及其框架模型,即将小波分析应用于信号预处理,并以此为基础,利用其奇异点检测原理将语音信号和噪声分离,实现语音增强,最终基于样本进行训练和测试,采用SVM实现说话人的分类识别.  相似文献   

9.
Gaussian mixture model (GMM) based approaches have been commonly used for speaker recognition tasks. Methods for estimation of parameters of GMMs include the expectation-maximization method which is a non-discriminative learning based method. Discriminative classifier based approaches to speaker recognition include support vector machine (SVM) based classifiers using dynamic kernels such as generalized linear discriminant sequence kernel, probabilistic sequence kernel, GMM supervector kernel, GMM-UBM mean interval kernel (GUMI) and intermediate matching kernel. Recently, the pyramid match kernel (PMK) using grids in the feature space as histogram bins and vocabulary-guided PMK (VGPMK) using clusters in the feature space as histogram bins have been proposed for recognition of objects in an image represented as a set of local feature vectors. In PMK, a set of feature vectors is mapped onto a multi-resolution histogram pyramid. The kernel is computed between a pair of examples by comparing the pyramids using a weighted histogram intersection function at each level of pyramid. We propose to use the PMK-based SVM classifier for speaker identification and verification from the speech signal of an utterance represented as a set of local feature vectors. The main issue in building the PMK-based SVM classifier is construction of a pyramid of histograms. We first propose to form hard clusters, using k-means clustering method, with increasing number of clusters at different levels of pyramid to design the codebook-based PMK (CBPMK). Then we propose the GMM-based PMK (GMMPMK) that uses soft clustering. We compare the performance of the GMM-based approaches, and the PMK and other dynamic kernel SVM-based approaches to speaker identification and verification. The 2002 and 2003 NIST speaker recognition corpora are used in evaluation of different approaches to speaker identification and verification. Results of our studies show that the dynamic kernel SVM-based approaches give a significantly better performance than the state-of-the-art GMM-based approaches. For speaker recognition task, the GMMPMK-based SVM gives a performance that is better than that of SVMs using many other dynamic kernels and comparable to that of SVMs using state-of-the-art dynamic kernel, GUMI kernel. The storage requirements of the GMMPMK-based SVMs are less than that of SVMs using any other dynamic kernel.  相似文献   

10.
Optimal representation of acoustic features is an ongoing challenge in automatic speech recognition research. As an initial step toward this purpose, optimization of filterbanks for the cepstral coefficient using evolutionary optimization methods is proposed in some approaches. However, the large number of optimization parameters required by a filterbank makes it difficult to guarantee that an individual optimized filterbank can provide the best representation for phoneme classification. Moreover, in many cases, a number of potential solutions are obtained. Each solution presents discrimination between specific groups of phonemes. In other words, each filterbank has its own particular advantage. Therefore, the aggregation of the discriminative information provided by filterbanks is demanding challenging task. In this study, the optimization of a number of complementary filterbanks is considered to provide a different representation of speech signals for phoneme classification using the hidden Markov model (HMM). Fuzzy information fusion is used to aggregate the decisions provided by HMMs. Fuzzy theory can effectively handle the uncertainties of classifiers trained with different representations of speech data. In this study, the output of the HMM classifiers of each expert is fused using a fuzzy decision fusion scheme. The decision fusion employed a global and local confidence measurement to formulate the reliability of each classifier based on both the global and local context when making overall decisions. Experiments were conducted based on clean and noisy phonetic samples. The proposed method outperformed conventional Mel frequency cepstral coefficients under both conditions in terms of overall phoneme classification accuracy. The fuzzy fusion scheme was shown to be capable of the aggregation of complementary information provided by each filterbank.  相似文献   

11.
《工矿自动化》2015,(11):1-6
针对现有煤岩识别方法适用范围小、识别正确率低等问题,采用图像分块离散余弦变换处理煤岩图像,将每一个图像块的DCT变换系数以"Z"字型排列,构成表达图像块的向量;采用2种方式提取煤岩图像特征:一种是用图像块向量每一维的均值和所有图像块向量的总体方差构成煤岩图像特征向量,另一种是按照图像块DCT变换顺序,将图像块向量级联构成煤岩图像特征向量;采用学习向量量化神经网络进行煤岩识别,2种特征提取方式的识别准确率均为96.67%,比Haar小波方法提高了3.3%,比Daubechies小波方法提高了5.8%。  相似文献   

12.
李宏言  盛利元  陈妮 《计算机工程与设计》2007,28(19):4702-4704,4737
针对传统DTW语音识别方法的运算量和存储空间大的缺陷,提出一种基于矢量量化和查找表的改进DTW方法.方法利用矢量量化操作将连续特征矢量空间转化成离散矢量空间,以降低模式存储空间,在此基础上建立矢量失真测度表,并通过Hash查表方式实现了地址空间的精确定位,从而省去了动态规划操作造成的大量距离测度计算,极大提高了识别匹配速度.理论分析和实验结果证明了改进方法的有效性.同时为研究方便,在Matlab平台下设计和开发了DTW实时语音识别系统.  相似文献   

13.
霍纬纲  屈峰  程震 《计算机应用》2017,37(11):3075-3079
为了提高动态数据集上模糊关联分类器(FAC)的建模效率,提出了一种基于演进向量量化(eVQ)聚类的增量模糊关联分类方法。首先,采用eVQ聚类算法增量更新数量属性上的高斯隶属度函数参数;然后,扩展早剪枝更新(UWEP)算法,使之适用于增量挖掘模糊频繁项;最后,以模糊相关度(FCORR)和分类规则前件长度为度量方式裁剪并更新模糊关联分类规则库。在4个UCI标准数据集上的实验结果表明,与批量模糊关联分类建模方法相比,所提方法能够在保证分类精度和解释性的前提下,减少模糊关联分类器的训练时间;基于eVQ的高斯隶属度函数的增量更新有助于提高动态数据集上模糊关联分类器的分类精度。  相似文献   

14.
提出了一种基于可靠稳定的模糊核学习矢量量化(FKLVQ)聚类的Sammon非线性映射新算法。该方法通过Mercer核,将数据空间映射到高维特征空间,并在此特征空间上进行FKLVQ学习获取数据空间有效且稳定的聚类权矢量,然后在特征空间和输出空间上仅针对各空间的数据样本和它们各自的聚类权矢量进行Sammon非线性核映射。这样既降低了计算的复杂度,又使数据空间和输出空间上数据点与聚类中心间的距离信息保持相似。仿真结果验证了该方法的可靠性和稳定性。  相似文献   

15.
16.
This paper describes speech intelligibility enhancement for Hidden Markov Model (HMM) generated synthetic speech in noise. We present a method for modifying the Mel cepstral coefficients generated by statistical parametric models that have been trained on plain speech. We update these coefficients such that the glimpse proportion – an objective measure of the intelligibility of speech in noise – increases, while keeping the speech energy fixed. An acoustic analysis reveals that the modified speech is boosted in the region 1–4 kHz, particularly for vowels, nasals and approximants. Results from listening tests employing speech-shaped noise show that the modified speech is as intelligible as a synthetic voice trained on plain speech whose duration, Mel cepstral coefficients and excitation signal parameters have been adapted to Lombard speech from the same speaker. Our proposed method does not require these additional recordings of Lombard speech. In the presence of a competing talker, both modification and adaptation of spectral coefficients give more modest gains.  相似文献   

17.
针对现有的煤矸界面识别技术采用的γ射线法不适用于顶板不含放射性元素或者放射性元素含量较低的工作面,而雷达探测法探测范围小、信号衰减严重的问题,提出了一种基于Mel频率倒谱系数和遗传算法的煤矸界面识别方法。该方法利用煤矸放落过程中产生的声波信号的特征差异进行煤矸识别,采用Mel频率倒谱系数将去噪后的煤矸声波信号变换到频域进行处理,提取出煤矸声波信号的32维特征参数;采用遗传算法优化处理32维特征参数,得到最优参数组合;采用支持向量机和BP神经网络对最优参数进行识别。实验结果表明,该方法能够准确识别出煤矸下落状态。  相似文献   

18.
Using classical signal processing and filtering techniques for music note recognition faces various kinds of difficulties. This paper proposes a new scheme based on neural networks for music note recognition. The proposed scheme uses three types of neural networks: time delay neural networks, self-organizing maps, and linear vector quantization. Experimental results demonstrate that the proposed scheme achieves 100% recognition rate in moderate noise environments. The basic design of two potential applications of the proposed scheme is briefly demonstrated.  相似文献   

19.
Clustering of stationary time series has become an important tool in many scientific applications, like medicine, finance, etc. Time series clustering methods are based on the calculation of suitable similarity measures which identify the distance between two or more time series. These measures are either computed in the time domain or in the spectral domain. Since the computation of time domain measures is rather cumbersome we resort to spectral domain methods. A new measure of distance is proposed and it is based on the so-called cepstral coefficients which carry information about the log spectrum of a stationary time series. These coefficients are estimated by means of a semiparametric model which assumes that the log-likelihood ratio of two or more unknown spectral densities has a linear parametric form. After estimation, the estimated cepstral distance measure is given as an input to a clustering method to produce the disjoint groups of data. Simulated examples show that the method yields good results, even when the processes are not necessarily linear. These cepstral-based clustering algorithms are applied to biological time series. In particular, the proposed methodology effectively identifies distinct and biologically relevant classes of amino acid sequences with the same physicochemical properties, such as hydrophobicity.  相似文献   

20.
利用自适应KLT提出了一种新的语音去噪方法。自适应KLT的语音去噪算法,不需要白化处理,既可以自适应跟踪KLT阵,又能够有效地协调去噪后信号的音质与可懂度之间的矛盾。在说话人识别阶段采用改进的MCE。实验表明,该混合系统确实能够增强说话人辨认的鲁棒性和识别率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号