共查询到20条相似文献,搜索用时 9 毫秒
1.
为提高说话人识别系统的识别率,提出了基于梅尔频率倒谱系数(MFCC)与翻转梅尔频率倒谱系数(IMFCC)为特征参数的特征提取新方法。该方法利用Fisher准则将MFCC和IMFCC相结合,构造了一种混合特征参数。实验结果表明,新的混合特征参数与MFCC相比,在纯净语音库及噪声环境中均具有较好的识别性能。 相似文献
2.
基于Fisher比的梅尔倒谱系数混合特征提取方法 总被引:1,自引:0,他引:1
针对语音识别中梅尔倒谱系数(MFCC)对中高频信号的识别精度不高,并且没有考虑各维特征参数对识别结果影响的问题,提出基于MFCC、逆梅尔倒谱系数(IMFCC)和中频梅尔倒谱系数(MidMFCC),并结合Fisher准则的特征提取方法。首先对语音信号提取MFCC、IMFCC和MidMFCC三种特征参数,分别计算三种特征参数中各维分量的Fisher比,通过Fisher比对三种特征参数进行选择,组成一种混合特征参数,提高语音中高频信息的识别精度。实验结果表明,在相同环境下,新的特征与MFCC参数相比,识别率有一定程度的提高。 相似文献
3.
与文本无关的说话人识别具有用户使用方便、可应用范围较宽等优点,是当前说话人识别技术的研究重点。对文本无关说话人识别系统中的特征参数提取进行了研究,通过对Mel子带系数进行修正,增强了说话人识别系统中说话人之间的频带差异,提高了特征空间中类别的可分性,得到了更能体现说话人个性特征的Mel子带系数,从而提高了说话人识别系统的平均正确识别率。 相似文献
4.
Marwa A. Nasr Mohammed Abd-Elnaby Adel S. El-Fishawy S. El-Rabaie Fathi E. Abd El-Samie 《International Journal of Speech Technology》2018,21(4):941-951
This paper presents an efficient approach for automatic speaker identification based on cepstral features and the Normalized Pitch Frequency (NPF). Most relevant speaker identification methods adopt a cepstral strategy. Inclusion of the pitch frequency as a new feature in the speaker identification process is expected to enhance the speaker identification accuracy. In the proposed framework for speaker identification, a neural classifier with a single hidden layer is used. Different transform domains are investigated for reliable feature extraction from the speech signal. Moreover, a pre-processing noise reduction step, is used prior to the feature extraction process to enhance the performance of the speaker identification system. Simulation results prove that the NPF as a feature in speaker identification enhances the performance of the speaker identification system, especially with the Discrete Cosine Transform (DCT) and wavelet denoising pre-processing step. 相似文献
5.
6.
针对预先给定参数求解共同向量所存在的不足,提出了一种基于共同向量的非常态语音说话人识别算法,首先,通过系统识别率自适应调整求解共同向量的参数;然后,将系统识别率最高的参数视为最优参数,为测试语音提取共同向量,并用SVM分类器进行非常态语音说话人分类。实验结果表明:该算法所提取的共同向量,对轻微感冒语音说话人识别率为85.4%,比对特征不进行处理的GMM算法、SVM和结合共同向量的GMM算法的识别率分别提高了16.9%、15.2%和3.2%。 相似文献
7.
This paper proposes a multi-section vector quantization approach for on-line signature recognition. We have used a database
of 330 users which includes 25 skilled forgeries performed by 5 different impostors. This database is larger than those typically
used in the literature. Nevertheless, we also provide results from the SVC database. Our proposed system obtains similar results
as the state-of-the-art online signature recognition algorithm, Dynamic Time Warping, with a reduced computational requirement,
around 47 times lower. In addition, our system improves the database storage requirements due to vector compression, and is
more privacy-friendly because it is not possible to recover the original signature using the codebooks. Experimental results
reveal that our proposed multi-section vector quantization achieves a 98% identification rate, minimum Detection Cost Function
value equal to 2.29% for random forgeries and 7.75% for skilled forgeries. 相似文献
8.
基于支持向量机和小波分析的说话人识别 总被引:2,自引:0,他引:2
为解决说话人识别问题,提出了一种基于支持向量机和小波分析的识别方法以及其框架模型,即将小波分析应用于信号预处理,并以此为基础,利用其奇异点检测原理将语音信号和噪声分离,实现语音增强,最终基于样本进行训练和测试,采用SVM实现说话人的分类识别. 相似文献
9.
Gaussian mixture model (GMM) based approaches have been commonly used for speaker recognition tasks. Methods for estimation of parameters of GMMs include the expectation-maximization method which is a non-discriminative learning based method. Discriminative classifier based approaches to speaker recognition include support vector machine (SVM) based classifiers using dynamic kernels such as generalized linear discriminant sequence kernel, probabilistic sequence kernel, GMM supervector kernel, GMM-UBM mean interval kernel (GUMI) and intermediate matching kernel. Recently, the pyramid match kernel (PMK) using grids in the feature space as histogram bins and vocabulary-guided PMK (VGPMK) using clusters in the feature space as histogram bins have been proposed for recognition of objects in an image represented as a set of local feature vectors. In PMK, a set of feature vectors is mapped onto a multi-resolution histogram pyramid. The kernel is computed between a pair of examples by comparing the pyramids using a weighted histogram intersection function at each level of pyramid. We propose to use the PMK-based SVM classifier for speaker identification and verification from the speech signal of an utterance represented as a set of local feature vectors. The main issue in building the PMK-based SVM classifier is construction of a pyramid of histograms. We first propose to form hard clusters, using k-means clustering method, with increasing number of clusters at different levels of pyramid to design the codebook-based PMK (CBPMK). Then we propose the GMM-based PMK (GMMPMK) that uses soft clustering. We compare the performance of the GMM-based approaches, and the PMK and other dynamic kernel SVM-based approaches to speaker identification and verification. The 2002 and 2003 NIST speaker recognition corpora are used in evaluation of different approaches to speaker identification and verification. Results of our studies show that the dynamic kernel SVM-based approaches give a significantly better performance than the state-of-the-art GMM-based approaches. For speaker recognition task, the GMMPMK-based SVM gives a performance that is better than that of SVMs using many other dynamic kernels and comparable to that of SVMs using state-of-the-art dynamic kernel, GUMI kernel. The storage requirements of the GMMPMK-based SVMs are less than that of SVMs using any other dynamic kernel. 相似文献
10.
Optimal representation of acoustic features is an ongoing challenge in automatic speech recognition research. As an initial step toward this purpose, optimization of filterbanks for the cepstral coefficient using evolutionary optimization methods is proposed in some approaches. However, the large number of optimization parameters required by a filterbank makes it difficult to guarantee that an individual optimized filterbank can provide the best representation for phoneme classification. Moreover, in many cases, a number of potential solutions are obtained. Each solution presents discrimination between specific groups of phonemes. In other words, each filterbank has its own particular advantage. Therefore, the aggregation of the discriminative information provided by filterbanks is demanding challenging task. In this study, the optimization of a number of complementary filterbanks is considered to provide a different representation of speech signals for phoneme classification using the hidden Markov model (HMM). Fuzzy information fusion is used to aggregate the decisions provided by HMMs. Fuzzy theory can effectively handle the uncertainties of classifiers trained with different representations of speech data. In this study, the output of the HMM classifiers of each expert is fused using a fuzzy decision fusion scheme. The decision fusion employed a global and local confidence measurement to formulate the reliability of each classifier based on both the global and local context when making overall decisions. Experiments were conducted based on clean and noisy phonetic samples. The proposed method outperformed conventional Mel frequency cepstral coefficients under both conditions in terms of overall phoneme classification accuracy. The fuzzy fusion scheme was shown to be capable of the aggregation of complementary information provided by each filterbank. 相似文献
11.
12.
针对传统DTW语音识别方法的运算量和存储空间大的缺陷,提出一种基于矢量量化和查找表的改进DTW方法.方法利用矢量量化操作将连续特征矢量空间转化成离散矢量空间,以降低模式存储空间,在此基础上建立矢量失真测度表,并通过Hash查表方式实现了地址空间的精确定位,从而省去了动态规划操作造成的大量距离测度计算,极大提高了识别匹配速度.理论分析和实验结果证明了改进方法的有效性.同时为研究方便,在Matlab平台下设计和开发了DTW实时语音识别系统. 相似文献
13.
为了提高动态数据集上模糊关联分类器(FAC)的建模效率,提出了一种基于演进向量量化(eVQ)聚类的增量模糊关联分类方法。首先,采用eVQ聚类算法增量更新数量属性上的高斯隶属度函数参数;然后,扩展早剪枝更新(UWEP)算法,使之适用于增量挖掘模糊频繁项;最后,以模糊相关度(FCORR)和分类规则前件长度为度量方式裁剪并更新模糊关联分类规则库。在4个UCI标准数据集上的实验结果表明,与批量模糊关联分类建模方法相比,所提方法能够在保证分类精度和解释性的前提下,减少模糊关联分类器的训练时间;基于eVQ的高斯隶属度函数的增量更新有助于提高动态数据集上模糊关联分类器的分类精度。 相似文献
14.
15.
16.
《Computer Speech and Language》2014,28(2):665-686
This paper describes speech intelligibility enhancement for Hidden Markov Model (HMM) generated synthetic speech in noise. We present a method for modifying the Mel cepstral coefficients generated by statistical parametric models that have been trained on plain speech. We update these coefficients such that the glimpse proportion – an objective measure of the intelligibility of speech in noise – increases, while keeping the speech energy fixed. An acoustic analysis reveals that the modified speech is boosted in the region 1–4 kHz, particularly for vowels, nasals and approximants. Results from listening tests employing speech-shaped noise show that the modified speech is as intelligible as a synthetic voice trained on plain speech whose duration, Mel cepstral coefficients and excitation signal parameters have been adapted to Lombard speech from the same speaker. Our proposed method does not require these additional recordings of Lombard speech. In the presence of a competing talker, both modification and adaptation of spectral coefficients give more modest gains. 相似文献
17.
针对现有的煤矸界面识别技术采用的γ射线法不适用于顶板不含放射性元素或者放射性元素含量较低的工作面,而雷达探测法探测范围小、信号衰减严重的问题,提出了一种基于Mel频率倒谱系数和遗传算法的煤矸界面识别方法。该方法利用煤矸放落过程中产生的声波信号的特征差异进行煤矸识别,采用Mel频率倒谱系数将去噪后的煤矸声波信号变换到频域进行处理,提取出煤矸声波信号的32维特征参数;采用遗传算法优化处理32维特征参数,得到最优参数组合;采用支持向量机和BP神经网络对最优参数进行识别。实验结果表明,该方法能够准确识别出煤矸下落状态。 相似文献
18.
Khalid Youssef Peng-Yung Woo 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2009,13(12):1187-1198
Using classical signal processing and filtering techniques for music note recognition faces various kinds of difficulties.
This paper proposes a new scheme based on neural networks for music note recognition. The proposed scheme uses three types
of neural networks: time delay neural networks, self-organizing maps, and linear vector quantization. Experimental results
demonstrate that the proposed scheme achieves 100% recognition rate in moderate noise environments. The basic design of two
potential applications of the proposed scheme is briefly demonstrated. 相似文献
19.
Alexios Savvides Vasilis J. Promponas Konstantinos Fokianos 《Pattern recognition》2008,41(7):2398-2412
Clustering of stationary time series has become an important tool in many scientific applications, like medicine, finance, etc. Time series clustering methods are based on the calculation of suitable similarity measures which identify the distance between two or more time series. These measures are either computed in the time domain or in the spectral domain. Since the computation of time domain measures is rather cumbersome we resort to spectral domain methods. A new measure of distance is proposed and it is based on the so-called cepstral coefficients which carry information about the log spectrum of a stationary time series. These coefficients are estimated by means of a semiparametric model which assumes that the log-likelihood ratio of two or more unknown spectral densities has a linear parametric form. After estimation, the estimated cepstral distance measure is given as an input to a clustering method to produce the disjoint groups of data. Simulated examples show that the method yields good results, even when the processes are not necessarily linear. These cepstral-based clustering algorithms are applied to biological time series. In particular, the proposed methodology effectively identifies distinct and biologically relevant classes of amino acid sequences with the same physicochemical properties, such as hydrophobicity. 相似文献
20.
利用自适应KLT提出了一种新的语音去噪方法。自适应KLT的语音去噪算法,不需要白化处理,既可以自适应跟踪KLT阵,又能够有效地协调去噪后信号的音质与可懂度之间的矛盾。在说话人识别阶段采用改进的MCE。实验表明,该混合系统确实能够增强说话人辨认的鲁棒性和识别率。 相似文献