共查询到20条相似文献,搜索用时 71 毫秒
1.
2.
3.
基于说话人聚类和支持向量的说话人确认研究 总被引:2,自引:0,他引:2
说话人确认系统需要用目标说话人和背景模型说话人的语音数据对模型进行训练。背景模型说话人可随机选或选取与目标说话人相近的说话人,采用说话人聚类的方法可以有效地解决说话人背景模型的选取问题。支持向量机用作说话人确认模型来训练目标说话人和背景说话人的语音数据,实验表明该方法地与文本无关的说话人确认问题是有效的。 相似文献
4.
为了提高信道变化下说话人确认系统的识别率和鲁棒性,提出一种基于i-向量和加权线性判别分析的稀疏表示分类算法。首先借助于加权线性判别分析的信道补偿和降维性能,消除i-向量中信道干扰信息并降低i-向量的维数;紧接着在i-向量集上构建训练语音样本过完备字典矩阵,采用MAP算法求解测试语音在字典矩阵上的稀疏系数向量,最后利用稀疏系数向量重构测试语音样本,根据重构误差确定目标说话人。仿真实验结果验证了该算法的有效性和可行性。 相似文献
5.
论文介绍了一个基于DSP的说话人确认系统,该系统确认算法建立于高斯混合模型-全局背景模型(GMM-UBM)的基础上,并在特征空间采用一种新的基于信息熵特征融合的算法,实验结果表明在不影响识别率的情况下,该算法计算量比传统的特征关联融合的要减少以上,比归一化融合要少。硬件系统采用高速DSP芯片TMS320C6701,为确认算法的实时实现提供了保证。 相似文献
6.
在研究说话人识别系统时,分别从特征参数的选取和识别训练两种不同角度分析了加权特征向量和群模型在增强系统性能方面的可行性,并采用群模型与加权特征向量相结合的方式建立与文本无关的说话人确认系统.试验结果表明,含加权特征向量的群模型比传统的矢量量化有更高的辨识率,而且错误拒绝率在一定的错误接受率下也有显著降低. 相似文献
7.
基于支撑向量机的说话人确认系统 总被引:2,自引:1,他引:1
支撑向量机(SVM)是一种新的统计学习方法,和以往的学习方法不同的是SVM的学习原则是使结构风险(Structural Risk)最小,而经典的学习方法遵循经验风险(Empirical Risk)最小原则,这使得SVM具有较好的总体性能.文章提出一种基于支撑向量机的文本无关的说话人确认系统,实验表明同基于向量量化(VQ)和高斯混合模式(GMM)的经典方法相比,基于SVM的方法具有更高的区分力和更好的总体性能. 相似文献
8.
基于说话人聚类和支持向量机的说话人确认研究 总被引:3,自引:1,他引:3
说话人确认系统需要用目标说话人和背景模型说话人的语音数据对模型进行训练。背景模型说话人可随机选取或选取与目标说话人相近的说话人。采用说话人聚类的方法可以有效地解决说话人背景模型的选取问题。支持向量机用作说话人确认模型来训练目标说话人和背景说话人的语音数据。实验表明该方法对与文本无关的说话人确认问题是有效的。 相似文献
9.
10.
在文本无关的说话人识别中,韵律特征由于其对信道环境噪声不敏感等特性而被应用于话者识别任务中.本文对韵律参数采用基于高斯混合模型超向量的支持向量机建模方法,并将类内协方差特征映射方法应用于模型超向量上,单系统的性能比传统方法的混合高斯-通用背景模型(Gaussian mixture model-universal background model,GMM-UBM)基线系统有了40.19%的提升.该方法与本文的基于声学倒谱参数的确认系统融合后,能使整体系统的识别性能有9.25%的提升.在NIST(National institute of standards and technology mixture)2006说话人测试数据库上,融合后的系统能够取得4.9%的等错误率. 相似文献
11.
Multimedia Tools and Applications - Speech signals that recorded in the far field or with a distant microphone typically comprise additive noise and reverberation, which cause degradation and... 相似文献
12.
This work evaluates the performance of speaker verification system based on Wavelet based Fuzzy Learning Vector Quantization (WLVQ) algorithm. The parameters of Gaussian mixture model (GMM) are designed using this proposed algorithm. Mel Frequency Cepstral Coefficients (MFCC) are extracted from the speech data and vector quantized through Wavelet based FLVQ algorithm. This algorithm develops a multi resolution codebook by updating both winning and nonwinning prototypes through an unsupervised learning process. This codebook is used as mean vector of GMM. The other two parameters, weight and covariance are determined from the clusters formed by the WLVQ algorithm. The multi resolution property of wavelet transform and ability of FLVQ in regulating the competition between prototypes during learning are combined in this algorithm to develop an efficient codebook for GMM. Because of iterative nature of Expectation Maximization (EM) algorithm, the applicability of alternative training algorithms is worth investigation. In this work, the performance of speaker verification system using GMM trained by LVQ, FLVQ and WLVQ algorithms are evaluated and compared with EM algorithm. FLVQ and WLVQ based training algorithms for modeling speakers using GMM yields better performance than EM based GMM. 相似文献
13.
Significance of duration modification for speaker verification under mismatch speech tempo condition
Rohan Kumar Das Bidisha Sharma S. R. Mahadeva Prasanna 《International Journal of Speech Technology》2018,21(3):401-408
This work explores the scope of duration modification for speaker verification (SV) under mismatch speech tempo condition. The SV performance is found to depend on speaking rate of a speaker. The mismatch in the speaking rate can degrade the performance of a system and is crucial from the perspective of deployable systems. In this work, an analysis of SV performance is carried out by varying the speaking rate of train and test speech. Based on the studies, a framework is proposed to compensate the mismatch in speech tempo. The framework changes the duration of test speech in terms of speaking rate according to the derived mismatch factor between train and test speech. This in turn matches speech tempo of the test speech to that of the claimed speaker model. The proposed approach is found to have significant impact on SV performance while comparing the performance under mismatch conditions. A set of practical data having mismatch in speech tempo is also used to cross-validate the framework. 相似文献
14.
B. Bharathi 《International Journal of Speech Technology》2017,20(3):465-474
In speaker recognition tasks, one of the reasons for reduced accuracy is due to closely resembling speakers in the acoustic space. In order to increase the discriminative power of the classifier, the system must be able to use only the unique features of a given speaker with respect to his/her acoustically resembling speaker. This paper proposes a technique to reduce the confusion errors, by finding speaker-specific phonemes and formulate a text using the subset of phonemes that are unique, for speaker verification task using i-vector based approach. In this paper spectral features such as linear prediction cepstral co-efficients (LPCC), perceptual linear prediction co-efficients (PLP) and phase feature such as modified group delay are experimented to analyse the importance of speaker-specific-text in speaker verification task. Experiments have been conducted on speaker verification task using speech data of 50 speakers collected in a laboratory environment. The experiments show that the equal error rate (EER) has been decreased significantly using i-vector approach with speaker-specific-text when compared to i-vector approach with random-text using different spectral and phase based features. 相似文献
15.
This paper addresses issues concerning exponential stability and robustness of hybrid systems. Stability conditions using Lyapunov techniques are given. The search for the Lyapunov functions is formulated as a linear matrix inequality (LMI) problem for hybrid systems with affine as well as non-linear vector fields. It is shown how the Lyapunov approach most advantageously also can be used to guarantee stability despite the presence of model uncertainties. Several examples are given to illustrate the theory. 相似文献
16.
从变帧长、变帧率角度考虑提出一种新的提取MFCC的方法。该方法先将帧长和帧率都限制为基音周期的整数倍,即基音同步算法;然后基于变帧率算法的原理在语音特征变化缓慢的地方去除一些帧来降低帧率。在NIST 99说话人评测上进行的说话人确认实验表明,该方法不但提升了系统性能,而且降低了帧率,节省了特征文件的存储空间。 相似文献
17.
With the rapid increasing of learning materials and learning objects in e-learning, the need for recommender system has also
become more and more imperative. Although, the traditional recommendation system has achieved great success in many domains,
it is not suitable to support e-learning recommender system because the approach in e-learning is hybrid and it is obtained
mainly by two mechanisms: the learners’ learning processes and the analysis of social interaction. Therefore, this study proposes
a flexible recommendation approach to satisfy this demand. The recommendation is designed based on a multidimensional recommendation
model. Furthermore, we use Markov Chain Model to divide the group learners into advanced learners and beginner learners by
using the learners’ learning activities and learning processes so that we can correctly estimate the rating which also include
learners’ social interaction. The experimental result shows that the proposed system can give a more satisfying and qualified
recommendation. 相似文献
18.
Yi-Hsiang Chao Author Vitae Wei-Ho Tsai Author Vitae Hsin-Min Wang Author Vitae Ruei-Chuan Chang Author Vitae 《Pattern recognition》2009,42(7):1351-1360
Speaker verification is usually formulated as a statistical hypothesis testing problem and solved by a log-likelihood ratio (LLR) test. A speaker verification system's performance is highly dependent on modeling the target speaker's voice (the null hypothesis) and characterizing non-target speakers’ voices (the alternative hypothesis). However, since the alternative hypothesis involves unknown impostors, it is usually difficult to characterize a priori. In this paper, we propose a framework to better characterize the alternative hypothesis with the goal of optimally distinguishing the target speaker from impostors. The proposed framework is built on a weighted arithmetic combination (WAC) or a weighted geometric combination (WGC) of useful information extracted from a set of pre-trained background models. The parameters associated with WAC or WGC are then optimized using two discriminative training methods, namely, the minimum verification error (MVE) training method and the proposed evolutionary MVE (EMVE) training method, such that both the false acceptance probability and the false rejection probability are minimized. Our experiment results show that the proposed framework outperforms conventional LLR-based approaches. 相似文献
19.
Debmalya Chakrabarty S. R. Mahadeva Prasanna Rohan Kumar Das 《International Journal of Speech Technology》2013,16(1):75-88
In this paper an online text-independent speaker verification system developed at IIT Guwahati under multivariability condition for remote person authentication is described. The system is developed on a voice server accessible via telephone network using an interactive voice response (IVR) system in which both enrollment and testing can be done online. The speaker verification system is developed using Mel-Frequency Cepstral Coefficients (MFCC) for feature extraction and Gaussian Mixture Model—Universal Background Model (GMM-UBM) for modeling. The performance of the system under multi-variable condition is evaluated using online enrollments and testing from the subjects. The evaluation of the system helps in understanding the impact of several well known issues related to speaker verification such as the effect of environment noise, duration of test speech, robustness of the system against playing recorded speech etc. in an online system scenario. These issues need to be taken care for the development and deployment of speaker verification system in real life applications. 相似文献
20.
In this paper, we propose two kinds of modifications in speaker recognition. First, the correlations between frequency channels are of prime importance for speaker recognition. Some of these correlations are lost when the frequency domain is divided into sub-bands. Consequently we propose a particularly redundant parallel architecture for which most of the correlations are kept. Second, generally a log transformation used to modify the power spectrum is done after the filter-bank in the classical spectrum calculation. We will see that performing this transformation before the filter bank is more interesting in our case. In the processing of recognition, the Gaussian mixture model (GMM) recognition arithmetic is adopted. Experiments on speech corrupted by noise show a better adaptability of this approach in noisy environments, compared with a conventional device, especially when pruning of some recognizers is performed. 相似文献