首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 671 毫秒
1.
在基于因子分析的说话人识别中,提出串行训练载荷矩阵的方法.在载荷矩阵训练中,采用串行的方式训练得到说话人因子矩阵、对角阵(残差矩阵)和信道空间矩阵.在说话人注册中,将以上3个载荷矩阵拼接,采用联合估计的方法得到每个说话人的因子.采用这种策略可有效解决因子分析中的饱和问题.在NIST SRE 2006年核心测试数据库上等错误率能达到3.65%.  相似文献   

2.
基于说话人聚类和支持向量的说话人确认研究   总被引:2,自引:0,他引:2  
侯风雷 《计算机应用》2002,22(10):33-35
说话人确认系统需要用目标说话人和背景模型说话人的语音数据对模型进行训练。背景模型说话人可随机选或选取与目标说话人相近的说话人,采用说话人聚类的方法可以有效地解决说话人背景模型的选取问题。支持向量机用作说话人确认模型来训练目标说话人和背景说话人的语音数据,实验表明该方法地与文本无关的说话人确认问题是有效的。  相似文献   

3.
基于说话人聚类和支持向量机的说话人确认研究   总被引:3,自引:1,他引:3  
说话人确认系统需要用目标说话人和背景模型说话人的语音数据对模型进行训练。背景模型说话人可随机选取或选取与目标说话人相近的说话人。采用说话人聚类的方法可以有效地解决说话人背景模型的选取问题。支持向量机用作说话人确认模型来训练目标说话人和背景说话人的语音数据。实验表明该方法对与文本无关的说话人确认问题是有效的。  相似文献   

4.
提出了一种基于本征音因子分析的文本无关的说话人识别方法.它解决了训练语音与测试语音均很短的情况下,传统的基于最大后验概率准则的混合高斯模型无法建立稳定的说话人模型问题.首先利用期望最大化算法在开发集上训练出说话人的本征音载荷矩阵,在说话人模型建模时通过将短时语音数据向本征音空间的降维映射来得到模型参数.实验结果表明,在NIST SRE 2006数据库中的10 s训练语音-10 s测试语音任务中,在传统的混合高斯模型的基线系统上,通过采用本征音因子分析的方法可以使系统等错误率降低18%.  相似文献   

5.
说话人验证是一种自然、有效的生物特征身份认证方法,其性能很大程度上取决于所提取说话人特征的质量.残差网络(ResNet)具有优越的推理能力,可以提取高质量的说话人特征,因此广泛地应用于说话人验证任务中,然而目前残差网络仍存在音频数据信息利用不充分,提取的特征不利于分类说话人等问题,这些问题大大限制了残差网络的表征能力.本文聚焦于残差网络的模型结构,详细分析了残差块分布比例、激活层、跳跃连接这些结构因素对特征信息提取的影响,以及模型输出特征分布对说话人分类结果的影响,并据此对原始残差块、特征下采样过程以及模型输出头重新设计并构建了一个新的说话人验证模型:EIPFD-ResNet.该模型采用更少激活层的残差块和单独设计的下采样层共同作用来减少音频信号的损失和噪声信息的引入,采用归一化处理后的模型输出头帮助分类损失提供更清晰的分类决策面,并在3个公开数据集(VoxCeleb1、VoxCeleb2、Cn-Celeb2)上评估了所提模型的有效性.实验结果证明,本文提出的模型在仅有7.486M参数量的情况下,相较于传统ResNet34模型,在3个数据集上的等错误率(EER)分别降低了16.4%、3...  相似文献   

6.
钟山  何亮  邓妍  刘加 《自动化学报》2009,35(5):546-550
研究了将自适应领域的最大似然线性回归(Maximum likelihood linear regression, MLLR)变换矩阵作为特征进行文本无关的说话人识别算法. 本文引入了基于统一背景模型的MLLRSV-SVM说话人识别算法, 并在此基础上进行高层音素聚类以进一步提高识别性能. 在采用多种信道补偿技术后, 在NIST SRE 2006年1训练语段-1测试语段同信道和跨信道数据库上, 基于MLLR特征的系统与其他最好的系统性能接近并有很强的互补性, 经过简单线性融合可以极大提高识别性能.  相似文献   

7.
目前说话人识别系统在理想环境下识别率已可达90%以上,但在实际通信环境下识别率却迅速下降.本文对信道失配环境下的鲁棒说话人识别进行研究.首先建立了一个基于高斯混合模型(GMM)的说话人识别系统,然后通过对实际通信信道的测试和分析,提出了两种改进方法.一是由实测数据建立了一个通用信道模型,将干净语音经通用信道模型滤波后再作为训练语音训练说话人模型;二是通过对比实测信道﹑理想低通信道及语音梅尔倒谱系数(MFCC)的特点,提出合理舍去语音第一﹑二维特征参数的方法.实验结果表明,通过处理后,系统在通信环境下的识别率提升了20%左右,与传统的倒谱均值减(CMS)方法相比,识别率提高了9%-12%.  相似文献   

8.
为了提高信道变化下说话人确认系统的识别率和鲁棒性,提出一种基于i-向量和加权线性判别分析的稀疏表示分类算法。首先借助于加权线性判别分析的信道补偿和降维性能,消除i-向量中信道干扰信息并降低i-向量的维数;紧接着在i-向量集上构建训练语音样本过完备字典矩阵,采用MAP算法求解测试语音在字典矩阵上的稀疏系数向量,最后利用稀疏系数向量重构测试语音样本,根据重构误差确定目标说话人。仿真实验结果验证了该算法的有效性和可行性。  相似文献   

9.
在文本无关的说话人识别中,训练与测试语音中信道环境的差异是影响其性能最重要的因素.近年来,利用因子分析对信道建模成为说话人识别领域的重要方法,大大降低了说话人确认的错误率,但运算复杂度限制了实时的应用.本文介绍了一种简化的因子分析方法:首先在混合高斯模型的模型域训练信道空间,然后在特征域进行信道补偿,得到的新特征可用于各种系统.在NIST2006的数据库上,利用本文的方法相对基线系统在等错误率上有31%的降低.  相似文献   

10.
x-vector系统将一段不定长的语音通过神经网络映射成固定维的矢量来表征说话人信息,该系统在文本无关的说话人确认任务中取得了优异的性能。本文将其应用到文本相关的说话人确认任务中,在x-vector模型选择上,我们采用残差神经网络以获得更有区分性的x-vector;在包含多字符的语句中,对每个字训练一个残差神经网络;在提取过程中,每一个字单独提取一个x-vector并单独进行说话人判决,最后将多个判决得分进行融合后给出最终的识别结果。实验是在数据库RSR2015 Part Ⅲ 上进行的,提出的方法在男性和女性测试集上等错误率分别有相对15.34%、19.7%的下降。  相似文献   

11.
屈丹  杨绪魁  张文林 《自动化学报》2015,41(7):1244-1252
提出了特征空间本征音说话人自适应算法,该方法首先借鉴RATZ 算法的思想,采用高斯混合模型对特征空间中的说话人信息进行建模;其次利用 子空间方法实现对特征补偿项的估计,减少估计参数的数量,在对特征空间精确建 模的同时,降低了算法对自适应数据量的需求.基于微软语料库的中文连续语 音识别实验表明,该算法在自适应数据量极少时仍能取得较好的性能,配合说话人自适 应训练能够进一步降低词错误率,其实时性优于本征音说话人自适应算法.  相似文献   

12.
Recently, we proposed an improvement to the conventional eigenvoice (EV) speaker adaptation using kernel methods. In our novel kernel eigenvoice (KEV) speaker adaptation, speaker supervectors are mapped to a kernel-induced high dimensional feature space, where eigenvoices are computed using kernel principal component analysis. A new speaker model is then constructed as a linear combination of the leading eigenvoices in the kernel-induced feature space. KEV adaptation was shown to outperform EV, MAP, and MLLR adaptation in a TIDIGITS task with less than 10 s of adaptation speech. Nonetheless, due to many kernel evaluations, both adaptation and subsequent recognition in KEV adaptation are considerably slower than conventional EV adaptation. In this paper, we solve the efficiency problem and eliminate all kernel evaluations involving adaptation or testing observations by finding an approximate pre-image of the implicit adapted model found by KEV adaptation in the feature space; we call our new method embedded kernel eigenvoice (eKEV) adaptation. eKEV adaptation is faster than KEV adaptation, and subsequent recognition runs as fast as normal HMM decoding. eKEV adaptation makes use of multidimensional scaling technique so that the resulting adapted model lies in the span of a subset of carefully chosen training speakers. It is related to the reference speaker weighting (RSW) adaptation method that is based on speaker clustering. Our experimental results on Wall Street Journal show that eKEV adaptation continues to outperform EV, MAP, MLLR, and the original RSW method. However, by adopting the way we choose the subset of reference speakers for eKEV adaptation, we may also improve RSW adaptation so that it performs as well as our eKEV adaptation.  相似文献   

13.
该文报告了组合LPC参数以及基频F0的高斯混合模型(GMM)电话语音说话人自动识别技术的实验研究结果。该研究在基线试验中GMM使用16混合共分散对角矩阵,特征量为LPC倒谱系数。而在开发系统测试中分别利用语音的全发话区间和有声区间两部分参数增加基频参数进行试验,并给出实验比较结果。在50人电话通话开放集自动切分语音流实验中正确识别率为76.97%,而提案方法为80.29%,改善率为3.32%。接近人工切分语音流时的识别率82.34%。  相似文献   

14.
联合因子分析中的本征信道空间拼接方法   总被引:1,自引:1,他引:0  
何亮  史永哲  刘加 《自动化学报》2011,37(7):849-856
为了使联合因子分析适用于多种信道条件下的文本无关说话人识别,提出了一种本征信道空间的正交拼接法.在多信道条件下,可以通过混合数据法或简单拼接法估计本征信道空间,但前者存在空间掩盖,后者虽解决了空间掩盖但引入了空间重叠.本文首先证明说话人建模和测试的核心运算是斜投影,基于上述证明,通过将待拼接空间正交的方法移除了空间重叠.在NIST SRE 2008核心评测数据库上的实验表明,本文所提算法优于混合数据法和简单拼接法.  相似文献   

15.
This paper describes and discusses the "STBU" speaker recognition system, which performed well in the NIST Speaker Recognition Evaluation 2006 (SRE). STBU is a consortium of four partners: Spescom DataVoice (Stellenbosch, South Africa), TNO (Soesterberg, The Netherlands), BUT (Brno, Czech Republic), and the University of Stellenbosch (Stellenbosch, South Africa). The STBU system was a combination of three main kinds of subsystems: 1) GMM, with short-time Mel frequency cepstral coefficient (MFCC) or perceptual linear prediction (PLP) features, 2) Gaussian mixture model-support vector machine (GMM-SVM), using GMM mean supervectors as input to an SVM, and 3) maximum-likelihood linear regression-support vector machine (MLLR-SVM), using MLLR speaker adaptation coefficients derived from an English large vocabulary continuous speech recognition (LVCSR) system. All subsystems made use of supervector subspace channel compensation methods-either eigenchannel adaptation or nuisance attribute projection. We document the design and performance of all subsystems, as well as their fusion and calibration via logistic regression. Finally, we also present a cross-site fusion that was done with several additional systems from other NIST SRE-2006 participants.  相似文献   

16.
17.
In this paper, several feature extraction and channel compensation techniques found in state-of-the-art speaker verification systems are analyzed and discussed. For the NIST SRE 2006 submission, cepstral mean subtraction, feature warping, RelAtive SpecTrAl (RASTA) filtering, heteroscedastic linear discriminant analysis (HLDA), feature mapping, and eigenchannel adaptation were incrementally added to minimize the system's error rate. This paper deals with eigenchannel adaptation in more detail and includes its theoretical background and implementation issues. The key part of the paper is, however, the post-evaluation analysis, undermining a common myth that ldquothe more boxes in the scheme, the better the system.rdquo All results are presented on NIST Speaker Recognition Evaluation (SRE) 2005 and 2006 data.  相似文献   

18.
Gaussian mixture models (GMMs) are commonly used as the output density function for large-vocabulary continuous speech recognition (LVCSR) systems. A standard problem when using multivariate GMMs to classify data is how to accurately represent the correlations in the feature vector. Full covariance matrices yield a good model, but dramatically increase the number of model parameters. Hence, diagonal covariance matrices are commonly used. Structured precision matrix approximations provide an alternative, flexible, and compact representation. Schemes in this category include the extended maximum likelihood linear transform and subspace for precision and mean models. This paper examines how these precision matrix models can be discriminatively trained and used on state-of-the-art speech recognition tasks. In particular, the use of the minimum phone error criterion is investigated. Implementation issues associated with building LVCSR systems are also addressed. These models are evaluated and compared using large vocabulary continuous telephone speech and broadcast news English tasks.  相似文献   

19.
Multiple-cluster schemes, such as cluster adaptive training (CAT) or eigenvoice systems, are a popular approach for rapid speaker and environment adaptation. Interpolation weights are used to transform a multiple-cluster, canonical, model to a standard hidden Markov model (HMM) set representative of an individual speaker or acoustic environment. Maximum likelihood training for CAT has previously been investigated. However, in state-of-the-art large vocabulary continuous speech recognition systems, discriminative training is commonly employed. This paper investigates applying discriminative training to multiple-cluster systems. In particular, minimum phone error (MPE) update formulae for CAT systems are derived. In order to use MPE in this case, modifications to the standard MPE smoothing function and the prior distribution associated with MPE training are required. A more complex adaptive training scheme combining both interpolation weights and linear transforms, a structured transform (ST), is also discussed within the MPE training framework. Discriminatively trained CAT and ST systems were evaluated on a state-of-the-art conversational telephone speech task. These multiple-cluster systems were found to outperform both standard and adaptively trained systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号