期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于通用背景-联合估计(UB-JE)的说话人识别方法 总被引：2，自引：1，他引：1

汪海彬郭剑毅毛存礼余正涛《自动化学报》2018,44(10):1888-1895

在说话人识别中,有效的识别方法是核心.近年来,基于总变化因子分析（i-vector）方法成为了说话人识别领域的主流,其中总变化因子空间的估计是整个算法的关键.本文结合常规的因子分析方法提出一种新的总变化因子空间估计算法,即通用背景—联合估计（Universal background-joint estimation algorithm,UB-JE）算法.首先,根据高斯混合—通用背景模型（Gaussian mixture model-universal background model,GMM-UBM）思想提出总变化矩阵通用背景（UB）算法;其次,根据因子分析理论结合相关文献提出了一种总变化矩阵联合估计（JE）算法;最后,将两种算法相结合得到通用背景—联合估计（UB-JE）算法.采用TIMIT和MDSVC语音数据库,结合i-vector方法将所提的算法与传统算法进行对比实验.结果显示,等错误率（Equal error rate,EER）和最小检测代价函数（Minimum detection cost function,MinDCF）分别提升了8.3%与6.9%,所提方法能够提升i-vector方法的性能. 相似文献

2.

复杂环境下基于自适应深度神经网络的鲁棒语音识别

张开生赵小芬《计算机工程与科学》2022,44(6):1105-1113

在连续语音识别系统中,针对复杂环境（包括说话人及环境噪声的多变性）造成训练数据与测试数据不匹配导致语音识别率低下的问题,提出一种基于自适应深度神经网络的语音识别算法。结合改进正则化自适应准则及特征空间的自适应深度神经网络提高数据匹配度;采用融合说话人身份向量i-vector及噪声感知训练克服说话人及环境噪声变化导致的问题,并改进传统深度神经网络输出层的分类函数,以保证类内紧凑、类间分离的特性。通过在TIMIT英文语音数据集和微软中文语音数据集上叠加多种背景噪声进行测试,实验结果表明,相较于目前流行的GMM-HMM和传统DNN语音声学模型,所提算法的识别词错误率分别下降了5.151%和3.113%,在一定程度上提升了模型的泛化性能和鲁棒性。相似文献

3.

语音识别中基于i-vector的说话人归一化研究

李亚琦黄浩《现代计算机》2014,(5):3-7

i-vector是反映说话人声学差异的一种重要特征,在目前的说话人识别和说话人验证中显示了有效性。将i-vector应用于语音识别中的说话人的声学特征归一化,对训练数据提取i-vector并利用LBG算法进行无监督聚类．然后对各类分别训练最大似然线性变换并使用说话人自适应训练来实现说话人的归一化。将变换后的特征用于训练和识别．实验表明该方法能够提高语音识别的性能。相似文献

4.

高斯PLDA在说话人确认中的应用及其联合估计

许云飞杨海周若华颜永红《自动化学报》2014,40(6):1068-1074

近年来,基于总变化因子的说话人识别方法成为说话人识别领域的主流方法.其中,概率线性鉴别分析（Probabilistic linear discriminant analysis,PLDA）因其优异的性能而得到学者们的广泛关注.然而,在估计PLDA模型时,传统的因子分析方法只更新模型空间,因此,模型均值不能很好地与更新后的模型空间耦合.提出联合估计法对模型均值和模型空间同时估计,得到更为严格的期望最大化更新公式,在美国国家标准与技术局说话人识别评测2010扩展测试数据库以及2012核心测试数据库上,等错率得到一定提升. 相似文献

5.

特征空间本征音说话人自适应

屈丹杨绪魁张文林《自动化学报》2015,41(7):1244-1252

提出了特征空间本征音说话人自适应算法,该方法首先借鉴RATZ 算法的思想,采用高斯混合模型对特征空间中的说话人信息进行建模;其次利用子空间方法实现对特征补偿项的估计,减少估计参数的数量,在对特征空间精确建模的同时,降低了算法对自适应数据量的需求.基于微软语料库的中文连续语音识别实验表明,该算法在自适应数据量极少时仍能取得较好的性能,配合说话人自适应训练能够进一步降低词错误率,其实时性优于本征音说话人自适应算法. 相似文献

6.

基于因子分析建模的电话语音说话人聚类

吴奎宋彦戴礼荣《模式识别与人工智能》2013,26(1):1-5

现有基于混合高斯模型的说话人聚类方法主要依据最大后验准则,从通用背景模型中自适应得到类别的混合高斯模型,然而自适应数据较少,模型的准确性不够。对此,文中尝试基于本征语音(EV)空间和全变化(TV)空间分析的两种因子分析建模方法,通过对差异空间的建模,减少估计类别混合高斯模型时需要估计的参数个数。结果表明,在美国国家标准技术研究所2008年说话人识别评测的电话语音数据集上,相对于基于最大后验概率准则的基线系统而言,文中所使用的基于EV和TV空间分析的建模方法都可使聚类错误率有较大幅度的下降,并且TV空间分析建模相对于EV空间分析建模能获得更低的聚类错误率。相似文献

7.

一种新的基于子空间的说话人自适应方法 总被引：1，自引：0，他引：1

张文林张卫强刘加李弼程屈丹《自动化学报》2011,37(12):1495-1502

提出了一种新的基于子空间的快速说话人自适应方法.该方法在本征音(Eigen-voice, EV)自适应方法基础上,进一步在音子空间寻找低维子空间, 得到更为紧凑的“说话人--音子”联合子空间.该子空间不仅包含了说话人间的模型参数相关性信息,而且对音子间的模型参数相关性信息也进行了显式建模,在大大降低模型存储量的同时更为全面地反映模型参数的先验信息.在基于连续语音识别的无监督自适应实验中,在少量的自适应数据条件下,新方法取得了比最大似然线性回归和聚类最大似然线性基方法更好的效果. 相似文献

8.

说话人识别中的串行因子分析

郭武戴礼荣王仁华《模式识别与人工智能》2009,22(4)

在基于因子分析的说话人识别中,提出串行训练载荷矩阵的方法.在载荷矩阵训练中,采用串行的方式训练得到说话人因子矩阵、对角阵(残差矩阵)和信道空间矩阵.在说话人注册中,将以上3个载荷矩阵拼接,采用联合估计的方法得到每个说话人的因子.采用这种策略可有效解决因子分析中的饱和问题.在NIST SRE 2006年核心测试数据库上等错误率能达到3.65%. 相似文献

9.

改进的模块2DPCA人脸识别方法 总被引：1，自引：0，他引：1

下载免费PDF全文

张龙翔《计算机工程与应用》2010,46(13):147-150

提出了一种基于类内自适应加权平均值的模块2DPCA人脸识别方法。该算法对每一类训练样本中每个训练样本的每一子块求类内自适应加权平均值,并用类内自适应加权平均值对训练样本类内的相应子块进行规范化处理,然后由所有规范化后的子块构成总体散布矩阵,从而得到最优投影矩阵;由训练集的全体子块的加权平均值对训练样本的子块和测试样本的子块进行规范化后投影到最优投影矩阵,得到识别特征;最后用最近距离分类器分类。在ORL人脸库上的实验结果表明,提出的方法在识别性能上明显优于2DPCA方法和普通模块2DPCA方法。相似文献

10.

联合因子分析中的本征信道空间拼接方法 总被引：1，自引：1，他引：0

何亮史永哲刘加《自动化学报》2011,37(7):849-856

为了使联合因子分析适用于多种信道条件下的文本无关说话人识别,提出了一种本征信道空间的正交拼接法.在多信道条件下,可以通过混合数据法或简单拼接法估计本征信道空间,但前者存在空间掩盖,后者虽解决了空间掩盖但引入了空间重叠.本文首先证明说话人建模和测试的核心运算是斜投影,基于上述证明,通过将待拼接空间正交的方法移除了空间重叠.在NIST SRE 2008核心评测数据库上的实验表明,本文所提算法优于混合数据法和简单拼接法. 相似文献

11.

A study on the roles of total variability space and session variability modeling in speaker recognition

A. K. Sarkar J. F. Bonastre D. Matrouf 《International Journal of Speech Technology》2016,19(1):111-120

Speaker verification (SV) using i-vector concept becomes state-of-the-art. In this technique, speakers are projected onto the total variability space and represented by vectors called i-vectors. During testing, the i-vectors of the test speech segment and claimant are conditioned to compensate for the session variability before scoring. So, i-vector system can be viewed as two processing blocks: one is total variability space and the other is post-processing module. Several questions arise, such as, (i) which part of the i-vector system plays a major role in speaker verification: total variability space or post-processing task; (ii) is the post-processing module intrinsic to the total variability space? The motivation of this paper is to partially answer these questions by proposing several simpler speaker characterization systems for speaker verification, where speakers are represented by their speaker characterization vectors (SCVs). The SCVs are obtained by uniform segmentation of the speakers gaussian mixture models (GMMs)- and maximum likelihood linear regression (MLLR) super-vectors. We consider two adaptation approaches for GMM super-vector: one is maximum a posteriori and other is MLLR. Similarly to the i-vector, SCVs are post-processed for session variability compensation during testing. The proposed system shows promising performance when compared to the classical i-vector system which indicates that the post-processing task plays an major role in i-vector based SV system and is not intrinsic to the total variability space. All experimental results are shown on NIST 2008 SRE core condition. 相似文献

12.

基于T矩阵归一化PLDA的说话人确认

缑新科王跃《计算机与现代化》2017,(10):53

利用i-vector/PLDA模型进行说话人确认时,对于不定时间的语音,由于将长度归一化后的i-vector转化到PLDA模型时,伴随着不确定性的扭曲和缩放,影响识别率。本文通过对全变量空间矩阵T的列向量执行归一化,代替在PLDA模型上对i-vector进行长度归一化,避免因在i-vector上执行长度归一化,导致转移到PLDA模型上产生不良的扭曲。实验结果表明,该方法得到和长度归一化相似的效果,部分效果要优于长度归一化。相似文献

13.

基于WLDA和i-稀疏表示分类的说话人确认

邢玉娟曹晓丽谭萍李恒杰《计算机工程与应用》2016,52(13):173-176

为了提高信道变化下说话人确认系统的识别率和鲁棒性,提出一种基于i-向量和加权线性判别分析的稀疏表示分类算法。首先借助于加权线性判别分析的信道补偿和降维性能,消除i-向量中信道干扰信息并降低i-向量的维数;紧接着在i-向量集上构建训练语音样本过完备字典矩阵,采用MAP算法求解测试语音在字典矩阵上的稀疏系数向量,最后利用稀疏系数向量重构测试语音样本,根据重构误差确定目标说话人。仿真实验结果验证了该算法的有效性和可行性。相似文献

14.

基于DNN处理的鲁棒性I-Vector说话人识别算法

下载免费PDF全文

王昕张洪冉《计算机工程与应用》2018,54(22):167-172

提出了一种将基于深度神经网络（Deep Neural Network,DNN）特征映射的回归分析模型应用到身份认证矢量（identity vector,i-vector）/概率线性判别分析（Probabilistic Linear Discriminant Analysis,PLDA）说话人系统模型中的方法。DNN通过拟合含噪语音和纯净语音i-vector之间的非线性函数关系,得到纯净语音i-vector的近似表征,达到降低噪声对系统性能影响的目的。在TIMIT数据集上的实验验证了该方法的可行性和有效性。相似文献

15.

Two-space variability compensation technique for speaker verification in short length and reverberant environments

Flavio J. Reyes-Díaz Gabriel Hernández-Sierra José R. Calvo de Lara 《International Journal of Speech Technology》2017,20(3):475-485

The performance of state-of-the-art speaker verification in uncontrolled environment is affected by different variabilities. Short duration variability is very common in these scenarios and causes the speaker verification performance to decrease quickly while the duration of verification utterances decreases. Linear discriminant analysis (LDA) is the most common session variability compensation algorithm, nevertheless it presents some shortcomings when trained with insufficient data. In this paper we introduce two methods for session variability compensation to deal with short-length utterances on i-vector space. The first method proposes to incorporate the short duration variability information in the within-class variance estimation process. The second proposes to compensate the session and short duration variabilities in two different spaces with LDA algorithms (2S-LDA). First, we analyzed the behavior of the within and between class scatters in the first proposed method. Then, both proposed methods are evaluated on telephone session from NIST SRE-08 for different duration of the evaluation utterances: full (average 2.5 min), 20, 15, 10 and 5 s. The 2S-LDA method obtains good results on different short-length utterances conditions in the evaluations, with a EER relative average improvement of 1.58%, compared to the best baseline (WCCN[LDA]). Finally, we applied the 2S-LDA method in speaker verification under reverberant environment, using different reverberant conditions from Reverb challenge 2013, obtaining an improvement of 8.96 and 23% under matched and mismatched reverberant conditions, respectively. 相似文献

16.

Supervector-based approaches in a discriminative framework for speaker verification in noisy environments

Sourjya Sarkar K. Sreenivasa Rao 《International Journal of Speech Technology》2017,20(2):387-416

This paper explores the robustness of supervector-based speaker modeling approaches for speaker verification (SV) in noisy environments. In this paper speaker modeling is carried out in two different frameworks: (i) Gaussian mixture model-support vector machine (GMM-SVM) combined method and (ii) total variability modeling method. In the GMM-SVM combined method, supervectors obtained by concatenating the mean of an adapted speaker GMMs are used to train speaker-specific SVMs during the training/enrollment phase of SV. During the evaluation/testing phase, noisy test utterances transformed into supervectors are subjected to SVM-based pattern matching and classification. In the total variability modeling method, large size supervectors are reduced to a low dimensional channel robust vector (i-vector) prior to SVM training and subsequent evaluation. Special emphasis has been laid on the significance of a utterance partitioning technique for mitigating data-imbalance and utterance duration mismatches. An adaptive boosting algorithm is proposed in the total variability modeling framework for enhancing the accuracy of SVM classifiers. Experiments performed on the NIST-SRE-2003 database with training and test utterances corrupted with additive noises indicate that the aforementioned modeling methods outperform the standard GMM-universal background model (GMM-UBM) framework for SV. It is observed that the use of utterance partitioning and adaptive boosting in the speaker modeling frameworks result in substantial performance improvements under degraded conditions. 相似文献

17.

Improved i-vector extraction technique for speaker verification with short utterances

Arnab Poddar Md Sahidullah Goutam Saha 《International Journal of Speech Technology》2018,21(3):473-488

A major challenge in ASV is to improve performance with short speech segments for end-user convenience in real-world applications. In this paper, we present a detailed analysis of ASV systems to observe the duration variability effects on state-of-the-art i-vector and classical Gaussian mixture model-universal background model (GMM-UBM) based ASV systems. We observe an increase in uncertainty of model parameter estimation for i-vector based ASV with speech of shorter duration. In order to compensate the effect of duration variability in short utterances, we have proposed adaptation technique for Baum-Welch statistics estimation used to i-vector extraction. Information from pre-estimated background model parameters are used for adaptation method. The ASV performance with the proposed approach is considerably superior to the conventional i-vector based system. Furthermore, the fusion of proposed i-vector based system and GMM-UBM further improves the ASV performance, especially for short speech segments. Experiments conducted on two speech corpora, NIST SRE 2008 and 2010, have shown relative improvement in equal error rate (EER) in the range of 12–20%. 相似文献