首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
该文研究了似然得分归一化方法的原理,建立了基于自适应GMM模型的说话人确认系统,并将非特定人的背景模型与特定人的cohort模型相结合,提出了混合归一化的方法。在电话语音条件下,该文比较了不同得分归一化方法对确认系统性能的影响。实验表明,在自适应GMM模型似然比得分的基础上,T-cohort与通用背景模型混合归一化能获得最佳识别效果。当错误拒绝率为5%时,该方法可以获得0.5%的错误接受率,远远低于采用通用背景模型归一化方法的2%。  相似文献   

2.
噪声鲁棒性是影响话者确认系统实用化的关键问题之一,为了提高系统的噪声鲁棒性,本文设计了基于子带隐Markov模型(HMM)和多层感知机(MLP)的话者确认系统,系统由多个子带系统所构成,对每个子带分别建立基于背景模型的连续HMM话者确认模型,采用MLP对各个子带HMM的输出进行非线性拟合,并利用MLP直接做确认判决,在与文本有关的话者确认实验中,本文提出的模型较常规基于背景模型的HMM话者模型在确认性能和噪声鲁棒性上均有所提高,实验进一步表明,利用MLP进行拟合和判决在一定程度上解决了话者确认阈值设置的困难,有效地提高了确认系统的鲁棒性。  相似文献   

3.
随着网络与语音信号处理技术的快速发展,把说话人识别系统应用于Internet,使其作为身份识别的一种方法是势在必行。文中介绍了一个基于TCP/IP的实时说话人确认系统,它基于C/S(客户/服务器)模型,采用TCP/IP,以期能够实现Internet上的语音登录系统。介绍了该系统的框架及具体算法,给出了实验结果及其分析。  相似文献   

4.
张涛涛  陈丽萍  戴礼荣 《信号处理》2016,32(10):1213-1219
在说话人确认中,特征端因子分析(Acoustic Factor Analysis, AFA)利用MPPCA(Mixtures of Probabilistic Principal Component Analyzers, MPPCA)算法在通用背景模型(Universal Background Model, UBM)的每个高斯上分别对特征降维以去除语音特征中文本、信道和噪声等信息的干扰,获得增强的说话人信息并用于提升说话人确认的性能。但是通用背景模型属于无监督的聚类方法,其每个高斯成分物理意义不够明确,不能区分不同说话人发不同音素时的情况。为解决这一问题,本文利用语音识别中的声学模型深度神经网络(Deep Neural Network, DNN)取代传统的通用背景模型并结合特征端因子分析分别对不同音素上的语音特征进行降维提取出说话人信息,进而提取DNN i-vector用于说话人确认。在RSR2015数据库PartIII上的实验结果表明该方法相对于基于UBM的特征端因子分析方法在男女测试集上等错误率(Equal Error Rate, EER)分别下降13.49%和22.43%.   相似文献   

5.
Unconstrained face verification aims to verify whether two specify images contain the same person. In this paper, we propose a deep Bayesian convolutional neural network (DBCNN) framework to extract facial features and measure their similarity for face verification in unconstrained conditions. Specifically, we design a deep convolutional neural network and construct a Bayesian probabilistic model by transferring the Bayesian likelihood ratio function into linear decision function. By training a decision line rather than finding a suitable threshold, we further enlarge the distances between inter-class and intra-class in unconstrained environment. Finally, we comprehensively evaluate our method on LFW, CACD-VS and MegaFace datasets. The test results on LFW and CACD-VS datasets show that our method can shrink intra-class variations significantly. The performance of our DBCNN model on MegaFace dataset proves that our model can achieve comparable performance to state-of-the-art methods on face verification with relative small training data and only one single network.  相似文献   

6.
为了解决多任务观测条件下时域流信号动态重构面临的块效应问题,该文基于重叠正交变换(LOT)和稀疏贝叶斯学习的贪婪重构框架先后提出了一种流信号多任务稀疏贝叶斯学习算法及其鲁棒增强型的改进算法,前者将LOT时域滑窗推广到多任务条件下,通过贝叶斯概率建模将未知的噪声精度的估计任务从信号重构中解耦并省略,后者进一步引入了重构不确定性的度量,提高了算法的鲁棒性和抑制误差积累的能力。基于浮标实测数据的实验结果表明,相比多任务重构领域代表性较强的时间多稀疏贝叶斯学习(TMSBL)和多任务压缩感知(MT-CS)算法,本文算法在不同信噪比、观测数目和任务数目条件下具有显著更高的重构精度、成功率和效率。  相似文献   

7.
针对基于因子分析模型的说话人确认系统评分的复杂性以及需要较大运算量的问题,文章直接利用话者因子的余弦距离相似度来计算评分。首先在训练阶段和测试阶段分别用因子分析的方法从语音中估计出话者因子,然后直接利用话者因子评分。对比SVM和其它的JFA-GMM-UBM话者确认系统,本文中所采用的系统训练阶段和测试阶段的流程相同,并且目标话者模型只需要存储话者因子,存储量少。在NIST2008数据库上的实验结果表明,余弦距离评分对比其它因子分析模型的评分方法,更加简单,并且话者确认系统的性能也有提高。  相似文献   

8.
A new speaker verification method is presented which is based on likelihood score normalisation and employs the global speaker model. As a result of the normalisation, common speech and environmental factors are removed from input utterances so that differences between speakers are emphasised. Experimental results of text-independent verification tests show the effectiveness of this method  相似文献   

9.
针对信息安全风险评估过程中专家评价意见的多样性以及不确定信息难以量化处理的问题,提出了一种基于改进的DS证据理论与贝叶斯网络(BN)结合的风险评估方法.首先,在充分研究信息安全风险评估流程和要素的基础上,建立了风险评估模型,确定风险影响因素;其次,根据评估模型并结合专家知识构建相应的贝叶斯网络模型,确定贝叶斯网络模型中的条件概率表;再次,利用基于权值分配和矩阵分析的改进DS证据理论融合多位专家对风险影响因素的评价意见;最后,根据贝叶斯网络模型的推理算法,计算被测信息系统处于不同风险等级的概率值,并对结果进行有效性分析.分析表明,将改进后的DS证据理论与贝叶斯网络应用到风险评估过程中,在一定程度上能够提高评估结果的可信度和直观性.  相似文献   

10.
This paper concerns robust and reliable speaker model training for text‐independent speaker verification. The baseline speaker modeling approach is the Gaussian mixture model (GMM). In text‐independent speaker verification, the amount of speech data may be different for speakers. However, we still wish the modeling approach to perform equally well for all speakers. Besides, the modeling technique must be least vulnerable against unseen data. A traditional approach for GMM training is expectation maximization (EM) method, which is known for its overfitting problem and its weakness in handling insufficient training data. To tackle these problems, variational approximation is proposed. Variational approaches are known to be robust against overtraining and data insufficiency. We evaluated the proposed approach on two different databases, namely KING and TFarsdat. The experiments show that the proposed approach improves the performance on TFarsdat and KING databases by 0.56% and 4.81%, respectively. Also, the experiments show that the variationally optimized GMM is more robust against noise and the verification error rate in noisy environments for TFarsdat dataset decreases by 1.52%.  相似文献   

11.
Speaker adaptive test normalization (ATnorm) is the most effective approach of the widely used score normalization in text-flldependent speaker verification, which selects speaker adaptive impostor cohorts with an extra development corpus in order to enhance the recognition performance. In this paper, an improved implementation of ATnorm that can offer overall significant advantages over the original ATnorm is presented. This method adopts a novel cross similarity measurement in speaker adaptive cohort model selection without an extra development corpus. It can achieve a comparable performance with the original ATnorm and reduce the computation complexity moderately. With the full use of the saved extra development corpus, the overall system performance can be improved significantly. The results are presented on NIST 2006 Speaker Recognition Evaluation data corpora where it is shown that this method provides significant improvements in system performance, with relatively 14.4% gain on equal error rate (EER) and 14.6% gain on decision cost function (DCF) obtained as a whole.  相似文献   

12.
Currently, many speaker recognition applications must handle speech corrupted by environmental additive noise without having a priori knowledge about the characteristics of noise. Some previous works in speaker recognition have used the missing feature (MF) approach to compensate for noise. In most of those applications, the spectral reliability decision step is performed using the signal to noise ratio (SNR) criterion, which attempts to directly measure the relative signal to noise energy at each frequency. An alternative approach to spectral data reliability has been used with some success in the MF approach to speech recognition. Here, we compare the use of this new criterion with the SNR criterion for MF mask estimation in speaker recognition. The new reliability decision is based on the extraction and analysis of several spectro-temporal features from across the entire speech frame, but not across the time, which highlight the differences between spectral regions dominated by speech and by noise. We call it the feature classification (FC) criterion. It uses several spectral features to establish spectrogram reliability unlike SNR criterion that relies only in one feature: SNR. We evaluated our proposal through speaker verification experiments, in Ahumada speech database corrupted by different types of noise at various SNR levels. Experiments demonstrated that the FC criterion achieves considerably better recognition accuracy than the SNR criterion in the speaker verification tasks tested.  相似文献   

13.
Automatic speaker verification: A review   总被引:1,自引:0,他引:1  
The relation of speaker verification to other pattern-recognition problems in speech is discussed, especially the distinction between speaker verification and speaker identification. The prospects for automatic speaker verification, its settings and applications are outlined. The techniques, evaluations, and implementations of various proposed speaker recognition systems are reviewed with special emphasis on issues peculiar to speaker verification. Two large-scale operating systems using different analysis techniques and applied to different settings are described.  相似文献   

14.
This paper introduces an adaptive and integrated utterance verification (UV) framework using minimum verification error (MVE) training as a new set of solutions suitable for real applications. UV is traditionally considered an add‐on procedure to automatic speech recognition (ASR) and thus treated separately from the ASR system model design. This traditional two‐stage approach often fails to cope with a wide range of variations, such as a new speaker or a new environment which is not matched with the original speaker population or the original acoustic environment that the ASR system is trained on. In this paper, we propose an integrated solution to enhance the overall UV system performance in such real applications. The integration is accomplished by adapting and merging the target model for UV with the acoustic model for ASR based on the common MVE principle at each iteration in the recognition stage. The proposed iterative procedure for UV model adaptation also involves revision of the data segmentation and the decoded hypotheses. Under this new framework, remarkable enhancement in not only recognition performance, but also verification performance has been obtained.  相似文献   

15.
In this paper, we investigate the estimation of switching activity in VLSI circuits using a graphical probabilistic model based on cascaded Bayesian networks (CBNs). First, we develop a theoretical analysis for Bayesian inferencing of switching activity and then derive upper bounds for certain circuit parameters which, in turn, are useful in establishing the cascade structure of the CBN model. We formulate an elegant framework for maintaining probabilistic consistency in the interfacing boundaries across the CBNs during the inference process using a tree-dependent (TD) probability distribution function. A TD distribution is an approximation of the true joint probability function over the switching variables, with the constraint that the underlying BN representation is a tree. The tree approximation of the true joint probability function can be arrived at by using a maximum weight spanning tree (MWST) built using pairwise mutual information about the switching occurring at pairs of signal lines on the boundary. Further, we show that the proposed TD distribution function can be used to model correlations among the primary inputs which is critical for accuracy in modeling of switching activity. Experimental results for ISCAS circuits are presented to illustrate the efficacy of the proposed CBN models.  相似文献   

16.
毛鹏飞  刘加 《电声技术》2009,33(11):56-59
实现了一个高性能、低成本、低功耗的声纹确认片上系统(SOC)。系统核心算法采用基于高斯混合模型以及通用背景模型(GMM—UBM)建模的说话人确认算法,采用了Mel倒谱系数(MFCC)作为说话人特征。此SOC系统不仅可进行声纹确认,而且包含说话人模型的训练,可实时更新说话人的人数和模型。系统的平均EER达到了0.0342。  相似文献   

17.
介绍了一个基于GMM实时说话人识别系统的设计与实现,系统具有实时说话人辨认和实时说话人确认功能。在实验室条件下,对不同的高斯混合密度个数及采样率进行了测试,测试了模型的自适应性能。实验表明系统具有较好的识别准确率。  相似文献   

18.
基于说话人分类技术的分级说话人识别研究   总被引:3,自引:0,他引:3       下载免费PDF全文
刘文举  孙兵  钟秋海 《电子学报》2005,33(7):1230-1233
识别正确率和抗噪性能固然是说话人识别的研究重点,但识别响应速度也是决定系统实用化的关键所在.本文成功地提出了基于说话人分类技术的分级说话人辨识方法,极大地提高了系统运行速度,随着注册说话人数的增多,较之传统的说话人辨识方法,其优势更加明显.同时在说话人确认中,该方法的使用,进一步提高了确认的正确率,有效地降低了错误接受和错误拒绝率.本文提出的可信度打分方法,也一定程度上改进了系统的性能.实验表明:基于说话人分类技术的说话人辨识方法使系统的运行速度平均提高了3.5倍,对说话人确认等误识率和最小误识率平均下降了53.75%.  相似文献   

19.
丘敬云  李琳 《电子世界》2012,(9):136-138
本文提出了一种新的说话人特征分类方法,基于计算动词相似度理论,建立距离和趋势的评价模型,通过计算特征向量与k-means算法聚类所得的聚类中心的相似度矩阵,将说话人个性特征从MFCC特征域映射到说话人相似度属性空间中,形成新的特征向量集,这样,每个说话人的特征向量将被聚为在距离和变化趋势上最具相似性的k分类。之后,利用GMM模型在属性空间内进行联合概率分析、匹配,建立新的说话人识别系统。本文采用标准TIMIT语音库与NIST语音库在该识别系统中进行一系列实验,结果表明,该基于新的优化特征分类的识别系统,对比传统的说话人识别系统,在等错误率上有很好的提高。  相似文献   

20.
This paper presents a generalized i-vector representation framework with phonetic tokenization and tandem features for text independent as well as text dependent speaker verification. In the conventional i-vector framework, the tokens for calculating the zero-order and first-order Baum-Welch statistics are Gaussian Mixture Model (GMM) components trained from acoustic level MFCC features. Yet besides MFCC, we believe that phonetic information makes another direction that can benefit the system performance. Our contribution in this paper lies in integrating phonetic information into the i-vector representation by several extensions, forming a more generalized i-vector framework. First, the tokens for calculating the zero-order statistics is extended from the MFCC trained GMM components to phonetic phonemes, trigrams and tandem feature trained GMM components, using phoneme posterior probabilities. Second, given the zero-order statistics (posterior probabilities on tokens), the feature used to calculate the first-order statistics is also extended from MFCC to tandem feature, and is not necessarily the same feature employed by the tokenizer. Third, the zero-order and first-order statistics vectors are then concatenated and represented by the simplified supervised i-vector approach followed by the standard Probabilistic Linear Discriminant Analysis (PLDA) back-end. We study different token and feature combinations, and we show that the feature level fusion of acoustic level MFCC features and phonetic level tandem features with GMM based i-vector representation achieves the best performance for text independent speaker verification. Furthermore, we demonstrate that the phonetic level phoneme constraints introduced by the tandem features help the text dependent speaker verification system to reject wrong password trials and improve the performance dramatically. Experimental results are reported on the NIST SRE 2010 common condition 5 female part task and the RSR 2015 part 1 female part task for text independent and text dependent speaker verification, respectively. For the text independent speaker verification task, the proposed generalized i-vector representation outperforms the i-vector baseline by relatively 53 % in terms of equal error rate (EER) and norm minDCF values. For the text dependent speaker verification task, our proposed approach also reduced the EER significantly from 23 % to 90 % relatively for different types of trials.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号