期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

王波徐毅琼李弼程《微计算机信息》2006,22(29):55-56

本文在TMS320C6701EVM板的基础上实现一种快速的说话人识别系统。本文提出一种基于段级语音特征的说话人识别的快速算法,该算法在传统的GMM算法的基础上使用段级语音特征对测试语音进行数据量压缩,以减少计算时间。并基于车比雪夫和不等式提出了基于协方差模型的段级特征的失真测度描述。本文根据实验选择了段级特征语音段长度,实验表明该算法在不显著影响识别率的基础上有效地减少了算法延迟,提高了识别速度。相似文献

2.

基于段级特征主成分分析的说话人识别算法

储雯李银国徐洋孟祥涛《计算机应用》2013,33(7):1935-1937

为了提高说话人识别(SR)系统的运算速度,增强其鲁棒性,以现有的帧级语音特征为基础,提出了一种基于段级特征主成分分析的说话人识别算法。该算法在训练和识别阶段以段级特征代替帧级特征,然后用主成分分析方法对段级特征进行降维、去相关。实验结果表明,该算法的系统训练时间、测试时间分别为基线系统的47.8%、40.0%,同时识别率略有提高,抑制了噪声对说话人识别系统的影响。该结果验证了基于段级特征主成分分析的说话人识别算法在识别率有所提高的情况下取得了较快的识别速度,同时在不同噪声环境下的不同信噪比情况下均可以提高系统识别率。相似文献

3.

噪声环境下基于特征信息融合的说话人识别 总被引：1，自引：0，他引：1

叶寒生陶进绪张东文余斌《计算机仿真》2009,26(3)

针对在干净的语音环境下说话人识别率很高,但噪声环境下说话人识别率急剧下降的问题,提出了一种在噪声环境下,利用信噪比权重对说话人的特征信息MFCC系数和基音周期进行非线性融合,同时对MFCC特征参数进行基于帧信噪比权重得分,并同传统的高斯混合模型算法和基于FO-MFCC联合分布的特征融合方法,在噪声环境下分别进行了说话人识别的性能比较,同时对提出的融合算法进行了仿真实现.实验结果表明:在噪声的环境下方法相比上述传统说话人识别方法,性能有了明显的提高,在干净的语音环境下性能相当. 相似文献

4.

基于DTW的编码域说话人识别研究

李榕健于洪涛李邵梅《电子技术应用》2010,36(8)

相对解码重建后的语音进行说话人识别,从VoIP的语音流中直接提取语音特征参数进行说话人识别方法具有便于实现的优点,针对G.729编码域数据,研究基于DTW算法的快速说话人识别方法。实验结果表明,在相关的说话人识别中,DTW算法相比GMM在识别正确率和效率上有了很大提高。相似文献

5.

一种基于PCA的段级特征

张兴明王科人黄山奇《电子技术应用》2011,37(5)

提出了一种基于PCA的段级特征(PCAULF)。该特征以现有的帧级语音特征为基础,通过计算段级特征引入了语音的长时特性。对段级特征使用PCA降维,一方面去除由于引入段级特征带来的冗余,实现数据降维,提高识别速度;另一方面抑制了噪声对识别系统的影响,提高了段级特征的鲁棒性。在训练阶段,计算所有语音的段级特征,使用PCA方法得到变换矩阵;在测试阶段,先使用变换矩阵对段级特征进行降维,再进行判别。实验结果表明,采用该特征有效地提高了识别精度和速度,更加适用于实时说话人识别系统。相似文献

6.

基于共性特征选择的短时说话人识别方法

下载免费PDF全文

肖星星冯瑞《计算机工程》2012,38(24):171-174

现有说话人识别方法在短时语音条件下识别性能明显下降。为此,提出一种基于共性特征选择的短时说话人识别方法。利用说话人语音数据得到高斯混合模型,提取说话人之间的公共重叠部分,建立共性重叠模型和非重叠模型,根据这2个模型完成测试语音特征的选择,计算其在所有说话人非重叠模型中的相似度,并根据相似性最大化原则进行决策。实验结果表明,该方法具有较强的鲁棒性,且系统识别错误率较低。相似文献

7.

基于分布特征统计的说话人识别

下载免费PDF全文

李邵梅郭云飞卫红权《计算机工程与应用》2009,45(34):118-120

给出了基于公共码书的说话人分布特征的定义。提出了基于分布特征统计的说话人识别算法,根据所有参考说话人的训练语音建立公共码书,实现对语音特征空间的分类,统计各参考说话人训练语音的在公共码字上的分布特征进行建模。识别中引入双序列比对方法进行识别语音的分布特征统计与参考说话人模型间的相似度匹配,实现对说话人的辨认。实验表明,该方法保证识别率的情况下,进一步提高了基于VQ的说话人识别的速度。相似文献

8.

基于多特征i-vector的短语音说话人识别算法

孙念张毅林海波黄超《计算机应用》2018,38(10):2839-2843

当测试语音时长充足时,单一特征的信息量和区分性足够完成说话人识别任务,但是在测试语音很短的情况下,语音信号里缺乏充分的说话人信息,使得说话人识别性能急剧下降。针对短语音条件下的说话人信息不足的问题,提出一种基于多特征i-vector的短语音说话人识别算法。该算法首先提取不同的声学特征向量组合成一个高维特征向量,然后利用主成分分析（PCA）去除高维特征向量的相关性,使特征之间正交化,最后采用线性判别分析（LDA）挑选出最具区分性的特征,并且在一定程度上降低空间维度,从而实现更好的说话人识别性能。结合TIMIT语料库进行实验,同一时长的短语音（2 s）条件下,所提算法比基于i-vector的单一的梅尔频率倒谱系数（MFCC）、线性预测倒谱系数（LPCC）、感知对数面积比系数（PLAR）特征系统在等错误率（EER）上分别有相对72.16%、69.47%和73.62%的下降。不同时长的短语音条件下,所提算法比基于i-vector的单一特征系统在EER和检测代价函数（DCF）上大致都有50%的降低。基于以上两种实验的结果充分表明了所提算法在短语音说话人识别系统中可以充分提取说话人的个性信息,有利地提高说话人识别性能。相似文献

9.

一种基于K-SVD的说话人识别方法

马振张雄伟杨吉斌《计算机工程与应用》2012,48(34):112-115,135

为了充分提取语音中的个人特征信息,类比矢量量化,提出了一种基于K-均值奇异值分解(K-SVD)的说话人识别方法。利用K-SVD训练得到的字典可较好地保存语音信号中的个人特征信息。利用这一特性,通过K-SVD从训练数据中提取包含说话人个人特征信息的字典,利用该字典实现说话人识别。相对于传统方法,该方法能够更好地利用语音的稀疏性保存语音中的个人特征信息并减小重构误差。实验仿真结果表明,与基于矢量量化的说话人识别方法相比,该方法在多说话人的情况下具有更好的识别率,具有更高的实用价值。相似文献

10.

应用似然比框架的法庭说话人识别

王华朋杨军《数据采集与处理》2013,28(2):239

为了检验元音倒谱特征在法庭说话人识别中的性能,提出了使用元音稳定段美尔倒谱系数(Mel-frequeney eepstral coefficients,MFCC)作为识别特征的基于似然比的法庭说话人识别方法,并使用45人电话对话录音中元音/a/作为样本进行了测试.实验结果表明,该方法不仅能正确识别说话人,而且能根据当前嫌疑人样本和问题语音样本的差异,量化该语音样本作为证据的力度,为法庭提供科学合理的证据评估结果.与人工提取共振峰特征相比,自动特征提取的引入提高了工作效率,使识别系统的性能获得了大幅提升. 相似文献

11.

Dynamic visual features for audio–visual speaker verification

David Dean Sridha Sridharan 《Computer Speech and Language》2010,24(2):136-149

The cascading appearance-based (CAB) feature extraction technique has established itself as the state-of-the-art in extracting dynamic visual speech features for speech recognition. In this paper, we will focus on investigating the effectiveness of this technique for the related speaker verification application. By investigating the speaker verification ability of each stage of the cascade we will demonstrate that the same steps taken to reduce static speaker and environmental information for the visual speech recognition application also provide similar improvements for visual speaker recognition. A further study is conducted comparing synchronous HMM (SHMM) based fusion of CAB visual features and traditional perceptual linear predictive (PLP) acoustic features to show that higher complexity inherit in the SHMM approach does not appear to provide any improvement in the final audio–visual speaker verification system over simpler utterance level score fusion. 相似文献

12.

文本无关说话人识别中句级特征提取方法研究综述EI北大核心CSCD

陈晨韩纪庆陈德运何勇军《自动化学报》2022,48(3):664-688

句级(Utterance-level)特征提取是文本无关说话人识别领域中的重要研究方向之一.与只能刻画短时语音特性的帧级(Frame-level)特征相比,句级特征中包含了更丰富的说话人个性信息;且不同时长语音的句级特征均具有固定维度,更便于与大多数常用的模式识别方法相结合.近年来,句级特征提取的研究取得了很大的进展,鉴于其在说话人识别中的重要地位,本文对近期具有代表性的句级特征提取方法与技术进行整理与综述,并分别从前端处理、基于任务分段式与驱动式策略的特征提取方法,以及后端处理等方面进行论述,最后对未来的研究趋势展开探讨与分析. 相似文献

13.

基于GFCC与RLS的说话人识别抗噪系统研究

茅正冲王正创黄芳《计算机工程与应用》2015,51(10):215-218

为了提高说话人识别抗噪系统的性能,提出了将RLS自适应滤波器作为语音信号去噪的预处理器,进一步提高语音信号的信噪比,再通过Gammatone滤波器组,对去噪后的说话人语音信号进行处理,提取说话人语音信号的特征参数GFCC,进而将特征参数GFCC用于说话人识别系统中。仿真实验在高斯混合模型识别系统中进行。实验结果表明,采用这种方法应用于说话人识别抗噪系统,系统的识别率及鲁棒性都有明显的提高。相似文献

14.

Computationally Efficient and Robust BIC-Based Speaker Segmentation

Kotti M. Benetos E. Kotropoulos C. 《IEEE transactions on audio, speech, and language processing》2008,16(5):920-933

An algorithm for automatic speaker segmentation based on the Bayesian information criterion (BIC) is presented. BIC tests are not performed for every window shift, as previously, but when a speaker change is most probable to occur. This is done by estimating the next probable change point thanks to a model of utterance durations. It is found that the inverse Gaussian fits best the distribution of utterance durations. As a result, less BIC tests are needed, making the proposed system less computationally demanding in time and memory, and considerably more efficient with respect to missed speaker change points. A feature selection algorithm based on branch and bound search strategy is applied in order to identify the most efficient features for speaker segmentation. Furthermore, a new theoretical formulation of BIC is derived by applying centering and simultaneous diagonalization. This formulation is considerably more computationally efficient than the standard BIC, when the covariance matrices are estimated by other estimators than the usual maximum-likelihood ones. Two commonly used pairs of figures of merit are employed and their relationship is established. Computational efficiency is achieved through the speaker utterance modeling, whereas robustness is achieved by feature selection and application of BIC tests at appropriately selected time instants. Experimental results indicate that the proposed modifications yield a superior performance compared to existing approaches. 相似文献

15.

基于话者特征图案的BPNN话者模型 总被引：1，自引：0，他引：1

方绍武戴蓓倩《计算机学报》2002,25(5):556-560

该文提出了一种用于说话人识别的基于话者特征图案的BPNN话者模型，该话者模型解决了语音信号的时长变化与神经网络输入层结点数固定不变之间的矛盾。利用VQ技术对所有话者的语音样本训练出话者特征图案，再将语音样本对该特征图案进行映射，在映射域解决了语音样本的时间规正问题。同时，该方法还提高了映射域参数的模式分类能力。相似文献

16.

Emotional speech feature normalization and recognition based on speaker-sensitive feature clustering

Chengwei Huang Baolin Song Li Zhao 《International Journal of Speech Technology》2016,19(4):805-816

In this paper we propose a feature normalization method for speaker-independent speech emotion recognition. The performance of a speech emotion classifier largely depends on the training data, and a large number of unknown speakers may cause a great challenge. To address this problem, first, we extract and analyse 481 basic acoustic features. Second, we use principal component analysis and linear discriminant analysis jointly to construct the speaker-sensitive feature space. Third, we classify the emotional utterances into pseudo-speaker groups in the speaker-sensitive feature space by using fuzzy k-means clustering. Finally, we normalize the original basic acoustic features of each utterance based on its group information. To verify our normalization algorithm, we adopt a Gaussian mixture model based classifier for recognition test. The experimental results show that our normalization algorithm is effective on our locally collected database, as well as on the eNTERFACE’05 Audio-Visual Emotion Database. The emotional features achieved using our method are robust to the speaker change, and an improved recognition rate is observed. 相似文献

17.

Evaluating Plan Recognition Systems: Three Properties of a Good Explanation

James Mayfield 《Artificial Intelligence Review》2000,14(4-5):351-376

Plan recognition in a dialogue system is the process of explaining why an utterance was made, in terms of the plans and goals that its speaker was pursuing in making the utterance. I present a theory of how such an explanation of an utterance may be judged as to its merits as an explanation. I propose three criteria for making such judgments: applicability, grounding, and completeness. The first criterion is the applicability of the explanation to the needs of the system that will use it. The second criterion is the grounding of the explanation in what is already known of the speaker and of the dialogue. Finally, the third criterion is the completeness of the explanation's coverage of the goals that motivated the production of the utterance. An explanation of an utterance is a good explanation of that utterance to the extent that it meets these three criteria. In addition to forming the basis of a method for evaluating the merit of an explanation, these criteria are useful in designing and evaluating a plan recognition algorithm and its associated knowledge base. 相似文献