期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

屈丹张文林《电子与信息学报》2015,37(6):1350-1356

本征音子说话人自适应算法在自适应数据量充足时可以取得很好的自适应效果,但在自适应数据量不足时会出现严重的过拟合现象。为此该文提出一种基于本征音子说话人子空间的说话人自适应算法来克服这一问题。首先给出基于隐马尔可夫模型-高斯混合模型(HMM-GMM)的语音识别系统中本征音子说话人自适应的基本原理。其次通过引入说话人子空间对不同说话人的本征音子矩阵间的相关性信息进行建模;然后通过估计说话人相关坐标矢量得到一种新的本征音子说话人子空间自适应算法。最后将本征音子说话人子空间自适应算法与传统说话人子空间自适应算法进行了对比。基于微软语料库的汉语连续语音识别实验表明,与本征音子说话人自适应算法相比,该算法在自适应数据量极少时能大幅提升性能,较好地克服过拟合现象。与本征音自适应算法相比,该算法以较小的性能牺牲代价获得了更低的空间复杂度而更具实用性。相似文献

2.

Eigenvoices: A compact representation of speakers in model space

Patrick Nguyen Roland Kuhn Jean-Claude Junqua Nancy Niedzielski Christian Wellekens 《电信纪事》2000,55(3-4):163-171

In this article, we present a new approach to modeling speaker-dependent systems. The approach was inspired by the eigenfaces techniques used in face recognition. We build a linear vector space of low dimensionality, called eigenspace, in which speakers are located. The basis vectors of this space are called eigenvoices. Each eigenvoice models a direction of inter-speaker variability. The eigenspace is built during the training phase. Then, any speaker model can be expressed as a linear combination of eigenvoices. The benefits of this technique as set forth in this article reside in the reduction of the number of parameters that describe a model. Thereby we are able to reduce the number of parameters to estimate, as well as computation and/or storage costs. We apply the approach to speaker adaptation and speaker recognition. Some experimental results are supplied. 相似文献

3.

基于最大似然可变子空间的快速说话人自适应方法

张文林牛铜张连海李弼程《电子与信息学报》2012,34(3):571-575

该文提出一种基于最大似然可变子空间的说话人自适应方法。在训练阶段,对训练集中的说话人相关模型参数进行主分量分析,得到一组说话人基矢量;在自适应阶段,通过最大似然准则选取与当前说话人相关性最大的基矢量子集,进而将新的说话人相关模型限制在这组基矢量所张成的说话人子空间中,通过求解每一个基矢量对应的系数从而进行说话人自适应。与经典的基于子空间的说话人自适应方法不同,该文中的说话人子空间是在自适应阶段动态选取的,所需要估计的参数更少,在少量自适应数据下可以得到更稳健的自适应结果。在基于微软语料库的连续语音识别自适应实验中,给定极少量自适应数据(小于5 s),在有监督和无监督条件下,该文方法均优于经典的本征音自适应方法和基于最大似然线性回归的方法。相似文献

4.

基于码本的说话人自适应方法 总被引：1，自引：0，他引：1

吕津赵明生王作英《电子学报》2001,29(4):456-460

本文提出了一种基于码本的说话人自适应方法.它可以将变换方法和Bayes估计法这两大类说话人自适应方法的优点有机的结合起来,既能实现快速的说话人自适应,还具有良好的一致渐进性.自适应过程可分为两个阶段:在第一阶段,用由大量参考说话人的语音码本构成的线性组合来逼近用户的语音码本.此时只需要很少的自适应训练数据就可以用基于Rosen梯度投影法的优化算法计算出线性组合中各码本的最佳权值.在第二阶段,码本的最佳线性组合被用作用户码本的先验估计值.随着更多自适应训练数据的获得,系统对用户码本进一步进行Bayes估计,从而可以实现累进的自适应.作者将该方法应用于说话人无关的连续汉语语音识别系统.一系列的对比实验表明该自适应方法很有前途. 相似文献

5.

广播音频目标语音检测技术研究

吕兰兰《数字技术与应用》2009,(11):125-126

针对广播电视新闻节目中的音频信息的特点,利用说话人检测技术,提出了目标语音的检测和定位算法,可以快速挖掘和定位特定发言人或主持人的相关信息,效果良好。相似文献

6.

基于分层采样粒子滤波的麦克风阵列说话人跟踪方法 总被引：2，自引：0，他引：2

金乃高殷福亮陈喆《电子学报》2008,36(1):194-198

针对噪声与混响环境下的说话人跟踪问题,本文提出了一种基于粒子滤波的麦克风阵列声源定位与跟踪方法.该方法在粒子滤波框架下,将无混响影响的语音建立信号作为观测信息,通过计算麦克风阵列波束形成器的输出能量来构建似然函数,同时考虑语音信号不同频率成分在声源定位中的作用,利用分层采样方法提高粒子的采样效率.实验结果表明,本文方法提高了说话人声源跟踪系统的抗噪声与抗混响能力. 相似文献

7.

Indexing Techniques for Power Management in Multi-Attribute Data Broadcast

Hu Qinglong Lee Wang-Chien Lee Dik Lun 《Mobile Networks and Applications》2001,6(2):185-197

In this paper, we discuss the power conservative indexing techniques for managing multi-attribute data broadcast on wireless channels. These indexing techniques, namely, index tree, signature and hybrid, aim at improving the battery power consumption of mobile clients. By taking into account the broadcast management factors such as clustering and scheduling, these three indexing schemes may significantly reduce tune-in time while maintaining a reasonable access time. Cost models for single and multi-attribute query processing are developed. Our performance evaluation shows that the signature and hybrid methods are superior to the index tree method. 相似文献

8.

Audio-Visual Speaker Recognition for Video Broadcast News

Benoît Maison Chalapathy Neti Andrew Senior 《The Journal of VLSI Signal Processing》2001,29(1-2):71-79

Audio-based speaker identification degrades severely when there is a mismatch between training and test conditions due either to channel or to noise. In this paper, we explore various techniques to combine video based speaker identification with audio-based speaker identification to improve the performance under mismatched conditions. Specifically, we explore techniques to optimally determine the relative weights of the independent decisions based on audio and video to achieve the best combination. Experiments on video broadcast news data show that significant improvements can be achieved by the fusion in acoustically degraded conditions. 相似文献

9.

Text‐Independent Speaker Verification Using Variational Gaussian Mixture Model

Mohammad Hossein Moattar Mohammad Mehdi Homayounpour 《ETRI Journal》2011,33(6):914-923

This paper concerns robust and reliable speaker model training for text‐independent speaker verification. The baseline speaker modeling approach is the Gaussian mixture model (GMM). In text‐independent speaker verification, the amount of speech data may be different for speakers. However, we still wish the modeling approach to perform equally well for all speakers. Besides, the modeling technique must be least vulnerable against unseen data. A traditional approach for GMM training is expectation maximization (EM) method, which is known for its overfitting problem and its weakness in handling insufficient training data. To tackle these problems, variational approximation is proposed. Variational approaches are known to be robust against overtraining and data insufficiency. We evaluated the proposed approach on two different databases, namely KING and TFarsdat. The experiments show that the proposed approach improves the performance on TFarsdat and KING databases by 0.56% and 4.81%, respectively. Also, the experiments show that the variationally optimized GMM is more robust against noise and the verification error rate in noisy environments for TFarsdat dataset decreases by 1.52%. 相似文献

10.

Adaptation of hidden Markov model for telephone speech recognitionand speaker adaptation

Chien J.-T. Wang H.-C. 《Vision, Image and Signal Processing, IEE Proceedings -》1997,144(3):129-135

The authors propose a channel compensation method for the hidden Markov model (HMM) parameters in automatic speech recognition. The proposed approach is to adapt the existing reference models to a new channel environment by using a small amount of adaptation data. The concept of HMM parameter adaptation by incorporating the corresponding phone-dependent channel compensation (PDCC) vectors is applied to improve the performance of speech recognition. Two extended PDCC techniques are presented. One is based on the refinement of PDCC using vector quantisation. The other is based on the interpolation of compensation vectors. Both techniques are evaluated on the experiments on telephone speech recognition and speaker adaptation. The experimental results show that the performance can be significantly improved 相似文献

11.

基于分类特征映射的SVM话者确认

贺庆玮李辉许敏强《通信技术》2010,43(3):147-149

为了解决与文中无关的话者确认,大量训练样本数据所导致的建立支持向量机SVM（SupportVectorMachine）话者模型困难,文中提出了一种基于基音分类特征映射和支持向量机的话者确认系统,首先根据基音周期将语音倒谱参数在特征空间上分类,再利用GMM-UBM结构进行特征映射,获得每个特征子空间中的话者特征参数并建立SVM话者模型。基音分类特征映射不仅使得样本数据极大地压缩,而且让子空间中SVM分类界面具有更好的区分性,因此,对各分类子系统评分融合之后的总系统具有更好话者确认性能。在NIST’06数据库上的实验证明了该方法的有效性。相似文献

12.

基于高斯相似度分析的插值自适应算法

下载免费PDF全文

吕萍王作英陆大《电子学报》2001,29(Z1):1759-1761

快速说话人自适应算法在非特定人连续语音识别的应用中有重要意义.现在流行的自适应算法多数只考虑均值的自适应.本文提出的自适应算法可以快速的对协方差矩阵进行自适应.该算法是用高斯相似度度量协方差矩阵间的距离,并由此测度建立了反映协方差矩阵结构关系的二叉决策树.树的每个中间节点包含一个类质心.在决策树基础上,训练多个与特定人模型相关的类质心.自适应时,通过对这些类质心进行线性插值得到自适应的协方差矩阵.实验结果表明,该方法能够在仅有一句自适应数据的情况下,使系统误识率由29.49%下降到27.55%. 相似文献

13.

循环相关匹配滤波器设计 总被引：10，自引：0，他引：10

李虎生刘加刘润生《电子学报》2003,31(1):103-108

在谱相关分析的基础上,讨论了对循环平稳信号进行最佳滤波的问题,推导得到了基于最大输出信噪比准则的循环相关匹配滤波器的解析表式.然而,由于该滤波器性能与所选取的循环频率是相关的,单循环频率循环相关匹配滤波器存在固有的缺陷-信号能量利用不充分.为此,研究了多循环频率循环相关匹配滤波器组的设计方法,在最大输出信噪比准则约束下确定了滤波器组的优化结构.仿真实验比较了谱相关分析方法和循环相关匹配滤波方法,对调幅信号和BPSK信号的仿真实验结果证实了文章理论分析得到的结果. 相似文献

14.

最大后验估计和最近邻线性回归结合的说话人自适应方法

下载免费PDF全文

何磊武健方棣棠吴文虎《电子学报》2000,28(11):55-58

本文提出一种新的说话人自适应方法:最大后验(MAP)估计与最近邻线性回归(NNLR)结合的自适应,利用模型近邻信息和MAP自适应结果,建立线性回归模型,对没有自适应数据的模型完成模型调整.实验证明,NNLR要优于另一种用于MAP自适应框架的模型插值方法:向量域平滑(VFS). 相似文献

15.

基于加权子空间拟合的声源定位与跟踪方法

金乃高殷福亮陈喆《电子与信息学报》2008,30(9):2134-2137

麦克风阵列声源定位可为复杂环境下的说话人空间位置估计问题提供一种有效的解决方案。该文基于粒子滤波框架,提出了一种加权子空间拟合声源定位与跟踪方法。该方法将窄带子空间拟合算法的代价函数推广至宽带情形,构建了一种适用于宽带语音信号的似然函数,并结合说话人的运动模型估计声源的位置。计算机仿真与实测结果验证了该方法的有效性。相似文献

16.

似然得分归一化及其在与文本无关说话人确认中的应用

邓浩江杜利民万洪杰《电子与信息学报》2005,27(7):1025-1029

该文研究了似然得分归一化方法的原理,建立了基于自适应GMM模型的说话人确认系统,并将非特定人的背景模型与特定人的cohort模型相结合,提出了混合归一化的方法。在电话语音条件下,该文比较了不同得分归一化方法对确认系统性能的影响。实验表明,在自适应GMM模型似然比得分的基础上,T-cohort与通用背景模型混合归一化能获得最佳识别效果。当错误拒绝率为5%时,该方法可以获得0.5%的错误接受率,远远低于采用通用背景模型归一化方法的2%。相似文献

17.

Spatially constrained level-set tracking and segmentation of non-rigid objects

《Journal of Visual Communication and Image Representation》2016

Level-set is a widely used technique in segmentation-based tracking due to its flexibility in handling 2D topological changes and computational efficiency. Most existing level-set models aim at grouping pixels that have similar features into a region, without consideration of the spatial relationship of these pixels. In this paper, we present a novel level-set tracking method that incorporates spatial information to improve the robustness and accuracy of tracking non-rigid objects. Both tracking and segmentation are performed in a unified probabilistic framework, with additional spatial constraints from a part-based model—the Hough Forests. In the stage of tracking, the rigid motion of the target object is estimated by rigid registration in both the color space and the Hough voting space. Then in the stage of segmentation, some support points are obtained from back-projection, and guide the level-set evolution to capture the shape deformation. We conduct quantitative evaluation on two recently proposed public benchmarks: a non-rigid object tracking dataset and the CVPR2013 online tracking benchmark, involving 61 sequences in total. The experimental results demonstrate that our tracking method performs comparably to the state-of-the-arts in the CVPR2013 benchmark, while shows significantly improved performance in tracking non-rigid objects. 相似文献

18.

高性能汉语数码串语音识别 总被引：9，自引：0，他引：9

下载免费PDF全文

李虎生刘加刘润生《电子学报》2001,29(5):595-599

本文给出了一个高性能汉语数码串非特定人连续语音识别系统,其声学模型基于Mel倒谱系数和连续HMM,识别时采用多候选帧同步搜索算法,并采用了MCE算法进行训练以提高系统的区分能力,实验证明该系统的识别率为94.8%(不定长数字串)和96.8%(定长数字串).为增强系统的实用性,本文还研究了基于MAP算法的说话人自适应算法和基于置信度的拒识算法.在进行自适应后,误识率可相对下降40%以上,在拒绝掉5%的正确语音时,系统识别率可以上升到96.9%(不定长数字串)和98.7%(定长数字串). 相似文献

19.

Fast speaker adaptation using extended diagonal linear transformation for deep neural networks

Donghyun Kim Sanghun Kim 《ETRI Journal》2019,41(1):109-116

This paper explores new techniques that are based on a hidden‐layer linear transformation for fast speaker adaptation used in deep neural networks (DNNs). Conventional methods using affine transformations are ineffective because they require a relatively large number of parameters to perform. Meanwhile, methods that employ singular‐value decomposition (SVD) are utilized because they are effective at reducing adaptive parameters. However, a matrix decomposition is computationally expensive when using online services. We propose the use of an extended diagonal linear transformation method to minimize adaptation parameters without SVD to increase the performance level for tasks that require smaller degrees of adaptation. In Korean large vocabulary continuous speech recognition (LVCSR) tasks, the proposed method shows significant improvements with error‐reduction rates of 8.4% and 17.1% in five and 50 conversational sentence adaptations, respectively. Compared with the adaptation methods using SVD, there is an increased recognition performance with fewer parameters. 相似文献

20.

基于说话人分类技术的分级说话人识别研究 总被引：3，自引：0，他引：3

下载免费PDF全文

刘文举孙兵钟秋海《电子学报》2005,33(7):1230-1233

识别正确率和抗噪性能固然是说话人识别的研究重点,但识别响应速度也是决定系统实用化的关键所在.本文成功地提出了基于说话人分类技术的分级说话人辨识方法,极大地提高了系统运行速度,随着注册说话人数的增多,较之传统的说话人辨识方法,其优势更加明显.同时在说话人确认中,该方法的使用,进一步提高了确认的正确率,有效地降低了错误接受和错误拒绝率.本文提出的可信度打分方法,也一定程度上改进了系统的性能.实验表明:基于说话人分类技术的说话人辨识方法使系统的运行速度平均提高了3.5倍,对说话人确认等误识率和最小误识率平均下降了53.75%. 相似文献