共查询到20条相似文献,搜索用时 31 毫秒
1.
本征音子说话人自适应算法在自适应数据量充足时可以取得很好的自适应效果,但在自适应数据量不足时会出现严重的过拟合现象。为此该文提出一种基于本征音子说话人子空间的说话人自适应算法来克服这一问题。首先给出基于隐马尔可夫模型-高斯混合模型(HMM-GMM)的语音识别系统中本征音子说话人自适应的基本原理。其次通过引入说话人子空间对不同说话人的本征音子矩阵间的相关性信息进行建模;然后通过估计说话人相关坐标矢量得到一种新的本征音子说话人子空间自适应算法。最后将本征音子说话人子空间自适应算法与传统说话人子空间自适应算法进行了对比。基于微软语料库的汉语连续语音识别实验表明,与本征音子说话人自适应算法相比,该算法在自适应数据量极少时能大幅提升性能,较好地克服过拟合现象。与本征音自适应算法相比,该算法以较小的性能牺牲代价获得了更低的空间复杂度而更具实用性。 相似文献
2.
Patrick Nguyen Roland Kuhn Jean-Claude Junqua Nancy Niedzielski Christian Wellekens 《电信纪事》2000,55(3-4):163-171
In this article, we present a new approach to modeling speaker-dependent systems. The approach was inspired by the eigenfaces techniques used in face recognition. We build a linear vector space of low dimensionality, called eigenspace, in which speakers are located. The basis vectors of this space are called eigenvoices. Each eigenvoice models a direction of inter-speaker variability. The eigenspace is built during the training phase. Then, any speaker model can be expressed as a linear combination of eigenvoices. The benefits of this technique as set forth in this article reside in the reduction of the number of parameters that describe a model. Thereby we are able to reduce the number of parameters to estimate, as well as computation and/or storage costs. We apply the approach to speaker adaptation and speaker recognition. Some experimental results are supplied. 相似文献
3.
该文提出一种基于最大似然可变子空间的说话人自适应方法。在训练阶段,对训练集中的说话人相关模型参数进行主分量分析,得到一组说话人基矢量;在自适应阶段,通过最大似然准则选取与当前说话人相关性最大的基矢量子集,进而将新的说话人相关模型限制在这组基矢量所张成的说话人子空间中,通过求解每一个基矢量对应的系数从而进行说话人自适应。与经典的基于子空间的说话人自适应方法不同,该文中的说话人子空间是在自适应阶段动态选取的,所需要估计的参数更少,在少量自适应数据下可以得到更稳健的自适应结果。在基于微软语料库的连续语音识别自适应实验中,给定极少量自适应数据(小于5 s),在有监督和无监督条件下,该文方法均优于经典的本征音自适应方法和基于最大似然线性回归的方法。 相似文献
4.
基于码本的说话人自适应方法 总被引:1,自引:0,他引:1
本文提出了一种基于码本的说话人自适应方法.它可以将变换方法和Bayes估计法这两大类说话人自适应方法的优点有机的结合起来,既能实现快速的说话人自适应,还具有良好的一致渐进性.自适应过程可分为两个阶段:在第一阶段,用由大量参考说话人的语音码本构成的线性组合来逼近用户的语音码本.此时只需要很少的自适应训练数据就可以用基于Rosen梯度投影法的优化算法计算出线性组合中各码本的最佳权值.在第二阶段,码本的最佳线性组合被用作用户码本的先验估计值.随着更多自适应训练数据的获得,系统对用户码本进一步进行Bayes估计,从而可以实现累进的自适应.作者将该方法应用于说话人无关的连续汉语语音识别系统.一系列的对比实验表明该自适应方法很有前途. 相似文献
5.
针对广播电视新闻节目中的音频信息的特点,利用说话人检测技术,提出了目标语音的检测和定位算法,可以快速挖掘和定位特定发言人或主持人的相关信息,效果良好。 相似文献
6.
7.
In this paper, we discuss the power conservative indexing techniques for managing multi-attribute data broadcast on wireless channels. These indexing techniques, namely, index tree, signature and hybrid, aim at improving the battery power consumption of mobile clients. By taking into account the broadcast management factors such as clustering and scheduling, these three indexing schemes may significantly reduce tune-in time while maintaining a reasonable access time. Cost models for single and multi-attribute query processing are developed. Our performance evaluation shows that the signature and hybrid methods are superior to the index tree method. 相似文献
8.
Benoît Maison Chalapathy Neti Andrew Senior 《The Journal of VLSI Signal Processing》2001,29(1-2):71-79
Audio-based speaker identification degrades severely when there is a mismatch between training and test conditions due either to channel or to noise. In this paper, we explore various techniques to combine video based speaker identification with audio-based speaker identification to improve the performance under mismatched conditions. Specifically, we explore techniques to optimally determine the relative weights of the independent decisions based on audio and video to achieve the best combination. Experiments on video broadcast news data show that significant improvements can be achieved by the fusion in acoustically degraded conditions. 相似文献
9.
This paper concerns robust and reliable speaker model training for text‐independent speaker verification. The baseline speaker modeling approach is the Gaussian mixture model (GMM). In text‐independent speaker verification, the amount of speech data may be different for speakers. However, we still wish the modeling approach to perform equally well for all speakers. Besides, the modeling technique must be least vulnerable against unseen data. A traditional approach for GMM training is expectation maximization (EM) method, which is known for its overfitting problem and its weakness in handling insufficient training data. To tackle these problems, variational approximation is proposed. Variational approaches are known to be robust against overtraining and data insufficiency. We evaluated the proposed approach on two different databases, namely KING and TFarsdat. The experiments show that the proposed approach improves the performance on TFarsdat and KING databases by 0.56% and 4.81%, respectively. Also, the experiments show that the variationally optimized GMM is more robust against noise and the verification error rate in noisy environments for TFarsdat dataset decreases by 1.52%. 相似文献
10.
The authors propose a channel compensation method for the hidden Markov model (HMM) parameters in automatic speech recognition. The proposed approach is to adapt the existing reference models to a new channel environment by using a small amount of adaptation data. The concept of HMM parameter adaptation by incorporating the corresponding phone-dependent channel compensation (PDCC) vectors is applied to improve the performance of speech recognition. Two extended PDCC techniques are presented. One is based on the refinement of PDCC using vector quantisation. The other is based on the interpolation of compensation vectors. Both techniques are evaluated on the experiments on telephone speech recognition and speaker adaptation. The experimental results show that the performance can be significantly improved 相似文献
11.
为了解决与文中无关的话者确认,大量训练样本数据所导致的建立支持向量机SVM(SupportVectorMachine)话者模型困难,文中提出了一种基于基音分类特征映射和支持向量机的话者确认系统,首先根据基音周期将语音倒谱参数在特征空间上分类,再利用GMM-UBM结构进行特征映射,获得每个特征子空间中的话者特征参数并建立SVM话者模型。基音分类特征映射不仅使得样本数据极大地压缩,而且让子空间中SVM分类界面具有更好的区分性,因此,对各分类子系统评分融合之后的总系统具有更好话者确认性能。在NIST’06数据库上的实验证明了该方法的有效性。 相似文献
12.
快速说话人自适应算法在非特定人连续语音识别的应用中有重要意义.现在流行的自适应算法多数只考虑均值的自适应.本文提出的自适应算法可以快速的对协方差矩阵进行自适应.该算法是用高斯相似度度量协方差矩阵间的距离,并由此测度建立了反映协方差矩阵结构关系的二叉决策树.树的每个中间节点包含一个类质心.在决策树基础上,训练多个与特定人模型相关的类质心.自适应时,通过对这些类质心进行线性插值得到自适应的协方差矩阵.实验结果表明,该方法能够在仅有一句自适应数据的情况下,使系统误识率由29.49%下降到27.55%. 相似文献
13.
循环相关匹配滤波器设计 总被引:10,自引:0,他引:10
在谱相关分析的基础上,讨论了对循环平稳信号进行最佳滤波的问题,推导得到了基于最大输出信噪比准则的循环相关匹配滤波器的解析表式.然而,由于该滤波器性能与所选取的循环频率是相关的,单循环频率循环相关匹配滤波器存在固有的缺陷-信号能量利用不充分.为此,研究了多循环频率循环相关匹配滤波器组的设计方法,在最大输出信噪比准则约束下确定了滤波器组的优化结构.仿真实验比较了谱相关分析方法和循环相关匹配滤波方法,对调幅信号和BPSK信号的仿真实验结果证实了文章理论分析得到的结果. 相似文献
14.
15.
16.
17.
Level-set is a widely used technique in segmentation-based tracking due to its flexibility in handling 2D topological changes and computational efficiency. Most existing level-set models aim at grouping pixels that have similar features into a region, without consideration of the spatial relationship of these pixels. In this paper, we present a novel level-set tracking method that incorporates spatial information to improve the robustness and accuracy of tracking non-rigid objects. Both tracking and segmentation are performed in a unified probabilistic framework, with additional spatial constraints from a part-based model—the Hough Forests. In the stage of tracking, the rigid motion of the target object is estimated by rigid registration in both the color space and the Hough voting space. Then in the stage of segmentation, some support points are obtained from back-projection, and guide the level-set evolution to capture the shape deformation. We conduct quantitative evaluation on two recently proposed public benchmarks: a non-rigid object tracking dataset and the CVPR2013 online tracking benchmark, involving 61 sequences in total. The experimental results demonstrate that our tracking method performs comparably to the state-of-the-arts in the CVPR2013 benchmark, while shows significantly improved performance in tracking non-rigid objects. 相似文献
18.
本文给出了一个高性能汉语数码串非特定人连续语音识别系统,其声学模型基于Mel倒谱系数和连续HMM,识别时采用多候选帧同步搜索算法,并采用了MCE算法进行训练以提高系统的区分能力,实验证明该系统的识别率为94.8%(不定长数字串)和96.8%(定长数字串).为增强系统的实用性,本文还研究了基于MAP算法的说话人自适应算法和基于置信度的拒识算法.在进行自适应后,误识率可相对下降40%以上,在拒绝掉5%的正确语音时,系统识别率可以上升到96.9%(不定长数字串)和98.7%(定长数字串). 相似文献
19.
This paper explores new techniques that are based on a hidden‐layer linear transformation for fast speaker adaptation used in deep neural networks (DNNs). Conventional methods using affine transformations are ineffective because they require a relatively large number of parameters to perform. Meanwhile, methods that employ singular‐value decomposition (SVD) are utilized because they are effective at reducing adaptive parameters. However, a matrix decomposition is computationally expensive when using online services. We propose the use of an extended diagonal linear transformation method to minimize adaptation parameters without SVD to increase the performance level for tasks that require smaller degrees of adaptation. In Korean large vocabulary continuous speech recognition (LVCSR) tasks, the proposed method shows significant improvements with error‐reduction rates of 8.4% and 17.1% in five and 50 conversational sentence adaptations, respectively. Compared with the adaptation methods using SVD, there is an increased recognition performance with fewer parameters. 相似文献
20.
识别正确率和抗噪性能固然是说话人识别的研究重点,但识别响应速度也是决定系统实用化的关键所在.本文成功地提出了基于说话人分类技术的分级说话人辨识方法,极大地提高了系统运行速度,随着注册说话人数的增多,较之传统的说话人辨识方法,其优势更加明显.同时在说话人确认中,该方法的使用,进一步提高了确认的正确率,有效地降低了错误接受和错误拒绝率.本文提出的可信度打分方法,也一定程度上改进了系统的性能.实验表明:基于说话人分类技术的说话人辨识方法使系统的运行速度平均提高了3.5倍,对说话人确认等误识率和最小误识率平均下降了53.75%. 相似文献