首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 796 毫秒
1.
本文在对语音识别中基于自适应回归树的极大似然线性变换(MLLR)模型自适应算法深刻分析的基础上,提出了一种基于目标驱动的多层MLLR自适应(TMLLR)算法。这种算法基于目标驱动的原则,引入反馈机制,根据目标函数似然概率的增加来动态决定MLLR变换的变换类,大大提高了系统的识别率。并且由于这种算法的特殊多层结构,减少了许多中间的冗余计算,算法在具有较高的自适应精度的同时还具有较快的自适应速度。在有监督自适应实验中,经过此算法自适应后的系统识别率比基于自适应回归树的MLLR算法自适应后系统的误识率降低了10% ,自适应速度也比基于自适应回归树的MLLR算法快近一倍。  相似文献   

2.
本文首先将文本信息检索中LSI方法的思想和原理应用于手写数字识别问题,把手写数字图像看作空间向量的表示,通过计算未知数字与各训练集之间相关度排序来达到识别的目的,计算量小且有较低的误识率(5.5%);其次,通过对所有0-9数字的训练样本排列为一个矩阵,并对该矩阵进行奇异值分解,将各训练样本在适当维数的左奇异向量上分别投影,得到了一种低阶表示下的相关度计算方法,该方法在保持原有较低误识率的同时,能极大地压缩原有训练样本数据(压缩掉的数据百分比超过95%);另外,利用了区分不规范样本的思想,获得了更低的误识率(下降到4.5%)。  相似文献   

3.
传统蒙古文和西里尔蒙古文分别是在中国和蒙古国使用的蒙古文,它们的口语基本相同,但是书写形式完全不同。结合传统蒙古文和西里尔蒙古文的构词特点,提出了基于联合序列模型的传统蒙古文和西里尔蒙古文相互转换方法,并做了大量的相互转换实验。实验中,传统蒙古文到西里尔蒙古文转换系统的词误识率和字母误识率分别达到了18.38%和6.75%,西里尔蒙古文到传统蒙古文转换系统的词误识率字母误识率分别达到了18.77%和7.14%,基本达到了实用要求。  相似文献   

4.
张俊  关胜晓 《计算机应用》2015,35(7):2101-2104
针对目前说话人识别系统的效率问题,采用集成算法的策略,提出一种新的说话人识别系统框架。首先,考虑到传统的最大后验概率矢量量化(VQ-MAP)算法中只关注平均矢量而不考虑权重的问题,提出了改进的VQ-MAP算法,使用加权平均向量来代替平均向量;然后,由于支持向量机(SVM)算法相对耗时,故采用最小二乘支持向量机(LS-SVM)替代SVM算法;最后,在说话人识别系统中,利用改进的VQ-MAP算法所得参数集作为LS-SVM的训练样本。实验结果表明,基于改进的VQ-MAP和LS-SVM的集成算法,与传统的SVM算法相比,在均使用径向基函数(RBF)核函数时,对40人样本数据建模时间上减少接近40%;在阈值为1,测试语音时长为4 s时,与传统的VQ-MAP和SVM算法相比,误识率降低了1.1%,误拒率降低了2.9%,识别率提高了3.9%;在阈值为1,测试语音时长为4 s时,与传统的VQ-MAP和LS-SVM算法相比,误识率降低了3.6%,误拒率降低了2.7%,识别率提高了4.4%。结果表明,集成算法能够有效提高算法识别率,明显减少运算时间,同时降低误识率和误拒率。  相似文献   

5.
针对蒙古文字母到音素的转换(grapheme to phoneme conversion, G2P)问题, 提出了基于规则的蒙古文G2P转换方法和基于联合序列模型的蒙古文G2P转换方法。实验结果表明, 利用联合序列模型的蒙古文G2P转换方法要明显好于基于规则的蒙古文G2P转换方法。并且建立的基于联合序列模型的蒙古文G2P转换系统的词误识率为1632%, 音素误识率仅为337%, 能达到实用要求。  相似文献   

6.
本文介绍了采用综合技术集成的方法,解决印刷汉字识别系统误识率太高的重大难题,并通过集成系统的实践,证实了其技术集成优势,由于识别方法的互补效应,不仅提高了识别的正确率,而且使误识率得到大幅度的降低,采用该集成办法研制的系统,经过100万字的实际文章的测试,系统的识别率超过98%,误识率小于0.2%,尤其是汉字的误识率小于0.1%。  相似文献   

7.
语音识别中的一种说话人聚类算法   总被引:1,自引:1,他引:1  
本文介绍了稳健语音识别中的一种说话人聚类算法,包括它在语音识别中的作用和具体的用法,聚类中常用的特征、距离测度,聚类的具体实现步骤等。我们从两个方面对该算法的性能进行了测试,一是直接计算句子聚类的正确率,二是对说话人自适应效果的改进的作用,即比较使用此算法后系统性能的改进进行评价。实验表明:在使用GLR 距离作为距离测度的时候,该算法对句子的聚类正确率达85169 %;在识别实验中,该聚类算法的使用,使得用于说话人自适应的数据更加充分,提高了自适应的效果,系统的误识率已经接近利用已知说话人信息进行自适应时的误识率。  相似文献   

8.
针对实际问题中训练数据不足的特点,在对说话人建模时采用的是高斯混合模型—通用背景模型GMM-UBM,针对MCE训练算法中计算量大的显著问题,对其进行改进,改进的MCE算法不仅能使计算量减小,而且识别性能更佳。实验结果表明,在高斯混合数与说话人数不同的情况下,改进的MCE比传统MCE算法都要节省训练时间,且随着高斯混合数与说话人数的增长,节省的时间越多。针对采用MAP、MLLR、MAP\MLLR、EigenVoice方法作自适应得到的说话人模型,然后应用MCE算法与改进的MCE算法,改进的MCE算法比传统MCE方法识别率更高。  相似文献   

9.
一种基于模板匹配的手形认证算法   总被引:5,自引:0,他引:5  
身份认证是保证信息与网络安全的一种重要手段,手形认证是身份认证的重要方法之一。传统手形识别方法大致分成特征矢量匹配和点模式匹配两种:前者通过计算手形的长度和宽度等特征矢量来对不同手形进行匹配认证,该方法计算量小,但是误识率偏高;后者通过将手形轮廓图象表示为一系列特征点集,然后对两个手形的特征点集进行匹配认证,误识率较小,但计算量和拒识率相对较大。以上原因导致了两种算法都不能被广泛应用。该文提出了一种基于模板的点匹配算法,可以较好地解决点模式匹配计算量过大的问题,同时也能够提高认证识别率。在认证过程中还采用了方向角及膨胀收缩修正等方法,使得模板的匹配速度和拒识率得到有效的改善,从而大大增强了认证过程的鲁棒性。  相似文献   

10.
基于轮廓线的三维人脸识别的改进算法   总被引:3,自引:0,他引:3       下载免费PDF全文
对已有的基于轮廓线的人脸识别方法进行了改进,在人脸的任意位置利用PCA自动确定人脸纵方向,采用网格配准方法提取对称面和对称轮廓线。通过计算对称轮廓线上的曲率,提取其他3条横向轮廓线。对提取的4条轮廓线进行重采样和归一化,截取轮廓线的有价值部分作为ICP算法的输入,进行人脸识别。试验证明,该算法将人脸识别率从原来的86.5%提高到了94%,降低了误识率。  相似文献   

11.
12.
该文针对维吾尔语说话人之间的发音差异会在一定程度上影响维吾尔语语音识别系统的性能这一情况研究了说话人自适应技术,将目前较为常用的MLLR和MAP以及MLLR和MAP相结合的自适应方法应用于维吾尔语连续语音识别的声学模型训练中,并用这三种方法自适应后的声学模型分别在测试集上进行识别实验。实验结果表明MLLR、MAP以及MAP+MLLR自适应方法使基线识别系统的单词错误识别率分别降低了0.6%、2.34%和2.57%。
  相似文献   

13.
We are interested in the problem of robust understanding from noisy spontaneous speech input. With the advances in automated speech recognition (ASR), there has been increasing interest in spoken language understanding (SLU). A challenge in large vocabulary spoken language understanding is robustness to ASR errors. State of the art spoken language understanding relies on the best ASR hypotheses (ASR 1-best). In this paper, we propose methods for a tighter integration of ASR and SLU using word confusion networks (WCNs). WCNs obtained from ASR word graphs (lattices) provide a compact representation of multiple aligned ASR hypotheses along with word confidence scores, without compromising recognition accuracy. We present our work on exploiting WCNs instead of simply using ASR one-best hypotheses. In this work, we focus on the tasks of named entity detection and extraction and call classification in a spoken dialog system, although the idea is more general and applicable to other spoken language processing tasks. For named entity detection, we have improved the F-measure by using both word lattices and WCNs, 6–10% absolute. The processing of WCNs was 25 times faster than lattices, which is very important for real-life applications. For call classification, we have shown between 5% and 10% relative reduction in error rate using WCNs compared to ASR 1-best output.  相似文献   

14.
One of the key issues for adaptation algorithms is to modify a large number of parameters with only a small amount of adaptation data. Speaker adaptation techniques try to obtain near speaker-dependent (SD) performance with only small amounts of speaker-specific data, and are often based on initial speaker-independent (SI) recognition systems. Some of these speaker adaptation techniques may also be applied to the task of adaptation to a new acoustic environment. In this case an SI recognition system trained in, typically, a clean acoustic environment is adapted to operate in a new, noise-corrupted, acoustic environment. This paper examines the maximum likelihood linear regression (MLLR) adaptation technique. MLLR estimates linear transformations for groups of model parameters to maximize the likelihood of the adaptation data. Previously, MLLR has been applied to the mean parameters in mixture-Gaussian HMM systems. In this paper MLLR is extended to also update the Gaussian variances and re-estimation formulae are derived for these variance transforms. MLLR with variance compensation is evaluated on several large vocabulary recognition tasks. The use of mean and variance MLLR adaptation was found to give an additional 2% to 7% decrease in word error rate over mean-only MLLR adaptation.  相似文献   

15.
Many automatic speech recognition (ASR) systems rely on the sole pronunciation dictionaries and language models to take into account information about language. Implicitly, morphology and syntax are to a certain extent embedded in the language models but the richness of such linguistic knowledge is not exploited. This paper studies the use of morpho-syntactic (MS) information in a post-processing stage of an ASR system, by reordering N-best lists. Each sentence hypothesis is first part-of-speech tagged. A morpho-syntactic score is computed over the tag sequence with a long-span language model and combined to the acoustic and word-level language model scores. This new sentence-level score is finally used to rescore N-best lists by reranking or consensus. Experiments on a French broadcast news task show that morpho-syntactic knowledge improves the word error rate and confidence measures. In particular, it was observed that the errors corrected are not only agreement errors and errors on short grammatical words but also other errors on lexical words where the hypothesized lemma was modified.  相似文献   

16.
In this paper, we focus on information extraction from optical character recognition (OCR) output. Since the content from OCR inherently has many errors, we present robust algorithms for information extraction from OCR lattices instead of merely looking them up in the top-choice (1-best) OCR output. Specifically, we address the challenge of named entity detection in noisy OCR output and show that searching for named entities in the recognition lattice significantly improves detection accuracy over 1-best search. While lattice-based named entity (NE) detection improves NE recall from OCR output, there are two problems with this approach: (1) the number of false alarms can be prohibitive for certain applications and (2) lattice-based search is computationally more expensive than 1-best NE lookup. To mitigate the above challenges, we present techniques for reducing false alarms using confidence measures and for reducing the amount of computation involved in performing the NE search. Furthermore, to demonstrate that our techniques are applicable across multiple domains and languages, we experiment with optical character recognition systems for videotext in English and scanned handwritten text in Arabic.  相似文献   

17.
In this paper we are concerned with the problem of the adaptation of non-native speech in a large-vocabulary speech recognition system for Modern Standard Arabic (MSA). A technique to adapt Hidden Markov Models (HMMs) to foreign accents by using Genetic Algorithms (GAs) in unsupervised mode is presented. The implementation requirements of GAs, such as genetic operators and objective function, have been selected to give more reliability to a global linear transformation matrix. The Minimum Phone Error (MPE) criterion is used as an objective function. The West Point Language Data Consortium (LDC) modern standard Arabic database is used throughout our experiments. Results show that significant decrease of word error rate has been achieved by the evolutionary-based approach compared to conventional Maximum Likelihood Linear Regression (MLLR), Maximum a posteriori (MAP) techniques and to the adaptation combining MLLR and MPE-based training.  相似文献   

18.
We introduce a strategy for modeling speaker variability in speaker adaptation based on maximum likelihood linear regression (MLLR). The approach uses a speaker-clustering procedure that models speaker variability by partitioning a large corpus of speakers in the eigenspace of their MLLR transformations and learning cluster-specific regression class tree structures. We present experiments showing that choosing the appropriate regression class tree structure for speakers leads to a significant reduction in overall word error rates in automatic speech recognition systems. To realize these gains in unsupervised adaptation, we describe an algorithm that produces a linear combination of MLLR transformations from cluster-specific trees using weights estimated by maximizing the likelihood of a speaker’s adaptation data. This algorithm produces small improvements in overall recognition performance across a range of tasks for both English and Mandarin. More significantly, distributional analysis shows that it reduces the number of speakers with performance loss due to adaptation across a range of adaptation data sizes and word error rates.  相似文献   

19.
This paper proposes the use of Maximum A Posteriori Linear Regression (MAPLR) transforms as feature for language recognition. Rather than estimating the transforms using maximum likelihood linear regression (MLLR), MAPLR inserts the priori information of the transforms in the estimation process using maximum a posteriori (MAP) as the estimation criterion to drive the transforms. By multi MAPLR adaptation each language spoken utterance is convert to one discriminative transform supervector consist of one target language transform vector and other non-target transform vectors. SVM classifiers are employed to model the discriminative MAPLR transform supervector. This system can achieve performance comparable to that obtained with state-of-the-art approaches and better than MLLR. Experiment results on 2007 NIST Language Recognition Evaluation (LRE) databases show that relative decline in EER of 4% and on mincost of 9% are obtained after the language recognition system using MAPLR instead of MLLR in 30-s tasks, and further improvement is gained combining with state-of-the-art systems. It leads to gains of 6% on EER and 11% on minDCF comparing with the performance of the only combination of the MMI system and the GMM-SVM system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号