首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper addresses the problem of recognizing a vocabulary of over 50,000 city names in a telephone access spoken dialogue system. We adopt a two-stage framework in which only major cities are represented in the first stage lexicon. We rely on an unknown word model encoded as a phone loop to detect OOV city names (referred to as ‘rare city’ names). We use SpeM, a tool that can extract words and word-initial cohorts from phone graphs from a large fallback lexicon, to provide an N-best list of promising city name hypotheses on the basis of the phone graph corresponding to the OOV. This N-best list is then inserted into the second stage lexicon for a subsequent recognition pass.Experiments were conducted on a set of spontaneous telephone-quality utterances; each containing one rare city name. It appeared that SpeM was able to include nearly 75% of the correct city names in an N-best hypothesis list of 3000 city names. With the names found by SpeM to extend the lexicon of the second stage recognizer, a word accuracy of 77.3% could be obtained. The best one-stage system yielded a word accuracy of 72.6%. The absolute number of correctly recognized rare city names almost doubled, from 62 for the best one-stage system to 102 for the best two-stage system. However, even the best two-stage system recognized only about one-third of the rare city names retrieved by SpeM. The paper discusses ways for improving the overall performance in the context of an application.  相似文献   

2.
Recently, minimum perfect hashing (MPH)-based language model (LM) lookup methods have been proposed for fast access of N-gram LM scores in lexical-tree based LVCSR (large vocabulary continuous speech recognition) decoding. Methods of node-based LM cache and LM context pre-computing (LMCP) have also been proposed to combine with MPH for further reduction of LM lookup time. Although these methods are effective, LM lookup still takes a large share of overall decoding time when trigram LM lookahead (LMLA) is used for lower word error rate than unigram or bigram LMLAs. Besides computation time, memory cost is also an important performance aspect of decoding systems. Most speedup methods for LM lookup obtain higher speed at the cost of increased memory demand, which makes system performance unpredictable when running on computers with smaller memory capacities. In this paper, an order-preserving LM context pre-computing (OPCP) method is proposed to achieve both fast speed and small memory cost in LM lookup. By reducing hashing operations through order-preserving access of LM scores, OPCP cuts down LM lookup time effectively. In the meantime, OPCP significantly reduces memory cost because of reduced size of hashing keys and the need for only last word index of each N-gram in LM storage. Experimental results are reported on two LVCSR tasks (Wall Street Journal 20K and Switchboard 33K) with three sizes of trigram LMs (small, medium, large). In comparison with above-mentioned existing methods, OPCP reduced LM lookup time from about 30–80% of total decoding time to about 8–14%, without any increase of word error rate. Except for the small LM, the total memory cost of OPCP for LM lookup and storage was about the same or less than the original N-gram LM storage, much less than the compared methods. The time and memory savings in LM lookup by using OPCP became more pronounced with the increase of LM size.  相似文献   

3.
近年来大词汇量连续语音识别技术得到了迅速的发展,国内外研究机构加大了对汉语和英语语音识别技术的研究,然而,维吾尔语语音识别技术的研究工作最近才起步。建立了面向大词汇量的维吾尔语语音语料库,研究了维吾尔语声学模型和语言模型建模技术、解码技术,进行了面向大词汇量的维吾尔语连续语音识别实验。对维吾尔语大词汇量连续语音识别技术进一步发展中存在的问题进行了讨论。  相似文献   

4.
We describe the automatic determination of a large and complicated acoustic model for speech recognition by using variational Bayesian estimation and clustering (VBEC) for speech recognition. We propose an efficient method for decision tree clustering based on a Gaussian mixture model (GMM) and an efficient model search algorithm for finding an appropriate acoustic model topology within the VBEC framework. GMM-based decision tree clustering for triphone HMM states features a novel approach designed to reduce the overly large number of computations to a practical level by utilizing the statistics of monophone hidden Markov model states. The model search algorithm also reduces the search space by utilizing the characteristics of the acoustic model. The experimental results confirmed that VBEC automatically and rapidly yielded an optimum model topology with the highest performance.  相似文献   

5.
针对词袋模型易受到无关的背景视觉噪音干扰的问题,提出了一种结合显著性检测与词袋模型的目标识别方法。首先,联合基于图论的视觉显著性算法与一种全分辨率视觉显著性算法,自适应地从原始图像中获取感兴趣区域。两种视觉显著性算法的联合可以提高获取的前景目标的完整性。然后,使用尺度不变特征变换描述子从感兴趣区域中提取特征向量,并通过密度峰值聚类算法对特征向量进行聚类,生成视觉字典直方图。最后,利用支持向量机对目标进行识别。在PASCAL VOC 2007和MSRC-21数据库上的实验结果表明,该方法相比同类方法可以有效地提高目标识别性能。  相似文献   

6.
语音识别中DTW改进算法的研究   总被引:1,自引:0,他引:1  
动态时间规整DTW是语音识别中的一种经典算法。对此算法提出了一种改进的端点检测算法,特征提取采用了Mel频率倒谱系数MFCC,并采用计算量相对较小的改进的动态时间规整算法实现语音参数模板匹配,能够实现孤立词、特定人、小词汇量的语音识别,并用Matlab进行了算法仿真。试验结果表明,改进后的算法能够有效地提高系统对语音的识别率。  相似文献   

7.
为了更准确地估计状态聚类前有调三音子的模型参数,从而提高聚类后捆绑状态的精度及系统的识别性能,针对汉语连续语音识别中,有些有调三音子的训练样本数非常少,而其对应的无调三音子的训练样本数相对较多的情况,提出用其对应的无调三音子的模型参数进行初始化,并用最大后验概率准则训练模型。汉语大词汇量连续语音识别实验表明,该方法可以提高训练语料中稀疏三音子聚类前的模型精度,从而提高系统的识别性能。  相似文献   

8.
综合了语音识别中常用的高斯混合模型和人工神经网络框架优点的Tandem特征提取方法应用于维吾尔语声学模型训练中,经过一系列后续处理,将原始的MFCC特征转化为Tandem特征,以此作为基于隐马尔可夫统计模型的语音识别系统的输入,并使用最小音素错误区分性训练准则训练声学模型,进而完成在测试集上的识别实验。实验结果显示,Tandem区分性训练方法使识别系统的单词错误率比原先的基于最大似然估计准则的系统相对减少13%。  相似文献   

9.
South African English is currently considered an under-resourced variety of English. Extensive speech resources are, however, available for North American (US) English. In this paper we consider the use of these US resources in the development of a South African large vocabulary speech recognition system. Specifically we consider two research questions. Firstly, we determine the performance penalties that are incurred when using US instead of South African language models, pronunciation dictionaries and acoustic models. Secondly, we determine whether US acoustic and language modelling data can be used in addition to the much more limited South African resources to improve speech recognition performance. In the first case we find that using a US pronunciation dictionary or a US language model in a South African system results in fairly small penalties. However, a substantial penalty is incurred when using a US acoustic model. In the second investigation we find that small but consistent improvements over a baseline South African system can be obtained by the additional use of US acoustic data. Larger improvements are obtained when complementing the South African language modelling data with US and/or UK material. We conclude that, when developing resources for an under-resourced variety of English, the compilation of acoustic data should be prioritised, language modelling data has a weaker effect on performance and the pronunciation dictionary the smallest.  相似文献   

10.
11.
The use of the computing with words paradigm for the automatic text documents categorization problem is discussed. This specific problem of information retrieval (IR) becomes more and more important, notably in view of a fast proliferation of textual information available on the Internet. The main issues that have to be addressed here are: document representation and classification. The use of fuzzy logic for both problems has already been quite deeply studied though for the latter, i.e., classification, generally not in an IR context. Our approach is based mainly on the classical calculus of linguistically quantified propositions proposed by Zadeh. Moreover, we employ results related to fuzzy (linguistic) queries in IR, notably various interpretations of the weights of query terms. Some preliminary results on widely adopted text corpora are presented.  相似文献   

12.
Command and control (C&C) speech recognition allows users to interact with a system by speaking commands or asking questions restricted to a fixed grammar containing pre-defined phrases. Whereas C&C interaction has been commonplace in telephony and accessibility systems for many years, only recently have mobile devices had the memory and processing capacity to support client-side speech recognition. Given the personal nature of mobile devices, statistical models that can predict commands based in part on past user behavior hold promise for improving C&C recognition accuracy. For example, if a user calls a spouse at the end of every workday, the language model could be adapted to weight the spouse more than other contacts during that time. In this paper, we describe and assess statistical models learned from a large population of users for predicting the next user command of a commercial C&C application. We explain how these models were used for language modeling, and evaluate their performance in terms of task completion. The best performing model achieved a 26% relative reduction in error rate compared to the base system. Finally, we investigate the effects of personalization on performance at different learning rates via online updating of model parameters based on individual user data. Personalization significantly increased relative reduction in error rate by an additional 5%.  相似文献   

13.
An improved approach for constrained robust model predictive control   总被引:1,自引:0,他引:1  
In this paper, we present a new technique to address constrained robust model predictive control. The main advantage of this new approach with respect to other well-known techniques is the reduced conservativeness. Specifically, the technique described in this paper can be applied to polytopic uncertain systems and is based on the use of several Lyapunov functions each one corresponding to a different vertex of the uncertainty's polytope.  相似文献   

14.
针对传统边缘检测算法不能准确定位图像的边缘和检测出的图像边缘连续性较差的不足,在基于改进 的弹簧质点模型基础上提出一种新的、简单和有效的图像边缘检测算法。在该算法中,将被检测的像素点作为中心点,其周围八个方向的像素点对该中心点的弹簧拉力组成一个平面内的弹簧质点模型。根据胡克定律,可以计算出作用在中心点上的合力大小,如果作用在中心点上合力大小超过设定的阈值, 则认为该中心点为边缘上的点。实验结果表明,提出的算法能够有效地克服传统算法的不足,而且具有一定的噪声滤波效果。  相似文献   

15.
16.
Model trees are a particular case of decision trees employed to solve regression problems. They have the advantage of presenting an interpretable output, helping the end-user to get more confidence in the prediction and providing the basis for the end-user to have new insight about the data, confirming or rejecting hypotheses previously formed. Moreover, model trees present an acceptable level of predictive performance in comparison to most techniques used for solving regression problems. Since generating the optimal model tree is an NP-Complete problem, traditional model tree induction algorithms make use of a greedy top-down divide-and-conquer strategy, which may not converge to the global optimal solution. In this paper, we propose a novel algorithm based on the use of the evolutionary algorithms paradigm as an alternate heuristic to generate model trees in order to improve the convergence to globally near-optimal solutions. We call our new approach evolutionary model tree induction (E-Motion). We test its predictive performance using public UCI data sets, and we compare the results to traditional greedy regression/model trees induction algorithms, as well as to other evolutionary approaches. Results show that our method presents a good trade-off between predictive performance and model comprehensibility, which may be crucial in many machine learning applications.  相似文献   

17.
Speech prosody contains an important structural information for performing speech analysis and for extracting syntactic nuclei from spoken sentences. This paper describes a procedure based on a multichannel system of epoch filters for recognizing the pulses of glottal chord vibrations by an analysis of the speech waveform. Recognition is performed by a stochastic finite state automaton automatically inferred after experiments.  相似文献   

18.
Information graphics (bar charts, line graphs, grouped bar charts, etc) often appear in popular media such as newspapers and magazines. In most cases, the information graphic is intended to convey a high‐level message. This message plays a role in facilitating the discourse purpose of the document but is seldom repeated in the document's text, headlines, or captions. We present a methodology and an implemented system for recognizing the intended message of a grouped bar chart. The recognition system relies on the following components: (1) a linguistic classifier that processes text in the graphic and predicts the most linguistically salient entity from those that are mentioned in text, (2) a cognitive model that estimates the relative perceptual effort required for an individual to recognize some high‐level message in a graph, and (3) a Bayesian network that captures the probabilistic relationship between the high‐level intended message of a graphic and its communicative signals. This research contributes to three applications: accessibility of information graphics for sight‐impaired individuals, retrieval of information graphics from a digital library, and summarization of multimodal documents.  相似文献   

19.
This paper describes a novel knowledge-based methodology and toolset for helping business process designers and participants better manage exceptions (unexpected deviations from an ideal sequence of events caused by design errors, resource failures, requirement changes, etc.) that can occur during the enactment of a process. This approach is based on an on-line repository exploiting a generic and reusable body of knowledge, which describes what kinds of exceptions can occur in collaborative work processes, how these exceptions can be detected, and how they can be resolved. This work builds upon previous efforts from the MIT Process Handbook project and from research on conflict management in collaborative design. This revised version was published online in July 2006 with corrections to the Cover Date.  相似文献   

20.
一种改进的混合高斯模型背景估计方法   总被引:1,自引:0,他引:1  
蒋明  潘姣丽 《微型机与应用》2011,30(11):31-33,36
传统混合高斯模型一般为每个像素分配固定的高斯分布个数,从而造成背景形成速度的减慢和系统资源的浪费;同时也存在着高斯模型背景建模中的缓慢或滞留运动物体造成目标误判现象的问题(即空洞问题)。为此,提出了一种有效的两阶段视频图像处理方法。该方法在第一阶段根据像素点的优先级大小自动地调节高斯分布的数目,在第二阶段首先对像素点进行所属区域的划分,进而对目标区域和非目标区域采取不同的更新手段。实验表明,采用两阶段视频图像处理方法明显地改善了背景建模的速度,有效解决了提取目标出现的空洞问题。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号