首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
This paper presents an attempt at using the syntactic structure in natural language for improved language models for speech recognition. The structured language model merges techniques in automatic parsing and language modeling using an original probabilistic parameterization of a shift-reduce parser. A maximum likelihood re-estimation procedure belonging to the class of expectation-maximization algorithms is employed for training the model. Experiments on the Wall Street Journal and Switchboard corpora show improvement in both perplexity and word error rate—word lattice rescoring—over the standard 3-gram language model.  相似文献   

3.
4.
This paper describes the use of a neural network language model for large vocabulary continuous speech recognition. The underlying idea of this approach is to attack the data sparseness problem by performing the language model probability estimation in a continuous space. Highly efficient learning algorithms are described that enable the use of training corpora of several hundred million words. It is also shown that this approach can be incorporated into a large vocabulary continuous speech recognizer using a lattice rescoring framework at a very low additional processing time. The neural network language model was thoroughly evaluated in a state-of-the-art large vocabulary continuous speech recognizer for several international benchmark tasks, in particular the Nist evaluations on broadcast news and conversational speech recognition. The new approach is compared to four-gram back-off language models trained with modified Kneser–Ney smoothing which has often been reported to be the best known smoothing method. Usually the neural network language model is interpolated with the back-off language model. In that way, consistent word error rate reductions for all considered tasks and languages were achieved, ranging from 0.4% to almost 1% absolute.  相似文献   

5.
6.
Pronunciation variation is a major obstacle in improving the performance of Arabic automatic continuous speech recognition systems. This phenomenon alters the pronunciation spelling of words beyond their listed forms in the pronunciation dictionary, leading to a number of out of vocabulary word forms. This paper presents a direct data-driven approach to model within-word pronunciation variations, in which the pronunciation variants are distilled from the training speech corpus. The proposed method consists of performing phoneme recognition, followed by a sequence alignment between the observation phonemes generated by the phoneme recognizer and the reference phonemes obtained from the pronunciation dictionary. The unique collected variants are then added to dictionary as well as to the language model. We started with a Baseline Arabic speech recognition system based on Sphinx3 engine. The Baseline system is based on a 5.4 hours speech corpus of modern standard Arabic broadcast news, with a pronunciation dictionary of 14,234 canonical pronunciations. The Baseline system achieves a word error rate of 13.39%. Our results show that while the expanded dictionary alone did not add appreciable improvements, the word error rate is significantly reduced by 2.22% when the variants are represented within the language model.  相似文献   

7.
张红  黄泰翼  徐波 《自动化学报》2001,27(3):338-345
广播电视新闻自动记录系统是近两年国际上出现的大词汇量连续语音识别系统研究的新热点,是语音识别技术进一步走向实用化的重要过渡形式.文中介绍了目前国际上广播电视新闻自动记录系统出现的背景和发展历史,从系统性能与理论研究两方面介绍了这方面的研究现状并加以分析,最后对开发我国自己的广播电视新闻自动记录系统提出了具体的发展方案.  相似文献   

8.
9.
This paper investigates a data-driven word decompounding algorithm for use in automatic speech recognition. An existing algorithm, called “Morfessor,” has been enhanced in order to address the problem of increased phonetic confusability arising from word decompounding by incorporating phonetic properties and some constraints on recognition units derived from forced alignments experiments. Speech recognition experiments have been carried out on a broadcast news task for the Amharic language to validate the approach. The out of vocabulary (OOV) word rates were reduced by 35% to 50% and a small reduction in word error rate (WER) has been achieved. The algorithm is relatively language independent and requires minimal adaptation to be applied to other languages.   相似文献   

10.
11.
12.
>维吾尔语广播新闻敏感词检索系统的研究   总被引:1,自引:0,他引:1  
维吾尔语广播新闻敏感词检索系统是以HMM为基础。在MATLAB平台上设计实现的。该系统的特点包括 1.由于维吾尔语敏感词数量不多,该系统语音语料库很小。2.由于广播新闻中的发音较为标准规范,在识别中避免了说话人发音上的不规范,这有利于语音识别系统性能的提高。3.由于选择词素为识别基元,易于识别基元端点检测。  相似文献   

13.
14.
15.
Statistical n-gram language modeling is popular for speech recognition and many other applications. The conventional n-gram suffers from the insufficiency of modeling long-distance language dependencies. This paper presents a novel approach focusing on mining long distance word associations and incorporating these features into language models based on linear interpolation and maximum entropy (ME) principles. We highlight the discovery of the associations of multiple distant words from training corpus. A mining algorithm is exploited to recursively merge the frequent word subsets and efficiently construct the set of association patterns. By combining the features of association patterns into n-gram models, the association pattern n-grams are estimated with a special realization to trigger pair n-gram where only the associations of two distant words are considered. In the experiments on Chinese language modeling, we find that the incorporation of association patterns significantly reduces the perplexities of n-gram models. The incorporation using ME outperforms that using linear interpolation. Association pattern n-gram is superior to trigger pair n-gram. The perplexities are further reduced using more association steps. Further, the proposed association pattern n-grams are not only able to elevate document classification accuracies but also improve speech recognition rates.  相似文献   

16.
人工智能时代各项技术不断革新,落地场景不断丰富.其中语音转写、机器写作、AI主播和短视频智能生产融合了语言科技的多项技术,如语音识别、语音合成、语言生成、语义理解、机器翻译等,其相关产品和应用进入新闻生产中采编、写稿、播报等各个环节,使新闻生产模式向智能化方向发展.  相似文献   

17.
18.
19.
This paper aims to quantify the main error types the 2004 BBN speech recognition system made in the broadcast news (BN) and conversational telephone speech (CTS) DARPA EARS evaluations. We show that many of the remaining errors occur in clusters rather than isolated, have specific causes, and differ to some extent between the BN and CTS domains. The correctly recognized words are also clustered and are highly correlated with regions where the system produces a single hypothesized choice per word. A statistical analysis of some well-known error causes (out-of-vocabulary words, word fragments, hesitations, and unlikely language constructs) was performed in order to assess their contribution to the overall word error rate (WER). We conclude with a discussion of the lower bound on the WER introduced by the human annotator disagreement.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号