共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
《Computer Speech and Language》2000,14(4):283-332
This paper presents an attempt at using the syntactic structure in natural language for improved language models for speech recognition. The structured language model merges techniques in automatic parsing and language modeling using an original probabilistic parameterization of a shift-reduce parser. A maximum likelihood re-estimation procedure belonging to the class of expectation-maximization algorithms is employed for training the model. Experiments on the Wall Street Journal and Switchboard corpora show improvement in both perplexity and word error rate—word lattice rescoring—over the standard 3-gram language model. 相似文献
3.
4.
《Computer Speech and Language》2007,21(3):492-518
This paper describes the use of a neural network language model for large vocabulary continuous speech recognition. The underlying idea of this approach is to attack the data sparseness problem by performing the language model probability estimation in a continuous space. Highly efficient learning algorithms are described that enable the use of training corpora of several hundred million words. It is also shown that this approach can be incorporated into a large vocabulary continuous speech recognizer using a lattice rescoring framework at a very low additional processing time. The neural network language model was thoroughly evaluated in a state-of-the-art large vocabulary continuous speech recognizer for several international benchmark tasks, in particular the Nist evaluations on broadcast news and conversational speech recognition. The new approach is compared to four-gram back-off language models trained with modified Kneser–Ney smoothing which has often been reported to be the best known smoothing method. Usually the neural network language model is interpolated with the back-off language model. In that way, consistent word error rate reductions for all considered tasks and languages were achieved, ranging from 0.4% to almost 1% absolute. 相似文献
5.
6.
Dia AbuZeina Wasfi Al-Khatib Moustafa Elshafei Husni Al-Muhtaseb 《International Journal of Speech Technology》2012,15(2):65-75
Pronunciation variation is a major obstacle in improving the performance of Arabic automatic continuous speech recognition
systems. This phenomenon alters the pronunciation spelling of words beyond their listed forms in the pronunciation dictionary,
leading to a number of out of vocabulary word forms. This paper presents a direct data-driven approach to model within-word
pronunciation variations, in which the pronunciation variants are distilled from the training speech corpus. The proposed
method consists of performing phoneme recognition, followed by a sequence alignment between the observation phonemes generated
by the phoneme recognizer and the reference phonemes obtained from the pronunciation dictionary. The unique collected variants
are then added to dictionary as well as to the language model. We started with a Baseline Arabic speech recognition system
based on Sphinx3 engine. The Baseline system is based on a 5.4 hours speech corpus of modern standard Arabic broadcast news,
with a pronunciation dictionary of 14,234 canonical pronunciations. The Baseline system achieves a word error rate of 13.39%.
Our results show that while the expanded dictionary alone did not add appreciable improvements, the word error rate is significantly
reduced by 2.22% when the variants are represented within the language model. 相似文献
7.
8.
9.
《IEEE transactions on audio, speech, and language processing》2009,17(5):863-873
10.
11.
12.
>维吾尔语广播新闻敏感词检索系统的研究 总被引:1,自引:0,他引:1
维吾尔语广播新闻敏感词检索系统是以HMM为基础。在MATLAB平台上设计实现的。该系统的特点包括 1.由于维吾尔语敏感词数量不多,该系统语音语料库很小。2.由于广播新闻中的发音较为标准规范,在识别中避免了说话人发音上的不规范,这有利于语音识别系统性能的提高。3.由于选择词素为识别基元,易于识别基元端点检测。 相似文献
13.
14.
15.
Jen-Tzung Chien 《IEEE transactions on audio, speech, and language processing》2006,14(5):1719-1728
Statistical n-gram language modeling is popular for speech recognition and many other applications. The conventional n-gram suffers from the insufficiency of modeling long-distance language dependencies. This paper presents a novel approach focusing on mining long distance word associations and incorporating these features into language models based on linear interpolation and maximum entropy (ME) principles. We highlight the discovery of the associations of multiple distant words from training corpus. A mining algorithm is exploited to recursively merge the frequent word subsets and efficiently construct the set of association patterns. By combining the features of association patterns into n-gram models, the association pattern n-grams are estimated with a special realization to trigger pair n-gram where only the associations of two distant words are considered. In the experiments on Chinese language modeling, we find that the incorporation of association patterns significantly reduces the perplexities of n-gram models. The incorporation using ME outperforms that using linear interpolation. Association pattern n-gram is superior to trigger pair n-gram. The perplexities are further reduced using more association steps. Further, the proposed association pattern n-grams are not only able to elevate document classification accuracies but also improve speech recognition rates. 相似文献
16.
人工智能时代各项技术不断革新,落地场景不断丰富.其中语音转写、机器写作、AI主播和短视频智能生产融合了语言科技的多项技术,如语音识别、语音合成、语言生成、语义理解、机器翻译等,其相关产品和应用进入新闻生产中采编、写稿、播报等各个环节,使新闻生产模式向智能化方向发展. 相似文献
17.
18.
19.
Duta N. Schwartz R. Makhoul J. 《IEEE transactions on audio, speech, and language processing》2006,14(5):1745-1753
This paper aims to quantify the main error types the 2004 BBN speech recognition system made in the broadcast news (BN) and conversational telephone speech (CTS) DARPA EARS evaluations. We show that many of the remaining errors occur in clusters rather than isolated, have specific causes, and differ to some extent between the BN and CTS domains. The correctly recognized words are also clustered and are highly correlated with regions where the system produces a single hypothesized choice per word. A statistical analysis of some well-known error causes (out-of-vocabulary words, word fragments, hesitations, and unlikely language constructs) was performed in order to assess their contribution to the overall word error rate (WER). We conclude with a discussion of the lower bound on the WER introduced by the human annotator disagreement. 相似文献