期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李明琴李涓子王作英陆大《计算机学报》2004,27(12):1679-1687

该文提出了一个统计语义分析器，它能够发现中文句子中的语义依存关系．这些语义依存关系可以用于表示句子的意义和结构．语义分析器在1百万词的标有语义依存关系的语料库(语义依存网络语料库，SDN)上训练并测试，文中设计、实现了多个实验以分析语义分析器的性能．实验结果表明，分析器在非限定领域中表现出了较好的性能，分析正确率与中文句法分析器基本相当。相似文献

2.

Combining language models in the input interface of a spoken dialogue system

R. Lpez-Czar Z. Callejas 《Computer Speech and Language》2006,20(4):420-440

This paper presents a new technique to enhance the performance of the input interface of spoken dialogue systems based on a procedure that combines during speech recognition the advantages of using prompt-dependent language models with those of using a language model independent of the prompts generated by the dialogue system. The technique proposes to create a new speech recognizer, termed contextual speech recognizer, that uses a prompt-independent language model to allow recognizing any kind of sentence permitted in the application domain, and at the same time, uses contextual information (in the form of prompt-dependent language models) to take into account that some sentences are more likely to be uttered than others at a particular moment of the dialogue. The experiments show the technique allows enhancing clearly the performance of the input interface of a previously developed dialogue system based exclusively on prompt-dependent language models. But most important, in comparison with a standard speech recognizer that uses just one prompt-independent language model without contextual information, the proposed recognizer allows increasing the word accuracy and sentence understanding rates by 4.09% and 4.19% absolute, respectively. These scores are slightly better than those obtained using linear interpolation of the prompt-independent and prompt-dependent language models used in the experiments. 相似文献

3.

Thai spelling analysis for automatic spelling speech recognition

Chutima Pisarn Thanaruk Theeramunkong 《Information Sciences》2008,178(1):122-136

Spelling speech recognition can be applied for several purposes including enhancement of speech recognition systems and implementation of name retrieval systems. This paper presents a Thai spelling analysis to develop a Thai spelling speech recognizer. The Thai phonetic characteristics, alphabet system and spelling methods have been analyzed. As a training resource, two alternative corpora, a small spelling speech corpus and an existing large continuous speech corpus, are used to train hidden Markov models (HMMs). Then their recognition results are compared to each other. To solve the problem of utterance speed difference between spelling utterances and continuous speech utterances, the adjustment of utterance speed has been taken into account. Two alternative language models, bigram and trigram, are used for investigating performance of spelling speech recognition. Our approach achieves up to 98.0% letter correction rate, 97.9% letter accuracy and 82.8% utterance correction rate when the language model is trained based on trigram and the acoustic model is trained from the small spelling speech corpus with eight Gaussian mixtures. 相似文献

4.

潜在语义分析在连续语音识别中的应用

下载免费PDF全文

欧建林林茜史晓东《计算机工程与应用》2009,45(32):111-113

研究了潜在语义分析（LSA）理论及其在连续语音识别中应用的相关技术,在此基础上利用WSJ0文本语料库上构建LSA模型,并将其与3-gram模型进行插值组合,构建了包含语义信息的统计语言模型;同时为了进一步优化混合模型的性能,提出了基于密度函数初始化质心的k-means聚类算法对LSA模型的向量空间进行聚类。WSJ0语料库上的连续语音识别实验结果表明：LSA+3-gram混合模型能够使识别的词错误率相比较于标准的3-gram下降13.3%。相似文献

5.

Applications of Language Modeling in Speech-To-Speech Translation

Fu-Hua Liu Liang Gu Yuqing Gao Michael Picheny 《International Journal of Speech Technology》2004,7(2-3):221-229

相似文献

6.

A path-stack algorithm for optimizing dynamic regimes in a statistical hidden dynamic model of speech

《Computer Speech and Language》2000,14(2):101-114

In this paper we report our recent research whose goal is to improve the performance of a novel speech recognizer based on an underlying statistical hidden dynamic model of phonetic reduction in the production of conversational speech. We have developed a path-stack search algorithm which efficiently computes the likelihood of any observation utterance while optimizing the dynamic regimes in the speech model. The effectiveness of the algorithm is tested on the speech data in the Switchboard corpus, in which the optimized dynamic regimes computed from the algorithm are compared with those from exhaustive search. We also present speech recognition results on the Switchboard corpus that demonstrate improvements of the recognizer’s performance compared with the use of the dynamic regimes heuristically set from the phone segmentation by a state-of-the-art hidden Markov model (HMM) system. 相似文献

7.

Improving phrase-based statistical machine translation with morphosyntactic transformation

Thai Phuong Nguyen Akira Shimazu 《Machine Translation》2006,20(3):147-166

We present a phrase-based statistical machine translation approach which uses linguistic analysis in the preprocessing phase. The linguistic analysis includes morphological transformation and syntactic transformation. Since the word-order problem is solved using syntactic transformation, there is no reordering in the decoding phase. For morphological transformation, we use hand-crafted transformational rules. For syntactic transformation, we propose a transformational model based on a probabilistic context-free grammar. This model is trained using a bilingual corpus and a broad-coverage parser of the source language. This approach is applicable to language pairs in which the target language is poor in resources. We considered translation from English to Vietnamese and from English to French. Our experiments showed significant BLEU-score improvements in comparison with Pharaoh, a state-of-the-art phrase-based SMT system. 相似文献

8.

Exploring the <Emphasis Type="SmallCaps">sawa</Emphasis> corpus: collection and deployment of a parallel corpus English—Swahili

Guy De Pauw Peter Waiganjo Wagacha Gilles-Maurice de Schryver 《Language Resources and Evaluation》2011,45(3):331-344

Research in machine translation and corpus annotation has greatly benefited from the increasing availability of word-aligned parallel corpora. This paper presents ongoing research on the development and application of the sawa corpus, a two-million-word parallel corpus English—Swahili. We describe the data collection phase and zero in on the difficulties of finding appropriate and easily accessible data for this language pair. In the data annotation phase, the corpus was semi-automatically sentence and word-aligned and morphosyntactic information was added to both the English and Swahili portion of the corpus. The annotated parallel corpus allows us to investigate two possible uses. We describe experiments with the projection of part-of-speech tagging annotation from English onto Swahili, as well as the development of a basic statistical machine translation system for this language pair, using the parallel corpus and a consolidated database of existing English—Swahili translation dictionaries. We particularly focus on the difficulties of translating English into the morphologically more complex Bantu language of Swahili. 相似文献

9.

一种词义与词的混合语言模型及其应用

侯珺王作英《中文信息学报》2001,15(6):8-13

本文提出了一种基于词和词义混合的统计语言模型,研究了这个模型在词义标注和汉语普通话语音识别中的性能,并且与传统的词义模型和基于词的语言模型进行了对比。这个模型比传统词义模型更准确地描述了词义和词的关系,在词义标注中具有较小的混淆度;在汉语普通话连续音识别中,这个词义模型的性能优于基于词的三元文法模型,并且需要较小的存储空间。相似文献

10.

System Combination for Machine Translation of Spoken and Written Language 总被引：1，自引：0，他引：1

《IEEE transactions on audio, speech, and language processing》2008,16(7):1222-1237

This paper describes an approach for computing a consensus translation from the outputs of multiple machine translation (MT) systems. The consensus translation is computed by weighted majority voting on a confusion network, similarly to the well-established ROVER approach of Fiscus for combining speech recognition hypotheses. To create the confusion network, pairwise word alignments of the original MT hypotheses are learned using an enhanced statistical alignment algorithm that explicitly models word reordering. The context of a whole corpus of automatic translations rather than a single sentence is taken into account in order to achieve high alignment quality. The confusion network is rescored with a special language model, and the consensus translation is extracted as the best path. The proposed system combination approach was evaluated in the framework of the TC-STAR speech translation project. Up to six state-of-the-art statistical phrase-based translation systems from different project partners were combined in the experiments. Significant improvements in translation quality from Spanish to English and from English to Spanish in comparison with the best of the individual MT systems were achieved under official evaluation conditions. 相似文献

11.

Speech understanding and speech translation by maximum a-posteriori semantic decoding

《Artificial Intelligence in Engineering》1999,13(4):373-384

This paper describes a domain-limited system for speech understanding as well as for speech translation. An integrated semantic decoder directly converts the preprocessed speech signal into its semantic representation by a maximum a-posteriori classification. With the combination of probabilistic knowledge on acoustic, phonetic, syntactic, and semantic levels, the semantic decoder extracts the most probable meaning of the utterance. No separate speech recognition stage is needed because of the integration of the Viterbi-algorithm (calculating acoustic probabilities by the use of Hidden-Markov-Models) and a probabilistic chart parser (calculating semantic and syntactic probabilities by special models). The semantic structure is introduced as a representation of an utterance's meaning. It can be used as an intermediate level for a succeeding intention decoder (within a speech understanding system for the control of a running application by spoken inputs) as well as an interlingua-level for a succeeding language production unit (within an automatic speech translation system for the creation of spoken output in another language). Following the above principles and using the respective algorithms, speech understanding and speech translating front-ends for the domains ‘graphic editor’, ‘service robot’, ‘medical image visualisation’ and ‘scheduling dialogues’ could be successfully realised. 相似文献

12.

Increasing adaptability of a speech into sign language translation system

Verónica López-Ludeña Rubén San-Segundo Carlos González Morcillo Juan Carlos López José M. Pardo Muñoz 《Expert systems with applications》2013,40(4):1312-1322

This paper describes a new version of a speech into sign language translation system with new tools and characteristics for increasing its adaptability to a new task or a new semantic domain. This system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the signs). In order to increase the system adaptability, this paper presents new improvements in all the three main modules for generating automatically the task dependent information from a parallel corpus: automatic generation of Spanish variants when generating the vocabulary and language model for the speech recogniser, an acoustic adaptation module for the speech recogniser, data-oriented language and translation models for the machine translator and a list of signs to design. The avatar animation module includes a new editor for rapidly design of the required signs. These developments have been necessary to reduce the effort when adapting a Spanish into Spanish sign language (LSE: Lengua de Signos Española) translation system to a new domain. The whole translation presents a SER (Sign Error Rate) lower than 10% and a BLEU higher than 90% while the effort for adapting the system to a new domain has been reduced more than 50%. 相似文献

13.

Continuous space language models

《Computer Speech and Language》2007,21(3):492-518

This paper describes the use of a neural network language model for large vocabulary continuous speech recognition. The underlying idea of this approach is to attack the data sparseness problem by performing the language model probability estimation in a continuous space. Highly efficient learning algorithms are described that enable the use of training corpora of several hundred million words. It is also shown that this approach can be incorporated into a large vocabulary continuous speech recognizer using a lattice rescoring framework at a very low additional processing time. The neural network language model was thoroughly evaluated in a state-of-the-art large vocabulary continuous speech recognizer for several international benchmark tasks, in particular the Nist evaluations on broadcast news and conversational speech recognition. The new approach is compared to four-gram back-off language models trained with modified Kneser–Ney smoothing which has often been reported to be the best known smoothing method. Usually the neural network language model is interpolated with the back-off language model. In that way, consistent word error rate reductions for all considered tasks and languages were achieved, ranging from 0.4% to almost 1% absolute. 相似文献

14.

基于Sentence-Rank的图像句子标注

徐守坤徐坚李宁周佳刘楚秋《计算机工程与应用》2019,55(2):121-127

传统的图像语义句子标注是利用句子模板完成对图像内容描述，但其标注句子很难做到符合语言逻辑。针对这一问题，提出基于统计思想从语料库中选出一条最优的句子来描述图像内容，设计以[N]-gram算法为主要思想的Sentence-Rank算法生成标注句子。首先执行机器视觉特征学习，选择标注性能最好的HSV-LBP-HOG融合特征完成图像分类，获得图像标注关键词。然后，利用字符串匹配算法从语料库中列出包含所有标注关键词的句子，并将得到的句子通过Sentence-Rank算法进行价值排序，选取评分最高的句子描述图像。实验结果表明，该方法得到的标注句子具有较低的困惑度，较好地解决了句子的语言逻辑问题。相似文献

15.

一种基于语义分析的汉语语音识别纠错方法

韦向峰张全熊亮《计算机科学》2006,33(10):152-155

汉语语音识别的研究越来越重视与语言处理的结合,语音识别已经不是单纯的语音信号处理。N-gram语言模型应用到语音识别系统中,大大增强了系统的正确率和稳定性,但它也有其自身的局限性,使得语音识别出现许多语法和语义的错误结果。本文分析了语音识别产生语音和文字方面的错误的原因和类型,在概念层次网络语言模型的基础上提出了一种基于语句语义分析和混淆音矩阵的语音识别纠错方法。通过三个发音人、5万字的声音语料和216句实验语句的纠错测试,本文的纠错系统在纠正语义搭配型错误方面有比较好的表现,可克服N-gram语言模型带来的一些缺陷。本文提出的纠错方法还可以融合到语音识别系统中,以便更好地为语音识别的纠错处理服务。相似文献

16.

A Dialectal Chinese Speech Recognition Framework 总被引：2，自引：0，他引：2

下载免费PDF全文

Jing Li Thomas Fang Zheng William Byrne and Dan Jurafsky 《计算机科学技术学报》2006,21(1):106-115

A framework for dialectal Chinese speech recognition is proposed and studied, in which a relatively small dialectal Chinese (or in other words Chinese influenced by the native dialect) speech corpus and dialect-related knowledge are adopted to transform a standard Chinese (or Putonghua, abbreviated as PTH) speech recognizer into a dialectal Chinese speech recognizer. Two kinds of knowledge sources are explored: one is expert knowledge and the other is a small dialectal Chinese corpus. These knowledge sources provide information at four levels: phonetic level, lexicon level, language level, and acoustic decoder level. This paper takes Wu dialectal Chinese (WDC) as an example target language. The goal is to establish a WDC speech recognizer from an existing PTH speech recognizer based on the Initial-Final structure of the Chinese language and a study of how dialectal Chinese speakers speak Putonghua. The authors propose to use context-independent PTH-IF mappings (where IF means either a Chinese Initial or a Chinese Final), context-independent WDC-IF mappings, and syllable-dependent WDC-IF mappings (obtained from either experts or data), and combine them with the supervised maximum likelihood linear regression (MLLR) acoustic model adaptation method. To reduce the size of the multi-pronunciation lexicon introduced by the IF mappings, which might also enlarge the lexicon confusion and hence lead to the performance degradation, a Multi-Pronunciation Expansion (MPE) method based on the accumulated uni-gram probability (AUP) is proposed. In addition, some commonly used WDC words are selected and added to the lexicon. Compared with the original PTH speech recognizer, the resulting WDC speech recognizer achieves 10-18% absolute Character Error Rate (CER) reduction when recognizing WDC, with only a 0.62% CER increase when recognizing PTH. The proposed framework and methods are expected to work not only for Wu dialectal Chinese but also for other dialectal Chinese languages and even other languages. 相似文献

17.

PATTERN RECOGNITION APPROACHES FOR SPEECH-TO-SPEECH TRANSLATION

FRANCISCO CASACUBERTA ENRIQUE VIDAL ALBERTO SANCHIS JUAN-MIGUEL VILAR 《控制论与系统》2013,44(1):3-17

We propose a statistical approach to speech-to-speech translation that uses finite-state models in all levels. Acoustic hidden Markov models (HMMs) model the pronunciation of the input-language phonemes and words, while the input–output word mapping, along with the syntax of the output language, are jointly modeled by means a large stochastic finite-state transducer. This allows for a complete integration of all the models so that the translation process can be performed by searching for an optimal path of states through the integrated network. As in speech recognition, HMMs can be trained from an input-language speech corpus, and the translation model is learned automatically from a parallel (text) training corpus. This approach has been assessed in the framework of the EuTrans project, funded by the European Union. Extensive experiments have been carried out with speech-input translations from Spanish to English and from Italian to English in applications involving the interaction (by telephone) of a customer with the front desk of a hotel. A summary of the most relevant results is presented. 相似文献

18.

MLN-based Bangla ASR using context sensitive triphone HMM

Foyzul Hassan Mohammed Rokibul Alam Kotwal Ghulam Muhammad Mohammad Nurul Huda 《International Journal of Speech Technology》2011,14(3):183-191

Building a continuous speech recognizer for the Bangla (widely used as Bengali) language is a challenging task due to the unique inherent features of the language like long and short vowels and many instances of allophones. Stress and accent vary in spoken Bangla language from region to region. But in formal read Bangla speech, stress and accents are ignored. There are three approaches to continuous speech recognition (CSR) based on the sub-word unit viz. word, phoneme and syllable. Pronunciation of words and sentences are strictly governed by set of linguistic rules. Many attempts have been made to build continuous speech recognizers for Bangla for small and restricted tasks. However, medium and large vocabulary CSR for Bangla is relatively new and not explored. In this paper, the authors have attempted for building automatic speech recognition (ASR) method based on context sensitive triphone acoustic models. The method comprises three stages, where the first stage extracts phoneme probabilities from acoustic features using a multilayer neural network (MLN), the second stage designs triphone models to catch context of both sides and the final stage generates word strings based on triphone hidden Markov models (HMMs). The objective of this research is to build a medium vocabulary triphone based continuous speech recognizer for Bangla language. In this experimentation using Bangla speech corpus prepared by us, the recognizer provides higher word accuracy as well as word correct rate for trained and tested sentences with fewer mixture components in HMMs. 相似文献

19.

基于补全信息的篇章级神经机器翻译

张培张旭熊德意《中文信息学报》1986,34(7):60-67

对于句子级别的神经机器翻译,由于不考虑句子所处的上下文信息,往往存在句子语义表示不完整的问题。该文通过依存句法分析,对篇章中的每句话提取有效信息,再将提取出的信息,补全到源端句子中,使得句子的语义表示更加完整。该文在汉语-英语语言对上进行了实验,并针对篇章语料稀少的问题,提出了在大规模句子级别的平行语料上的训练方法。相比于基准系统,该文提出的方法获得了1.47个BLEU值的提高。实验表明,基于补全信息的篇章级神经机器翻译,可以有效地解决句子级别神经机器翻译语义表示不完整的问题。相似文献

20.

融合词典与对抗迁移的越南语事件实体识别

薛振宇线岩团余正涛高盛祥普浏清《计算机工程》2022,48(3):107-114+145

针对越南语事件标注语料稀缺且标注语料中未登陆词过多导致实体识别精度降低的问题,提出一种融合词典与对抗迁移的实体识别模型。将越南语作为目标语言,英语和汉语作为源语言,通过源语言的实体标注信息和双语词典提升目标语言的实体识别效果。采用词级别对抗迁移实现源语言与目标语言的语义空间共享,融合双语词典进行多粒度特征嵌入以丰富目标语言词的语义表征,再使用句子级别对抗迁移提取与语言无关的序列特征,最终通过条件随机场推理模块标注实体识别结果。在越南语新闻数据集上的实验结果表明,在源语言为英语和汉语的情况下,该模型相比主流的单语实体识别模型和迁移学习模型的实体识别性能有明显提升,并且在加入目标语义标注数据后,相比单语实体识别模型的F1值分别增加了19.61和18.73个百分点。相似文献