首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Although some progress has been made on the quality of Machine Translation in recent years, there is still a significant potential for quality improvement. There has also been a shift in paradigm of machine translation, from “classical” rule-based systems like METAL or LMT1 towards example-based or statistical MT.2 It seems to be time now to evaluate the progress and compare the results of these efforts, and draw conclusions for further improvements of MT quality.The paper starts with a comparison between statistical MT (henceforth: SMT) and rule-based MT (henceforth: RMT) systems, and describes the set-up and the evaluation results; the second section analyses the strengths and weaknesses of the respective approaches, and the third one discusses models of an architecture for a hybrid system.  相似文献   

2.
80年代以来,语音识别与合成,语音编码和实时传输、多语种机器翻译等关键技术的迅速发展,促进了自动翻译电话系统的研究和开发。  相似文献   

3.
Multiples translations can be computed by one machine translation (MT) system or by different MT systems. We may assume that different MT systems make different errors due to using different models, generation strategies, or tweaks. An investigated technique, inherited from automatic speech recognition (ASR), is the so-called system combination that is based on combining the outputs of multiples MT systems. We combine the outputs of a phrase- and Ngram-based Statistical MT (SMT) systems using statistical criteria and additional rescoring features.  相似文献   

4.
基于WordNet词义消歧的系统融合   总被引:3,自引:3,他引:0  
刘宇鹏  李生  赵铁军 《自动化学报》2010,36(11):1575-1580
最近混淆网络在融合多个机器翻译结果中展示很好的性能. 然而为了克服在不同的翻译系统中不同的词序, 假设对齐在混淆网络的构建上仍然是一个重要的问题. 但以往的对齐方法都没有考虑到语义信息. 本文为了更好地改进系统融合的性能, 提出了用词义消歧(Word sense disambiguation, WSD)来指导混淆网络中的对齐. 同时骨架翻译的选择也是通过计算句子间的相似度来获得的, 句子的相似性计算使用了二分图的最大匹配算法. 为了使得基于WordNet词义消歧方法融入到系统中, 本文将翻译错误率(Translation error rate, TER)算法进行了改进, 实验结果显示本方法的性能好于经典的TER算法的性能.  相似文献   

5.
System Combination for Machine Translation of Spoken and Written Language   总被引:1,自引:0,他引:1  
This paper describes an approach for computing a consensus translation from the outputs of multiple machine translation (MT) systems. The consensus translation is computed by weighted majority voting on a confusion network, similarly to the well-established ROVER approach of Fiscus for combining speech recognition hypotheses. To create the confusion network, pairwise word alignments of the original MT hypotheses are learned using an enhanced statistical alignment algorithm that explicitly models word reordering. The context of a whole corpus of automatic translations rather than a single sentence is taken into account in order to achieve high alignment quality. The confusion network is rescored with a special language model, and the consensus translation is extracted as the best path. The proposed system combination approach was evaluated in the framework of the TC-STAR speech translation project. Up to six state-of-the-art statistical phrase-based translation systems from different project partners were combined in the experiments. Significant improvements in translation quality from Spanish to English and from English to Spanish in comparison with the best of the individual MT systems were achieved under official evaluation conditions.   相似文献   

6.
This article provides an overview of the BITS Hebrew-English bibliographic translation system developed at CCL, UMIST. This is an experimental machine translation system for translating bibliographic references in which first generation translation techniques are combined with more recent developments in computer science. The experiment explores whether simple techniques can be used to achieve usable translations in this context. The input material, the pre-editing system and the general architecture of the BITS system are described and evaluated in view of current MT research.  相似文献   

7.
Development of a robust two-way real-time speech translationsystem exposes researchers and system developers to various challenges of machine translation(MT) and spoken language dialogues. The need for communicating in at least two differentlanguages poses problems not present for a monolingual spoken language dialogue system,where no MT engine is embedded within the process flow. Integration of various componentmodules for real-time operation poses challenges not present for text translation. In this paper,we present the CCLINC (Common Coalition Language System at Lincoln Laboratory) English–Koreantwo-way speech translation system prototype trained on doctor–patient dialogues,which integrates various techniques to tackle the challenges of automatic real-time speechtranslation. Key features of the system include (i) language–independent meaning representation which preserves the hierarchicalpredicate–argument structure of an input utterance, providing a powerful mechanism for discourse understanding of utterances originating from different languages,word-sense disambiguation and generation of various word orders of many languages, (ii) adoptionof the DARPA Communicator architecture, a plug-and-play distributed system architecturewhich facilitates integration of component modules and system operation in real time, and (iii)automatic acquisition of grammar rules and lexicons for easy porting of the system to differentlanguages and domains. We describe these features in detail and present experimental results.  相似文献   

8.
Parallel integration of automatic speech recognition (ASR) models and statistical machine translation (MT) models is an unexplored research area in comparison to the large amount of works done on integrating them in series, i.e., speech-to-speech translation. Parallel integration of these models is possible when we have access to the speech of a target language text and to its corresponding source language text, like a computer-assisted translation system. To our knowledge, only a few methods for integrating ASR models with MT models in parallel have been studied. In this paper, we systematically study a number of different translation models in the context of the $N$-best list rescoring. As an alternative to the $N$ -best list rescoring, we use ASR word graphs in order to arrive at a tighter integration of ASR and MT models. The experiments are carried out on two tasks: English-to-German with an ASR vocabulary size of 17 K words, and Spanish-to-English with an ASR vocabulary of 58 K words. For the best method, the MT models reduce the ASR word error rate by a relative of 18% and 29% on the 17 K and the 58 K tasks, respectively.   相似文献   

9.
The last few years have witnessed an increasing interest in hybridizing surface-based statistical approaches and rule-based symbolic approaches to machine translation (MT). Much of that work is focused on extending statistical MT systems with symbolic knowledge and components. In the brand of hybridization discussed here, we go in the opposite direction: adding statistical bilingual components to a symbolic system. Our base system is Generation-heavy machine translation (GHMT), a primarily symbolic asymmetrical approach that addresses the issue of Interlingual MT resource poverty in source-poor/target-rich language pairs by exploiting symbolic and statistical target-language resources. GHMT’s statistical components are limited to target-language models, which arguably makes it a simple form of a hybrid system. We extend the hybrid nature of GHMT by adding statistical bilingual components. We also describe the details of retargeting it to Arabic–English MT. The morphological richness of Arabic brings several challenges to the hybridization task. We conduct an extensive evaluation of multiple system variants. Our evaluation shows that this new variant of GHMT—a primarily symbolic system extended with monolingual and bilingual statistical components—has a higher degree of grammaticality than a phrase-based statistical MT system, where grammaticality is measured in terms of correct verb-argument realization and long-distance dependency translation.  相似文献   

10.
陈遥  朱跃龙  冯钧 《计算机工程》2008,34(11):200-202
介绍针对信息提取的机器翻译模型,分析该模型下基于语法分析的水情电报翻译系统及存在的问题。采用机器翻译中语义分析技术解决问题,利用语义信息提高翻译率。用逻辑语义建立水情电报的逻辑语义模型,结合语义信息实现对几种类型错报的翻译,由此提出基于语义的水情电报翻译模型。对系统的评价与分析结果表明,翻译率即自动化程度得到了提高,报文的不可翻译率从2%~5%降到0.96%左右。  相似文献   

11.
In this paper we present a speech-to-speech (S2S) translation system called the BBN TransTalk that enables two-way communication between speakers of English and speakers who do not understand or speak English. The BBN TransTalk has been configured for several languages including Iraqi Arabic, Pashto, Dari, Farsi, Malay, Indonesian, and Levantine Arabic. We describe the key components of our system: automatic speech recognition (ASR), machine translation (MT), text-to-speech (TTS), dialog manager, and the user interface (UI). In addition, we present novel techniques for overcoming specific challenges in developing high-performing S2S systems. For ASR, we present techniques for dealing with lack of pronunciation and linguistic resources and effective modeling of ambiguity in pronunciations of words in these languages. For MT, we describe techniques for dealing with data sparsity as well as modeling context. We also present and compare different user confirmation techniques for detecting errors that can cause the dialog to drift or stall.  相似文献   

12.
电子词典是在机器翻译系统中包含的信息量最大的一个部件,电子词典包的质量和容量直接限定机器翻译的质量和应用范围。与一般的电子词典不同,机器翻译词典每个词条都要比一般的电子词典增加词类信息、语义类别信息和成语等。文章以频率统计和频率分布统计作为维汉机器翻译词典的词条收录原则,统计维吾尔文中常用的单词数目,论述维汉机器翻译词典的设计思想,用BNF形式语言和Jackson图描述维汉机器翻译词典应包含的词条信息,最后介绍词典的具体构造方法、词条排序原则、索引表和属性库的数据结构和词典信息的查找方法。试验表明该词典在解决维吾尔语词汇歧义、结构歧义、提高汉语译文准确率等方面较为有效。  相似文献   

13.
NABU is a large, multilingual Natural Language Processing (NLP) system being developed at MCC for Human Interface applications. Although the NABU project is not considering Machine Translation (MT) as an implementation domain, it is not unreasonable to suppose that, given our multilingual orientation, some MT problems could be ameliorated if not solved by our theoretical approach. This paper addresses the problem of MT via thecognitive interlingua method, focusing on the representation of the lexicon in such a system, and its accommodation of various sources of knowledge for use by both man and machine: notably, in the latter case, morphology, syntax, and semantics. We propose a new theoretical framework — the NABU Word Lattice — as a means of integrating multiple sources of knowledge in a parsimonious fashion conducive to formal interpretation within, and the construction of, an MT system.  相似文献   

14.
This paper describes a new version of a speech into sign language translation system with new tools and characteristics for increasing its adaptability to a new task or a new semantic domain. This system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the signs). In order to increase the system adaptability, this paper presents new improvements in all the three main modules for generating automatically the task dependent information from a parallel corpus: automatic generation of Spanish variants when generating the vocabulary and language model for the speech recogniser, an acoustic adaptation module for the speech recogniser, data-oriented language and translation models for the machine translator and a list of signs to design. The avatar animation module includes a new editor for rapidly design of the required signs. These developments have been necessary to reduce the effort when adapting a Spanish into Spanish sign language (LSE: Lengua de Signos Española) translation system to a new domain. The whole translation presents a SER (Sign Error Rate) lower than 10% and a BLEU higher than 90% while the effort for adapting the system to a new domain has been reduced more than 50%.  相似文献   

15.
随着统计方法逐渐成为机器翻译研究的主流,机器翻译系统评测的分值越来越高,人们对机器翻译的信心和期望逐渐增加,社会对机器翻译应用的需求也越来越大。然而,现有的机器翻译理论和方法在系统性能上提升的空间逐渐减小,而且距离用户实际需求仍有很长的路要走。那么,面对期望、面对需求,机器翻译之路应该如何走?为此,第八届全国机器翻译研讨会对当前机器翻译研究所面临的挑战和机遇进行了深入研讨。该文详细介绍了该次研讨会六个专题的讨论情况,对机器翻译研究面临的机遇和挑战进行了认真的分析和总结。  相似文献   

16.
The Cunei machine translation platform is an open-source system for data-driven machine translation. Our platform is a synthesis of the traditional example-based MT (EBMT) and statistical MT (SMT) paradigms. What makes Cunei unique is that it measures the relevance of each translation instance with a distance function. This distance function, represented as a log-linear model, operates over one translation instance at a time and enables us to score the translation instance relative to the specified input and/or the current target hypothesis. We describe how our system, Cunei, scores features individually for each translation instance and how it efficiently performs parameter tuning over the entire feature space. We also compare Cunei with three other open-source MT systems (Moses, CMU-EBMT, and Marclator). In our experiments involving Korean–English and Czech–English translation Cunei clearly outperforms the traditional EBMT and SMT systems.  相似文献   

17.
多策略汉日机器翻译系统中的核心技术研究   总被引:1,自引:0,他引:1  
多策略的机器翻译是当今机器翻译系统的一个发展方向。该文论述了一个多策略的汉日机器翻译系统中各翻译核心子系统所使用的核心技术和算法,其中包含了使用词法分析、句法分析和语义角色标注的汉语分析子系统、利用双重索引技术的基于翻译记忆技术的机器翻译子系统、以句法树片段为模板的基于实例模式的机器翻译子系统以及综合了配价模式和断段分析的机器翻译子系统。翻译记忆子系统的测试结果表明其具有高效的特性;实例模式子系统在1 559个句子的封闭测试中达到99%的准确率,在1 500个句子的开放测试中达到85%的准确率;配价模式子系统在3 059个句子的测试中达到了89%的准确率。  相似文献   

18.
Although corpus-based approaches to machine translation (MT) are growing in interest, they are not applicable when the translation involves less-resourced language pairs for which there are no parallel corpora available; in those cases, the rule-based approach is the only applicable solution. Most rule-based MT systems make use of part-of-speech (PoS) taggers to solve the PoS ambiguities in the source-language texts to translate; those MT systems require accurate PoS taggers to produce reliable translations in the target language (TL). The standard statistical approach to PoS ambiguity resolution (or tagging) uses hidden Markov models (HMM) trained in a supervised way from hand-tagged corpora, an expensive resource not always available, or in an unsupervised way through the Baum-Welch expectation-maximization algorithm; both methods use information only from the language being tagged. However, when tagging is considered as an intermediate task for the translation procedure, that is, when the PoS tagger is to be embedded as a module within an MT system, information from the TL can be (unsupervisedly) used in the training phase to increase the translation quality of the whole MT system. This paper presents a method to train HMM-based PoS taggers to be used in MT; the new method uses not only information from the source language (SL), as general-purpose methods do, but also information from the TL and from the remaining modules of the MT system in which the PoS tagger is to be embedded. We find that the translation quality of the MT system embedding a PoS tagger trained in an unsupervised manner through this new method is clearly better than that of the same MT system embedding a PoS tagger trained through the Baum-Welch algorithm, and comparable to that obtained by embedding a PoS tagger trained in a supervised way from hand-tagged corpora.  相似文献   

19.
Traditionally, human–machine interaction to reach an improved machine translation (MT) output takes place ex-post and consists of correcting this output. In this work, we investigate other modes of intervention in the MT process. We propose a Pre-Edition protocol that involves: (a) the detection of MT translation difficulties; (b) the resolution of those difficulties by a human translator, who provides their translations (pre-translation); and (c) the integration of the obtained information prior to the automatic translation. This approach can meet individual interaction preferences of certain translators and can be particularly useful for production environments, where more control over output quality is needed. Early resolution of translation difficulties can prevent downstream errors, thus improving the final translation quality “for free”. We show that translation difficulty can be reliably predicted for English for various source units. We demonstrate that the pre-translation information can be successfully exploited by an MT system and that the indirect effects are genuine, accounting for around 16% of the total improvement. We also provide a study of the human effort involved in the resolution process.  相似文献   

20.
Balashov  Yuri 《Minds and Machines》2020,30(3):349-383

The rapid development of natural language processing in the last three decades has drastically changed the way professional translators do their work. Nowadays most of them use computer-assisted translation (CAT) or translation memory (TM) tools whose evolution has been overshadowed by the much more sensational development of machine translation (MT) systems, with which TM tools are sometimes confused. These two language technologies now interact in mutually enhancing ways, and their increasing role in human translation has become a subject of behavioral studies. Philosophers and linguists, however, have been slow in coming to grips with these important developments. The present paper seeks to fill in this lacuna. I focus on the semantic aspects of the highly distributed human–computer interaction in the CAT process which presents an interesting case of an extended cognitive system involving a human translator, a TM tool, an MT engine, and sometimes other human translators or editors. Considered as a whole, such a system is engaged in representing the linguistic meaning of the source document in the target language. But the roles played by its various components, natural as well as artificial, are far from trivial, and the division of linguistic labor between them throws new light on the familiar notions that were initially inspired by rather different phenomena in the philosophy of language, mind, and cognitive science.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号