首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
3.
In this paper, we develop an approach called syntax-based reordering (SBR) to handling the fundamental problem of word ordering for statistical machine translation (SMT). We propose to alleviate the word order challenge including morpho-syntactical and statistical information in the context of a pre-translation reordering framework aimed at capturing short- and long-distance word distortion dependencies. We examine the proposed approach from the theoretical and experimental points of view discussing and analyzing its advantages and limitations in comparison with some of the state-of-the-art reordering methods.In the final part of the paper, we describe the results of applying the syntax-based model to translation tasks with a great need for reordering (Chinese-to-English and Arabic-to-English). The experiments are carried out on standard phrase-based and alternative N-gram-based SMT systems. We first investigate sparse training data scenarios, in which the translation and reordering models are trained on a sparse bilingual data, then scaling the method to a large training set and demonstrating that the improvement in terms of translation quality is maintained.  相似文献   

4.
Machine translation is traditionally formulated as the transduction of strings of words from the source to the target language. As a result, additional lexical processing steps such as morphological analysis, transliteration, and tokenization are required to process the internal structure of words to help cope with data-sparsity issues that occur when simply dividing words according to white spaces. In this paper, we take a different approach: not dividing lexical processing and translation into two steps, but simply viewing translation as a single transduction between character strings in the source and target languages. In particular, we demonstrate that the key to achieving accuracies on a par with word-based translation in the character-based framework is the use of a many-to-many alignment strategy that can accurately capture correspondences between arbitrary substrings. We build on the alignment method proposed in Neubig et al. (Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Portland, Oregon, pp. 632–641, 2011), improving its efficiency and accuracy with a focus on character-based translation. Using a many-to-many aligner imbued with these improvements, we demonstrate that the traditional framework of phrase-based machine translation sees large gains in accuracy over character-based translation with more naive alignment methods, and achieves comparable results to word-based translation for two distant language pairs.  相似文献   

5.
Learning finite-state models for machine translation   总被引:1,自引:0,他引:1  
In formal language theory, finite-state transducers are well-know models for simple “input-output” mappings between two languages. Even if more powerful, recursive models can be used to account for more complex mappings, it has been argued that the input-output relations underlying most usual natural language pairs can essentially be modeled by finite-state devices. Moreover, the relative simplicity of these mappings has recently led to the development of techniques for learning finite-state transducers from a training set of input-output sentence pairs of the languages considered. In the last years, these techniques have lead to the development of a number of machine translation systems. Under the statistical statement of machine translation, we overview here how modeling, learning and search problems can be solved by using stochastic finite-state transducers. We also review the results achieved by the systems we have developed under this paradigm. As a main conclusion of this review we argue that, as task complexity and training data scarcity increase, those systems which rely more on statistical techniques tend produce the best results. This work was partially supported by the European Union project TT2 (IST-2001-32091) and by the Spanish project ITEFTE (TIC 2003-08681-C02-02). Editor: Georgios Paliouras and Yasubumi Sakakibara  相似文献   

6.
7.
This paper provides an overview of the KBMT-89 project at Carmegie Mellon University's Center for Machine Translation, as well therefore of the special number of this journal, which reports on the project. The knowledge-based approach to machine translation is presented and defended in a historical context. Various components of the system, key parts of which are described in subsequent papers of the issue, are introduced and paired with their computational motivations.  相似文献   

8.
Ideally, we might hope to improve the performance of our MT systems by improving the system, but it might be even more important to improve performance by looking for a more appropriate application. A survey of the literature on evaluation of MT systems seems to suggest that the success of the evaluation often depends very strongly on the selection of an appropriate application. If the application is well-chosen, then it often becomes fairly clear how the system should be evaluated. Moreover, the evaluation is likely to make the system look good. Conversely, if the application is not clearly identified (or worse, if the application is poorly chosen), then it is often very difficult to find a satisfying evaluation paradigm. We begin our discussion with a brief review of some evaluation metrics that have been tried in the past and conclude that it is difficult to identify a satisfying evaluation paradigm that will make sense over all possible applications. It is probably wise to identify the application first, and then we will be in a much better position to address evaluation questions. The discussion will then turn to the main point, an essay on how to pick a good niche application for state-of-the-art (crummy) machine translation.  相似文献   

9.
Using virtual environments to train firefighters   总被引:3,自引:0,他引:3  
Using virtual environments for training and mission rehearsal gives US Navy firefighters an edge in fighting real fires. A test run on the ex-USS Shadwell measured the improvement. The results suggest that virtual environments serve effectively for training and mission rehearsal for shipboard firefighting. VE training provides a flexible environment where a firefighter can not only learn an unfamiliar part of the ship, but also practice tactics and procedures for fighting a fire by interacting with simulated smoke and fire without risking lives or property. These tests proved a successful first step in developing a new training technology for shipboard firefighting based on immersive virtual environments. The tests also indicated potential areas for improvement, requiring additional research. User interaction techniques for manipulating objects in VEs need further study, along with usability studies to determine their effectiveness or utility. Other areas that could enhance VE training systems include more natural and intuitive I/O devices such as 3D sound, speech and natural language input, integrated multimedia and hypermedia instruction, and multi user interaction  相似文献   

10.
11.
Assessing the quality of candidate translations involves diverse linguistic facets. However, most automatic evaluation methods in use today rely on limited quality assumptions, such as lexical similarity. This introduces a bias in the development cycle which in some cases has been reported to carry very negative consequences. In order to tackle this methodological problem, we explore a novel path towards heterogeneous automatic Machine Translation evaluation. We have compiled a rich set of specialized similarity measures operating at different linguistic dimensions and analyzed their individual and collective behaviour over a wide range of evaluation scenarios. Results show that measures based on syntactic and semantic information are able to provide more reliable system rankings than lexical measures, especially when the systems under evaluation are based on different paradigms. At the sentence level, while some linguistic measures perform better than most lexical measures, some others perform substantially worse, mainly due to parsing problems. Their scores are, however, suitable for combination, yielding a substantially improved evaluation quality.  相似文献   

12.
Machine Translation - Self-attention-based encoder-decoder frameworks have drawn increasing attention in recent years. The self-attention mechanism generates contextual representations by attending...  相似文献   

13.
We propose a novel approach to cross-lingual language model and translation lexicon adaptation for statistical machine translation (SMT) based on bilingual latent semantic analysis. Bilingual LSA enables latent topic distributions to be efficiently transferred across languages by enforcing a one-to-one topic correspondence during training. Using the proposed bilingual LSA framework, model adaptation can be performed by, first, inferring the topic posterior distribution of the source text and then applying the inferred distribution to an n-gram language model of the target language and translation lexicon via marginal adaptation. The background phrase table is enhanced with the additional phrase scores computed using the adapted translation lexicon. The proposed framework also features rapid bootstrapping of LSA models for new languages based on a source LSA model of another language. Our approach is evaluated on the Chinese–English MT06 test set using the medium-scale SMT system and the GALE SMT system measured in BLEU and NIST scores. Improvement in both scores is observed on both systems when the adapted language model and the adapted translation lexicon are applied individually. When the adapted language model and the adapted translation lexicon are applied simultaneously, the gain is additive. At the 95% confidence interval of the unadapted baseline system, the gain in both scores is statistically significant using the medium-scale SMT system, while the gain in the NIST score is statistically significant using the GALE SMT system.  相似文献   

14.
Statistical machine translation systems are usually trained on large amounts of bilingual text (used to learn a translation model), and also large amounts of monolingual text in the target language (used to train a language model). In this article we explore the use of semi-supervised model adaptation methods for the effective use of monolingual data from the source language in order to improve translation quality. We propose several algorithms with this aim, and present the strengths and weaknesses of each one. We present detailed experimental evaluations on the French–English EuroParl data set and on data from the NIST Chinese–English large-data track. We show a significant improvement in translation quality on both tasks.  相似文献   

15.
Explicit length modelling has been previously explored in statistical pattern recognition with successful results. In this paper, two length models along with two parameter estimation methods and two alternative parametrisations for statistical machine translation (SMT) are presented. More precisely, we incorporate explicit bilingual length modelling in a state-of-the-art log-linear SMT system as an additional feature function in order to prove the contribution of length information. Finally, a systematic evaluation on reference SMT tasks considering different language pairs proves the benefits of explicit length modelling.  相似文献   

16.
We present a collection of parallel treebanks that have been automatically aligned on both the terminal and the non-terminal constituent level for use in syntax-based machine translation. We describe how they were constructed and applied to a syntax- and example-based machine translation system called Parse and Corpus-Based Machine Translation (PaCo-MT). For the language pair Dutch to English, we present non-terminal alignment evaluation scores for a variety of tree alignment approaches. Finally, based on the parallel treebanks created by these approaches, we evaluate the MT system itself and compare the scores with those of Moses, a current state-of-the-art statistical MT system, when trained on the same data.  相似文献   

17.
This paper presents a new method for utilizing translator knowledge bases for machine translation systems. Translator knowledge to be stored and utilized consists of translationally equivalent pattern pairs: surface-level phrasal, clausal, and sentential correspondences between the source and target languages. This knowledge will be utilized to translate domain-specific idiomatic, nonstandard, or ungrammatical expressions. The proposed method has been implemented in an adaptive English to Japanese machine translation system, HICATS/EJ, as one of its customization facilities.  相似文献   

18.
刘颖  姜巍 《计算机工程与应用》2012,48(32):98-101,146
对齐短语是决定统计机器翻译系统质量的核心模块。提出基于短语结构树的层次短语模型,这是利用串-树模型的思想对层次短语模型的扩展。基于短语结构树的层次短语模型是在双语对齐短语的基础之上结合英语短语结构树抽取翻译规则,并利用启发式策略获得翻译规则的扩展句法标记。采用翻译规则的统计机器翻译系统在不同数据集上具有稳定的翻译结果,在训练集和测试集的平均BlEU评分高于短语模型和层次短语模型的BLEU评分。  相似文献   

19.
This paper summarizes ongoing efforts to provide software infrastructure (and methodology) for open-source machine translation that combines a deep semantic transfer approach with advanced stochastic models. The resulting infrastructure combines precise grammars for parsing and generation, a semantic-transfer based translation engine and stochastic controllers. We provide both a qualitative and quantitative experience report from instantiating our general architecture for Japanese–English MT using only open-source components, including HPSG-based grammars of English and Japanese.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号