首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Statistical machine translation systems are usually trained on large amounts of bilingual text (used to learn a translation model), and also large amounts of monolingual text in the target language (used to train a language model). In this article we explore the use of semi-supervised model adaptation methods for the effective use of monolingual data from the source language in order to improve translation quality. We propose several algorithms with this aim, and present the strengths and weaknesses of each one. We present detailed experimental evaluations on the French–English EuroParl data set and on data from the NIST Chinese–English large-data track. We show a significant improvement in translation quality on both tasks.  相似文献   

2.
In this paper, we present the building of various language resources for a multi-engine bi-directional English-Filipino Machine Translation (MT) system. Since linguistics information on Philippine languages are available, but as of yet, the focus has been on theoretical linguistics and little is done on the computational aspects of these languages, attempts are reported here on the manual construction of these language resources such as the grammar, lexicon, morphological information, and the corpora which were literally built from almost non-existent digital forms. Due to the inherent difficulties of manual construction, we also discuss our experiments on various technologies for automatic extraction of these resources to handle the intricacies of the Filipino language, designed with the intention of using them for the MT system. To implement the different MT engines and to ensure the improvement of translation quality, other language tools (such as the morphological analyzer and generator, and the part of speech tagger) were developed.  相似文献   

3.
Neural Computing and Applications - Deep neural networks (DNN) have achieved great success in several research areas like information retrieval, image processing, and speech recognition. In the...  相似文献   

4.
依赖于大规模的平行语料库,神经机器翻译在某些语言对上已经取得了巨大的成功。无监督神经机器翻译UNMT又在一定程度上解决了高质量平行语料库难以获取的问题。最近的研究表明,跨语言模型预训练能够显著提高UNMT的翻译性能,其使用大规模的单语语料库在跨语言场景中对深层次上下文信息进行建模,获得了显著的效果。进一步探究基于跨语言预训练的UNMT,提出了几种改进模型训练的方法,针对在预训练之后UNMT模型参数初始化质量不平衡的问题,提出二次预训练语言模型和利用预训练模型的自注意力机制层优化UNMT模型的上下文注意力机制层2种方法。同时,针对UNMT中反向翻译方法缺乏指导的问题,尝试将Teacher-Student框架融入到UNMT的任务中。实验结果表明,在不同语言对上与基准系统相比,本文的方法最高取得了0.8~2.08个百分点的双语互译评估(BLEU)值的提升。  相似文献   

5.
针对汉语—维吾尔语的统计机器翻译系统中存在的语义无关性问题,提出基于神经网络机器翻译方法的双语关联度优化模型。该模型利用注意力机制捕获词对齐信息,引入双语短语间的语义相关性和内部词汇匹配度,预测双语短语的生成概率并将其作为双语关联度,以优化统计翻译模型中的短语翻译得分。在第十一届全国机器翻译研讨会(CWMT 2015)汉维公开机器翻译数据集上的实验结果表明,与基线系统相比,在使用较小规模的训练数据和词汇表的条件下,所提方法可以有效地同时提高短语级别和句子级别的机器翻译任务性能,分别获得最高2.49和0.59的BLEU值提升。  相似文献   

6.
针对词汇化调序模型在机器翻译中存在的上下文无关性及稀疏性问题,提出了基于语义内容进行调序方向及概率预测的调序表重构模型。首先,使用连续分布式表示方法获取调序规则的特征向量;然后,通过循环神经网络(RNN)对于向量化表示的调序规则进行调序方向及概率预测;最后,过滤并重构调序表,赋予原始调序规则更加合理的调序概率分布值,提高调序模型中调序信息的准确度,同时降低调序表规模,提高后续解码速率。实验结果表明,将调序表重构模型应用至汉维机器翻译任务中,BLEU值可以获得0.39的提升。  相似文献   

7.
无监督神经机器翻译仅利用大量单语数据,无需平行数据就可以训练模型,但是很难在2种语系遥远的语言间建立联系。针对此问题,提出一种新的不使用平行句对的神经机器翻译训练方法,使用一个双语词典对单语数据进行替换,在2种语言之间建立联系,同时使用词嵌入融合初始化和双编码器融合训练2种方法强化2种语言在同一语义空间的对齐效果,以提高机器翻译系统的性能。实验表明,所提方法在中-英与英-中实验中比基线无监督翻译系统的BLEU值分别提高2.39和1.29,在英-俄和英-阿等单语实验中机器翻译效果也显著提高了。  相似文献   

8.
The study of comparative stylistics attempts to catalogue and explain the differences in style between languages. Rules of comparative stylistics are commonly presented in textbooks of translation as simple rules of thumb, but if we hope to incorporate a knowledge of comparative stylistics into machine translation systems, we must take a more systematic approach. We develop a formal model of comparative syntactic stylistics to be used as a component of a general computational theory of style. We adapt textbook rules of human translation and study a small corpus of French-English translations to determine how these informal rules can be represented in our model as formal rules of translation. Our model of comparative stylistics could be implemented in a machine translation system, enabling the system to make a more informed decision about possible translation choices and their potential stylistic effects.  相似文献   

9.
Dekai Wu 《Machine Translation》2005,19(3-4):213-227
We offer a perspective on EBMT from a statistical MT standpoint, by developing a three-dimensional MT model space based on three pairs of definitions: (1) logical versus statistical MT, (2) schema-based versus example-based MT, and (3) lexical versus compositional MT. Within this space we consider the interplay of three key ideas in the evolution of transfer, example-based, and statistical approaches to MT. We depict how all translation models face these issues in one way or another, regardless of the school of thought, and suggest where the real questions for the future may lie.  相似文献   

10.
11.
The availability of machine-readable bilingual linguistic resources is crucial not only for rule-based machine translation but also for other applications such as cross-lingual information retrieval. However, the building of such resources (bilingual single-word and multi-word correspondences, translation rules) demands extensive manual work, and, as a consequence, bilingual resources are usually more difficult to find than “shallow” monolingual resources such as morphological dictionaries or part-of-speech taggers, especially when they involve a less-resourced language. This paper describes a methodology to build automatically both bilingual dictionaries and shallow-transfer rules by extracting knowledge from word-aligned parallel corpora processed with shallow monolingual resources (morphological analysers, and part-of-speech taggers). We present experiments for Brazilian Portuguese–Spanish and Brazilian Portuguese–English parallel texts. The results show that the proposed methodology can enable the rapid creation of valuable computational resources (bilingual dictionaries and shallow-transfer rules) for machine translation and other natural language processing tasks).  相似文献   

12.
This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language models in differentiating between Statistical Machine Translation output and human translations. Our approach uses discriminative language modelling to rerank the n-best translations generated by a statistical machine translation system. The performance is evaluated for Arabic-to-English translation using NIST’s MT-Eval benchmarks. While deep features extracted from parse trees do not consistently help, we show how features extracted from a shallow Part-of-Speech annotation layer outperform a competitive baseline and a state-of-the-art comparative reranking approach, leading to significant BLEU improvements on three different test sets.  相似文献   

13.
针对汉维统计机器翻译中维吾尔语具有长距离依赖问题和语言模型具有数据稀疏现象,提出了一种基于泛化的维吾尔语语言模型.该模型借助维吾尔语语言模型的训练过程中生成的文本,结合字符串相似度算法,取相似的维文字符串经过归一化处理抽取规则,计算规则的参数值,利用规则给测试集在解码过程中生成n-best译文重新评分,将评分最高的译文作为最佳译文.实验结果表明,泛化语言模型减少了存储空间,同时,规则的合理使用有效地提高了翻译译文的质量.  相似文献   

14.
Word reordering is one of the challengeable problems of machine translation. It is an important factor of quality and efficiency of machine translation systems. In this paper, we introduce a novel reordering model based on an innovative structure, named, phrasal dependency tree. The phrasal dependency tree is a modern syntactic structure which is based on dependency relationships between contiguous non-syntactic phrases. The proposed model integrates syntactical and statistical information in the context of log-linear model aimed at dealing with the reordering problems. It benefits from phrase dependencies, translation directions (orientations) and translation discontinuity between translated phrases. In comparison with well-known and popular reordering models such as distortion, lexicalised and hierarchical models, the experimental study demonstrates the superiority of our model in terms of translation quality. Performance is evaluated for Persian → English and English → German translation tasks using Tehran parallel corpus and WMT07 benchmarks, respectively. The results report 1.54/1.7 and 1.98/3.01 point improvements over the baseline in terms of BLEU/TER metrics on Persian → English and German → English translation tasks, respectively. On average our model retrieved a significant impact on precision with comparable recall value with respect to the lexicalised and distortion models.  相似文献   

15.
METIS-II was an EU-FET MT project running from October 2004 to September 2007, which aimed at translating free text input without resorting to parallel corpora. The idea was to use “basic” linguistic tools and representations and to link them with patterns and statistics from the monolingual target-language corpus. The METIS-II project has four partners, translating from their “home” languages Greek, Dutch, German, and Spanish into English. The paper outlines the basic ideas of the project, their implementation, the resources used, and the results obtained. It also gives examples of how METIS-II has continued beyond its lifetime and the original scope of the project. On the basis of the results and experiences obtained, we believe that the approach is promising and offers the potential for development in various directions.  相似文献   

16.
NEC has been developing a Japanese-English bi-directional machine translation system called VENUS (Vehicle for Natural language Understanding & Synthesis) in order to reduce the increasing cost of the manual translation of vast amounts of in-house technical documents. In addition, a translation support subsystem has been developed on the basis of VENUS, and extended to have the requisite facilities to prepare translated documents, such as document entry, editing, translation, printing, management, etc.This paper briefly introduces the current status of the VENUS translation system, and the basic idea for the system development.  相似文献   

17.
Machine translation is traditionally formulated as the transduction of strings of words from the source to the target language. As a result, additional lexical processing steps such as morphological analysis, transliteration, and tokenization are required to process the internal structure of words to help cope with data-sparsity issues that occur when simply dividing words according to white spaces. In this paper, we take a different approach: not dividing lexical processing and translation into two steps, but simply viewing translation as a single transduction between character strings in the source and target languages. In particular, we demonstrate that the key to achieving accuracies on a par with word-based translation in the character-based framework is the use of a many-to-many alignment strategy that can accurately capture correspondences between arbitrary substrings. We build on the alignment method proposed in Neubig et al. (Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics. Portland, Oregon, pp. 632–641, 2011), improving its efficiency and accuracy with a focus on character-based translation. Using a many-to-many aligner imbued with these improvements, we demonstrate that the traditional framework of phrase-based machine translation sees large gains in accuracy over character-based translation with more naive alignment methods, and achieves comparable results to word-based translation for two distant language pairs.  相似文献   

18.
Although there is no machine learning technique that fully meets human requirements, finding a quick and efficient translation mechanism has become an urgent necessity, due to the differences between the languages spoken in the world’s communities and the vast development that has occurred worldwide, as each technique demonstrates its own advantages and disadvantages. Thus, the purpose of this paper is to shed light on some of the techniques that employ machine translation available in literature, to encourage researchers to study these techniques. We discuss some of the linguistic characteristics of the Arabic language. Features of Arabic that are related to machine translation are discussed in detail, along with possible difficulties that they might present. This paper summarizes the major techniques used in machine translation from Arabic into English, and discusses their strengths and weaknesses.  相似文献   

19.
Due to the rapid advancement of both computer technology and linguistic theory, machine translation systems are now coming into practical use.Fujitsu has two machine translation systems, ATLAS-I is a syntax-based machine translation system which translates English into Japanese. ATLAS II is a semantic-based system which aims at high quality multilingual translation. In this paper, both the ATLAS-I and ATLAS II translation mechanisms are explained.  相似文献   

20.
This paper provides an overview of the KBMT-89 project at Carmegie Mellon University's Center for Machine Translation, as well therefore of the special number of this journal, which reports on the project. The knowledge-based approach to machine translation is presented and defended in a historical context. Various components of the system, key parts of which are described in subsequent papers of the issue, are introduced and paired with their computational motivations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号