共查询到20条相似文献,搜索用时 0 毫秒
1.
Statistical machine translation systems are usually trained on large amounts of bilingual text (used to learn a translation
model), and also large amounts of monolingual text in the target language (used to train a language model). In this article
we explore the use of semi-supervised model adaptation methods for the effective use of monolingual data from the source language
in order to improve translation quality. We propose several algorithms with this aim, and present the strengths and weaknesses
of each one. We present detailed experimental evaluations on the French–English EuroParl data set and on data from the NIST
Chinese–English large-data track. We show a significant improvement in translation quality on both tasks. 相似文献
2.
3.
This paper proposes a novel method for phrase-based statistical machine translation based on the use of a pivot language.
To translate between languages L
s
and L
t
with limited bilingual resources, we bring in a third language, L
p
, called the pivot language. For the language pairs L
s
− L
p
and L
p
− L
t
, there exist large bilingual corpora. Using only L
s
− L
p
and L
p
− L
t
bilingual corpora, we can build a translation model for L
s
− L
t
. The advantage of this method lies in the fact that we can perform translation between L
s
and L
t
even if there is no bilingual corpus available for this language pair. Using BLEU as a metric, our pivot language approach
significantly outperforms the standard model trained on a small bilingual corpus. Moreover, with a small L
s
− L
t
bilingual corpus available, our method can further improve translation quality by using the additional L
s
− L
p
and L
p
− L
t
bilingual corpora. 相似文献
4.
对齐短语是决定统计机器翻译系统质量的核心模块。提出基于短语结构树的层次短语模型,这是利用串-树模型的思想对层次短语模型的扩展。基于短语结构树的层次短语模型是在双语对齐短语的基础之上结合英语短语结构树抽取翻译规则,并利用启发式策略获得翻译规则的扩展句法标记。采用翻译规则的统计机器翻译系统在不同数据集上具有稳定的翻译结果,在训练集和测试集的平均BlEU评分高于短语模型和层次短语模型的BLEU评分。 相似文献
5.
6.
7.
Gregor Thurmair 《Computers and the Humanities》1991,25(2-3):115-128
This paper describes developments in the area of machine translation (MT). First, the paper gives an overview of developments in Germany in general; then, special problems are discussed. The system taken as an example is METAL (Machine Translation and Analysis of Natural Language), where recent development work has centered around two main topics. (i) Efforts have been made to make the system really multilingual. The German-to-English prototype had to be expanded, some system components had to be readjusted, and additional problems had to be solved. Currently, analysis and synthesis components for German, English, French, Spanish, and Dutch are under development. All these languages use a common system kernel and a standard interface structure. (ii) The system had to be made user-friendly. This was an even more important task as, up to now, MT systems have not been well accepted by users. METAL tries to be more realistic, and also tries to support the main user interfaces in a much better way than has been done before. This is based on the conviction that there are several parameters which determine the real success of an MT system. It is not just translation quality which is decisive, it is also the integration of an MT system into the whole process of preparing and translating documents.Gregor Thurmair is head of the Linguistics Department at Siemens Nixdorf Information Systems and project leader of the machine translation group, METAL. He is involved in projects in information retrieval (morphological analysis), speech understanding (parsing, semantics) and machine translation (METAL system). He has presented papers on morphology, semantics in speech understanding, transfer problems in MT, and grammar checking. 相似文献
9.
We describe a novel approach to MT that combines the strengths of the two leading corpus-based approaches: Phrasal SMT and
EBMT. We use a syntactically informed decoder and reordering model based on the source dependency tree, in combination with
conventional SMT models to incorporate the power of phrasal SMT with the linguistic generality available in a parser. We show
that this approach significantly outperforms a leading string-based Phrasal SMT decoder and an EBMT system. We present results
from two radically different language pairs, and investigate the sensitivity of this approach to parse quality by using two
distinct parsers and oracle experiments. We also validate our automated bleu scores with a small human evaluation. 相似文献
10.
针对汉语—维吾尔语的统计机器翻译系统中存在的语义无关性问题,提出基于神经网络机器翻译方法的双语关联度优化模型。该模型利用注意力机制捕获词对齐信息,引入双语短语间的语义相关性和内部词汇匹配度,预测双语短语的生成概率并将其作为双语关联度,以优化统计翻译模型中的短语翻译得分。在第十一届全国机器翻译研讨会(CWMT 2015)汉维公开机器翻译数据集上的实验结果表明,与基线系统相比,在使用较小规模的训练数据和词汇表的条件下,所提方法可以有效地同时提高短语级别和句子级别的机器翻译任务性能,分别获得最高2.49和0.59的BLEU值提升。 相似文献
11.
12.
为解决基于短语统计机器翻译存在的调序能力不足的问题,尝试利用句法分析器对基于短语统计机器翻译的输入汉语句子进行句法分析,然后利用转换器进行调序操作,并对部分类型短语进行预先翻译,然后再利用基于短语统计机器翻译的解码器进行翻译。重点测试了汉语中“的”字引导的复杂定语调序、介词短语、特定搭配短语、方位词短语的调序及预翻译产生的效果。实验结果表明,这些调序及预翻译操作可以显著地提高基于短语的统计机器翻译的英文译文结果的BLEU值。 相似文献
13.
Word reordering is one of the challengeable problems of machine translation. It is an important factor of quality and efficiency of machine translation systems. In this paper, we introduce a novel reordering model based on an innovative structure, named, phrasal dependency tree. The phrasal dependency tree is a modern syntactic structure which is based on dependency relationships between contiguous non-syntactic phrases. The proposed model integrates syntactical and statistical information in the context of log-linear model aimed at dealing with the reordering problems. It benefits from phrase dependencies, translation directions (orientations) and translation discontinuity between translated phrases. In comparison with well-known and popular reordering models such as distortion, lexicalised and hierarchical models, the experimental study demonstrates the superiority of our model in terms of translation quality. Performance is evaluated for Persian → English and English → German translation tasks using Tehran parallel corpus and WMT07 benchmarks, respectively. The results report 1.54/1.7 and 1.98/3.01 point improvements over the baseline in terms of BLEU/TER metrics on Persian → English and German → English translation tasks, respectively. On average our model retrieved a significant impact on precision with comparable recall value with respect to the lexicalised and distortion models. 相似文献
14.
源语言和目标语言的句法异构性对统计机器翻译(SMT)性能有重要影响。在基于短语的汉英统计机器翻译基础上,提出了一种基于N-best句法知识增强的源语言预调序方法。首先对源语言输入句子进行N-best句法分析,计算统计概率得到高可靠性子树结构,再根据词对齐信息从可靠性子树结构中抽取初始调序规则集。两种优化策略用于对初始规则集进行优化:基于中英文句法知识规则推导筛选和规则概率阈值控制机制。然后为减少短语内部调序,保证短语局部流利性,采用源语言短语翻译表为约束,使调序控制在短语块之间进行。最后根据获取的优化规则集和短语表约束条件对源语言端句子的句法分析树进行预调序。在基于NIST 2005和2008测试数据集上的汉英统计机器翻译实验结果表明,所提基于N-best句法知识增强的统计机器翻译预调序方法相对于基线系统,自动评价准则BLEU得分分别提高了0.68和0.83。 相似文献
15.
解码器是统计机器翻译研究的关键部分。在基于短语的统计机器翻译的基础上,结合对数线性模型的思想加入多个特征模型,研究了一种动态规划的柱搜索解码算法。详细介绍此算法在解码器中的具体实现,并对翻译速度和精度作了分析。 相似文献
16.
This paper describes an example-based machine translation (EBMT) method based on tree–string correspondence (TSC) and statistical
generation. In this method, the translation example is represented as a TSC, which is a triple consisting of a parse tree
in the source language, a string in the target language, and the correspondence between the leaf node of the source-language
tree and the substring of the target-language string. For an input sentence to be translated, it is first parsed into a tree.
Then the TSC forest which best matches the input tree is searched for. Finally the translation is generated using a statistical
generation model to combine the target-language strings of the TSCs. The generation model consists of three features: the
semantic similarity between the tree in the TSC and the input tree, the translation probability of translating the source
word into the target word, and the language-model probability for the target-language string. Based on the above method, we
build an English-to-Chinese MT system. Experimental results indicate that the performance of our system is comparable with
phrase-based statistical MT systems. 相似文献
17.
晋耀红 《计算机工程与应用》2012,48(4):29-32
针对专利文本翻译中的复杂语句,提出了一种基于混合策略的方法,融合语义分析技术和基于规则的翻译技术,来提高专利翻译的效果。利用语义分析技术,重点解决句子中心动词识别和句子中有嵌套结构存在的名称短语的分析,把语义分析结果输入到基于规则的翻译系统中,用以改善翻译的效果。测试结果表明,融合后的翻译系统,BLEU值提高了9.8%。该方法已经集成到了国家知识产权局的在线汉英机器翻译系统中,有效地提高了专利翻译的效果和翻译效率。 相似文献
18.
描述了一种基于短语统计机器翻译的柱搜索解码器。搜索算法的效率是解码的关键,基于传统的柱搜索解码算法,提出了提高搜索效率的改进措施:动态剪枝策略改进了原来固定地剪枝对搜索当前情形反应不足的问题,提高了剪枝精度;预剪枝策略限制了较差的扩展,减少了不必要的扩展,提高了搜索速度;在研究了当前主要位置重排限制的基础上,提出了一种快速位置重排限制策略,加快了位置重排时的解码速度。此外,针对领域术语翻译唯一性问题提出了专门处理方法以提高翻译的准确度。分析对比实验结果,证明了算法的有效性。 相似文献
19.
J. Andrs-Ferrer D. Ortiz-Martínez I. García-Varea F. Casacuberta 《Pattern recognition letters》2008,29(8):1072-PRintPerclntel
In pattern recognition, an elegant and powerful way to deal with classification problems is based on the minimisation of the classification risk. The risk function is defined in terms of loss functions that measure the penalty for wrong decisions. However, in practice a trivial loss function is usually adopted (the so-called 0–1 loss function) that do no make the most of this framework. This work is focused on the study of different loss functions, and specially on those loss functions that do not depend on the class proposed by the system. Loss functions of this kind have allowed us to theoretically explain heuristics that are successfully used with very complex pattern recognition problem, such as (statistical) machine translation. A comparative experimental work has also been carried out to compare different proposals of loss functions in the practical scenario of machine translation. 相似文献
20.
For virtual machine based traffic simulation platforms, the paper proposes a software framework that performs trace-based dynamic translation. Through monitoring the runtime execution status of bytecodes and translating frequently executed bytecodes, also known as hot spots, into equivalent native machine codes, the framework considerably improves the performance of virtual machine based traffic simulation platforms up to ten times or more, as the experiments showed. For the first time, the presented work clearly exhibits that a seamless combination of the two technologies – dynamic translation and virtual machine could lead to a new generation of applicable traffic simulation platforms. Such a platform not only offers high flexibility in terms of traffic model simulation, but also preserves the ability of conducting numerical computation-intensive simulations generally found in real-life industrial projects. 相似文献