期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Semi-supervised model adaptation for statistical machine translation

Nicola Ueffing Gholamreza Haffari Anoop Sarkar 《Machine Translation》2007,21(2):77-94

Statistical machine translation systems are usually trained on large amounts of bilingual text (used to learn a translation model), and also large amounts of monolingual text in the target language (used to train a language model). In this article we explore the use of semi-supervised model adaptation methods for the effective use of monolingual data from the source language in order to improve translation quality. We propose several algorithms with this aim, and present the strengths and weaknesses of each one. We present detailed experimental evaluations on the French–English EuroParl data set and on data from the NIST Chinese–English large-data track. We show a significant improvement in translation quality on both tasks. 相似文献

2.

基于短语统计翻译的汉维机器翻译系统 总被引：1，自引：0，他引：1

杨攀李淼张建《计算机应用》2009,29(7):2022-2025

描述了一种基于短语统计翻译的汉维机器翻译系统。首先使用汉维语料进行训练,得到语言模型和翻译模型;再利用训练好的模型对源语句进行解码,以得到最佳的翻译语句。解码的核心算法是柱搜索（beam search）算法。其中维文语料使用的是拉丁维文。实验结果表明,基于短语的统计机器翻译方法可以快速有效地构建一个汉维机器翻译平台。相似文献

3.

Pivot language approach for phrase-based statistical machine translation

Hua Wu Haifeng Wang 《Machine Translation》2007,21(3):165-181

This paper proposes a novel method for phrase-based statistical machine translation based on the use of a pivot language. To translate between languages L _s and L _t with limited bilingual resources, we bring in a third language, L _p, called the pivot language. For the language pairs L _s − L _p and L _p − L _t, there exist large bilingual corpora. Using only L _s − L _p and L _p − L _t bilingual corpora, we can build a translation model for L _s − L _t. The advantage of this method lies in the fact that we can perform translation between L _s and L _t even if there is no bilingual corpus available for this language pair. Using BLEU as a metric, our pivot language approach significantly outperforms the standard model trained on a small bilingual corpus. Moreover, with a small L _s − L _t bilingual corpus available, our method can further improve translation quality by using the additional L _s − L _p and L _p − L _t bilingual corpora. 相似文献

4.

统计机器翻译中翻译规则抽取

刘颖姜巍《计算机工程与应用》2012,48(32):98-101,146

对齐短语是决定统计机器翻译系统质量的核心模块。提出基于短语结构树的层次短语模型,这是利用串-树模型的思想对层次短语模型的扩展。基于短语结构树的层次短语模型是在双语对齐短语的基础之上结合英语短语结构树抽取翻译规则,并利用启发式策略获得翻译规则的扩展句法标记。采用翻译规则的统计机器翻译系统在不同数据集上具有稳定的翻译结果,在训练集和测试集的平均BlEU评分高于短语模型和层次短语模型的BLEU评分。相似文献

5.

基于机器翻译的双语协同关系抽取

胡亚楠惠浩添钱龙华朱巧明《计算机应用研究》2015,32(3)

传统的弱指导关系抽取研究主要集中于单语言内部.为了充分利用语言之间的互补性来减轻对大规模训练数据的需求,提出一种双语协同训练的关系分类方法.针对小规模标注语料和一定规模的未标注语料,通过机器翻译和实体对齐产生关系实例的双语视图,最后利用协同训练得到两种语言的分类模型.在ACERDC 2005中英文语料上的实验表明,双语协同训练方法可以同时提高中文和英文的关系分类性能,并且减少对于标注训练数据量的需求. 相似文献

6.

主题模型中的参数估计方法综述

杜慧陈云芳张伟《计算机科学》2017,44(Z6):29-32, 47

主题模型利用快速的机器学习算法从高维稀疏的单词数据中提取出低维的主题表示,实现了对文档单词的聚类。对主题模型中的参数进行估计是该领域的一项重要研究工作。详细描述了概率潜在语义分析模型和潜在狄利克雷模型以及主题模型中基本的参数估计方法,并对模型的困惑度进行了实验比较。相似文献

7.

Recent developments in machine translation

Gregor Thurmair 《Computers and the Humanities》1991,25(2-3):115-128

This paper describes developments in the area of machine translation (MT). First, the paper gives an overview of developments in Germany in general; then, special problems are discussed. The system taken as an example is METAL (Machine Translation and Analysis of Natural Language), where recent development work has centered around two main topics. (i) Efforts have been made to make the system really multilingual. The German-to-English prototype had to be expanded, some system components had to be readjusted, and additional problems had to be solved. Currently, analysis and synthesis components for German, English, French, Spanish, and Dutch are under development. All these languages use a common system kernel and a standard interface structure. (ii) The system had to be made user-friendly. This was an even more important task as, up to now, MT systems have not been well accepted by users. METAL tries to be more realistic, and also tries to support the main user interfaces in a much better way than has been done before. This is based on the conviction that there are several parameters which determine the real success of an MT system. It is not just translation quality which is decisive, it is also the integration of an MT system into the whole process of preparing and translating documents.Gregor Thurmair is head of the Linguistics Department at Siemens Nixdorf Information Systems and project leader of the machine translation group, METAL. He is involved in projects in information retrieval (morphological analysis), speech understanding (parsing, semantics) and machine translation (METAL system). He has presented papers on morphology, semantics in speech understanding, transfer problems in MT, and grammar checking. 相似文献

8.

Domain adaptation for ontology localization

《Journal of Web Semantics》2016

相似文献

9.

Dependency treelet translation: the convergence of statistical and example-based machine-translation?

Christopher Quirk Arul Menezes 《Machine Translation》2006,20(1):43-65

We describe a novel approach to MT that combines the strengths of the two leading corpus-based approaches: Phrasal SMT and EBMT. We use a syntactically informed decoder and reordering model based on the source dependency tree, in combination with conventional SMT models to incorporate the power of phrasal SMT with the linguistic generality available in a parser. We show that this approach significantly outperforms a leading string-based Phrasal SMT decoder and an EBMT system. We present results from two radically different language pairs, and investigate the sensitivity of this approach to parse quality by using two distinct parsers and oracle experiments. We also validate our automated bleu scores with a small human evaluation. 相似文献

10.

面向汉维机器翻译的双语关联度优化模型

潘一荣李晓杨雅婷《计算机应用研究》2020,37(3):726-730

针对汉语—维吾尔语的统计机器翻译系统中存在的语义无关性问题,提出基于神经网络机器翻译方法的双语关联度优化模型。该模型利用注意力机制捕获词对齐信息,引入双语短语间的语义相关性和内部词汇匹配度,预测双语短语的生成概率并将其作为双语关联度,以优化统计翻译模型中的短语翻译得分。在第十一届全国机器翻译研讨会（CWMT 2015）汉维公开机器翻译数据集上的实验结果表明,与基线系统相比,在使用较小规模的训练数据和词汇表的条件下,所提方法可以有效地同时提高短语级别和句子级别的机器翻译任务性能,分别获得最高2.49和0.59的BLEU值提升。相似文献

11.

基于统计和模板的双层汉蒙翻译研究

骆凯李淼强静乌达巴拉《计算机应用》2009,29(7):2026-2028

为了提高汉蒙翻译系统的翻译准确率,提出了在短语基础上结合模板的方法自动抽取模板结构;解码时,首先进行模板匹配,套用模板结构进行翻译,然后再按照Beam Search搜索算法进行后续翻译。该方法可以有效地解决单一的统计翻译中语序错误。以汉蒙翻译为例,实验结果显示此方法可以有效地提高翻译效果。在农业领域的汉蒙翻译中添加了农业常用短语模板,翻译效率相比Och的基于短语的统计翻译方法有较大的提高。相似文献

12.

句法调序的统计机器翻译方法研究

下载免费PDF全文

孙广范宋金平肖健袁琦《计算机工程与应用》2009,45(36):142-144

为解决基于短语统计机器翻译存在的调序能力不足的问题,尝试利用句法分析器对基于短语统计机器翻译的输入汉语句子进行句法分析,然后利用转换器进行调序操作,并对部分类型短语进行预先翻译,然后再利用基于短语统计机器翻译的解码器进行翻译。重点测试了汉语中“的”字引导的复杂定语调序、介词短语、特定搭配短语、方位词短语的调序及预翻译产生的效果。实验结果表明,这些调序及预翻译操作可以显著地提高基于短语的统计机器翻译的英文译文结果的BLEU值。相似文献

13.

A syntactically informed reordering model for statistical machine translation

Saeed Farzi Shahram Khadivi 《人工智能实验与理论杂志》2013,25(4):449-469

Word reordering is one of the challengeable problems of machine translation. It is an important factor of quality and efficiency of machine translation systems. In this paper, we introduce a novel reordering model based on an innovative structure, named, phrasal dependency tree. The phrasal dependency tree is a modern syntactic structure which is based on dependency relationships between contiguous non-syntactic phrases. The proposed model integrates syntactical and statistical information in the context of log-linear model aimed at dealing with the reordering problems. It benefits from phrase dependencies, translation directions (orientations) and translation discontinuity between translated phrases. In comparison with well-known and popular reordering models such as distortion, lexicalised and hierarchical models, the experimental study demonstrates the superiority of our model in terms of translation quality. Performance is evaluated for Persian → English and English → German translation tasks using Tehran parallel corpus and WMT07 benchmarks, respectively. The results report 1.54/1.7 and 1.98/3.01 point improvements over the baseline in terms of BLEU/TER metrics on Persian → English and German → English translation tasks, respectively. On average our model retrieved a significant impact on precision with comparable recall value with respect to the lexicalised and distortion models. 相似文献

14.

N-Best句法知识增强的统计机器翻译预调序模型

郭俊博张喜媛杜金华《计算机工程与应用》2016,52(17):160-165

源语言和目标语言的句法异构性对统计机器翻译（SMT）性能有重要影响。在基于短语的汉英统计机器翻译基础上,提出了一种基于N-best句法知识增强的源语言预调序方法。首先对源语言输入句子进行N-best句法分析,计算统计概率得到高可靠性子树结构,再根据词对齐信息从可靠性子树结构中抽取初始调序规则集。两种优化策略用于对初始规则集进行优化：基于中英文句法知识规则推导筛选和规则概率阈值控制机制。然后为减少短语内部调序,保证短语局部流利性,采用源语言短语翻译表为约束,使调序控制在短语块之间进行。最后根据获取的优化规则集和短语表约束条件对源语言端句子的句法分析树进行预调序。在基于NIST 2005和2008测试数据集上的汉英统计机器翻译实验结果表明,所提基于N-best句法知识增强的统计机器翻译预调序方法相对于基线系统,自动评价准则BLEU得分分别提高了0.68和0.83。相似文献

15.

基于短语统计机器翻译解码算法的研究与实现

下载免费PDF全文

罗毅李淼朱鉴胡冠龙《计算机工程与应用》2007,43(30):171-173

解码器是统计机器翻译研究的关键部分。在基于短语的统计机器翻译的基础上,结合对数线性模型的思想加入多个特征模型,研究了一种动态规划的柱搜索解码算法。详细介绍此算法在解码器中的具体实现,并对翻译速度和精度作了分析。相似文献

16.

Example-based machine translation based on tree–string correspondence and statistical generation

Zhanyi Liu Haifeng Wang Hua Wu 《Machine Translation》2006,20(1):25-41

This paper describes an example-based machine translation (EBMT) method based on tree–string correspondence (TSC) and statistical generation. In this method, the translation example is represented as a TSC, which is a triple consisting of a parse tree in the source language, a string in the target language, and the correspondence between the leaf node of the source-language tree and the substring of the target-language string. For an input sentence to be translated, it is first parsed into a tree. Then the TSC forest which best matches the input tree is searched for. Finally the translation is generated using a statistical generation model to combine the target-language strings of the TSCs. The generation model consists of three features: the semantic similarity between the tree in the TSC and the input tree, the translation probability of translating the source word into the target word, and the language-model probability for the target-language string. Based on the above method, we build an English-to-Chinese MT system. Experimental results indicate that the performance of our system is comparable with phrase-based statistical MT systems. 相似文献

17.

一种混合策略的专利机器翻译系统研究 总被引：2，自引：0，他引：2

下载免费PDF全文

晋耀红《计算机工程与应用》2012,48(4):29-32

针对专利文本翻译中的复杂语句,提出了一种基于混合策略的方法,融合语义分析技术和基于规则的翻译技术,来提高专利翻译的效果。利用语义分析技术,重点解决句子中心动词识别和句子中有嵌套结构存在的名称短语的分析,把语义分析结果输入到基于规则的翻译系统中,用以改善翻译的效果。测试结果表明,融合后的翻译系统,BLEU值提高了9.8%。该方法已经集成到了国家知识产权局的在线汉英机器翻译系统中,有效地提高了专利翻译的效果和翻译效率。相似文献

18.

一种基于短语统计机器翻译的高效柱搜索解码器

罗毅李淼张建《计算机应用》2007,27(8):1973-1975

描述了一种基于短语统计机器翻译的柱搜索解码器。搜索算法的效率是解码的关键,基于传统的柱搜索解码算法,提出了提高搜索效率的改进措施：动态剪枝策略改进了原来固定地剪枝对搜索当前情形反应不足的问题,提高了剪枝精度;预剪枝策略限制了较差的扩展,减少了不必要的扩展,提高了搜索速度;在研究了当前主要位置重排限制的基础上,提出了一种快速位置重排限制策略,加快了位置重排时的解码速度。此外,针对领域术语翻译唯一性问题提出了专门处理方法以提高翻译的准确度。分析对比实验结果,证明了算法的有效性。相似文献

19.

On the use of different loss functions in statistical pattern recognition applied to machine translation

J. Andrs-Ferrer D. Ortiz-Martínez I. García-Varea F. Casacuberta 《Pattern recognition letters》2008,29(8):1072-PRintPerclntel

In pattern recognition, an elegant and powerful way to deal with classification problems is based on the minimisation of the classification risk. The risk function is defined in terms of loss functions that measure the penalty for wrong decisions. However, in practice a trivial loss function is usually adopted (the so-called 0–1 loss function) that do no make the most of this framework. This work is focused on the study of different loss functions, and specially on those loss functions that do not depend on the class proposed by the system. Loss functions of this kind have allowed us to theoretically explain heuristics that are successfully used with very complex pattern recognition problem, such as (statistical) machine translation. A comparative experimental work has also been carried out to compare different proposals of loss functions in the practical scenario of machine translation. 相似文献

20.

Dynamic translation for virtual machine based traffic simulation

《Simulation Modelling Practice and Theory》2014

For virtual machine based traffic simulation platforms, the paper proposes a software framework that performs trace-based dynamic translation. Through monitoring the runtime execution status of bytecodes and translating frequently executed bytecodes, also known as hot spots, into equivalent native machine codes, the framework considerably improves the performance of virtual machine based traffic simulation platforms up to ten times or more, as the experiments showed. For the first time, the presented work clearly exhibits that a seamless combination of the two technologies – dynamic translation and virtual machine could lead to a new generation of applicable traffic simulation platforms. Such a platform not only offers high flexibility in terms of traffic model simulation, but also preserves the ability of conducting numerical computation-intensive simulations generally found in real-life industrial projects. 相似文献