共查询到20条相似文献,搜索用时 0 毫秒
1.
汉英机器翻译中基于实例的歧义结构消解 总被引:1,自引:0,他引:1
歧义是自然语言特别是汉语的显著特点和普遍现象,也是当前汉英机器翻译系统的主要处理难点之一。通过对其中一些常见汉语歧义结构的分析,提出一种基于实例的歧义结构消解方法。由于歧义结构的对应实例具有较高的“结构”上的代表性,通过与这些实例的相似性比较可以较准确地把握待消歧语段的内部结构。 相似文献
2.
Sergei Nirenburg 《Machine Translation》1989,4(1):5-24
This paper provides an overview of the KBMT-89 project at Carmegie Mellon University's Center for Machine Translation, as well therefore of the special number of this journal, which reports on the project. The knowledge-based approach to machine translation is presented and defended in a historical context. Various components of the system, key parts of which are described in subsequent papers of the issue, are introduced and paired with their computational motivations. 相似文献
3.
Structural disambiguation is acknowledged as a very real and frequent problem for many semantic-aware applications. In this paper, we propose a unified answer to sense disambiguation on a large variety of structures both at data and metadata level such as relational schemas, XML data and schemas, taxonomies, and ontologies. Our knowledge-based approach achieves a general applicability by converting the input structures into a common format and by allowing users to tailor the extraction of the context to the specific application needs and structure characteristics. Flexibility is ensured by supporting the combination of different disambiguation methods together with different information extracted from different sources of knowledge. Further, we support both assisted and completely automatic semantic annotation tasks, while several novel feedback techniques allow us to improve the initial disambiguation results without necessarily requiring user intervention. An extensive evaluation of the obtained results shows the good effectiveness of the proposed solutions on a large variety of structure-based information and disambiguation requirements. 相似文献
4.
对齐短语是决定统计机器翻译系统质量的核心模块。提出基于短语结构树的层次短语模型,这是利用串-树模型的思想对层次短语模型的扩展。基于短语结构树的层次短语模型是在双语对齐短语的基础之上结合英语短语结构树抽取翻译规则,并利用启发式策略获得翻译规则的扩展句法标记。采用翻译规则的统计机器翻译系统在不同数据集上具有稳定的翻译结果,在训练集和测试集的平均BlEU评分高于短语模型和层次短语模型的BLEU评分。 相似文献
5.
Constructive machine translation evaluation 总被引:1,自引:0,他引:1
Stephen Minnis 《Machine Translation》1993,8(1-2):67-75
When surveying the many methods currently employed in MT evaluation,1 it is not immediately obvious that the methods used serve to increase the knowledge of the properties being measured. This report describes aconstructive machine translation evaluation method, aimed at addressing this issue.2
Edited version of a presentation given to the International Working Group on the Evaluation of Machine Translation Systems, Vaud, Switzerland, April 1991. 相似文献
6.
7.
一种基于实例的汉英机器翻译策略 总被引:3,自引:0,他引:3
介绍了一种基于实例的汉英机器翻译策略,重点讨论了汉英双语语料库的设计和基于该语料库的汉语句子的匹配算法。在进行汉语句子的匹配时,根据汉语的特点直接采用汉字的匹配,而没有进行汉语句子的分词。另外,匹配时确定匹配片断的边界也是基于实例机器翻译的难点之一,在这方面也采取了相应的解决方法。没有对翻译句子的连接装配进行更深入的研究,这是因为该翻译策略是用于多翻译引擎系统的,它要与其它翻译策略配合使用,以提高翻译结果的正确率。基于实例的机器翻译需要大量的双语语料库作为翻译时的依据,而人工建设大型语料库费时费力,所以尝试采用计算机进行汉英双语语料库的自动建立,包括篇章对齐和单词级的对齐。 相似文献
8.
John Hutchins 《Machine Translation》2005,19(3-4):197-211
In the last decade the dominant models of MT have been data-driven or corpus-based. Of the two main trends, statistical machine
translation and example-based machine translation (EBMT), the latter is much less clearly defined. In a review of the recently
published collection edited by Michael Carl and Andy Way, this essay surveys the basic processes, methods, main problems and
tasks of EBMT, and attempts to provide a definition of the essence of EBMT in comparison with statistical MT and traditional
rule-based MT.
Recent Advances in Example-based Machine Translation. Edited by Michael Carl and Andy Way. Dordrecht: Kluwer Academic Publishers, 2003. xxxi, 482pp. (Text, Speech and Language
Technology, vol. 21) ISBN: 1-4020-1400-7 (hardback), 1-4020-1401-5 (paperback). 相似文献
9.
For virtual machine based traffic simulation platforms, the paper proposes a software framework that performs trace-based dynamic translation. Through monitoring the runtime execution status of bytecodes and translating frequently executed bytecodes, also known as hot spots, into equivalent native machine codes, the framework considerably improves the performance of virtual machine based traffic simulation platforms up to ten times or more, as the experiments showed. For the first time, the presented work clearly exhibits that a seamless combination of the two technologies – dynamic translation and virtual machine could lead to a new generation of applicable traffic simulation platforms. Such a platform not only offers high flexibility in terms of traffic model simulation, but also preserves the ability of conducting numerical computation-intensive simulations generally found in real-life industrial projects. 相似文献
10.
This paper proposes a novel method for phrase-based statistical machine translation based on the use of a pivot language.
To translate between languages L
s
and L
t
with limited bilingual resources, we bring in a third language, L
p
, called the pivot language. For the language pairs L
s
− L
p
and L
p
− L
t
, there exist large bilingual corpora. Using only L
s
− L
p
and L
p
− L
t
bilingual corpora, we can build a translation model for L
s
− L
t
. The advantage of this method lies in the fact that we can perform translation between L
s
and L
t
even if there is no bilingual corpus available for this language pair. Using BLEU as a metric, our pivot language approach
significantly outperforms the standard model trained on a small bilingual corpus. Moreover, with a small L
s
− L
t
bilingual corpus available, our method can further improve translation quality by using the additional L
s
− L
p
and L
p
− L
t
bilingual corpora. 相似文献
11.
Statistical machine translation systems are usually trained on large amounts of bilingual text (used to learn a translation
model), and also large amounts of monolingual text in the target language (used to train a language model). In this article
we explore the use of semi-supervised model adaptation methods for the effective use of monolingual data from the source language
in order to improve translation quality. We propose several algorithms with this aim, and present the strengths and weaknesses
of each one. We present detailed experimental evaluations on the French–English EuroParl data set and on data from the NIST
Chinese–English large-data track. We show a significant improvement in translation quality on both tasks. 相似文献
12.
Word reordering is one of the challengeable problems of machine translation. It is an important factor of quality and efficiency of machine translation systems. In this paper, we introduce a novel reordering model based on an innovative structure, named, phrasal dependency tree. The phrasal dependency tree is a modern syntactic structure which is based on dependency relationships between contiguous non-syntactic phrases. The proposed model integrates syntactical and statistical information in the context of log-linear model aimed at dealing with the reordering problems. It benefits from phrase dependencies, translation directions (orientations) and translation discontinuity between translated phrases. In comparison with well-known and popular reordering models such as distortion, lexicalised and hierarchical models, the experimental study demonstrates the superiority of our model in terms of translation quality. Performance is evaluated for Persian → English and English → German translation tasks using Tehran parallel corpus and WMT07 benchmarks, respectively. The results report 1.54/1.7 and 1.98/3.01 point improvements over the baseline in terms of BLEU/TER metrics on Persian → English and German → English translation tasks, respectively. On average our model retrieved a significant impact on precision with comparable recall value with respect to the lexicalised and distortion models. 相似文献
13.
Gregor Thurmair 《Computers and the Humanities》1991,25(2-3):115-128
This paper describes developments in the area of machine translation (MT). First, the paper gives an overview of developments in Germany in general; then, special problems are discussed. The system taken as an example is METAL (Machine Translation and Analysis of Natural Language), where recent development work has centered around two main topics. (i) Efforts have been made to make the system really multilingual. The German-to-English prototype had to be expanded, some system components had to be readjusted, and additional problems had to be solved. Currently, analysis and synthesis components for German, English, French, Spanish, and Dutch are under development. All these languages use a common system kernel and a standard interface structure. (ii) The system had to be made user-friendly. This was an even more important task as, up to now, MT systems have not been well accepted by users. METAL tries to be more realistic, and also tries to support the main user interfaces in a much better way than has been done before. This is based on the conviction that there are several parameters which determine the real success of an MT system. It is not just translation quality which is decisive, it is also the integration of an MT system into the whole process of preparing and translating documents.Gregor Thurmair is head of the Linguistics Department at Siemens Nixdorf Information Systems and project leader of the machine translation group, METAL. He is involved in projects in information retrieval (morphological analysis), speech understanding (parsing, semantics) and machine translation (METAL system). He has presented papers on morphology, semantics in speech understanding, transfer problems in MT, and grammar checking. 相似文献
14.
机器翻译中句法分析的设计与实现 总被引:2,自引:0,他引:2
费鲲 《计算机工程与设计》2006,27(15):2832-2834,2838
论述了英汉机器翻译中句法分析的设计与实现方法。首先阐述了编译原理中句法分析的相关理论,并以此理论为依据提出了机器翻译中句法分析的具体实现。实现过程中,采用部分分析的思想,将一个句子划分为多个语法成分,分别对各语法成分进行分析,从而完成对待翻译句子的句法分析,给出句法树。 相似文献
15.
This paper presents a new method for utilizing translator knowledge bases for machine translation systems. Translator knowledge to be stored and utilized consists of translationally equivalent pattern pairs: surface-level phrasal, clausal, and sentential correspondences between the source and target languages. This knowledge will be utilized to translate domain-specific idiomatic, nonstandard, or ungrammatical expressions. The proposed method has been implemented in an adaptive English to Japanese machine translation system, HICATS/EJ, as one of its customization facilities. 相似文献
16.
17.
We propose a novel approach to cross-lingual language model and translation lexicon adaptation for statistical machine translation
(SMT) based on bilingual latent semantic analysis. Bilingual LSA enables latent topic distributions to be efficiently transferred
across languages by enforcing a one-to-one topic correspondence during training. Using the proposed bilingual LSA framework,
model adaptation can be performed by, first, inferring the topic posterior distribution of the source text and then applying
the inferred distribution to an n-gram language model of the target language and translation lexicon via marginal adaptation. The background phrase table is
enhanced with the additional phrase scores computed using the adapted translation lexicon. The proposed framework also features
rapid bootstrapping of LSA models for new languages based on a source LSA model of another language. Our approach is evaluated
on the Chinese–English MT06 test set using the medium-scale SMT system and the GALE SMT system measured in BLEU and NIST scores.
Improvement in both scores is observed on both systems when the adapted language model and the adapted translation lexicon
are applied individually. When the adapted language model and the adapted translation lexicon are applied simultaneously,
the gain is additive. At the 95% confidence interval of the unadapted baseline system, the gain in both scores is statistically
significant using the medium-scale SMT system, while the gain in the NIST score is statistically significant using the GALE
SMT system. 相似文献
18.
We have designed, implemented and assessed an EBMT system that can be dubbed the “purest ever built”: it strictly does not
make any use of variables, templates or patterns, does not have any explicit transfer component, and does not require any
preprocessing or training of the aligned examples. It uses only a specific operation, proportional analogy, that implicitly
neutralizes divergences between languages and captures lexical and syntactic variations along the paradigmatic and syntagmatic
axes without explicitly decomposing sentences into fragments. Exactly the same genuine implementation of such a core engine
was evaluated on different tasks and language pairs. To begin with, we compared our system on two tasks of a previous MT evaluation
campaign to rank it among other current state-of-the-art systems. Then, we illustrated the “universality” of our system by
participating in a recent MT evaluation campaign, with exactly the same core engine, for a wide variety of language pairs.
Finally, we studied the influence of extra data like dictionaries and paraphrases on the system performance. 相似文献
19.
20.
Learning finite-state models for machine translation 总被引:1,自引:0,他引:1
In formal language theory, finite-state transducers are well-know models for simple “input-output” mappings between two languages.
Even if more powerful, recursive models can be used to account for more complex mappings, it has been argued that the input-output
relations underlying most usual natural language pairs can essentially be modeled by finite-state devices. Moreover, the relative
simplicity of these mappings has recently led to the development of techniques for learning finite-state transducers from
a training set of input-output sentence pairs of the languages considered. In the last years, these techniques have lead to
the development of a number of machine translation systems. Under the statistical statement of machine translation, we overview
here how modeling, learning and search problems can be solved by using stochastic finite-state transducers. We also review
the results achieved by the systems we have developed under this paradigm. As a main conclusion of this review we argue that,
as task complexity and training data scarcity increase, those systems which rely more on statistical techniques tend produce
the best results.
This work was partially supported by the European Union project TT2 (IST-2001-32091) and by the Spanish project ITEFTE (TIC
2003-08681-C02-02).
Editor: Georgios Paliouras and Yasubumi Sakakibara 相似文献