首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The string-to-tree model is one of the most successful syntax-based statistical machine translation (SMT) models. It models the grammaticality of the output via target-side syntax. However, it does not use any semantic information and tends to produce translations containing semantic role confusions and error chunk sequences. In this paper, we propose two methods to use semantic roles to improve the performance of the string-to-tree translation model: (1) adding role labels in the syntax tree; (2) constructing a semantic role tree, and then incorporating the syntax information into it. We then perform string-to-tree machine translation using the newly generated trees. Our methods enable the system to train and choose better translation rules using semantic information. Our experiments showed significant improvements over the state-of-the-art string-to-tree translation system on both spoken and news corpora, and the two proposed methods surpass the phrase-based system on large-scale training data.  相似文献   

2.
3.
蒋宏飞  李生  张民  赵铁军  杨沐昀 《自动化学报》2009,35(10):1317-1326
基于短语的模型是目前发展相对成熟的一种统计机器翻译(Statistical machine translation, SMT)模型. 但基于短语的模型不包含任何结构信息, 因而缺乏有效的全局调序能力, 同时不能对非连续短语进行建模. 基于句法的模型因具有结构信息而具有解决以上问题的潜力, 因而越来越受到研究者们的重视. 然而现有的大多数基于句法的模型都因严格的句法限制而制约了模型的描述能力. 为突破这种限制并将 基于短语的模型的优点融入到句法模型中, 本文提出一种基于同步树序列替换文法(Synchronous tree sequence substitution grammar, STSSG)的统计机器翻译模型. 在此模型中, 树序列被用作为基本的翻译单元. 在这种框架下, 不满足句法限制的翻译等价对和满足句法限制的翻译等价对都可以融入句法信息并被翻译模型所使用. 从而, 两种模型的优点均得到充分利用. 在2005年度美国国家标准与技术研究所(NIST)举办的机器翻译评比的中文翻译任务语料上的实验表明, 本文提出的模型显著地超过了两个基准系统: 基于短语的翻译系统Moses和一个基于严格树结构的句法翻译模型.  相似文献   

4.
该文总结了我们近几年来在基于句法的统计机器翻译方面所做的研究工作,特别是基于源语言句法的一系列统计机器翻译模型与方法,具体包括 基于最大熵括号转录语法的翻译模型,基于源语言短语结构树的树到串翻译模型及其相应的基于树的翻译方法,基于森林的翻译方法和句法分析与解码一体化翻译方法,基于源语言依存树的翻译模型。  相似文献   

5.
基于句法的统计机器翻译综述   总被引:1,自引:0,他引:1  
本文对基于句法的统计机器翻译进行了综述。按照模型所基于的语法不同,将基于句法的统计机器翻译分为两大类 基于形式化语法和基于语言学语法。对这两个不同类别,我们分别介绍它们代表性的工作,包括模型的构建、训练和解码器的设计等,并对比了各个模型的优点和缺点。最后我们对基于句法的统计机器翻译进行了总结,指出设计句法模型时要注意的问题,并对未来的发展趋势进行了预测。  相似文献   

6.
Unknown words are one of the key factors that greatly affect the translation quality.Traditionally, nearly all the related researches focus on obtaining the translation of the unknown words.However, these approaches have two disadvantages.On the one hand, they usually rely on many additional resources such as bilingual web data;on the other hand, they cannot guarantee good reordering and lexical selection of surrounding words.This paper gives a new perspective on handling unknown words in statistical machine translation (SMT).Instead of making great efforts to find the translation of unknown words, we focus on determining the semantic function of the unknown word in the test sentence and keeping the semantic function unchanged in the translation process.In this way, unknown words can help the phrase reordering and lexical selection of their surrounding words even though they still remain untranslated.In order to determine the semantic function of an unknown word, we employ the distributional semantic model and the bidirectional language model.Extensive experiments on both phrase-based and linguistically syntax-based SMT models in Chinese-to-English translation show that our method can substantially improve the translation quality.  相似文献   

7.
In this paper, we develop an approach called syntax-based reordering (SBR) to handling the fundamental problem of word ordering for statistical machine translation (SMT). We propose to alleviate the word order challenge including morpho-syntactical and statistical information in the context of a pre-translation reordering framework aimed at capturing short- and long-distance word distortion dependencies. We examine the proposed approach from the theoretical and experimental points of view discussing and analyzing its advantages and limitations in comparison with some of the state-of-the-art reordering methods.In the final part of the paper, we describe the results of applying the syntax-based model to translation tasks with a great need for reordering (Chinese-to-English and Arabic-to-English). The experiments are carried out on standard phrase-based and alternative N-gram-based SMT systems. We first investigate sparse training data scenarios, in which the translation and reordering models are trained on a sparse bilingual data, then scaling the method to a large training set and demonstrating that the improvement in terms of translation quality is maintained.  相似文献   

8.
组合原则表明句子的语义由其构成成分的语义按照一定规则组合而成,由此基于句法结构的语义组合计算一直是一个重要的探索方向,其中采用树结构的组合计算方法最具有代表性。但是该方法难以应用于大规模数据处理,主要问题是其语义组合的顺序依赖于具体树的结构,无法实现并行处理。该文提出一种基于图的依存句法分析和语义组合计算的联合框架,并借助复述识别任务训练语义组合模型和句法分析模型。一方面,图模型可以在训练和预测阶段采用并行处理,极大地缩短计算时间;另一方面,联合句法分析的语义组合框架不必依赖外部句法分析器,同时两个任务的联合学习可使语义表示同时学习句法结构和语义的上下文信息。我们在公开汉语复述识别数据集LCQMC上进行评测,实验结果显示准确率接近树结构组合方法,达到79.54%,预测速度最高可提升30倍以上。  相似文献   

9.
提出一种基于短语和依存句法结构的中文语义角色标注(SRL)方法。联合短语句法特征和依存句法特征,对句法树进行剪枝,过滤句法树上不可能担当语义角色的组块短语单元和关系结点,对担当语义角色的组块或节点进行角色类别标注。基于正确句法树和正确谓词的识别结果表明,该方法的SRL性能F1值为73.53%,优于目前国内外的同类系统。  相似文献   

10.
统计机器翻译一般采用启发式方法训练翻译模型。但启发式方法的理论基础不够完善,因此,会导致翻译模型规模庞大以及模型参数精确率不高。针对以上两个问题,该文提出一种基于变分贝叶斯推理的模型训练方法,形成更精确的精简翻译模型。该方法首先通过强制解码对齐语料,然后利用变分贝叶斯EM算法获得模型参数。该文的实验语料为NIST汉英翻译任务数据,实验结果显示,基于句法(基于短语)的统计机器翻译中,超过95%(76%)的规则被剪枝,且BLEU值显著提高。  相似文献   

11.
商品图像句子标注是图像标注中一项既有趣又富有挑战的研究任务.噪声单词干扰和句法结构错误是该项研究的制约因素,针对噪声单词干扰,提出关键词精化思想:用绝对排序特征强化关键词权重,完成第1次关键词精化;计算单词的语义相关度评分,进一步优选能准确刻画图像内容的单词,完成第2次关键词精化.设计词序列\  相似文献   

12.
This paper investigates the morphosyntactic symmetry between the elements in word combinations and word forms. Based on linguistic superoperators, the authors introduce the concepts of syntactic time and syntactic space. They develop a classification of the parts of speech and members of a sentence for the SL language, determine morphosyntactic cases, and propose a new semantic classification of predicates. The authors determine morphosyntactic and semantic-syntactic functions, introduce syntactic superoperators and use the latter to develop the rules to generate syntactic structures. They also propose rules of interpretation to match the syntactic structures with their semantic meanings.  相似文献   

13.
We present a phrase-based statistical machine translation approach which uses linguistic analysis in the preprocessing phase. The linguistic analysis includes morphological transformation and syntactic transformation. Since the word-order problem is solved using syntactic transformation, there is no reordering in the decoding phase. For morphological transformation, we use hand-crafted transformational rules. For syntactic transformation, we propose a transformational model based on a probabilistic context-free grammar. This model is trained using a bilingual corpus and a broad-coverage parser of the source language. This approach is applicable to language pairs in which the target language is poor in resources. We considered translation from English to Vietnamese and from English to French. Our experiments showed significant BLEU-score improvements in comparison with Pharaoh, a state-of-the-art phrase-based SMT system.  相似文献   

14.
该文探索了基于树核函数的中文语义角色分类,重点研究如何获取有效的结构化信息特征。在最小句法树结构的基础上,根据语义角色分类的特点,进一步定义了三种不同的句法结构,并使用复合核将基于树核和基于特征的方法结合。在中文PropBank语料上的结果表明,基于树核函数的方法在中文语义角色分类任务中能够取得较好的结果,精确率达到91.79%。同时,与基于特征方法的结合,基于树核函数的方法能够进一步提高前者性能,精确率达到94.28%,优于同类系统。  相似文献   

15.
传统蒙古文形态分析主要采用将蒙古文词缀和词干直接切分而仅保留词干的方法,该方法会丢掉蒙古文词缀所包含的大量语义信息。蒙古文词缀中包含大量格的附加成分,主要表征句子的结构特征,对其进行切分并不会影响词汇的语义特征,若不进行预处理则会造成严重的数据稀疏问题,从而影响翻译质量。因此,基于现有理论对语料预处理方法进行总结研究,重点研究了蒙古文格处理对翻译结果的影响,目的是从蒙古文形态分析的特殊性入手来提高蒙古文-汉文统计机器翻译的质量。通过优化预处理方法,使机器翻译结果的BLEU得分相比基线系统1提高了3.22个点。  相似文献   

16.
Phrase-based translation models, with sequences of words (phrases) as translation units, achieve state-of-the-art translation performance. However, phrase reordering is a major challenge for this model. Recently, researchers have focused on utilizing syntax to improve phrase reordering. In adding syntactic knowledge into phrase reordering model, using handcrafted or probabilistic syntactic rules to reorder the source-language approximating the target-language word order has been successful in improving translation quality. However, it suffers from propagating the pre-ordering errors to the later translation step (e.g. decoding). In this paper, we propose a novel framework to uniformly represent the handcrafted and probabilistic syntactic rules and integrate them more effectively into phrase-based translation. In the translation phase, for a source sentence to be translated, handcrafted or probabilistic syntactic rules are first acquired from the source parse tree prior to translation, and then instead of reordering the source sentence directly, we input these rules into the decoder and design a new algorithm to apply these rules during decoding. In order to attach more importance to the syntactic rules and distinguish reordering between syntactic and non-syntactic unit reordering, we propose to design respectively a syntactic reordering model and a non-syntactic reordering model. The syntactic rules will guide phrase reordering in decoding within the syntactic reordering model. Extensive experiments on Chinese-to-English translation show that our approach, whether incorporating handcrafted or probabilistic syntactic rules, significantly outperforms the previous methods.  相似文献   

17.
Extended multi bottom–up tree transducers are defined and investigated. They are an extension of multi bottom–up tree transducers by arbitrary, not just shallow, left-hand sides of rules; this includes rules that do not consume input. It is shown that such transducers, even linear ones, can compute all transformations that are computed by linear extended top–down tree transducers, which are a theoretical model for syntax-based machine translation. Moreover, the classical composition results for bottom–up tree transducers are generalized to extended multi bottom–up tree transducers. Finally, characterizations in terms of extended top–down tree transducers and tree bimorphisms are presented.  相似文献   

18.
当前的英文语法纠错模型往往忽略了有利于语法纠错的文本句法知识, 从而使得英语语法纠错模型的纠错能力受到影响. 针对上述问题, 提出一种基于差分融合句法特征的英语语法纠错模型. 首先, 本文提出的句法编码器不仅可以直接从文本中无监督地生成依存关系图和成分句法树信息, 而且还能将上述两种异构的句法结构进行特征融合, 编码成高维的句法表征. 其次, 为了同时利用文本中的语义和句法信息, 差分融合模块先使用差分正则化加强语义编码器捕获句法编码器未能生成的语义特征, 然后采用协同注意力将句法表征和语义表征进一步融合, 作为Transformer编码端的输出特征, 最终输入到解码端, 从而生成语法正确的文本. 在CoNLL-2014 英文纠错任务数据集上进行对比实验, 结果表明, 该方法的准确率和F0.5值优于基于Copy-Augmented Transformer的语法纠错模型, 其F0.5值提升了5.2个百分点, 并且句法知识避免了标注数据过少问题, 具有更优的文本纠错效果.  相似文献   

19.
This paper proposes a novel tree kernel-based method with rich syntactic and semantic information for the extraction of semantic relations between named entities. With a parse tree and an entity pair, we first construct a rich semantic relation tree structure to integrate both syntactic and semantic information. And then we propose a context-sensitive convolution tree kernel, which enumerates both context-free and context-sensitive sub-trees by considering the paths of their ancestor nodes as their contexts to capture structural information in the tree structure. An evaluation on the Automatic Content Extraction/Relation Detection and Characterization (ACE RDC) corpora shows that the proposed tree kernel-based method outperforms other state-of-the-art methods.  相似文献   

20.
刘颖  姜巍 《计算机工程与应用》2012,48(32):98-101,146
对齐短语是决定统计机器翻译系统质量的核心模块。提出基于短语结构树的层次短语模型,这是利用串-树模型的思想对层次短语模型的扩展。基于短语结构树的层次短语模型是在双语对齐短语的基础之上结合英语短语结构树抽取翻译规则,并利用启发式策略获得翻译规则的扩展句法标记。采用翻译规则的统计机器翻译系统在不同数据集上具有稳定的翻译结果,在训练集和测试集的平均BlEU评分高于短语模型和层次短语模型的BLEU评分。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号