共查询到20条相似文献,搜索用时 62 毫秒
1.
文中针对一种特殊的语言现象(HNC称为包装句蜕)进行了分析,目的是为机器翻译提供一些理论支持。首先从目前机器翻译系统所暴露的问题中提出研究包装句蜕的必要性,然后是对包装句蜕进行语言学描述及汉英对比分析,从可计算的角度提出了包装句蜕的判别方法,最后是包装句蜕的机器处理策略及规则。 相似文献
2.
该文以汉英机器翻译为应用目标,以概念层次网络理论的语义网络和句类分析方法为理论基础,探讨了句类依存树库构建的理论和标注实践等问题,描述了构建树库所需的概念类别标注集和句类关系标注集。并通过与已有汉语树库进行对比,以汉语显性轻动词句的标注为例,分析了汉语句类依存树库的特点。该文在应用层面定义了面向汉英机器翻译的融句法语义信息于一体的“句类依存子树到串”双语转换模板,尝试基于汉语句类依存树库提取汉英转换模板。 相似文献
3.
4.
本文介绍了作者在 IBM—PC/XT 机上采用语义分析方法实现的英汉机器翻译系统。该系统是单向、全自动翻译系统,系统由总控,字典维护、字典查询、分析树生成、转换生成五个模块,全部程序用 Turbo—Prolog 逻辑程序设计语言编写。文章主要介绍了作者在介词短语语义分析方面的一些工作。 相似文献
5.
块扩句是一类概念预期知识十分明确的句子。基于概念层次网络理论介绍了块扩句对应的块扩句类,总结了能够激活块扩句类的典型块扩动词。根据块扩动词的概念知识得到句子的特征语义块及块扩句类,依据块扩句类的知识对句子进行检验后可给出句子句类的分析结果。在已有的句类分析系统的基础上对真实语料中的块扩句进行了自动分析,实验表明正确率达到了71.29%,错误主要来自特征语义块动词辨识、动词多句类代码等。正确分析块扩句将有助于解决汉语句子的多动词处理难点。 相似文献
6.
姚兰 《自动化技术与应用》2021,40(2):182-185
本文基于语义选择与信息特征设计了英语自动化机器翻译系统.通过语义信息特征制定了机器翻译流程,以GIZA++为载体进行翻译,利用伯克利对准器对齐词语,基于反向转换语法,详细阐述汉语语言模式与英语翻译语言模式的结构关联特性,以语句动静配置,实现自动化机器翻译.最后通过系统测试,结果表明,与传统机器翻译系统相比,准确率显著提... 相似文献
7.
晋耀红 《计算机工程与应用》2012,48(4):29-32
针对专利文本翻译中的复杂语句,提出了一种基于混合策略的方法,融合语义分析技术和基于规则的翻译技术,来提高专利翻译的效果。利用语义分析技术,重点解决句子中心动词识别和句子中有嵌套结构存在的名称短语的分析,把语义分析结果输入到基于规则的翻译系统中,用以改善翻译的效果。测试结果表明,融合后的翻译系统,BLEU值提高了9.8%。该方法已经集成到了国家知识产权局的在线汉英机器翻译系统中,有效地提高了专利翻译的效果和翻译效率。 相似文献
8.
近年来,深度学习取得了重大突破,融合深度学习技术的神经机器翻译逐渐取代统计机器翻译,成为学术界主流的机器翻译方法。然而,传统的神经机器翻译将源端句子看作一个词序列,没有考虑句子的隐含语义信息,使得翻译结果与源端语义不一致。为了解决这个问题,一些语言学知识如句法、语义等被相继应用于神经机器翻译,并取得了不错的实验效果。语义角色也可用于表达句子语义信息,在神经机器翻译中具有一定的应用价值。文中提出了两种融合句子语义角色信息的神经机器翻译编码模型,一方面,在句子词序列中添加语义角色标签,标记每段词序列在句子中担当的语义角色,语义角色标签与源端词汇共同构成句子词序列;另一方面,通过构建源端句子的语义角色树,获取每个词在该语义角色树中的位置信息,将其作为特征向量与词向量进行拼接,构成含语义角色信息的词向量。在大规模中-英翻译任务上的实验结果表明,相较基准系统,文中提出的两种方法分别在所有测试集上平均提高了0.9和0.72个BLEU点,在其他评测指标如TER(Translation Edit Rate)和RIBES(Rank-based Intuitive Bilingual Evaluation Score)上也有不同程度的性能提升。进一步的实验分析显示,相较基准系统,文中提出的融合语义角色的神经机器翻译编码模型具有更佳的长句翻译效果和翻译充分性。 相似文献
9.
语义块切分是HNC理论的重要课题,与以往的处理策略不同,采用统计建模的方法来解决这一问题。采用词语、词性、概念等信息组成特征模板,并应用增量方法进行特征选择,构建了一个基于最大熵模型的语义块切分系统。在HNC标注语料库上的测试取得了较好的效果,开放测试的正确率和召回率分别达到了83.78%和91.17%。 相似文献
10.
句类中E块的作用及其判定策略 总被引:1,自引:0,他引:1
1 引言在一个句子中,谓语起着极其重要的作用。多年来,无论是以句法分析为主的自然语言处理,还是以语义分析为主的自然语言理解,都把对谓语的确定作为句子分析的关键因素。首先,从句子的构成形式上看, 可以通过谓语的类型,判断句子是无宾语,单宾语,还是双宾语,甚至是小句作宾语;其次,从意义上看,又可以通过它得到句子的主语和宾语所具备的基本语义信息,是人还是物,是具体概念还是抽象概念,或者,从格的角度讲,是施事格还是受事格等。HNC理论同样肯定谓语所起的作用,同时认为,一方面“谓语”的内涵过泛,另一方面对谓语的复合构成缺乏深层的揭示。为此,它引入了表示特征的E(igen)语义块基元的概念, 由它构成特征语义块EK。同时还引入了三类广义对 相似文献
11.
A common practice in operational Machine Translation (MT) and Natural Language Processing (NLP) systems is to assume that a verb has a fixed number of senses and rely on a precompiled lexicon to achieve large coverage. This paper demonstrates that this assumption is too weak to cope with the similar problems of lexical divergences between languages and unexpected uses of words that give rise to cases outside of the pre-compiled lexicon coverage. We first examine the lexical divergences between English verbs and Chinese verbs. We then focus on a specific lexical selection problem—translating Englishchange-of-state verbs into Chinese verb compounds. We show that an accurate translation depends not only on information about the participants, but also on contextual information. Therefore, selectional restrictions on verb arguments lack the necessary power for accurate lexical selection. Second, we examine verb representation theories and practices in MT systems and show that under the fixed sense assumption, the existing representation schemes are not adequate for handling these lexical divergences and extending existing verb senses to unexpected usages. We then propose a method of verb representation based on conceptual lattices which allows the similarities among different verbs in different languages to be quantitatively measured. A prototype system UNICON implements this theory and performs more accurate MT lexical selection for our chosen set of verbs. An additional lexical module for UNICON is also provided that handles sense extension. 相似文献
12.
王永生 《计算机工程与应用》2010,46(20):99-102
词性标注是英汉机器翻译系统中一个基础性的研究课题。提出了一种基于决策树的词性标注的非监督学习算法,在只有一个词库的有限条件下,通过决策树进行词性标注的非监督学习,生成词性标注规则。 相似文献
13.
This paper concerns the treatment, in the context of machine translation, of English complex nominal groups which can be considered as nominalizations of verb phrases. We discuss the fact that many styles of English prose which are suitable for translation by machine typically favor the use of nominal rather than verbal syntagms. But such constructions when translated literally are often considered unnatural. The general problem is described in detail, with examples. The more specific problem of recognizing nominalizations and analyzing their structure is considered. How and where to achieve the required syntactic transformation is discussed, and exemplified.On leave of absence from the Centre for Computational Linguistics, University of Manchester Institute of Science and Technology, England. 相似文献
14.
《Future Generation Computer Systems》1986,2(2):83-94
Advances in hardware have made available micro-coded LISP and PROLOG workstations, supported by text editing and formatting software. Some of these have been augmented with linguistic technology including large bilingual dictionaries, parsers, generators, and translators to make them powerful tools for research and development of automated translation. Some techniques of linguistic engineering for accomplishing translation are described, and it is suggested that the present barely satisfactory approach involving sentence-by-sentence translation will eventually be improved by incorporating the results of research on analyzing discourse. 相似文献
15.
Fuzzy matching techniques are the presently used methods in translating the words. Neural machine translation and statistical machine translation are the methods used in MT. In machine translator tool, the strategy employed for translation needs to handle large amount of datasets and therefore the performance in retrieving correct matching output can be affected. In order to improve the matching score of MT, the advanced techniques can be presented by modifying the existing fuzzy based translator and neural machine translator. The conventional process of modifying architectures and encoding schemes are tedious process. Similarly, the preprocessing of datasets also involves more time consumption and memory utilization. In this article, a new spider web based searching enhanced translation is presented to be employed with the neural machine translator. The proposed scheme enables deep searching of available dataset to detect the accurate matching result. In addition, the quality of translation is improved by presenting an optimal selection scheme for using the sentence matches in source augmentation. The matches retrieved using various matching scores are applied to an optimization algorithm. The source augmentation using optimal retrieved matches increases the translation quality. Further, the selection of optimal match combination helps to reduce time requirement, since it is not necessary to test all retrieved matches in finding target sentence. The performance of translation is validated by measuring the quality of translation using BLEU and METEOR scores. These two scores can be achieved for the TA-EN language pairs in different configurations of about 92% and 86%, correspondingly. The results are evaluated and compared with other available NMT methods to validate the work. 相似文献
16.
17.
18.
A. V. Novikova L. A. Mylnikov 《Automatic Documentation and Mathematical Linguistics》2017,51(3):159-169
This article draws on the example of business texts to consider practical aspects of the distortion of meaning in translation from one language to another in the available machine translation (MT) systems and their underlying approach based on word-by-word translation. An integrated functional approach to translating business texts is suggested on the basis of analyzing semantic and morphological features of actual text content and also on axiological and epistemic semantic features that bring to light subjective modality. The suggested technique is used to develop an algorithm of business text MT that makes it possible to resolve the word-by-word translation issue and conveys the meanings of short texts. Cases of testing the suggested technique and the derived algorithm are considered for the Russian–English language pair. 相似文献
19.
We report experimental results on automatic extraction of an English-Chinese translation lexicon, by statistical analysis of a large parallel corpus, using limited amounts of linguistic knowledge. To our knowledge, these are the first empirical results of the kind between an Indo-European and non-Indo-European language for any significant vocabulary and corpus size. The learned vocabulary size is about 6,500 English words, achieving translation precision in the 86–96% range, with alignment proceeding at paragraph, sentence, and word levels. Specifically, we report (1) progress on the HKUST English-Chinese Parallel Bilingual Corpus, (2) experiments supporting the usefulness of restricted lexical cues for statistical paragraph and sentence alignment, and (3) experiments that question the role of hand-derived monolingual lexicons for automatic word translation acquisition. Using a hand-derived monolingual lexicon, the learned translation lexicon averages 2.33 Chinese translations per English entry, with a manually-filtered precision of 95.1%, and an automatically-filtered weighted precision of 86.0%. We then introduce a fully automatic two-stage statistical methodology that is able to learn translations for collocations. A statistically-learned monolingual Chinese lexicon is first used to segment the Chinese text, before applying bilingual training to produce 6,429 English entries with 2.25 Chinese translations per entry. This method improves the manually-filtered precision to 96.0% and the automatically-filtered weighted precision to 91.0%, an error rate reduction of 35.7% from using a hand-derived monolingual lexicon. 相似文献
20.
在自然语言处理领域,递归神经网络在机器翻译中的应用越来越广泛;除了其他语言外,汉语中还包含大量的词汇,提高英译汉的机器翻译质量是对汉语处理的一个重要贡献;设计了一个英汉机器翻译系统的模型,该系统使用基于知识的上下文向量来映射英语和汉语单词,采用编解码递归神经网络实现;对基于激活函数模型的性能进行了测试,测试结果表明,编码器层的线性激活函数和解码器层的双曲正切激活函数性能最好;从GRU和LSTM层的执行情况来看,GRU的性能优于LSTM;注意层采用softmax和sigmoid激活函数进行设置,该模型的方法在交叉熵损失度量方面优于现有的系统. 相似文献