首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
This paper proposes a novel method for phrase-based statistical machine translation based on the use of a pivot language. To translate between languages L s and L t with limited bilingual resources, we bring in a third language, L p , called the pivot language. For the language pairs L s  − L p and L p  − L t , there exist large bilingual corpora. Using only L s  − L p and L p  − L t bilingual corpora, we can build a translation model for L s  − L t . The advantage of this method lies in the fact that we can perform translation between L s and L t even if there is no bilingual corpus available for this language pair. Using BLEU as a metric, our pivot language approach significantly outperforms the standard model trained on a small bilingual corpus. Moreover, with a small L s  − L t bilingual corpus available, our method can further improve translation quality by using the additional L s  − L p and L p  − L t bilingual corpora.  相似文献   

2.
In conventional algorithms, the lack of entity information, reference, and semantic relations in the current corpus leads to a low rate of precision and efficiency in constructing cross‐language bilingual mapping. According to natural language processing and machine translation technology, to solve the problem, this paper aims to establish a parallel corpus for information extraction based on the OntoNotes corpus by combining automatic extraction and manual adjustment. To verify the validity of the parallel corpus constructed in this paper, a comparative experiment was carried out on the corpus. The corpus entity alignment rate, anaphora absence, and syntactic structure were analysed in detail based on statistics. The data set is well performed in language processing and machine translation. The parallel corpus for information extraction constructed in this paper can produce highly precise, stable, and efficient information in the process of bilingual mapping, which provides an effective parallel corpus for the study in machine translation of bilingual mapping.  相似文献   

3.
以机器翻译技术为核心的多语信息处理研究   总被引:1,自引:0,他引:1  
该文介绍了哈尔滨工业大学教育部-微软语言语音重点实验室在多语信息处理方面的研究进展和成果.首先综述了国内外的研究现状,然后重点介绍在统计机器翻译、机器翻译应用、机器翻译评价、跨语言信息检索等方面的研究工作.  相似文献   

4.
针对汉维统计机器翻译中维吾尔语具有长距离依赖问题和语言模型具有数据稀疏现象,提出了一种基于泛化的维吾尔语语言模型.该模型借助维吾尔语语言模型的训练过程中生成的文本,结合字符串相似度算法,取相似的维文字符串经过归一化处理抽取规则,计算规则的参数值,利用规则给测试集在解码过程中生成n-best译文重新评分,将评分最高的译文作为最佳译文.实验结果表明,泛化语言模型减少了存储空间,同时,规则的合理使用有效地提高了翻译译文的质量.  相似文献   

5.
一种基于实例的汉英机器翻译策略   总被引:3,自引:0,他引:3  
介绍了一种基于实例的汉英机器翻译策略,重点讨论了汉英双语语料库的设计和基于该语料库的汉语句子的匹配算法。在进行汉语句子的匹配时,根据汉语的特点直接采用汉字的匹配,而没有进行汉语句子的分词。另外,匹配时确定匹配片断的边界也是基于实例机器翻译的难点之一,在这方面也采取了相应的解决方法。没有对翻译句子的连接装配进行更深入的研究,这是因为该翻译策略是用于多翻译引擎系统的,它要与其它翻译策略配合使用,以提高翻译结果的正确率。基于实例的机器翻译需要大量的双语语料库作为翻译时的依据,而人工建设大型语料库费时费力,所以尝试采用计算机进行汉英双语语料库的自动建立,包括篇章对齐和单词级的对齐。  相似文献   

6.
基于短语统计翻译的汉维机器翻译系统   总被引:1,自引:0,他引:1  
杨攀  李淼  张建 《计算机应用》2009,29(7):2022-2025
描述了一种基于短语统计翻译的汉维机器翻译系统。首先使用汉维语料进行训练,得到语言模型和翻译模型;再利用训练好的模型对源语句进行解码,以得到最佳的翻译语句。解码的核心算法是柱搜索(beam search)算法。其中维文语料使用的是拉丁维文。实验结果表明,基于短语的统计机器翻译方法可以快速有效地构建一个汉维机器翻译平台。  相似文献   

7.
8.
龚龙超  郭军军  余正涛 《计算机应用》2022,42(11):3386-3394
当前性能最优的机器翻译模型之一Transformer基于标准的端到端结构,仅依赖于平行句对,默认模型能够自动学习语料中的知识;但这种建模方式缺乏显式的引导,不能有效挖掘深层语言知识,特别是在语料规模和质量受限的低资源环境下,句子解码缺乏先验约束,从而造成译文质量下降。为了缓解上述问题,提出了基于源语言句法增强解码的神经机器翻译(SSED)方法,显式地引入源语句句法信息指导解码。所提方法首先利用源语句句法信息构造句法感知的遮挡机制,引导编码自注意力生成一个额外的句法相关表征;然后将句法相关表征作为原句表征的补充,通过注意力机制融入解码,共同指导目标语言的生成,实现对模型的先验句法增强。在多个IWSLT及WMT标准机器翻译评测任务测试集上的实验结果显示,与Transformer基线模型相比,所提方法的BLEU值提高了0.84~3.41,达到了句法相关研究的最先进水平。句法信息与自注意力机制融合是有效的,利用源语言句法可指导神经机器翻译系统的解码过程,显著提高译文质量。  相似文献   

9.
Computer animation and visualization can facilitate communication between the hearing impaired and those with normal speaking capabilities. This paper presents a model of a system that is capable of translating text from a natural language into animated sign language. Techniques have been developed to analyse language and transform it into sign language in a systematic way. A hand motion coding method as applied to the hand motion representation, and control has also been investigated. Two translation examples are also given to demonstrate the practicality of the system.  相似文献   

10.
刘颖  姜巍 《计算机工程与应用》2012,48(32):98-101,146
对齐短语是决定统计机器翻译系统质量的核心模块。提出基于短语结构树的层次短语模型,这是利用串-树模型的思想对层次短语模型的扩展。基于短语结构树的层次短语模型是在双语对齐短语的基础之上结合英语短语结构树抽取翻译规则,并利用启发式策略获得翻译规则的扩展句法标记。采用翻译规则的统计机器翻译系统在不同数据集上具有稳定的翻译结果,在训练集和测试集的平均BlEU评分高于短语模型和层次短语模型的BLEU评分。  相似文献   

11.
通过汉语到英语的翻译实验以及对结果译文的分析,对基于词的模型、基于短语的模型和基于句法的模型的翻译性能进行了比较。结果表明基于短语的模型性能优于其他两个模型,但是使用的参数较多;基于句法的模型虽然翻译性能不理想,但可以用较少的参数表达更丰富的信息,值得深入研究。  相似文献   

12.
通过汉语到英语的翻译实验以及对结果译文的分析,对基于词的模型、基于短语的模型和基于句法的模型的翻译性能进行了比较.结果表明基于短语的模型性能优于其他两个模型,但是使用的参数较多;基于句法的模型虽然翻译性能不理想,但可以用较少的参数表达更丰富的信息,值得深入研究.  相似文献   

13.
一种面向汉英口语翻译的双语语块处理方法   总被引:3,自引:2,他引:3  
基于语块的处理方法是近年来自然语言处理领域兴起的一条新思路。但是,要将其应用于口语翻译当中,还需按照口语特点对涉及双语的语块概念做出合理界定。本文在已有单语语块定义的基础上,根据中、英文差异和口语翻译特性,从句法和语义两个层次提出了一种汉英双语语块概念,并对其特点进行了分析。同时,针对中、英文并行语料库,建立了一套计算机自动划分与人工校对相结合的双语语块加工方法。应用该方法,对汉英句子级对齐的口语语料进行双语语块划分和对整,并以此为基础进行了基于双语语块的口语统计机器翻译实验。结果表明,本文提出的双语语块定义符合口语翻译的实际需要,使用基于双语语块的语料处理方法,能有效地提高口语系统的翻译性能。  相似文献   

14.
SC文法功能体系   总被引:18,自引:0,他引:18  
陈肇雄 《计算机学报》1992,15(11):801-808
文法体系的研究一直是自然语言处理研究的核心问题之一.但是,由于自然语言本身所固有的复杂性和非规范性,多义问题始终未能得到圆满的解决.本文提出了SC文法(A SubCategory grammar for integrating Se-mantic and Case analysis),它是一种基于传统的上下文无关文法、语义文法,以及超前与反馈分析技术和格框架约束分析等技术的上下文相关处理文法.该文法不仅能继承传统的上下文无关文法的表示简洁、处理方便的特点,而且能实现语法和语义一体化分析和处理上下文相关以及复杂多义问题.  相似文献   

15.
为解决基于短语统计机器翻译存在的调序能力不足的问题,尝试利用句法分析器对基于短语统计机器翻译的输入汉语句子进行句法分析,然后利用转换器进行调序操作,并对部分类型短语进行预先翻译,然后再利用基于短语统计机器翻译的解码器进行翻译。重点测试了汉语中“的”字引导的复杂定语调序、介词短语、特定搭配短语、方位词短语的调序及预翻译产生的效果。实验结果表明,这些调序及预翻译操作可以显著地提高基于短语的统计机器翻译的英文译文结果的BLEU值。  相似文献   

16.
17.
为了解决在构建统计机器翻译系统过程中所面临的双语平行数据缺乏的问题,该文提出了一种新的基于中介语的翻译方法,称为Transfer-Triangulation方法。该方法可以在基于中介语的翻译过程中,结合传统的Transfer方法和Triangulation方法的优点,利用解码中介语短语的方法改进短语表。该文方法是在使用英语作为中介语的德-汉翻译任务中进行评价的。实验结果表明,相比于传统的基于中介语方法的基线系统,该方法显著提高了翻译性能。  相似文献   

18.
Word reordering is one of the challengeable problems of machine translation. It is an important factor of quality and efficiency of machine translation systems. In this paper, we introduce a novel reordering model based on an innovative structure, named, phrasal dependency tree. The phrasal dependency tree is a modern syntactic structure which is based on dependency relationships between contiguous non-syntactic phrases. The proposed model integrates syntactical and statistical information in the context of log-linear model aimed at dealing with the reordering problems. It benefits from phrase dependencies, translation directions (orientations) and translation discontinuity between translated phrases. In comparison with well-known and popular reordering models such as distortion, lexicalised and hierarchical models, the experimental study demonstrates the superiority of our model in terms of translation quality. Performance is evaluated for Persian → English and English → German translation tasks using Tehran parallel corpus and WMT07 benchmarks, respectively. The results report 1.54/1.7 and 1.98/3.01 point improvements over the baseline in terms of BLEU/TER metrics on Persian → English and German → English translation tasks, respectively. On average our model retrieved a significant impact on precision with comparable recall value with respect to the lexicalised and distortion models.  相似文献   

19.
This paper addresses the problem of transforming business specifications written in natural language into formal models suitable for use in information systems development. It proposes a method for transforming controlled natural language specifications based on the Semantics of Business Vocabulary and Business Rules standard. This approach is unique in combining techniques from Model-Driven Engineering (MDE), Cognitive Linguistics, and Knowledge-based Configuration, which allows the reliable semantic processing of specifications and integration with existing MDE tools to improve productivity, quality, and time-to-market in software development. The method first learns the vocabulary of the specification from glossary-like definitions then parses the rules of the specification and outputs the resulting formal SBVR model. Both aspects of the method are tested separately, with the system correctly learning 98% of the vocabulary and correctly interpreting 98% of the rules of an SBVR SE based example. Finally, the proposed method is compared to state-of-the-art approaches for creating formal models from natural language specifications, arguing that it meets the criteria necessary to fulfil the three goals of (1) shifting control of specification to non-technical business experts, (2) reducing the manual effort involved in formalising specifications, and (3) supporting business experts in creating well-formed sets of business vocabularies and rules.  相似文献   

20.
基于词串粒度及权值的汉语句子相似度衡量   总被引:5,自引:0,他引:5  
提出了一种改进的汉语句子相似度衡量方法,用于基于实例的汉英机器翻译。该方法同时考虑了相同词串的数目及长度和对应的权值信息,克服了传统方法的显著不足,在理论上更有合理性。在小数据集上的实验也表明该方法是可行的。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号