首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 31 毫秒
The development of a machine translation system is one of the most difficult computational tasks. Without a deep semantic analysis of both source and target languages, a machine translation system can not generate good results. This paper describes a machine translation system based on a new method called the Integral Method in which semantic analysis using an active dictionary plays a very important role.  相似文献   

无监督神经机器翻译仅利用大量单语数据,无需平行数据就可以训练模型,但是很难在2种语系遥远的语言间建立联系。针对此问题,提出一种新的不使用平行句对的神经机器翻译训练方法,使用一个双语词典对单语数据进行替换,在2种语言之间建立联系,同时使用词嵌入融合初始化和双编码器融合训练2种方法强化2种语言在同一语义空间的对齐效果,以提高机器翻译系统的性能。实验表明,所提方法在中-英与英-中实验中比基线无监督翻译系统的BLEU值分别提高2.39和1.29,在英-俄和英-阿等单语实验中机器翻译效果也显著提高了。  相似文献   

机读字典蕴藏着非常丰富的词汇语意知识,这些知识可由自动化方式粹取出来,有效地利用在各种自然语言处理相关研究上。本研究提出一套方法,以英文版的WordNet 作为基本骨架,结合比对属类词与比对定义内容两种技巧,将WordNet同义词集对映到朗文当代英汉双语词典之词条。并藉由这个对映将WordNet同义词集冠上中文翻译词汇。在实验部分,我们依岐义程度将词汇分为单一语意与语意岐义两部分进行。在单一语意部分的实验结果,以100%的涵盖率计算,可获得97.7%的精准率。而在语意岐义部分,我们得到85.4%精准率,以及63.4%涵盖率的实验结果。  相似文献   

Practical natural language processing (NLP) systems such as database front-ends, deductive databases or object-oriented databases are at the forefront of research into the next-generation intelligent database systems. The research described in this paper has been aimed at integrating front-end paradigms and rule-based deduction to provide a single powerful framework for database systems in Arabic. The lexicon stores only roots of verbs and uses a program intelligent enough to handle all derived forms automatically. This is significant, as these alone represent 70% of the total dictionary. As part of the discussion of this system, its utility in such NLP applications as parsing and machine translation is examined.  相似文献   

该文对神经机器翻译中的数据泛化方法和短语生成方法进行研究。在使用基于子词的方法来缓解未登录词和稀疏词汇问题的基础上,提出使用数据泛化的方法来进一步优化未登录词和稀疏词汇的翻译,缓解了子词方法中出现的错译问题。文中对基于子词的方法和基于数据泛化的方法进行了详细的实验对比,对两种方法的优缺点进行了讨论和说明。针对数据泛化的处理方法,提出了一致性检测方法和解码优化方法。由于标准的神经机器翻译模型以词汇为基础进行翻译建模,因此该文提出了一种规模可控的短语生成方法,通过使用该文方法生成的源语言短语,神经机器翻译的翻译性能进一步提高。最终,在汉英和英汉翻译任务上,翻译性能与基线翻译系统相比分别提高了1.3和1.2个BLEU值。  相似文献   

Information on subcategorization and selectional restrictions in a valency dictionary is important for natural language processing tasks such as monolingual parsing, accurate rule-based machine translation and automatic summarization. In this paper we present an efficient method of assigning valency information and selectional restrictions to entries in a bilingual dictionary, based on information in an existing valency dictionary. The method is based on two assumptions: words with similar meaning have similar subcategorization frames and selectional restrictions; and words with the same translations have similar meanings. Based on these assumptions, new valency entries are constructed for words in a plain bilingual dictionary, using entries with similar source-language meaning and the same target-language translations. We evaluate the effects of various measures of semantic similarity.  相似文献   

This paper presents the design and implementation techniques employed in a Japanese-to-Sinhalese machine translation system. The main result of this work is the successful application of Bunsetsu in generating meaningful translations for a flexible-grammar language. The system has been developed considering the similarities between Japanese Bunsetsu and Sinhalese units. Such efforts are being focused on determining the minimum reasonable grammatical knowledge necessary for machine translation. The principal characteristics of the system, the translation process, problems encountered during the development stages, present status, and future plans will be discussed.  相似文献   

一种基于贝叶斯分类与机读词典的多义词排歧方法   总被引:3,自引:0,他引:3  
一词多义是自然语言中普遍存在的现象,词义排歧的成功率是衡量机器翻译、信息检索、文本分类等自然语言处理软件性能的重要指标。提出了一种基于贝叶斯分类与机读词典的多义词排歧方法,通过小规模语料库的训练和歧义词在机读词典中的语义定义来完成歧义的消除。实验表明:基于贝叶斯分类与机读词典的多义词排歧算法在标注语料库规模受限的情况下,能取得较高的排歧准确率。  相似文献   

研究语义是当前人工智能、语义网、语义词典等研究领域的热点,它可以有效支持机器翻译和自然语言处理等技术.文章根据藏文独特的文法特性,运用藏文逻辑格和计算语言学知识,在保留藏文原有特点的基础上,为藏文语义关系抽取方法建立较完整的语义场,以此为藏文语义词典建设提供了基础性构建方法.  相似文献   

本文提出用面向对象理论来建立机器翻译词典基类的方法,成功地用一种通用的模式来实现机器翻译中各科电子词典的管理。新方法较大地提高了机器翻译系统的可靠性、可维护性与可重用性,并已在NHWIN中日-日中机器翻译系统中得到了很好的应用。  相似文献   

The paper addresses the problem of automatic dictionary translation.The proposed method translates a dictionary by means of mining repositories in the source and target languages, without any directly given relationships connecting the two languages. It consists of two stages: (1) translation by lexical similarity, where words are compared graphically, and (2) translation by semantic similarity, where contexts are compared. In the experiments Polish and English version of Wikipedia were used as text corpora. The method and its phases are thoroughly analyzed. The results allow implementing this method in human-in-the-middle systems.  相似文献   

Capturing the underlying semantic relationships of sentences is helpful for machine translation. Variational neural machine translation approaches provide an effective way to model the uncertain underlying semantics in languages by introducing latent variables. Multitask learning is applied in multimodal machine translation to integrate multimodal data. However, these approaches usually lack a strong interpretation in utilizing out-of-text information in machine translation tasks. In this paper, we propose a novel architecture-free multimodal translation model, called variational multimodal machine translation (VMMT), under the variational framework which can model the uncertainty in languages caused by ambiguity through utilizing visual and textual information. In addition, the proposed model can eliminate the discrepancy between training and prediction in the existing variational translation models by constructing encoders only relying on source data. More importantly, the proposed multimodal translation model is designed as multitask learning in which the shared semantic representation for different modes is learned and the gap among semantic representation from various modes is reduced by incorporating additional constraints. Moreover, the information bottleneck theory is adopted in our variational encoder–decoder model, which helps the encoder to filter redundancy and the decoder to concentrate on useful information. Experiments on multimodal machine translation demonstrate that the proposed model is competitive.  相似文献   

Describes the IXM2 associative processor and its main application in speech-to-speech translation. The IXM2 is a semantic memory system machine that began as a faithful implementation of the NETL semantic network machine and grew into a massively parallel SIMD machine that has demonstrated the power of large associative memories. Such processors can support robust performance in speech applications. In fact, the IXM2 with 73 transputers has outperformed a Cray in some language-translation tasks. We selected speech-to-speech translation as our main application because it is one of the grand challenges of massively parallel artificial intelligence. The social implications of successful automatic translation are enormous-e.g. people who speak different languages could communicate in real time by using interpreting telephony  相似文献   

In this paper, we address the demanding task of developing intelligent systems equipped with machine creativity that can perform design tasks automatically. The main challenge is how to model human beings' creativity mathematically and mimic such creativity computationally. We propose a ``synthesis reasoning model" as the underlying mechanism to simulate human beings' creative thinking when they are handling design tasks. We present the theory of the synthesis reasoning model, and the detailed procedure of designing an intelligent system based on the model. We offer a case study of an intelligent Chinese calligraphy generation system which we have developed. Based on implementation experiences of the calligraphy generation system as well as a few other systems for solving real-world problems, we suggest a generic methodology for constructing intelligent systems using the synthesis reasoning model.  相似文献   

A sememe is defined as the minimum semantic unit of languages in linguistics. Sememe knowledge bases are built by manually annotating sememes for words and phrases. HowNet is the most well-known sememe knowledge base. It has been extensively utilized in many natural language processing tasks in the era of statistical natural language processing and proven to be effective and helpful to understanding and using languages. In the era of deep learning, although data are thought to be of vital importance, there are some studies working on incorporating sememe knowledge bases like HowNet into neural network models to enhance system performance. Some successful attempts have been made in the tasks including word representation learning, language modeling, semantic composition, etc. In addition, considering the high cost of manual annotation and update for sememe knowledge bases, some work has tried to use machine learning methods to automatically predict sememes for words and phrases to expand sememe knowledge bases. Besides, some studies try to extend HowNet to other languages by automatically predicting sememes for words and phrases in a new language. In this paper, we summarize recent studies on application and expansion of sememe knowledge bases and point out some future directions of research on sememes.  相似文献   

A survey of the current machine translation systems is given, which includes not only activities in Japan, but also abroad, especially European, US and Canadian activities. Then the components of a machine translation system are explained from the standpoint of software, linguistic components, and users' demands. The importance of pre-editing and post-editing is stressed. The semantic and contextual processings are essential to obtain a better translation quality, which are the future problems to attack. Attention is given to the difficulty of contemplating a pivot method in machine translation instead of transfer methods, because the projection from a word or a phrase to a concept is very difficult if we want to have a very exact concept representation and translation. A new transfer method which accompanies the pe-transfer structural adjustment and post-transfer adjustment is explained. This method was adopted by the Japanese governmental project of machine translation which was directed by the author. Various mechanisms of structural transformations in the transfer and generation processes are explained, which are necessitated by the language translation between the two languages of different language families like Japanese and English.Finally some comments are given from the standpoint of users of machine translation systems. Systems always are imperfect, and users must use them after recognizing the possibilities and the limitations of the system.  相似文献   

语料库作为基本的语言数据库和知识库,是各种自然语言处理方法实现的基础。随着统计方法在自然语言处理中的广泛应用,语料库建设已成为重要的研究课题。自动分词是句法分析的一项不可或缺的基础性工作,其性能直接影响句法分析。本文通过对85万字节藏语语料的统计分析和藏语词的分布特点、语法功能研究,介绍基于词典库的藏文自动分词系统的模型,给出了切分用词典库的结构、格分块算法和还原算法。系统的研制为藏文输入法研究、藏文电子词典建设、藏文字词频统计、搜索引擎的设计和实现、机器翻译系统的开发、网络信息安全、藏文语料库建设以及藏语语义分析研究奠定了基础。  相似文献   

汉语和维吾尔语是在句法结构和语序上差异较大的两种语言。对于一个完备的汉维机器翻译系统而言,进行源语言的分析和目标语言时态、语态的准确表达是有必要的。针对统计机器翻译模型中所包含的句法、语义成分较低导致的准确率及语序问题,通过建立相关转换及匹配规则,以期用于机器翻译的混合方法之中来提高翻译系统的工作性能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号