首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Machine Translation (MT) is the focus of extensive scientific investigations driven by regular evaluation campaigns, but which are mostly oriented towards a somewhat particular task: translating news articles into English. In this paper, we investigate how well current MT approaches deal with a real-world task. We have rationally reconstructed one of the only MT systems in daily use which produces high-quality translation: the Météo system. We show how a combination of a sentence-based memory approach, a phrase-based statistical engine and a neural-network rescorer can give results comparable to those of the current system. We also explore another possible prospect for MT technology: the translation of weather alerts, which are currently being translated manually by translators at the Canadian Translation Bureau.  相似文献   

2.
We propose an alternative method of machine–aided translation: Structure–Based Machine Translation (SBMT). SBMT uses language structure matching techniques to reduce complicated grammar rules and provide efficient and feasible translation results. SBMT comprises the following four features: (1) source language input sentence analysis; (2) source language sentence transformation into target language structure; (3) dictionary lookup; and (4) semantic disambiguation or word sense disambiguation (WSD) for correct output selection. SBMT has been designed and a prototype system has been implemented that generates satisfactory translations.  相似文献   

3.
Data dependence testing is the basic step in detecting loop level parallelism in numerical programs. The problem is undecidable in the general case. Therefore, work has been concentrated on a simplified problem, affine memory disambiguation. In this simpler domain, array references and loops bounds are assumed to be linear integer functions of loop variables. Dataflow information is ignored. For this domain, we have shown that in practice the problem can be solved accurately and efficiently.(1) This paper studies empirically the effectiveness of this domain restriction, how many real references are affine and flow insensitive. We use Larus's llpp system(2) to find all the data dependences dynamically. We compare these to the results given by our affine memory disambiguation system. This system is exact for all the cases we see in practice. We show that while the affine approximation is reasonable, memory disambiguation is not a sufficient approximation for data dependence analysis. We propose extensions to improve the analysis. This research was supported in part by a fellowship from AT & T Bell Laboratories and by DARPA contract N00014-87-K-0828.  相似文献   

4.
电子词典是在机器翻译系统中包含的信息量最大的一个部件,电子词典包的质量和容量直接限定机器翻译的质量和应用范围。与一般的电子词典不同,机器翻译词典每个词条都要比一般的电子词典增加词类信息、语义类别信息和成语等。文章以频率统计和频率分布统计作为维汉机器翻译词典的词条收录原则,统计维吾尔文中常用的单词数目,论述维汉机器翻译词典的设计思想,用BNF形式语言和Jackson图描述维汉机器翻译词典应包含的词条信息,最后介绍词典的具体构造方法、词条排序原则、索引表和属性库的数据结构和词典信息的查找方法。试验表明该词典在解决维吾尔语词汇歧义、结构歧义、提高汉语译文准确率等方面较为有效。  相似文献   

5.
DPLL-based SAT solvers progress by implicitly applying binary resolution. The resolution proofs that they generate are used, after the SAT solver’s run has terminated, for various purposes. Most notable uses in formal verification are: extracting an unsatisfiable core, extracting an interpolant, and detecting clauses that can be reused in an incremental satisfiability setting (the latter uses the proof only implicitly, during the run of the SAT solver). Making the resolution proof smaller can benefit all of these goals: it can lead to smaller cores, smaller interpolants, and smaller clauses that are propagated to the next SAT instance in an incremental setting. We suggest two methods that are linear in the size of the proof for doing so. Our first technique, called Recycle-Units uses each learned constant (unit clause) (x) for simplifying resolution steps in which x was the pivot, prior to when it was learned. Our second technique, called   simplifies proofs in which there are several nodes in the resolution graph, one of which dominates the others, that correspond to the same pivot. Our experiments with industrial instances show that these simplifications reduce the core by ≈5% and the proof by ≈13%. It reduces the core less than competing methods such as run- till- fix, but whereas our algorithms are linear in the size of the proof, the latter and other competing techniques are all exponential as they are based on SAT runs. If we consider the size of the proof (the resolution graph) as being polynomial in the number of variables (it is not necessarily the case in general), this gives our method an exponential time reduction comparing to existing tools for small core extraction. Our experiments show that this result is evident in practice more so for the second method: rarely it takes more than a few seconds, even when competing tools time out, and hence it can be used as a cheap proof post-processing procedure.  相似文献   

6.
Providing machine tractable dictionary tools   总被引:1,自引:1,他引:0  
Machine readable dictionaries (Mrds) contain knowledge about language and the world essential for tasks in natural language processing (Nlp). However, this knowledge, collected and recorded by lexicographers for human readers, is not presented in a manner for Mrds to be used directly for Nlp tasks. What is badly needed are machine tractable dictionaries (Mtds): Mrds transformed into a format usable for Nlp. This paper discusses three different but related large-scale computational methods to transform Mrds into Mtds. The Mrd used is The Longman Dictionary of Contemporary English (Ldoce). The three methods differ in the amount of knowledge they start with and the kinds of knowledge they provide. All require some handcoding of initial information but are largely automatic. Method I, a statistical approach, uses the least handcoding. It generates relatedness networks for words in Ldoce and presents a method for doing partial word sense disambiguation. Method II employs the most handcoding because it develops and builds lexical entries for a very carefully controlled defining vocabulary of 2,000 word senses (1,000 words). The payoff is that the method will provide an Mtd containing highly structured semantic information. Method III requires the handcoding of a grammar and the semantic patterns used by its parser, but not the handcoding of any lexical material. This is because the method builds up lexical material from sources wholly within Ldoce. The information extracted is a set of sources of information, individually weak, but which can be combined to give a strong and determinate linguistic data base.  相似文献   

7.
从双语语料库中提取的机译单元能更好地覆盖真实语言文本,本文提供了一个通过找出两个双语句对之间非全部为高频功能词的“相同和差异”部分,并且利用翻译词典和动态规划算法对齐“相同和差异”部分来获取机译单元的算法。对于获取的候选机译单元,本算法设计了三个过滤器来考察其正确性:双语词串相似度过滤考察其语义对应性,词性相似度过滤考察其语法对应性,首尾禁用词过滤考察其搭配正确性。通过抽样检验,最后提取的机译单元的正确率为86% ,召回率约为61.34% ,该算法对于获取机译单元提供了一种新的实用的方法。  相似文献   

8.
This article describes a method that successfully exploits syntactic features for n-best translation candidate reranking using perceptrons. We motivate the utility of syntax by demonstrating the superior performance of parsers over n-gram language models in differentiating between Statistical Machine Translation output and human translations. Our approach uses discriminative language modelling to rerank the n-best translations generated by a statistical machine translation system. The performance is evaluated for Arabic-to-English translation using NIST’s MT-Eval benchmarks. While deep features extracted from parse trees do not consistently help, we show how features extracted from a shallow Part-of-Speech annotation layer outperform a competitive baseline and a state-of-the-art comparative reranking approach, leading to significant BLEU improvements on three different test sets.  相似文献   

9.
基于Ontology的英汉机器翻译研究   总被引:8,自引:1,他引:7  
高质量的机器翻译(Machine Translation)系统必须充分结合语言学知识以及语言中性的世界知识。近年来,ontology被广泛用于在概念层对世界知识建模,本文介绍一个基于ontology的英汉机器翻译模型系统,在这个系统中,ontology作为世界知识的模型,它是通过把概念组织成一个层次结构并同时在概念间建立丰富的概念联系而构成的。通过把某种语言中的词汇映射到ontology中的概念,可以支持在源语言分析时进行歧义消解和目标语生成时的词汇选择,并可以作为源语言和目的语言之间的中介表示的概念来源。在系统中,中介表示是用概念图(Conceptual Graph)来表示的。  相似文献   

10.
该文通过构建古汉语词典模型,结合黎锦熙先生提出的句本位句法相关规则构造知识库,使用词义消歧算法,对古汉语进行基于规则的机器翻译研究。实验以基于句本位语法进行句法标注后的《论语》作为测试语料,以句子为单位进行机器翻译,通过获取待选义项、构建义项选择模型、调整句法顺序等手段生成翻译结果集,并使用二元语法模型对结果进行优选,得到机器翻译最终结果,最后对翻译结果进行了分析测评。  相似文献   

11.
This paper reports on a series of experiments which aim at integratingExample-based Machine Translation and Translation Memories with Rule-based Machine Translation. We start by examining the potentials of each MT paradigm in terms of system-internal and system-external parameters. Whereas the system-external parameters include the expected translation quality and translation coverage, system-internal parameters relate to adaptability and recall of translation units. We prefer a dynamic linkage of different MT paradigms where the sharing of labor amongst the modules involved, such as segmentation andsegment translation, is decided dynamically during runtime. We motivatethe communication of linguistically rich data structures between thedifferent components in a hybrid system and show that this linkage leadsto better translation results and improves the customization possibilitiesof the system.  相似文献   

12.
Example-Based Machine Translation (EBMT) is a corpus based approach to Machine Translation (MT), that utilizes the translation by analogy concept. In our EBMT system, translation templates are extracted automatically from bilingual aligned corpora by substituting the similarities and differences in pairs of translation examples with variables. In the earlier versions of the discussed system, the translation results were solely ranked using confidence factors of the translation templates. In this study, we introduce an improved ranking mechanism that dynamically learns from user feedback. When a user, such as a professional human translator, submits his evaluation of the generated translation results, the system learns “context-dependent co-occurrence rules” from this feedback. The newly learned rules are later consulted, while ranking the results of the subsequent translations. Through successive translation-evaluation cycles, we expect that the output of the ranking mechanism complies better with user expectations, listing the more preferred results in higher ranks. We also present the evaluation of our ranking method which uses the precision values at top results and the BLEU metric.  相似文献   

13.
This paper presents a methodology for evaluating Arabic Machine Translation (MT) systems. We are specifically interested in evaluating lexical coverage, grammatical coverage, semantic correctness and pronoun resolution correctness. The methodology presented is statistical and is based on earlier work on evaluating MT lexicons in which the idea of the importance of a specific word sense to a given application domain and how its presence or absence in the lexicon affects the MT system’s lexical quality, which in turn will affect the overall system output quality. The same idea is used in this paper and generalized so as to apply to grammatical coverage, semantic correctness and correctness of pronoun resolution. The approach adopted in this paper has been implemented and applied to evaluating four English-Arabic commercial MT systems. The results of the evaluation of these systems are presented for the domain of the Internet and Arabization.  相似文献   

14.
Aconstraint system includes a set of variables and a set of relations among these variables, calledconstraints. The solution of a constraint system is an assignment of values to variables so that all, or many, of the relations are made true. A simple and efficient method for constraint resolution has been proposed in the work of B.N. Freeman-Benson, J. Maloney, and A. Borning. We show how their method is related to the classical problem of graph matching, and from this connection we derive new resolution algorithms.  相似文献   

15.
We propose Generate and Repair Machine Translation (GRMT), a constraint–based approach to machine translation that focuses on accurate translation output. GRMT performs the translation by generating a Translation Candidate (TC), verifying the syntax and semantics of the TC and repairing the TC when required. GRMT comprises three modules: Analysis Lite Machine Translation (ALMT), Translation Candidate Evaluation (TCE) and Repair and Iterate (RI). The key features of GRMT are simplicity, modularity, extendibility, and multilinguality.
An English–Thai translation system has been implemented to illustrate the performance of GRMT. The system has been developed and run under SWI–Prolog 3.2.8. The English and Thai grammars have been developed based on Head–Driven Phrase Structure Grammar (HPSG) and implemented on the Attribute Logic Engine (ALE). GRMT was tested to generate the translations for a number of sentences/phrases. Examples are provided throughout the article to illustrate how GRMT performs the translation process.  相似文献   

16.
Lower and upper bounds in zone-based abstractions of timed automata   总被引:2,自引:0,他引:2  
Timed automata have an infinite semantics. For verification purposes, one usually uses zone-based abstractions w.r.t. the maximal constants to which clocks of the timed automaton are compared. We show that by distinguishing maximal lower and upper bounds, significantly coarser abstractions can be obtained. We show soundness and completeness of the new abstractions w.r.t. reachability and demonstrate how information about lower and upper bounds can be used to optimise the algorithm for bringing a difference bound matrix into normal form. Finally, we experimentally demonstrate that the new techniques dramatically increase the scalability of the real-time model checker UPPAAL.  相似文献   

17.
为了提高翻译系统的翻译准确率,在短语基础上结合模板的方法自动抽取模板结构;解码时,首先进行模板匹配,套用模板结构进行翻译,然后再按照Beam Search搜索算法进行后续翻译。因此,该方法可以有效地解决单一的统计翻译中语序错误。以汉蒙翻译为例,实验结果显示此方法可以有效地提高翻译效果,翻译效率比基于短语的统计翻译方法提高10%。  相似文献   

18.
现有中文短文本实体消歧模型在消歧过程中大多只考虑指称上下文与候选实体描述的语义匹配特征,对同一查询文本中候选实体间的共现特征以及候选实体与实体指称类别相似特征等有效的消歧特征考虑不足。针对这些问题,本文首先利用预训练语言模型获得指称上下文与候选实体描述的语义匹配特征;然后,针对实体嵌入和指称类别嵌入提出共现特征与类别特征;最后,通过融合上述特征实现基于多特征因子融合实体消歧模型。实验结果表明本文提出的共现特征及类别特征在实现实体消歧中的可行性和有效性,以及本文提出的基于多特征因子融合的实体消歧方法能够取得更好的消歧效果。  相似文献   

19.
实体消歧作为知识库构建、信息检索等应用的重要支撑技术,在自然语言处理领域有着重要的作用。然而在短文本环境中,对实体的上下文特征进行建模的传统消歧方式很难提取到足够多用以消歧的特征。针对短文本的特点,提出一种基于实体主题关系的中文短文本图模型消歧方法,首先,通过TextRank算法对知识库信息构建的语料库进行主题推断,并使用主题推断的结果作为实体间关系的表示;然后,结合基于BERT的语义匹配模型给出的消歧评分对待消歧文本构建消歧网络图;最终,通过搜索排序得出最后的消歧结果。使用CCKS2020短文本实体链接任务提供的数据集对所提方法进行评测,实验结果表明,该方法对短文本的实体消歧效果优于其他方法,能有效解决在缺乏知识库实体关系情况下的中文短文本实体消歧问题。  相似文献   

20.
词义消歧是一项具有挑战性的自然语言处理难题。作为词义消歧中的一种优秀的半监督消歧算法,遗传蚁群词义消歧算法能快速进行全文词义消歧。该算法采用了一种局部上下文的图模型来表示语义关系,以此进行词义消歧。然而,在消歧过程中却丢失了全局语义信息,出现了消歧结果冲突的问题,导致算法精度降低。因此, 提出了一种基于全局领域和短期记忆因子改进的图模型来表示语义以解决这个问题。该图模型引入了全局领域信息,增强了图对全局语义信息的处理能力。同时根据人的短期记忆原理,在模型中引入了短期记忆因子,增强了语义间的线性关系,避免了消歧结果冲突对词义消歧的影响。大量实验结果表明:与经典词义消歧算法相比,所提的改进图模型提高了词义消歧的精度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号