首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
基于语义单元表示树剪枝的高速多语言机器翻译   总被引:9,自引:0,他引:9  
高小宇  高庆狮  胡玥  李莉 《软件学报》2005,16(11):1909-1919
提出一种基于语义单元表示树剪枝的高速多语言机器翻译方法.此方法是一种将汉语翻译到其他语种不需要先进行汉语切分的多语言机器翻译方法.而且翻译时间为O(L)而不是O(LN),其中,L是文本的长度,N是语义单元库中语义单元的数量,一般有数十万或者数百万.  相似文献   

2.
This paper is about the use of natural language to communicate with computers. Most researches that have pursued this goal consider only requests expressed in English. A way to facilitate the use of several languages in natural language systems is by using an interlingua. An interlingua is an intermediary representation for natural language information that can be processed by machines. We propose to convert natural language requests into an interlingua [universal networking language (UNL)] and to execute these requests using software components. In order to achieve this goal, we propose OntoMap, an ontology-based architecture to perform the semantic mapping between UNL sentences and software components. OntoMap also performs component search and retrieval based on semantic information formalized in ontologies and rules.  相似文献   

3.
现有Web漏洞检测方法中使用的中间语言针对特定编程语言设计,在对多种编程语言源代码进行漏洞抽象表示时,无法将多语言下的同类型漏洞用统一的中间语言表示,增加了后续漏洞分析处理的难度。针对该问题提出了一种基于污点分析的中间语言表示方法,实现多编程语言下同类型漏洞信息的统一抽象表示。该中间语言设计过程中将漏洞发生过程抽象为三元组表示,将与三元组相关的代码元素抽象为中间语言的关键字,根据三元组间的语义关系设计了该中间语言的语法。在转义时,利用污点分析方法跟踪污染源的执行路径,对路径中的源代码进行转义得到中间语言表示。最后将该中间语言用于漏洞检测模型,实验结果表明该中间语言与对照中间语言相比对编程语言中漏洞信息的抽象表示更具普适性,对漏洞检测具有有效性。  相似文献   

4.
This paper discusses some basic notions involved in designing, developing, and implementing the Intermediary Language (IL) for Machine Translation applied to a set of languages. The stages for the design of the IL would include the independent analysis and synthesis of each language in its own terms. Then each could be mapped once into the IL dictionary and grammar, creating the IL text. From the IL text the transfer routine would synthesize the target text for a particular language. It is assumed that the IL text would have algebraic representation of the variables to be instantiated in the target language on the basis of the IL text information. The IL should contain all the information occurring in the set of languages plus such generalizations as might be justified on the basis of inductive implications and/or deductively oriented postulates to be verified by adding new languages for testing the capacity of the IL.Given five languages spoken by more than a hundred million people, if N equals 5 for the pairwise translation (say, into English), N2–N, we get 20 programs, and for the IL translation 2N+1, we can manage with eleven programs, yielding a significant gain.The IL metalanguage, ideally, should have the capacity to function as an algebraic representation of both paradigmatic units (the selection axis) and their relationships (the contiguity axis). Both should be correlated with the extralinguistic fragments in terms of determiners, quantifiers, and classifiers. The structure of the IL grammar contains four components: dictionary, context-free information providing the nonterminal dictionary (i.e., classification), parser/synthesizer, and the initial string.  相似文献   

5.
《Computer》2003,36(11):14-16
Machine translation currently translates text from one language into another. However, work is under way on speech-to-speech translation. There are two kinds of machine translation: knowledge-based and statistical. Knowledge-based systems translate documents by converting words and grammar directly from one language into another. Rather than using the knowledge-based system's direct word-by-word translation techniques, statistical approaches translate documents by statistically analyzing entire phrases and, over time, "learning" how various languages work. The article examines the pros and cons of both systems, and predicts that statistical methods will become more popular, however the future will involve combining statistical and knowledge-based methods to create better systems.  相似文献   

6.
神经机器翻译是目前应用最广泛的机器翻译方法,在语料资源丰富的语种上取得了良好的效果.但是在汉语-越南语这类缺乏双语数据的语种上表现不佳.考虑汉语和越南语在语法结构上的差异性,提出一种融合源语言句法解析树的汉越神经机器翻译方法,利用深度优先遍历得到源语言的句法解析树的向量化表示,将句法向量与源语言词嵌入相加作为输入,训练翻译模型.在汉-越语言对上进行了实验,相较于基准系统,获得了0.6个BLUE值的提高.实验结果表明,融合句法解析树可以有效提高在资源稀缺情况下机器翻译模型的性能.  相似文献   

7.
Danilo Montesi 《Knowledge》1996,9(8):809-507
Heterogeneous knowledge representation allows combination of several knowledge representation techniques. For instance, connectionist and symbolic systems are two different computational paradigms and knowledge representations. Unfortunately, the integration of different paradigms and knowledge representations is not easy and very often is informal. In this paper, we propose a formal approach to integrate these two paradigms where as a symbolic system we consider a (logic) rule-based system. The integration is operated at language level between neural networks and rule languages. The formal model that allows the integration is based on constraint logic programming and provides an integrated framework to represent and process heterogeneous knowledge. In order to achieve this we define a new language that allows expression and modelling in a natural and intuitive way the above issues together with the operational semantics.  相似文献   

8.
Interlingua and transfer-based approaches tomachine translation have long been in use in competing and complementary ways. The former proves economical in situations where translation among multiple languages is involved, and can be used as a knowledge-representation scheme. But given a particular interlingua, its adoption depends on its ability (a) to capture the knowledge in texts precisely and accurately and (b) to handle cross-language divergences. This paper studies the language divergence between English and Hindi and its implication to machine translation between these languages using the Universal Networking Language (UNL). UNL has been introduced by the United Nations University, Tokyo, to facilitate the transfer and exchange of information over the internet. The representation works at the level of single sentences and defines a semantic net-like structure in which nodes are word concepts and arcs are semantic relations between these concepts. The language divergences between Hindi, an Indo-European language, and English can be considered as representing the divergences between the SOV and SVO classes of languages. The work presented here is the only one to our knowledge that describes language divergence phenomena in the framework of computational linguistics through a South Asian language.  相似文献   

9.
An instruction set is given for an abstract machine which uses a pushdown stack as its principal memory. The proposed instructions serve the similar purposes of (1) defining the dynamic semantics of programming languages by describing the operations of programs on the abstract machine and (2) describing an intermediate language to be used in compiling programming languages into machine language. It is shown how the intermediate language can be used in the translation of the programming languages ADA, FORTRAN and PASCAL into IBM 360 assembly language and advantages over other intermediate languages such as three-address code and P-code.  相似文献   

10.
Quotients and factors are important notions in the design of various computational procedures for regular languages and for the analysis of their logical properties. We propose a new representation of regular languages, by linear systems of language equations, which is suitable for the following computations: language reversal, left quotients and factors, right quotients and factors, and factor matrices. We present algorithms for the computation of all these notions, and indicate an application of the factor matrix to the computation of solutions of a particular language reconstruction problem. The advantage of these algorithms is that they all operate only on linear systems of language equations, while the design of the same algorithms for other representations often require translation to other representations.  相似文献   

11.
In this paper we present two actor languages and a semantics preserving translation between them. The source of the translation is a high-level language that provides object-based programming abstractions. The target is a simple functional language extended with basic primitives for actor computation. The semantics preserved is the interaction semantics of actor systems — sets of possible interactions of a system with its environment. The proof itself is of interest since it demonstrates a methodology based on the actor theory framework for reasoning about correctness of transformations and translations of actor programs and languages and more generally of concurrent object languages.  相似文献   

12.
13.
We propose a novel approach to cross-lingual language model and translation lexicon adaptation for statistical machine translation (SMT) based on bilingual latent semantic analysis. Bilingual LSA enables latent topic distributions to be efficiently transferred across languages by enforcing a one-to-one topic correspondence during training. Using the proposed bilingual LSA framework, model adaptation can be performed by, first, inferring the topic posterior distribution of the source text and then applying the inferred distribution to an n-gram language model of the target language and translation lexicon via marginal adaptation. The background phrase table is enhanced with the additional phrase scores computed using the adapted translation lexicon. The proposed framework also features rapid bootstrapping of LSA models for new languages based on a source LSA model of another language. Our approach is evaluated on the Chinese–English MT06 test set using the medium-scale SMT system and the GALE SMT system measured in BLEU and NIST scores. Improvement in both scores is observed on both systems when the adapted language model and the adapted translation lexicon are applied individually. When the adapted language model and the adapted translation lexicon are applied simultaneously, the gain is additive. At the 95% confidence interval of the unadapted baseline system, the gain in both scores is statistically significant using the medium-scale SMT system, while the gain in the NIST score is statistically significant using the GALE SMT system.  相似文献   

14.
The interlingual approach to machine translation (MT) is used successfully in multilingual translation. It aims to achieve the translation task in two independent steps. First, meanings of the source-language sentences are represented in an intermediate language-independent (Interlingua) representation. Then, sentences of the target language are generated from those meaning representations. Arabic natural language processing in general is still underdeveloped and Arabic natural language generation (NLG) is even less developed. In particular, Arabic NLG from Interlinguas was only investigated using template-based approaches. Moreover, tools used for other languages are not easily adaptable to Arabic due to the language complexity at both the morphological and syntactic levels. In this paper, we describe a rule-based generation approach for task-oriented Interlingua-based spoken dialogue that transforms a relatively shallow semantic interlingual representation, called interchange format (IF), into Arabic text that corresponds to the intentions underlying the speaker’s utterances. This approach addresses the handling of the problems of Arabic syntactic structure determination, and Arabic morphological and syntactic generation within the Interlingual MT approach. The generation approach is developed primarily within the framework of the NESPOLE! (NEgotiating through SPOken Language in E-commerce) multilingual speech-to-speech MT project. The IF-to-Arabic generator is implemented in SICStus Prolog. We conducted evaluation experiments using the input and output from the English analyzer that was developed by the NESPOLE! team at Carnegie Mellon University. The results of these experiments were promising and confirmed the ability of the rule-based approach in generating Arabic translation from the Interlingua taken from the travel and tourism domain.  相似文献   

15.
In this paper we develop a formalization of semantic relations that facilitates efficient implementations of relations in lexical databases or knowledge representation systems using bases. The formalization of relations is based on a modeling of hierarchical relations in Formal Concept Analysis. Further, relations are analyzed according to Relational Concept Analysis, which allows a representation of semantic relations consisting of relational components and quantificational tags. This representation utilizes mathematical properties of semantic relations. The quantificational tags imply inheritance rules among semantic relations that can be used to check the consistency of relations and to reduce the redundancy in implementations by storing only the basis elements of semantic relations. The research presented in this paper is an example of an application of Relational Concept Analysis to lexical databases and knowledge representation systems (cf. Priss 1996) which is part of a larger framework of research on natural language analysis and formalization.  相似文献   

16.
Following the development of question-answering systems, consultation systems have emerged as a useful integration between data-base management, knowledge representation, and natural-language interaction with the computer. The idea developed in this paper is to insert in a consultation system an input language which is formal but natural-oriented. That is a feasible input, which allows a natural communication with the computer without involving too many theoretical difficulties and practical disadvantages in terms of cost and execution time. The paper illustrates an implementation based on the classical facilities of the goal-oriented languages. Using microplanner as implementation language, we directly obtain the deep structure of the input quasinatural language expressed in microplanner, and this makes the deduction activity more reliable and direct. The problems related to knowledge representation and basic functions of the system are discussed; examples of its use are illustrated.  相似文献   

17.
Compilers and optimizers for declarative query languages use some form of intermediate language to represent user-level queries. The advent of compositional query languages for orthogonal type systems (e.g., OQL) calls for internal query representations beyond extensions of relational algebra. This work adopts a view of query processing which is greatly influenced by ideas from the functional programming domain. A uniform formal framework is presented which covers all query translation phases, including user-level query language compilation, query optimization, and execution plan generation. We pursue the type-based design—based on initial algebras—of a core functional language which is then developed into an intermediate representation that fits the needs of advanced query processing. Based on the principle of structural recursion we extend the language by monad comprehensions (which provide us with a calculus-style sublanguage that proves to be useful during the optimization of nested queries) and combinators (abstractions of the query operators implemented by the underlying target query engine). Due to its functional nature, the language is susceptible to program transformation techniques that were developed by the functional programming as well as the functional data model communities. We show how database query processing can substantially benefit from these techniques.  相似文献   

18.
Hebrew and Arabic are related but mutually incomprehensible languages with complex morphology and scarce parallel corpora. Machine translation between the two languages is therefore interesting and challenging. We discuss similarities and differences between Hebrew and Arabic, the benefits and challenges that they induce, respectively, and their implications on machine translation. We highlight the shortcomings of using English as a pivot language and advocate a direct, transfer-based and linguistically-informed (but still statistical, and hence scalable) approach. We report preliminary results of the two systems we are currently developing, for translation in both directions.  相似文献   

19.
This paper presents two grammars for reading numbers of classical andmodern Arabic language. The grammars make use of the structured Arabiccounting system to present an accurate and compact grammar that can beeasily implemented in different platforms. Automating the process ofreading numbers from its numerical representation to its sentential formhas many applications. Inquiring about your bank balance over the phone,automatically writing the amount of checks (from numerical form toletter form), and reading for the blind people are some of the fieldsthat automated reading of numbers can be of service. The parsing problemof sentential representation of numbers in the Arabic language is alsoaddressed. A grammar to convert from sentential representation to thenumerical representation is also presented. Grammars presented can beused to translate from the sentential Arabic numbers to sententialEnglish numbers, and vice versa, by using the common numericalrepresentation as an intermediate code. Such methodology can be used toaid the automatic translation between the two natural languages. Allgrammars described in this paper have been implemented on a UNIX system.Examples of different number representations and the output of theimplementation of the grammars are given as part of the paper.  相似文献   

20.
An intelligent machine can be thought of as a human friendly machine system that identifies or understands the problems of generating tasks, developing plans, compiling and executing the tasks automatically. High performance dependable intelligent systems must understand and translate natural languages. The translation of natural languages for intelligent systems has been one of the most challenging problems in intelligent systems from the very beginning. It is the responsibility of a translation system to assign the responsibility of task generation ability of the machine to automate a program generation.

In this paper, the problem of advanced machine translation capabilities is approached by examining the Sinhala natural language. Sinhalese has not been analyzed using computational linguistics. Our earlier system on Sinhalese morphology is the first attempt of such a study. This paper extends it to syntactic and semantic analysis. We formalize grammar rules for unit, phrase, clause and sentence, and developed a semantically characteristic Sinhalese dictionary, and a conceptual dictionary based on English, Japanese, and Sinhalese. Syntactic and semantic analyses are implemented on the computer and sound experimental results are obtained.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号