期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Morphological mismatches in machine translation

Igor Mel’čuk Leo Wanner 《Machine Translation》2008,22(3):101-152

This paper addresses one of the least studied, although very important, problems of machine translation—the problem of morphological mismatches between languages and their handling during transfer. The level at which we assume transfer to be carried out is the Deep-Syntactic Structure (DSyntS) as proposed in the Meaning-Text Theory. DSyntS is abstract enough to avoid all types of surface morphological divergences. For the remaining ‘genuine’ divergences between grammatical significations, we propose a morphological transfer model. To illustrate this model, we apply it to the transfer of grammemes of definiteness and aspect for the language pair Russian–German and German–Russian, respectively. 相似文献

2.

Regular Languages Defined by Generalized First-Order Formulas with a Bounded Number of Bound Variables

Straubing Thérien 《Theory of Computing Systems》2008,36(1):29-69

Abstract. We consider generalized first-order sentences over < using both ordinary and modular quantifiers. It is known that the languages definable by such sentences are exactly the regular languages whose syntactic monoids contain only solvable groups. We show that any sentence in this logic is equivalent to one using three variables only, and we prove that the languages expressible with two variables are those whose syntactic monoids belong to a particular pseudovariety of finite monoids, namely the wreath product of the pseudovariety DA (which corresponds to the languages definable by ordinary first-order two-variable sentences) with the pseudovariety of finite solvable groups. This generalizes earlier work of Thérien and Wilke on the expressive power of two-variable formulas in which only ordinary quantifiers are present. If all modular quantifiers in the sentence are of the same prime modulus, this provides an algorithm to decide if a regular language has such a two-variable definition. 相似文献

3.

Regular Languages Defined by Generalized First-Order Formulas with a Bounded Number of Bound Variables

Howard Straubing Denis Thérien 《Theory of Computing Systems》2003,36(1):29-69

Abstract. We consider generalized first-order sentences over < using both ordinary and modular quantifiers. It is known that the languages definable by such sentences are exactly the regular languages whose syntactic monoids contain only solvable groups. We show that any sentence in this logic is equivalent to one using three variables only, and we prove that the languages expressible with two variables are those whose syntactic monoids belong to a particular pseudovariety of finite monoids, namely the wreath product of the pseudovariety DA (which corresponds to the languages definable by ordinary first-order two-variable sentences) with the pseudovariety of finite solvable groups. This generalizes earlier work of Thérien and Wilke on the expressive power of two-variable formulas in which only ordinary quantifiers are present. If all modular quantifiers in the sentence are of the same prime modulus, this provides an algorithm to decide if a regular language has such a two-variable definition. 相似文献

4.

机器翻译中异化现象的分析及映射运算

郭宏蕾姚天顺《软件学报》1998,9(1):18-24

本文从词汇－语义角度分析了机器翻译中与汉英语言相关的一些异化现象，探讨了语义－句法映射运算集的有效性和合理性．同时，本文讨论了通过变异映射集、变异类型指示器和参数合一机制解决机器翻译中异化现象的方法．这些方法提高了译准率，使生成的句子既能正确表达原中间语言的语义，又符合目标语言的表述习惯．相似文献

5.

Multilingual generation: The role of telicity in lexical choice and syntactic realization

Bonnie J. Dorr Mari Broman Olsen 《Machine Translation》1996,11(1-3):37-74

Multilingual generation in machine translation (MT) requires a knowledge organization that facilitates the task of lexical choice, i.e. selection of lexical units to be used in the generation of a target-language sentence. This paper investigates the extent to which lexicalization patterns involving the lexical aspect feature [+telic] may be used for translating events and states among languages. Telicity has been correlated syntactically with both transitivity and unaccusativity, and semantically with Talmy's path of a motion event, the representation of which characterizes languages parametrically.Taking as our starting point the syntactic/semantic classification in Levin's English Verb Classes and Alternations, we examine the relation between telicity and the syntactic contexts, or alternations, outlined in this work, identifying systematic relations between the lexical aspect features and the semantic components that potentiate these alternations. Representing lexical aspect — particularly telicity — is therefore crucial for the tasks of lexical choice and syntactic realization. Having enriched the data in Levin (by correlating the syntactic alternations (Part I) and semantic verb classes (Part II) and marking them for telicity) we assign to verbs lexical semantic templates (LSTs). We then demonstrate that it is possible from these templates to build a large-scale repository for lexical conceptual structures which encode meaning components that correspond to different values of the telicity feature. The LST framework preserves both semantic content and semantic structure (following Grimshaw during the processes of lexical choice and syntactic realization. Application of this model identifies precisely where the Knowledge Representation component may profitably augment our rules of composition, to identify cases where the interlingua underlying the source language sentence must be either reduced or modified in order to produce an appropriate target language sentence. 相似文献

6.

Interlingua-based English–Hindi Machine Translation and Language Divergence

Shachi Dave Jignashu Parikh Pushpak Bhattacharyya 《Machine Translation》2001,16(4):251-304

Interlingua and transfer-based approaches tomachine translation have long been in use in competing and complementary ways. The former proves economical in situations where translation among multiple languages is involved, and can be used as a knowledge-representation scheme. But given a particular interlingua, its adoption depends on its ability (a) to capture the knowledge in texts precisely and accurately and (b) to handle cross-language divergences. This paper studies the language divergence between English and Hindi and its implication to machine translation between these languages using the Universal Networking Language (UNL). UNL has been introduced by the United Nations University, Tokyo, to facilitate the transfer and exchange of information over the internet. The representation works at the level of single sentences and defines a semantic net-like structure in which nodes are word concepts and arcs are semantic relations between these concepts. The language divergences between Hindi, an Indo-European language, and English can be considered as representing the divergences between the SOV and SVO classes of languages. The work presented here is the only one to our knowledge that describes language divergence phenomena in the framework of computational linguistics through a South Asian language. 相似文献

7.

Lexical paraphrases in multilingual sentence generation

Manfred Stede 《Machine Translation》1996,11(1-3):75-107

We describe the architecture of a sentence generation module that maps a language-neutral deep representation to a language-specific sentence-semantic specification, which is then handed over to a conventional front-end generator. Lexicalization is seen as the main task in the mapping step, and we specifically examine the role of verb semantics in the process. By separating the various kinds of knowledge involved, for related languages (such as English and German) the task of multilingual sentence generation can be treated as a variant of the monolingual paraphrasing problem. 相似文献

8.

A two-layered interface architecture

Judith D. Richardson Thomas J. Wheeler 《Computer Standards & Interfaces》1991,13(1-3):151-154

The problems of interfacing to external subsystems and using multiple paradigms in a single software system center on resolving the impedance mismatch (the differences in the data models and thought patterns of each paradigm) and how to reflect the differences across the language boundary. One technique is to build an interface between the two paradigms. The interface should strive to resolve the mismatches while providing access to those language features valuable to the problem solution. This means that semantic issues as well as syntactic issues must be addressed. Features of the implementation languages that contribute to the mismatch need to be identified and examined in order to develop a solution. This paper describes a two-layer architecture based upon the idea of abstract interfaces. This architecture appears to provide a basis for interfacing programming languages to object databases which addresses the semantic gap, or impedance mismatch, between the two as well as the syntactic differences. 相似文献

9.

Formal syntax methods for natural language

Dale Johnson Barrett R. Bryant 《Information Processing Letters》1984,19(3):135-143

This paper describes on-going research into the applications of some techniques normally used to formally specify and analyze the context-sensitive syntax of programming languages to the specification and analysis of the syntax of a natural language, namely English. The specific formal methods presently being investigated are two-level grammar (2LG) and the Vienna Definition Language (VDL). A preliminary subset of English has been established consisting of fifteen basic sentence patterns. 2LG and VDL specifications are given for one of these sentence types and the syntactic analysis of an English sentence using each of the two specifications is illustrated through an example. 相似文献

10.

Relational Interpretations of Recursive Types in an Operational Setting

Lars Birkedal Robert Harper 《Information and Computation》1999,155(1-2)

Relational interpretations of type systems are useful for establishing properties of programming languages. For languages with recursive types it is difficult to establish the existence of a relational interpretation. The usual approach is to pass to a domain-theoretic model of the language and, exploiting the structure of the model, to derive relational properties of it. We investigate the construction of relational interpretations of recursive types in a purely operational setting, drawing on recent ideas from domain theory and operational semantics as a guide. We prove syntactic minimal invariance for an extension of PCF with a recursive type, a syntactic analogue of the minimal invariance property used by Freyd and Pitts to characterize the domain interpretation of a recursive type. As Pitts has shown in the setting of domains, syntactic minimal invariance suffices to establish the existence of relational interpretations. We give two applications of this construction. First, we derive a notion of logical equivalence for expressions of the language that we show coincides with experimental equivalence and which, by virtue of its construction, validates useful induction and coinduction principles for reasoning about the recursive type. Second, we give a relational proof of correctness of the continuation-passing transformation, which is used in some compilers for functional languages. 相似文献

11.

面向机器翻译的中国手语的理解与合成 总被引：4，自引：0，他引：4

徐琳高文《计算机学报》2000,23(1):60-65

自然语言与可视化语言之间的自动翻译研究具有重大的现实意义和学术研究价值,它是一个崭新的、有发展前任的研究领域。该文从机器翻译的角度来考察汉语和中国手语之间的相同之处和差异,探讨两种语言在语序、句子结构、短语结构、特殊词类等方面的特点,建立了汉语中国手语机器翻译的一系列规则。在此基础之上,采用规则解释方法实现了一个汉语至可视化语言中国手语的翻译系统。相似文献

12.

用于关系抽取的注意力图长短时记忆神经网络

下载免费PDF全文

张勇高大林巩敦卫陶一凡《智能系统学报》2021,16(3):518-527

关系抽取是信息获取中一项关键技术。句子结构树能够捕获单词之间的长距离依赖关系,已被广泛用于关系抽取任务中。但是,现有方法存在过度依赖句子结构树本身信息而忽略外部信息的不足。本文提出一种新型的图神经网络模型,即注意力图长短时记忆神经网络(attention graph long short term memory neural network, AGLSTM)。该模型采用一种软修剪策略自动学习对关系抽取有用的句子结构信息;通过引入注意力机制,结合句法图信息学习句子的结构特征;并设计一种新型的图长短时记忆神经网络,使得模型能够更好地融合句法图信息和句子的时序信息。与10种典型的关系抽取方法进行对比,实验验证了该模型的优异性能。相似文献

13.

Konrad Zuse's Plankalku¨l: the first high-level, “non vonNeumann” programming language

Giloi W.K. 《Annals of the History of Computing, IEEE》1997,19(2):17-24

Konrad Zuse was the first person in history to build a working digital computer, a fact that is still not generally acknowledged. Even less known is that in the years 1943-1945, Zuse developed a high-level programming model and, based on it, an algorithmic programming language called Plankalku¨l (Plan Calculus). The Plankalku¨l features binary data structure types, thus supporting a loop-free programming style for logical or relational problems. As a language for numerical applications, Plankalku¨l already had the essential features of a “von Neumann language”, though at the level of an operator language. Consequently, the Plankalku¨l is in some aspects equivalent and in others more powerful than the von Neumann programming model that came to dominate programming for a long time. To find language concepts similar to those of the Plankalku¨l, one has to look at “non-von Neumann languages” such as APL or the relational algebra. This paper conveys the syntactic and semantic flavor of the Plankalku¨l, without presenting all its syntactic idiosyncrasies. Rather, it points out that the Plankalku¨l was not only the first high-level programming language but in some aspects conceptually ahead of the high-level languages that evolved a decade later 相似文献

14.

基于神经网络的复句判定及其关系识别研究

贾旭楠魏庭新曲维光顾彦慧周俊生《计算机工程》2021,47(11):54-61

复句是自然语言的基本单位之一,复句的判定及其语义关系的识别,对于句法解析、篇章理解等都有着非常重要的作用。基于神经网络模型识别自然语料中的复句,判断其复句关系,构造复句判定和复句关系识别联合模型,以最大程度地减少误差传递。在复句判定任务中通过Bi-LSTM获得上下文语义信息,采用注意力机制捕获句内跨距离搭配信息,利用CNN捕获句子局部信息。在复句关系识别任务中,使用Bert增强句子的语义表示,运用Tree-LSTM对句法结构和成分标记进行建模。在CAMR中文语料上的实验结果表明,基于注意力机制的复句判定模型F1值达到91.7%,基于Tree-LSTM的复句关系识别模型F1值达到69.15%。在联合模型中,2项任务的F1值分别达到92.15%和66.25%,说明联合学习能够使不同任务获得更多特征,从而提高模型性能。相似文献

15.

开放域上基于深度语义计算的复述模板获取方法 总被引：1，自引：0，他引：1

刘明童张玉洁徐金安陈钰枫《中文信息学报》2018,32(2):94-101

利用实体关系从网络大规模单语语料获取复述模板的方法可以规避对单语平行语料或可比语料的依赖,但是后期需要人工对有语义差异的关系模板分类后获取复述模板。针对这一遗留问题,该文提出基于深度语义计算的复述模板自动获取方法,首先设计基于统计特征的模板裁剪方法,从非复述语料中获取高质量的关系模板,然后设计基于深度语义计算的关系模板聚类方法获取高精度的复述模板。我们在四类实体关系数据上的实验结果表明,该方法实现了关系模板的自动获取与自动聚类,可以获得语义相近度更高、表现形式多样的复述模板。相似文献

16.

Annotation of sentence structure

Markéta Lopatková Petr Homola Natalia Klyueva 《Language Resources and Evaluation》2012,46(1):25-36

The focus of this article is on the creation of a collection of sentences manually annotated with respect to their sentence structure. We show that the concept of linear segments—linguistically motivated units, which may be easily detected automatically—serves as a good basis for the identification of clauses in Czech. The segment annotation captures such relationships as subordination, coordination, apposition and parenthesis; based on segmentation charts, individual clauses forming a complex sentence are identified. The annotation of a sentence structure enriches a dependency-based framework with explicit syntactic information on relations among complex units like clauses. We have gathered a collection of 3,444 sentences from the Prague Dependency Treebank, which were annotated with respect to their sentence structure (these sentences comprise 10,746 segments forming 6,341 clauses). The main purpose of the project is to gain a development data—promising results for Czech NLP tools (as a dependency parser or a machine translation system for related languages) that adopt an idea of clause segmentation have been already reported. The collection of sentences with annotated sentence structure provides the possibility of further improvement of such tools. 相似文献

17.

汉语复句关系的特征结构

冯文贺《中文信息学报》2015,29(6):13-22

通常复句关系分析基于分类机制,由于缺乏统一逻辑,面临不少分歧。该文提出基于特征结构描写复句关系。复句关系的特征结构由[特征: 值]元组构成,该文初步构拟汉语复句关系的特征结构系统,并用于具体分析。较之分类机制,特征结构对复句关系的描写深刻,且分析判断准确、易行。目前特征结构系统开放,但特征调整,可以完善而不大量更改已有特征描写结果。特征结构可用于复句关系的深度语义分析资源构建与计算研究。相似文献

18.

Integrating linguistic primitives in learning context-dependentrepresentation

Chan S.W.K. 《Knowledge and Data Engineering, IEEE Transactions on》2001,13(2):157-175

The paper presents an explicit connectionist-inspired, language learning model in which the process of settling on a particular interpretation for a sentence emerges from the interaction of a set of “soft” lexical, semantic, and syntactic primitives. We address how these distinct linguistic primitives can be encoded from different modular knowledge sources but strongly involved in an interactive processing in such a way as to make implicit linguistic information explicit. The learning of a quasi-logical form called context-dependent representation, is inherently incremental and dynamical in such a way that every semantic interpretation will be related to what has already been presented in the context created by prior utterances. With the aid of the context-dependent representation, the capability of the language learning model in text understanding is strengthened. This approach also shows how the recursive and compositional role of a sentence as conveyed in the syntactic structure can be modeled in a neurobiologically motivated linguistics based on dynamical systems rather on combinatorial symbolic architecture. Experiments with more than 2000 sentences in different languages illustrating the influences of the context-dependent representation on semantic interpretation, among other issues, are included 相似文献

19.

结合语法结构和语义信息的情感三元组提取

杨芳捷冯广唐业凯《计算机系统应用》2024,33(3):255-263

针对目前大多数方面情感三元组提取方法存在着没有充分考虑语法结构和语义相关性的问题. 本文提出一种结合语法结构和语义信息的方面情感三元组提取模型, 首先提出使用依赖解析器得到所有依赖弧的概率矩阵构建语法图, 提取丰富的语法结构信息. 其次利用自注意力机制构建语义图, 表示单词与单词之间的语义相关性, 从而减低噪声词的干扰. 最后设计了一个相互仿射变换层, 让模型可以更好地交换语法图和语义图之间的相关特征, 提升模型情感三元组提取的表现. 在多个公开数据集上进行验证. 实验表明, 与现有的情感三元组提取模型相比, 精确度(P)、召回率(R)和F1值整体都有提高, 验证了结合语法结构和语义信息在方面情感三元组提取的有效性. 相似文献

20.

The use of lexical semantics in interlingual machine translation

Bonnie J. Dorr 《Machine Translation》1992,7(3):135-193

This paper describes the lexical-semantic basis for UNITRAN, an implemented scheme for translating Spanish, English, and German bidirectionally. Two claims made here are that the current representation handles many distinctions (or divergences) across languages without recourse to language-specific rules and that the lexical-semantic framework provides the basis for a systematic mapping between the interlingua and the syntactic structure. The representation adopted is an extended version of lexical conceptual structure which is suitable to the task of translating between divergent structures for two reasons: (1) it provides an abstraction of language-independent properties from structural idiosyncrasies; and (2) it is compositional in nature. The lexical-semantic approach addresses the divergence problem by using a linguistically grounded mapping that has access to parameter settings in the lexicon. We will examine a number of relevant issues including the problem of defining primitives, the issue of interlinguality, the cross-linguistic coverage of the system, and the mapping between the syntactic structure and the interlingua. A detailed example of lexical-semantic composition will be presented. 相似文献