期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Lexical choice for complex noun phrases: Structure,modifiers, and determiners

Michael Elhadad 《Machine Translation》1996,11(1-3):159-184

This paper presents a lexical choice component for complex noun phrases. We first explain why lexical choice for NPs deserves special attention within the standard pipeline architecture for a generator. The task of the lexical chooser for NPs is more complex than for clauses because the syntax of NPs is less understood than for clauses, and therefore, syntactic realization components, while they accept a predicate-argument structure as input for clauses, require a purely syntactic tree as input for NPs. The task of mapping conceptual relations to different syntactic modifiers is therefore left to the lexical chooser for NPs.The paper focuses on the syntagmatic aspect of lexical choice, identifying a process called NP planning. It focuses on a set of communicative goals that NPs can satisfy and specifies an interface between the different components of the generator and the lexical chooser.The technique presented for NP planning encapsulates a rich lexical knowledge and allows for the generation of a wide variety of syntactic constructions. It also allows for a large paraphrasing power because it dynamically maps conceptual information to various syntactic slots. 相似文献

2.

Multilingual generation: The role of telicity in lexical choice and syntactic realization

Bonnie J. Dorr Mari Broman Olsen 《Machine Translation》1996,11(1-3):37-74

Multilingual generation in machine translation (MT) requires a knowledge organization that facilitates the task of lexical choice, i.e. selection of lexical units to be used in the generation of a target-language sentence. This paper investigates the extent to which lexicalization patterns involving the lexical aspect feature [+telic] may be used for translating events and states among languages. Telicity has been correlated syntactically with both transitivity and unaccusativity, and semantically with Talmy's path of a motion event, the representation of which characterizes languages parametrically.Taking as our starting point the syntactic/semantic classification in Levin's English Verb Classes and Alternations, we examine the relation between telicity and the syntactic contexts, or alternations, outlined in this work, identifying systematic relations between the lexical aspect features and the semantic components that potentiate these alternations. Representing lexical aspect — particularly telicity — is therefore crucial for the tasks of lexical choice and syntactic realization. Having enriched the data in Levin (by correlating the syntactic alternations (Part I) and semantic verb classes (Part II) and marking them for telicity) we assign to verbs lexical semantic templates (LSTs). We then demonstrate that it is possible from these templates to build a large-scale repository for lexical conceptual structures which encode meaning components that correspond to different values of the telicity feature. The LST framework preserves both semantic content and semantic structure (following Grimshaw during the processes of lexical choice and syntactic realization. Application of this model identifies precisely where the Knowledge Representation component may profitably augment our rules of composition, to identify cases where the interlingua underlying the source language sentence must be either reduced or modified in order to produce an appropriate target language sentence. 相似文献

3.

Identifying semantic equivalence for multi-document summarisation

Eamonn Newman Joe Carthy John Dunnion Nicola Stokes 《Artificial Intelligence Review》2006,25(1-2):55-65

We describe Semantic Equivalence and Textual Entailment Recognition, and outline a system which uses a number of lexical, syntactic and semantic features to classify pairs of sentences as “semantically equivalent”. We describe an experiment to show how syntactic and semantic features improve the performance of an earlier system, which used only lexical features. We also outline some areas for future work. 相似文献

4.

基于词汇支配度的汉语依存分析模型 总被引：7，自引：0，他引：7

刘挺马金山李生《软件学报》2006,17(9):1876-1883

如何应用句法结构和词汇化是句法分析建模所面临的两个主要问题,汉语依存分析对这两方面做了初步的探索.首先通过对大规模依存树库的统计学习,获取其中的词汇依存信息,建立了一个词汇化的概率分析模型.然后引入词汇支配度的概念,以充分利用了句子中的结构信息.词汇化方法有效地弥补了以前工作中词性信息的粒度过粗问题.同时,词汇支配度增强了对句法结构的识别,有效地避免了非法结构的生成.在4 000句的测试集上,依存分析获得了约74%的正确率. 相似文献

5.

显式融合词法和句法特征的抽取式机器阅读理解模型

闫维宏李少博单丽莉孙承杰刘秉权《计算机系统应用》2022,31(9):352-359

预训练语言模型虽然能够为每个词提供优良的上下文表示特征,但却无法显式地给出词法和句法特征,而这些特征往往是理解整体语义的基础.鉴于此,本文通过显式地引入词法和句法特征,探究其对于预训练模型阅读理解能力的影响.首先,本文选用了词性标注和命名实体识别来提供词法特征,使用依存分析来提供句法特征,将二者与预训练模型输出的上下文表示相融合.随后,我们设计了基于注意力机制的自适应特征融合方法来融合不同类型特征.在抽取式机器阅读理解数据集CMRC2018上的实验表明,本文方法以极低的算力成本,利用显式引入的词法和句法等语言特征帮助模型在F₁和EM指标上分别取得0.37%和1.56%的提升. 相似文献

6.

平滑学习算法在机器翻译消歧中的应用

刘颖《计算机应用与软件》2001,18(11):56-59

机器翻译中,在词性标注和句法语义分析阶段经常会遇到歧义,使用基于统计方法的词汇评分和句法语义评分就是对词性标注和句法语义分析阶段产生的歧义进行消歧,在用统计方法消歧时,经常遇到的一个现象就是数据稀疏问题,本文对词汇评分和句法语义评分遇到数据稀疏现象使用改进的Turing公式来平滑参数,给出平滑算法对词汇评分平滑的处理过程,在实验中给出语料与参数数量,正确率的实验结果。相似文献

7.

汉语析句的形式化问题

彭炜明宋继华王宁《中文信息学报》2016,30(3):30-35

该文讨论了形式化析句的基本概念,从语言和言语、描写和解释、层次和线性、短语和句式、词法和句法等多个语言学视角梳理了汉语析句中的形式化问题,并介绍了在句本位语法图解析句形式化中总结的若干经验、原则和待解决问题。相似文献

8.

Linguistic measures for automatic machine translation evaluation

Jesús Giménez Lluís Màrquez 《Machine Translation》2010,24(3-4):209-240

Assessing the quality of candidate translations involves diverse linguistic facets. However, most automatic evaluation methods in use today rely on limited quality assumptions, such as lexical similarity. This introduces a bias in the development cycle which in some cases has been reported to carry very negative consequences. In order to tackle this methodological problem, we explore a novel path towards heterogeneous automatic Machine Translation evaluation. We have compiled a rich set of specialized similarity measures operating at different linguistic dimensions and analyzed their individual and collective behaviour over a wide range of evaluation scenarios. Results show that measures based on syntactic and semantic information are able to provide more reliable system rankings than lexical measures, especially when the systems under evaluation are based on different paradigms. At the sentence level, while some linguistic measures perform better than most lexical measures, some others perform substantially worse, mainly due to parsing problems. Their scores are, however, suitable for combination, yielding a substantially improved evaluation quality. 相似文献

9.

基于CCMO的现代汉语介词词义结构描写

邱庆山《计算机工程与应用》2014,50(6):19-24

认知组合性词义观（CCMO）是对词义生成和理解之规律的一种模型化观照,是进行词义结构描写的基础性理念。在理解词义时,CCMO要求牢记词义具有认知特性和句法组合特性,要从认知和句法组合的角度出发,坚持认为词义结构的生成是认知结构向语言符号投射的结果,而词义结构的显现则是词语间的句法组合驱动的结果。基于CCMO,句法结构的生成可看成是词义结构扩展的结果,词义结构是句法结构生成的基础。以“在、从、经”类介词为例,描写构建了介词词义的球结构模型,说明了句法语义运算功能是介词词义的本质特性。相似文献

10.

Secondary predication and the lexical representation of verbs

T. R. Rapoport 《Machine Translation》1990,5(1):31-55

In this paper it is assumed that syntactic structure is projected from the lexicon. The lexical representation, which encodes the linguistically relevant aspects of the meanings of words, thus determines and constrains the syntax. Therefore, if semantic analysis of syntactic structures is to be possible, it is necessary to determine the content and structure of lexical semantic representations. The paper argues for a certain form of lexical representation by presenting the problem of a particular non-standard structure, the verb phrase of the form V-NP-Adj corresponding to various constructions of secondary predication in English. It is demonstrated that the solution to the semantic analysis of this structure lies in the meaning of the structure's predicators, in particular the lexical semantic representation of the verb. Verbs are classified according to the configuration of the lexical semantic representations, whether basic or derived. It is these specific configurations that restrict the possibilities of secondary predication. Given the class of a verb, its relation to the secondary predicate is predictable; and the correct interpretation of the V-NP-Adj string is therefore possible.This work is based on papers presented to the 1988 meetings of the Canadian Linguistic Association and the Brandeis Workshop on Theoretical and Computational Issues in Lexical Semantics. I am grateful to the audiences at these two meetings for comments, and to Anna-Maria di Sciullo, Diane Massam, Yves Roberge and James Pustejovsky for helpful discussion. I also thank SSHRC for funding the research of which this work forms part. 相似文献

11.

Compositionality and lexical alignment of multi-word terms

Emmanuel Morin Béatrice Daille 《Language Resources and Evaluation》2010,44(1-2):79-95

The automatic compilation of bilingual lists of terms from specialized comparable corpora using lexical alignment has been successful for single-word terms (SWTs), but remains disappointing for multi-word terms (MWTs). The low frequency and the variability of the syntactic structures of MWTs in the source and the target languages are the main reported problems. This paper defines a general framework dedicated to the lexical alignment of MWTs from comparable corpora that includes a compositional translation process and the standard lexical context analysis. The compositional method which is based on the translation of lexical items being restrictive, we introduce an extended compositional method that bridges the gap between MWTs of different syntactic structures through morphological links. We experimented with the two compositional methods for the French–Japanese alignment task. The results show a significant improvement for the translation of MWTs and advocate further morphological analysis in lexical alignment. 相似文献

12.

Language independent semantic kernels for short-text classification

《Expert systems with applications》2014,41(2):735-743

Short-text classification is increasingly used in a wide range of applications. However, it still remains a challenging problem due to the insufficient nature of word occurrences in short-text documents, although some recently developed methods which exploit syntactic or semantic information have enhanced performance in short-text classification. The language-dependency problem, however, caused by the heavy use of grammatical tags and lexical databases, is considered the major drawback of the previous methods when they are applied to applications in diverse languages. In this article, we propose a novel kernel, called language independent semantic (LIS) kernel, which is able to effectively compute the similarity between short-text documents without using grammatical tags and lexical databases. From the experiment results on English and Korean datasets, it is shown that the LIS kernel has better performance than several existing kernels. 相似文献

13.

一种基于优先关系的LSD分析算法 总被引：1，自引：0，他引：1

李沐姚天顺《计算机研究与发展》2001,38(5):597-603

句法分析是机器翻译中的一个重要环节,首先介绍了基于LSD方法进行句法分析的基金概念,然后提出了一种苦于优先关系的确定性LSD算法,主要讨论了基于名法结构信息优先关系和基于词谍统计优先关系的句法结构歧义消解策略,并给出了具体实现方法和复杂性分析,实验结果表明,该方法在保持确定性算法分析效率的前提下,提高了分析结果的正确率和规则应用的召回率。相似文献

14.

Improve syntax-based translation using deep syntactic structures

Xianchao Wu Takuya Matsuzaki Jun’ichi Tsujii 《Machine Translation》2010,24(2):141-157

相似文献

15.

What can be learned from raw texts?

Roberto Basili Maria Teresa Pazienza Paola Velardi 《Machine Translation》1993,8(3):147-173

The growing availability of large on-line corpora encourages the study of word behaviour directly from accessible raw texts. However, the methods by which lexical knowledge should be extracted from plain texts is still a matter of debate and experimentation. In this paper we present an integrated tool for lexical acquisition from corpora, ARIOSTO, based on a hybrid methodology that combines typical NLP techniques, such as (shallow) syntax and semantic markers, with numerical processing. The lexical data extracted by this method, calledclustered association data, are used for a variety of interesting purposes, such as the detection of selectional restrictions, the derivation of syntactic ambiguity criteria and the acquisition of taxonomic relations. 相似文献

16.

A separation of syntactic and nonsyntactic (1,+k)-branching programs

D. Sieling 《Computational Complexity》2000,9(3-4):247-263

For (1,+k)-branching programs and read-k-times branching programs syntactic and nonsyntactic variants can be distinguished. The nonsyntactic variants correspond in a natural way to sequential computations with restrictions on reading the input while lower bound proofs are easier for the syntactic variants. In this paper it is shown that nonsyntactic (1,+k)-branching programs are really more powerful than syntactic (1,+k)-branching programs by presenting an explicitly defined function with polynomial size nonsyntactic (1,+1)-branching programs but only exponential size syntactic (1,+k)-branching programs. Another separation of these variants of branching programs is obtained by comparing the complexity of the satisfiability test for both variants. Received: July 16, 1998. 相似文献

17.

基于实例的汉语语义超常搭配的自动发现

杨芸李剑锋周昌乐黄孝喜《计算机科学》2008,35(9):195-197

搭配在语言信息处理中具有重要的应用价值,通常我们主要关注符合语法规则的常规搭配.实际上,语言中还存在着大量的语法上符合规则而语义上不符合常规认知的语义超常搭配现象,而这样的现象与语言的隐喻表达和思维有着密切的联系,对自然语言理解将产生重要的影响.本文面向汉语隐喻理解来研究文本中语义超常搭配的自动发现方法,从汉语语义超常搭配判断的心理机制出发,提出了基于实例的汉语语义超常搭配识别的量化计算方法.实验以动词为中心的搭配语料为测试集,语义超常搭配识别的召回率为80.7%,准确率为81.5%.实验结果表明本文所给出的基于实例语义超常搭配判断的办法是切实可行的. 相似文献

18.

观点句中评价对象/属性的缺省项识别方法研究

刘慧慧王素格赵策力《中文信息学报》2014,28(6):175-182

在多对象、多属性的评论文本中,评价对象和评价属性的缺省识别对于观点挖掘有着重要的作用。针对情感观点句中评价对象和评价属性的缺省问题,该文提出一种有效的缺省项识别方法。首先构造缺省项识别规则集,用于获取待识别的缺省项侯选集;将缺省项识别问题看作一个二元分类问题,选用词法和依存句法作为特征,使用决策树分类算法C4.5训练分类器模型,在测试集上对待识别的缺省项进行判别。实验结果表明,使用依存句法特征集分类的F值优于词法特征集约2%。将词法和依存句法两类特征融合与单类特征相比,分类精确率和F值分别提高了10%和5%左右,说明词法特征和依存句法特征的融合有利于缺省项识别。相似文献

19.

Efficient accurate syntactic direct translation models: one tree at a time

Hany Hassan Khalil Sima’an Andy Way 《Machine Translation》2012,26(1-2):121-136

A challenging aspect of Statistical Machine Translation from Arabic to English lies in bringing the Arabic source morpho-syntax to bear on the lexical as well as word-order choices of the English target string. In this article, we extend the feature-rich discriminative Direct Translation Model 2 (DTM2) with a novel linear-time parsing algorithm based on an eager, incremental interpretation of Combinatory Categorial Grammar. This way we can reap the benefits of a target syntactic enhancement that leads to more grammatical output while also enabling dynamic decoding without the risk of blowing up decoding space and time requirements. Our model defines a mix of model parameters, some of which involve DTM2 source morpho-syntactic features, and others are novel target side syntactic features. Alongside translation features extracted from the derived parse tree, we explore syntactic features extracted from the incremental derivation process. Our empirical experiments show that our model significantly outperforms the state-of-the-art DTM2 system. 相似文献

20.

电子词典与词汇知识表达 总被引：3，自引：0，他引：3

陈克健《中文信息学报》2002,16(4):2-12

词汇知识的表达与取得是自然语言处理极须克服的问题,本论文提出一个初步的架构与常识的抽取机制。语言处理系统是以词为讯息处理单元,登录在词项下的讯息可以包括统计、语法、语义、常识等。语言分析系统利用〈词〉为引得取得输入语句中相关词汇的语法、语义、常识等信息,让语言处理系统有更好的聚焦能力,可以藉以解决分词歧义、结构的歧义。对于不易以人工整理取得的常识,本论文也提出计算机自动学习的策略,以渐进式的方式累积概念与概念之间的语义关系,来增进语言系统的分析能力。这个策略可行的几个关键技术,包括(1)未登录词判别及语法语义自动分类, (2)词义分析, (3)应用语法语义及常识的剖析系统。相似文献