首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Multilingual generation in machine translation (MT) requires a knowledge organization that facilitates the task of lexical choice, i.e. selection of lexical units to be used in the generation of a target-language sentence. This paper investigates the extent to which lexicalization patterns involving the lexical aspect feature [+telic] may be used for translating events and states among languages. Telicity has been correlated syntactically with both transitivity and unaccusativity, and semantically with Talmy's path of a motion event, the representation of which characterizes languages parametrically.Taking as our starting point the syntactic/semantic classification in Levin's English Verb Classes and Alternations, we examine the relation between telicity and the syntactic contexts, or alternations, outlined in this work, identifying systematic relations between the lexical aspect features and the semantic components that potentiate these alternations. Representing lexical aspect — particularly telicity — is therefore crucial for the tasks of lexical choice and syntactic realization. Having enriched the data in Levin (by correlating the syntactic alternations (Part I) and semantic verb classes (Part II) and marking them for telicity) we assign to verbs lexical semantic templates (LSTs). We then demonstrate that it is possible from these templates to build a large-scale repository for lexical conceptual structures which encode meaning components that correspond to different values of the telicity feature. The LST framework preserves both semantic content and semantic structure (following Grimshaw during the processes of lexical choice and syntactic realization. Application of this model identifies precisely where the Knowledge Representation component may profitably augment our rules of composition, to identify cases where the interlingua underlying the source language sentence must be either reduced or modified in order to produce an appropriate target language sentence.  相似文献   

2.
Support Vector Learning for Semantic Argument Classification   总被引:13,自引:0,他引:13  
The natural language processing community has recently experienced a growth of interest in domain independent shallow semantic parsing—the process of assigning a Who did What to Whom, When, Where, Why, How etc. structure to plain text. This process entails identifying groups of words in a sentence that represent these semantic arguments and assigning specific labels to them. It could play a key role in NLP tasks like Information Extraction, Question Answering and Summarization. We propose a machine learning algorithm for semantic role parsing, extending the work of Gildea and Jurafsky (2002), Surdeanu et al. (2003) and others. Our algorithm is based on Support Vector Machines which we show give large improvement in performance over earlier classifiers. We show performance improvements through a number of new features designed to improve generalization to unseen data, such as automatic clustering of verbs. We also report on various analytic studies examining which features are most important, comparing our classifier to other machine learning algorithms in the literature, and testing its generalization to new test set from different genre. On the task of assigning semantic labels to the PropBank (Kingsbury, Palmer, & Marcus, 2002) corpus, our final system has a precision of 84% and a recall of 75%, which are the best results currently reported for this task. Finally, we explore a completely different architecture which does not requires a deep syntactic parse. We reformulate the task as a combined chunking and classification problem, thus allowing our algorithm to be applied to new languages or genres of text for which statistical syntactic parsers may not be available.Editors: Dan Roth and Pascale FungThis research was partially supported by the ARDA AQUAINT program via contract OCG4423B and by the NSF via grant IIS-9978025.  相似文献   

3.
We introduce a dual-use methodology for automating the maintenance and growth of two types of knowledge sources, which are crucial for natural language text understanding—background knowledge of the underlying domain and linguistic knowledge about the lexicon and the grammar of the underlying natural language. A particularity of this approach is that learning occurs simultaneously with the on-going text understanding process. The knowledge assimilation process is centered around the linguistic and conceptual ‘quality' of various forms of evidence underlying the generation, assessment and on-going refinement of lexical and concept hypotheses. On the basis of the strength of evidence, hypotheses are ranked according to qualitative plausibility criteria, and the most reasonable ones are selected for assimilation into the already given lexical class hierarchy and domain ontology.  相似文献   

4.
提出了一种多语种文本自动生成系统中句子规划阶段的知识表示模型,它以句子结构类、句法规则和语义词典确定文本的具体形式,并详细介绍了该知识表示模型的结构及其匹配准则。  相似文献   

5.
Lexical collocations have particular statistical distributions. We have developed a set of statistical techniques for retrieving and identifying collocations from large textual corpora. The techniques we developed are able to identify collocations of arbitrary length as well as flexible collocations. These techniques have been implemented in a lexicographic tool, Xtract, which is able to automatically acquire collocations with high retrieval performance. Xtract works in three stages. The first stage is based on a statistical technique for identifying word pairs involved in a syntactic relation. The words can appear in the text in any order and can be separated by an arbitrary number of other words. The second stage is based on a technique to extract n-word collocations (or n-grams) in a much simpler way than related methods. These collocations can involve closed class words such as particles and prepositions. A third stage is then applied to the output of stage one and applies parsing techniques to sentences involving a given word pair in order to identify the proper syntactic relation between the two words. A secondary effect of the third stage is to filter out a number of candidate collocations as irrelevant and thus produce higher quality output. In this paper we present an overview of Xtract and we describe several uses for Xtract and the knowledge it retrieves such as language generation and machine translation.Frank Smadja is in the Department of Computer Science at Columbia University and has been working on lexical collocations for his doctoral thesis.  相似文献   

6.
Context: Terminological inconsistencies owing to errors in usage of terms in requirements specifications could result into subtle yet critical problems in interpreting and applying these specifications into various phases of SDLC.Objective: In this paper, we consider special class of terminological inconsistencies arising from term-aliasing, wherein multiple terms spread across a corpus of natural language text requirements may be referring to the same entity. Identification of such alias entity-terms is a difficult problem for manual analysis as well as for developing tool support.Method: We consider the case of syntactic as well as semantic aliasing and propose a systematic approach for identifying these. Identification of syntactic aliasing involves automated generation of patterns for identifying syntactic variances of terms including abbreviations and introduced-aliases. Identification of semantic aliasing involves extracting multidimensional features (linguistic, statistical, and locational) from given requirement text to estimate semantic relatedness among terms. Based upon the estimated relatedness and standard language database based refinement, clusters of potential semantic aliases are generated. Results of these analyses with user refinement lead to generation of entity-term alias glossary and unification of term usage across requirements.Results: A prototype tool was developed to assess the effectiveness of the proposed approach for an automated analysis of term-aliasing in the requirements given as plain English language text. Experimental results suggest that approach is effective in identifying syntactic as well as semantic aliases, however, when aiming for higher recall on larger corpus, user selection is necessary to eliminate false positives.Conclusion: This proposed approach reduces the time-consuming and error-prone task of identifying multiple terms which might be referring to the same entity to a process of tool assisted identification of such term-aliases.  相似文献   

7.
This paper demonstrates the capabilities offoidl, an inductive logic programming (ILP) system whose distinguishing characteristics are the ability to produce first-order decision lists, the use of an output completeness assumption as a substitute for negative examples, and the use originally motivated by the problem of learning to generate the past tense of English verbs; however, this paper demonstrates its superior performance on two different sets of benchmark ILP problems. Tests on the finite element mesh design problem show thatfoidl’s decision lists enable it to produce generally more accurate results than a range of methods previously applied to this problem. Tests with a selection of list-processing problems from Bratko’s introductory Prolog text demonstrate that the combination of implicit negatives and intensionality allowfoidl to learn correct programs from far fewer examples thanfoil. This research was supported by a fellowship from AT&T awarded to the first author and by the National Science Foundation under grant IRI-9310819. Mary Elaine Califf: She is currently pursuing her doctorate in Computer Science at the University of Texas at Austin where she is supported by a fellowship from AT&T. Her research interests include natural language understanding, particularly using machine learning methods to build practical natural language understanding systems such as information extraction systems, and inductive logic programming. Raymond Joseph Mooney: He is an Associate Professor of Computer Sciences at the University of Texas at Austin. He recerived his Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign in 1988. His current research interests include applying machine to natural language understanding, inductive logic programming, knowledge-base and theory refinement, learning for planning, and learning for recommender systems. He serves on the editorial boards of the journalNew Generation Computing, theMachine Learning journal, theJournal of Artificial Intelligence Research, and the journalApplied Intelligence.  相似文献   

8.
We describe a comprehensive framework for text understanding, based on the representation of context. It is designed to serve as a representation of semantics for the full range of interpretive and inferential needs of general natural language processing. Its most distinctive feature is its uniform representation of the various simple and independent linguistic sources that play a role in determining meaning: lexical associations, syntactic restrictions, case-role expectations, and most importantly, contextual effects. Compositional syntactic structure from a shallow parsing is represented in a neural net-based associative memory, where it then interacts through a Bayesian network with semantic associations and the context or "gist" of the passage carried forward from preceding sentences. Experiments with more than 2000 sentences in different languages are included.  相似文献   

9.
People understand utterances in real time. Blank [1] described a natural language processor which also parses sentences in linear time. Like human performance, it stays within fixed and finite short-term memory—indeed, these limits prevent it from being overwhelmed by syntactic ambiguities. This paper reviews the parser and describes enhancements that allow it to perform morphosyntactic agreement and semantic interpretation—still within linear time and with predictable resources. The lexicon has been extended semi-automatically, using data from tagged corpora and WordNet, to cover the typical vocabulary of utterances in the domain of air traffic information service (ATIS). Comparing our interpreter with the performance ofPundit's top down parser [2] for utterances in the ATIS domain, we get improvements of at least an order of magnitude and avoid asymptotic cases due toPundit's unbounded backtracking.RVG, developed with support from the National Science Foundation under grant IRI-8902658, is available along with a Tutorial and User Manual [3], at no cost for research or educational purposes. Contact Glenn D. Blank.  相似文献   

10.

In this paper we present an implemented account of multilingual linguistic resources for multilingual text generation that improves significantly on the degree of reuse of resources both across languages and across applications. We argue that this is a necessary step for multilingual generation in order to reduce the high cost of constructing linguistic resources and to make natural language generation relevant for a wider range of applications particularly, in this paper, for multilingual software and user interfaces. We begin by contrasting a weak and a strong approach to multilinguality in the state of the art in multilingual text generation. Neither approach has provided sufficient principles for organizing multilingual work. We then introduce our framework , where multilingual variation is included as an intrinsic feature of all levels of representation. We provide an example of multilingual tactical generation using this approach and discuss some of the performance, maintenance, and development issues that arise.  相似文献   

11.
预训练语言模型虽然能够为每个词提供优良的上下文表示特征,但却无法显式地给出词法和句法特征,而这些特征往往是理解整体语义的基础.鉴于此,本文通过显式地引入词法和句法特征,探究其对于预训练模型阅读理解能力的影响.首先,本文选用了词性标注和命名实体识别来提供词法特征,使用依存分析来提供句法特征,将二者与预训练模型输出的上下文表示相融合.随后,我们设计了基于注意力机制的自适应特征融合方法来融合不同类型特征.在抽取式机器阅读理解数据集CMRC2018上的实验表明,本文方法以极低的算力成本,利用显式引入的词法和句法等语言特征帮助模型在F1和EM指标上分别取得0.37%和1.56%的提升.  相似文献   

12.
A large number of wording choices naturally occurring in English sentences cannot be accounted for on semantic or syntactic grounds. They represent arbitrary word usages and are termed collocations. In this paper, we show how collocations can enhance the task of lexical selection in language generation. Previous language generation systems were not able to account for collocations for two reasons: they did not have the lexical information in compiled form and the lexicon formalisms available were not able to handle the variations in collocational knowledge. We describe an implemented generator, Cook, which uses a wide range of collocations to produce sentences in the stock market domain. Cook uses a flexible lexicon containing a range of collocations, from idiomatic phrases to word pairs that were compiled automatically from text corpora using a lexicographic tool, Xtract. We show how Cook is able to merge collocations of various types to produce a wide variety of sentences.  相似文献   

13.
文本蕴涵识别是解决自然语言中存在的同义异形问题的有效途径。虽然国内外学者已经提出了很多文本蕴涵识别模型,但影响文本蕴涵识别的因素错综复杂,识别准确率普遍不高。该文把文本蕴涵识别看作二元分类问题,抽取词汇特征、句法依存关系特征及FrameNet语义知识库特征的多种特征构造特征矩阵,训练SVM分类器,实现文本蕴涵识别。该方法在国际文本蕴涵识别技术评测RTE3的测试集上进行测试,蕴涵正例识别准确率达到了78.1%,高于RTE3评测2-ways的最高结果。  相似文献   

14.
This paper describes the design and function of the English generation phase in JETS, a minimal transfer, Japanese-English machine translation system that is based on the linguistic framework of relational grammar. To facilitate the development of relational grammar generators, we have built a generator shell that provides a high-level relational grammar rule-writing language and is independent of both the natural language and the application. The implemented English generator (called GENIE) maps abstract canonical structures, representing the basic predicate-argument structures of sentences, into well-formed English sentences via a two-stage plan-and-execute design. The modularity inherent in the plan-and-execute design permits the development of a very general and stable deterministic execution grammar. Another major feature of the GENIE generator is that it iscategory-driven, i.e., planning rules and execution rules are distributed over a part-of-speech hierarchy (down to individual lexical items) and are invoked via an inheritance mechanism only if appropriate for the category being processed. Categorydriven processing facilitates the handling of exceptions. The use of a syntactic planner and category-driven processing together provide a great deal of flexibility without sacrificing determinism in the generation process.  相似文献   

15.
We present a unified probabilistic framework for statistical language modeling which can simultaneously incorporate various aspects of natural language, such as local word interaction, syntactic structure and semantic document information. Our approach is based on a recent statistical inference principle we have proposed—the latent maximum entropy principle—which allows relationships over hidden features to be effectively captured in a unified model. Our work extends previous research on maximum entropy methods for language modeling, which only allow observed features to be modeled. The ability to conveniently incorporate hidden variables allows us to extend the expressiveness of language models while alleviating the necessity of pre-processing the data to obtain explicitly observed features. We describe efficient algorithms for marginalization, inference and normalization in our extended models. We then use these techniques to combine two standard forms of language models: local lexical models (Markov N-gram models) and global document-level semantic models (probabilistic latent semantic analysis). Our experimental results on the Wall Street Journal corpus show that we obtain a 18.5% reduction in perplexity compared to the baseline tri-gram model with Good-Turing smoothing.Editors: Dan Roth and Pascale Fung  相似文献   

16.
认知组合性词义观(CCMO)是对词义生成和理解之规律的一种模型化观照,是进行词义结构描写的基础性理念。在理解词义时,CCMO要求牢记词义具有认知特性和句法组合特性,要从认知和句法组合的角度出发,坚持认为词义结构的生成是认知结构向语言符号投射的结果,而词义结构的显现则是词语间的句法组合驱动的结果。基于CCMO,句法结构的生成可看成是词义结构扩展的结果,词义结构是句法结构生成的基础。以“在、从、经”类介词为例,描写构建了介词词义的球结构模型,说明了句法语义运算功能是介词词义的本质特性。  相似文献   

17.
Providing machine tractable dictionary tools   总被引:1,自引:1,他引:0  
Machine readable dictionaries (Mrds) contain knowledge about language and the world essential for tasks in natural language processing (Nlp). However, this knowledge, collected and recorded by lexicographers for human readers, is not presented in a manner for Mrds to be used directly for Nlp tasks. What is badly needed are machine tractable dictionaries (Mtds): Mrds transformed into a format usable for Nlp. This paper discusses three different but related large-scale computational methods to transform Mrds into Mtds. The Mrd used is The Longman Dictionary of Contemporary English (Ldoce). The three methods differ in the amount of knowledge they start with and the kinds of knowledge they provide. All require some handcoding of initial information but are largely automatic. Method I, a statistical approach, uses the least handcoding. It generates relatedness networks for words in Ldoce and presents a method for doing partial word sense disambiguation. Method II employs the most handcoding because it develops and builds lexical entries for a very carefully controlled defining vocabulary of 2,000 word senses (1,000 words). The payoff is that the method will provide an Mtd containing highly structured semantic information. Method III requires the handcoding of a grammar and the semantic patterns used by its parser, but not the handcoding of any lexical material. This is because the method builds up lexical material from sources wholly within Ldoce. The information extracted is a set of sources of information, individually weak, but which can be combined to give a strong and determinate linguistic data base.  相似文献   

18.
19.
This paper presents a lexical choice component for complex noun phrases. We first explain why lexical choice for NPs deserves special attention within the standard pipeline architecture for a generator. The task of the lexical chooser for NPs is more complex than for clauses because the syntax of NPs is less understood than for clauses, and therefore, syntactic realization components, while they accept a predicate-argument structure as input for clauses, require a purely syntactic tree as input for NPs. The task of mapping conceptual relations to different syntactic modifiers is therefore left to the lexical chooser for NPs.The paper focuses on the syntagmatic aspect of lexical choice, identifying a process called NP planning. It focuses on a set of communicative goals that NPs can satisfy and specifies an interface between the different components of the generator and the lexical chooser.The technique presented for NP planning encapsulates a rich lexical knowledge and allows for the generation of a wide variety of syntactic constructions. It also allows for a large paraphrasing power because it dynamically maps conceptual information to various syntactic slots.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号