TempEval is a framework for evaluating systems that automatically annotate texts with temporal relations. It was created in the context of the SemEval 2007 workshop and uses the TimeML annotation language. The evaluation consists of three subtasks of temporal annotation: anchoring an event to a time expression in the same sentence, anchoring an event to the document creation time, and ordering main events in consecutive sentences. In this paper we describe the TempEval task and the systems that participated in the evaluation. In addition, we describe how further task decomposition can bring even more structure to the evaluation of temporal relations.
When trying to understand a speaker's argument, it is necessary to determine what her claim is and what evidence she provides for it. It is necessary, therefore, to be able to recognize evidence relations in terms of the speaker's beliefs. This paper describes an implementation of an evidence oracle, which tests for evidence between statements and builds a model of the speaker based on the evidence relations found. This implementation is intended to be an advance in the development of practical discourse analysis systems, proposing a basis for verifying certain relationships between utterances. Another contribution of the work is a stratified speaker model which allows for varying levels of acceptance of beliefs attributed to the speaker. Integration of the implemented evidence oracle into a full discourse analyser is presented, together with output illustrating the analysis for several sample arguments. Some extensions of this approach for plan inference are also discussed. Lorsque l'on essaie de comprendre l'argument d'un locuteur, il importe de déterminer la nature de sa prétention et le type d'évidence qui l'accompagne. Par conséquent, il est nécessaire de pouvoir distinguer des relations d'évidence les croyances du locuteur. Cet article décrit la mise en oeuvre d'un oracle qui recherche l'évidence entre des énoncés et construit un modèle du locuteur en fonction des relations d'évidence constatées. Cette mise en oeuvre propose une base pour vérifier certaines relations entre des énoncés; elle se veut une contribution au développement d'un système pratique d'analyse du discours. Une autre contribution de cette recherche est l'élaboration d'un modèle de locuteur stratifyé qui tient compte de niveaux variables d'acceptation des croyances attributeés au locuteur. l'intégration de l'oracle d'évidence sous forme d'analyseur de discours est présentée, ainsi que des illustrations de l' analyse de plusieurs arguments types. Une extension de cette approche à l'inférence de plans est également discutée.  相似文献   

对自然语言水印(NLW)隐蔽性评测的缺乏严重影响了该领域技术的发展,为此,结合NLW的特点和语言心理学,以人类对语言释义的速度及难易程度为依据,提出了NLW隐蔽性的感知模型和相应的评测方案:从语法错误、搭配错误和语义损失三个方面对隐蔽性进行评测。最后,针对四种不同水印技术(基于绝对同义词替换的T-Lex水印系统、相对同义词替换水印系统、基于句法树的水印系统和中文句法水印系统),一方面利用该方案进行自动评测,另一方面进行置信度为90%的人工评测。两种方法得到了相同的结论:基于词汇的NLW技术的隐蔽性优于基于句子的NLW技术,说明该自动评测方法是评测NLW隐蔽性的有效评测方法。  相似文献   

Temporal relations is one of the most complicated and poorly understood trends in linguistics. For analysis the linguistic superoperators of the SL semantic language were used. Objective and subjective frames of reference were introduced. The multi-model nature of time was studied and the tendency of natural language to the three-dimensional time model was suggested. The relations specified by temporal prepositions were analyzed. The temporal relation proved to be invariant in the space of three-dimensional time, i.e., to its usage in the past, present, and future, and specified by three parameters such as the dynamics, topological zone, and time marker.  相似文献   

The paper presents a parallel parsing system for Definite Clause Grammars suitable for committed-choice parallel logic programming languages. Grammatical elements such as words and nonterminal symbols are defined as parallel processes. Parsing is done by the processcommunication. The advantage of the system is that all the grammar rules are compiled into the parallel logic programming language and the program has neither any side-effect nor duplicated computation.  相似文献   

互联网中存在海量易获取的自然语言形式地址描述文本,其中蕴含丰富的空间信息。针对其非结构化特点,提出了自动提取中文自然语言地址描述中词语和句法信息的方法,以便深度挖掘空间知识。首先,根据地址语料中字串共现的统计规律设计一种不依赖地名词典的中文分词算法,并利用在地址文本中起指示、限定作用的常见词语组成的预定义词表改善分词效果及辅助词性标注。分词完成后,定义能够表达中文地址描述常用句法的有限状态机模型,进而利用其自动匹配与识别地址文本的句法结构。最后,基于大规模真实语料的统计分词及句法识别实验表明了该方法的可用性及有效性。  相似文献   

A Prolog-based natural language front-end system is described with the following major issues of discussion: Domain independence of the syntax analyser was achieved by the ‘generate-and-test’ notion and the domain independent semantic representation; Determiners were treated as higher order predicates; A technique called ‘syntactic feature’ was employed to write a readable parser in Prolog.  相似文献   

Knowledge and Information Systems - As an essential component of human cognition, cause–effect relations appear frequently in text, and curating cause–effect relations from text helps...  相似文献   

Existing attempts to automate construction document analysis are limited in understanding the varied semantic properties of different documents. Due to the semantic conflicts, the construction specification review process is still conducted manually in practice despite the promising performance of the existing approaches. This research aimed to develop an automated system for reviewing construction specifications by analyzing the different semantic properties using natural language processing techniques. The proposed method analyzed varied semantic properties of 56 different specifications from five different countries in terms of vocabulary, sentence structure, and the organizing styles of provisions. First, the authors developed a semantic thesaurus for construction terms including 208 word-replacement rules based on Word2Vec embedding to understand the different vocabularies. Second, the authors developed a named entity recognition model based on bi-directional long short-term memory with a conditional random field layer, which identified the required keywords from given provisions with an averaged F1 score of 0.928. Third, the authors developed a provision-pairing model based on Doc2Vec embedding, which identified the most relevant provisions with an average accuracy of 84.4%. The web-based prototype demonstrated that the proposed system can facilitate the construction specification review process by reducing the time spent, supplementing the reviewer’s experience, enhancing accuracy, and achieving consistency. The results contribute to risk management in the construction industry, with practitioners being able to review construction specifications thoroughly in spite of tight schedules and few available experts.  相似文献   

The focus of computerised learning has shifted from content delivery towards personalised online learning with Intelligent Tutoring Systems (ITS). Oscar Conversational ITS (CITS) is a sophisticated ITS that uses a natural language interface to enable learners to construct their own knowledge through discussion. Oscar CITS aims to mimic a human tutor by dynamically detecting and adapting to an individual's learning styles whilst directing the conversational tutorial. Oscar CITS is currently live and being successfully used to support learning by university students. The major contribution of this paper is the development of the novel Oscar CITS adaptation algorithm and its application to the Felder–Silverman learning styles model. The generic Oscar CITS adaptation algorithm uniquely combines the strength of an individual's learning style preference with the available adaptive tutoring material for each tutorial question to decide the best fitting adaptation. A case study is described, where Oscar CITS is implemented to deliver an adaptive SQL tutorial. Two experiments are reported which empirically test the Oscar CITS adaptation algorithm with students in a real teaching/learning environment. The results show that learners experiencing a conversational tutorial personalised to their learning styles performed significantly better during the tutorial than those with an unmatched tutorial.  相似文献   

This paper will present a natural deduction system of temporal logic,which includes two collections ofinference rules called “horizontal inference rules” and “vertical inference rules” respectively.It is alsoproved that the system is both sound and complete under an appropriate interpretation.Very natural andgenerally short,each proof in the system can be represented by a matrix whose entries serve to record theinference process.  相似文献   

A conceptual model is proposed for a system whose function is to solve the problem of automatic classification of text documents in a natural language, i.e., to determine whether a new text document belongs to a predefined class. The functional requirements of the future system are given. Various representations of natural language texts, as well as statistical and logical-combinatorial methods of text analysis, are discussed. This work may be of interest to specialists in natural-language processing, data mining, and computational linguistics.  相似文献   

The primary issues that affect the design of indexing methods are examined, and several structures and algorithms for specific cases are proposed. The append-only tree (AP-tree) structure indexes data for append-only databases to help event-join optimization and queries that can exploit the inherent time ordering of such databases. Two variable indexing for the surrogate and time is discussed. It is shown that a nested index could be a very efficient structure in this context and is preferable to a composite B-tree or an index that involves linear lists of historical tuples. The problems of indexing time intervals, as related to nonsurrogate joint-indexing, are discussed. Several algorithms to partition the time line are introduced. A two-variable AT index based on nested indexing is outlined  相似文献   

In this paper, we address the issue of generating in-domain language model training data when little or no real user data are available. The two-stage approach taken begins with a data induction phase whereby linguistic constructs from out-of-domain sentences are harvested and integrated with artificially constructed in-domain phrases. After some syntactic and semantic filtering, a large corpus of synthetically assembled user utterances is induced. In the second stage, two sampling methods are explored to filter the synthetic corpus to achieve a desired probability distribution of the semantic content, both on the sentence level and on the class level. The first method utilizes user simulation technology, which obtains the probability model via an interplay between a probabilistic user model and the dialogue system. The second method synthesizes novel dialogue interactions from the raw data by modelling after a small set of dialogues produced by the developers during the course of system refinement. Evaluation is conducted on recognition performance in a restaurant information domain. We show that a partial match to usage-appropriate semantic content distribution can be achieved via user simulations. Furthermore, word error rate can be reduced when limited amounts of in-domain training data are augmented with synthetic data derived by our methods.
探讨了一种新型自然语言理解系统构架以及在此构架中的一个典型子系统.反馈式自然语言处理系统对某个单句,尤其是较难理解的单句,进行了反复多次、顾前瞻后地理解.它作为一个大平台,挂接各种基于经典算法的子系统,而且也便于扩充.上下文无关文法理论成熟,以及基于其的各种算法也比较成熟.在反馈式自然语言处理系统中,使用了基于上下文无关文法的算法对简单的名词性短语进行了分析的方法.  相似文献   

The present paper investigates multiword expressions (MWEs) in spoken language and possible ways of identifying MWEs automatically in speech corpora. Two MWEs that emerged from previous studies and that occur frequently in Dutch are analyzed to study their pronunciation characteristics and compare them to those of other utterances in a large speech corpus. The analyses reveal that these MWEs display extreme pronunciation variation and reduction, i.e., many phonemes and even syllables are deleted. Several measures of pronunciation reduction are calculated for these two MWEs and for all other utterances in the corpus. Five of these measures are more than twice as high for the MWEs, thus indicating considerable reduction. One overall measure of pronunciation deviation is then calculated and used to automatically identify MWEs in a large speech corpus. The results show that neither this overall measure, nor frequency of co-occurrence alone are suitable for identifying MWEs. The best results are obtained by using a metric that combines overall pronunciation reduction with weighted frequency. In this way, recurring “islands of pronunciation reduction” that contain (potential) MWEs can be identified in a large speech corpus.  相似文献   

