首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
SEKE is a semantic expectation‐based knowledge extraction system for extracting causation knowledge from natural language texts. It is inspired by human behavior on analyzing texts and capturing information with semantic expectations. The framework of SEKE consists of different kinds of generic templates organized in a hierarchical fashion. There are semantic templates, sentence templates, reason templates, and consequence templates. The design of templates is based on the expected semantics of causation knowledge. They are robust and flexible. The semantic template represents the target relation. The sentence templates act as a middle layer to reconcile the semantic templates with natural language texts. With the designed templates, SEKE is able to extract causation knowledge from complex sentences. Another characteristic of SEKE is that it can discover unseen knowledge for reason and consequence by means of pattern discovery. Using simple linguistic information, SEKE can discover extraction pattern from previously extracted causation knowledge and apply the newly generated patterns for knowledge discovery. To demonstrate the adaptability of SEKE for different domains, we investigate the application of SEKE on two domain areas of news articles, namely the Hong Kong stock market movement domain and the global warming domain. Although these two domain areas are completely different, in respect to their expected semantics in reason and consequence, SEKE can effectively handle the natural language texts in these two domains for causation knowledge extraction. © 2005 Wiley Periodicals, Inc. Int J Int Syst 20: 327–358, 2005.  相似文献   

2.
The idea of automatic summarization dates back to 1958, when Luhn invented the “auto abstract” (Luhn, 1958). Since then, many diverse automatic summarization approaches have been proposed, but no single technique has solved the increasingly urgent need for automatic summarization. Rather than proposing one more such technique, we suggest that the best solution is likely a system able to combine multiple summarization techniques, as required by the type of documents being summarized. Thus, this paper presents HAUSS: a framework to quickly build specialized summarizers, integrating several base techniques into a single approach. To recognize relevant text fragments, rules are created that combine frequency, centrality, citation and linguistic information in a context-dependent way. An incremental knowledge acquisition framework strongly supports the creation of these rules, using a training corpus to guide rule acquisition, and produce a powerful knowledge base specific to the domain. Using HAUSS, we created a knowledge base for catchphrase extraction in legal text. The system outperforms existing state-of-the-art general-purpose summarizers and machine learning approaches. Legal experts rated the extracted summaries similar to the original catchphrases given by the court. Our investigation of knowledge acquisition methods for summarization therefore demonstrates that it is possible to quickly create effective special-purpose summarizers, which combine multiple techniques, into a single context-aware approach.  相似文献   

3.
因果知识是一类十分常见的知识类型,也是领域知识库的重要组成部分。基于互联网信息资源自动提取因果相关知识,对社会计算系统的建模和智能系统的建造具有十分重要的意义。本文面向开源中文文本信息,研究建立并实现一种自动提取因果知识的方法,以有效支持网上知识工程和安全领域的因果情报自动获取与因果知识库的构建。  相似文献   

4.
周浪  张亮  冯冲  黄河燕 《计算机科学》2009,36(5):177-180
提出了一种规则与统计相结合的术语抽取方法,用于抽取包含多个词语的词组型术语.目前,绝大多数的统计方法都侧重于衡量术语的结构完整性,但这些方法并不能体现术语与专业相关的领域特征.通过对术语在各文档中的分布情况进行观察,提出了一种利用术语在语料中词频分布变化程度的统计信息采检验术语的领域相关性的方法,同时结合机器学习方法获取的语言知识,从计算机领域的语料中抽取领域特征明显的词组型术语.实验证明,该方法对低频术语和高频普通词串有较强的分辨能力.  相似文献   

5.
A novel approach is introduced in this paper for the implementation of a question–answering based tool for the extraction of information and knowledge from texts. This effort resulted in the computer implementation of a system answering bilingual questions directly from a text using Natural Language Processing. The system uses domain knowledge concerning categories of actions and implicit semantic relations. The present state of the art in information extraction is based on the template approach which relies on a predefined user model. The model guides the extraction of information and the instantiation of a template that is similar to a frame or set of attribute value pairs as the result of the extraction process. Our question–answering based approach aims to create flexible information extraction tools accepting natural language questions and generating answers that contain information extracted from text either directly or after applying deductive inference. Our approach also addresses the problem of implicit semantic relations occurring either in the questions or in the texts from which information is extracted. These relations are made explicit with the use of domain knowledge. Examples of application of our methods are presented in this paper concerning four domains of quite different nature. These domains are: oceanography, medical physiology, aspirin pharmacology and ancient Greek law. Questions are expressed both in Greek and English. Another important point of our method is to process text directly avoiding any kind of formal representation when inference is required for the extraction of facts not mentioned explicitly in the text. This idea of using text as knowledge base was first presented in Kontos [7] and further elaborated in [9,11,12] as the ARISTA method. This is a new method for knowledge acquisition from texts that is based on using natural language itself for knowledge representation.  相似文献   

6.
Presently, we are confronted with an enormous amount of legal documents, which are increasingly recorded in electronic format. There is a need to make the information in legal texts easily and automatically accessible. In this paper we argue that in the legal field, where we are confronted with specific text types, knowledge about discourse structures and the linguistic cues that signal them is very valuable to incorporate in information extraction systems and in text processing systems in general. We also demonstrate the need for adequate formalisms for representing discourse patterns. However, intertextual analysis of texts that describes and explains the properties of text types and genres is underdeveloped in the legal field.  相似文献   

7.
Knowledge is information that has been contextualised in a certain domain, to be used or applied. It represents the basic core of our Cultural Heritage and Natural Language provides us with prime versatile means of construing experience at multiple levels of organization. The natural language generation field consists in the creation of texts providing information contained in other kind of sources (numerical data, graphics, taxonomies and ontologies or even other texts), with the aim of making such texts indistinguishable, as far as possible, from those created by humans. On the other hand, the knowledge extraction, basing on text mining and text analysis tasks, as examples of the many applications born from computational linguistic, provides summarization, categorization, topics extractions from textual resources using linguistic concepts, which deal with the imprecision and ambiguity of human language. This paper presents a research activity focused on exploring and scientifically describing knowledge structure and organization involved in textual resources’ generation. Thus, a novel multidimensional model for the representation of conceptual knowledge, is proposed. Furthermore, a real case study in the Cultural Heritage domain is described to demonstrate the effectiveness and the feasibility of the proposed model and approach.  相似文献   

8.
Ontologies play a very important role in knowledge management and the Semantic Web, their use has been exploited in many current applications. Ontologies are especially useful because they support the exchange and sharing of information. Ontology learning from text is the process of deriving high-level concepts and their relations. An important task in ontology learning from text is to obtain a set of representative concepts to model a domain and organize them into a hierarchical structure (taxonomy) from unstructured information. In the process of building a taxonomy, the identification of hypernym/hyponym relations between terms is essential. How to automatically build the appropriate structure to represent the information contained in unstructured texts is a challenging task. This paper presents a novel method to obtain, from unstructured texts, representative concepts and their taxonomic relationships in a specific knowledge domain. This approach builds a concept hierarchy from a specific-domain corpus by using a clustering algorithm, a set of linguistic patterns, and additional contextual information extracted from the Web that improves the discovery of the most representative hypernym/hyponym relationships. A set of experiments were carried out using four different corpora. We evaluated the quality of the constructed taxonomies against gold standard ontologies, the experiments show promising results.  相似文献   

9.
作者及其团队长期针对农业领域的知识获取技术进行了系列性研究.阐述了运用智能引导、机器学习、数据挖掘、智能计算等技术的人工和自动/半自动的知识获取方法.这些方法能够有效地获取领域知识,发现隐含模式,进行知识精化.研发了知识获取工具.这些方法和工具反映了知识获取技术对农业信息工程所起的重要作用.  相似文献   

10.
There is an urgent need to automatically identify information in legal texts. In this paper, we argue that discourse analysis yields valuable knowledge to be incorporated in text processing systems. Knowledge about discourse patterns has already been applied in legal text generation systems. But, it is equally important to incorporate this kind of knowledge in legal information extraction systems. This knowledge is helpful for locating information in texts. Also, we demonstrate the need for adequate, maintainable, and possibly sharable knowledge representations of discourse patterns. The findings are illustrated by explicating the role discourse analysis played when building the SALOMON system, a system that automatically abstracts Belgian criminal cases.  相似文献   

11.
A generic information extraction architecture for financial applications   总被引:1,自引:0,他引:1  
The advent of computing has exacerbated the problem of overwhelming information. To manage the deluge of information, information extraction systems can be used to automatically extract relevant information from free-form text for update to databases or for report generation. One of the major challenges to the information extraction is the representation of domain knowledge in the task, that is how to represent the meaning of the input text, the knowledge of the field of application, and the knowledge about the target information to be extracted. We have chosen a directed graph structure, a domain ontology and a frame representation, respectively. We have further developed a generic information extraction (GIE) architecture that combines these knowledge structures for the task of processing. The GIE system is able to extract information from free-form text, further infer and derive new information. It analyzes the input text into a graph structure and subsequently unifies the graph and the ontology for extraction of relevant information, driven by the frame structure during a template filling process. The GIE system has been adopted for use in the message formatting expert system, a large-scale information extraction system for a specific financial application within a major bank in Singapore.  相似文献   

12.
Text categorization assigns predefined categories to either electronically available texts or those resulting from document image analysis. A generic system for text categorization is presented which is based on statistical analysis of representative text corpora. Significant features are automatically derived from training texts by selecting substrings from actual word forms and applying statistical information and general linguistic knowledge. The dimension of the feature vectors is then reduced by linear transformation, keeping the essential information. The classification is a minimum least-squares approach based on polynomials. The described system can be efficiently adapted to new domains or different languages. In application, the adapted text categorizers are reliable, fast, and completely automatic. Two example categorization tasks achieve recognition scores of approximately 80% and are very robust against recognition or typing errors.  相似文献   

13.
14.
Modern communication environments have changed the cognitive patterns of individuals, who are now used to the interaction of information encoded in different semiotic modalities, especially visual and linguistic. Despite this, the main premise of Corpus Linguistics is still ruling: our perception of and experience with the world is conveyed in texts, which nowadays need to be studied from a multimodal perspective. Therefore, multimodal corpora are becoming extremely useful to extract specialized knowledge and explore the insights of specialized language and its relation to non-language-specific representations of knowledge. It is our assertion that the analysis of the image-text interface can help us understand the way visual and linguistic information converge in subject-field texts. In this article, we use Frame-based terminology to sketch a novel proposal to study images in a corpus rich in pictorial representations for their inclusion in a terminological resource on the environment. Our corpus-based approach provides the methodological underpinnings to create meaning within terminographic entries, thus facilitating specialized knowledge transfer and acquisition through images.  相似文献   

15.
阅读理解(reading comprehension,RC)任务的目的在于理解一篇文档并对提出的问题返回答案句.提出了一种充分利用外部资源来提高RC系统性能的方法,使得RC系统性能在Remedia和ChungHwa两种语料上均得到提高.特别地,在对基于Remedia语料RC系统的性能分析表明,24.1%的性能提高归因于基于Web的答案模式匹配的运用,11.1%的性能提高归因于语言学特征匹配策略运用.同时也进行了t-test,结果表明答案模式匹配、语言学特征匹配和词汇语义关联推理的运用所得到的性能提高是显著的.  相似文献   

16.
越来越多的实践证明,词汇知识将是未来自然语言处理系统中不可或缺的组成部分。利用机器可读词典作为资源,首先通过对释义项进行分类,然后基于释义分析自动生成用于抽取词汇知识的模板,然后采用模板匹配的方法,实现词汇知识的自动抽取。通过一种基于最大熵模型的有监督的机器学习方法,对结果进行过滤。在应用到《应用汉语词典》中后,取得了良好的抽取效果。  相似文献   

17.
利用交叉分类机制共享因特网上各种语言的信息资源是知识挖掘的重要方法,本文给出了双语交叉分类的模型以及实现方法。其主要思想是不需要进行机器翻译和人工标注,利用文本特征抽取机制提取类别特征项和文本特征项,通过基于概念扩充的对译映射规则自动生成类别和文本特征向量,在此基础上利用潜在语义分析,将双语文本在语义层面上统一起来,通过类别与文本的语义相似度进行分类。从而获取较高的精度。  相似文献   

18.
基于语义相关和概念相关的自动分类方法研究   总被引:4,自引:0,他引:4  
文章区别于传统的基于词的中文文本自动分类方法,在选取文本特征时,考虑了词语的语言学信息以及词语概念之间的相关性,提出基于语义的方法和基于概念属性的方法,建立了分类模型。实验表明,改进后的这两种方法使分类系统具有较高的精度。  相似文献   

19.
With advancement in science and technology, computing systems are becoming increasingly more complex with a growing number of heterogeneous software and hardware components. They are thus becoming more difficult to monitor, manage, and maintain. Traditional approaches to system management have been largely based on domain experts through a knowledge acquisition solution that translates domain knowledge into operating rules and policies. This process has been well known as cumbersome, labor intensive, and error prone. In addition, traditional approaches for system management are difficult to keep up with the rapidly changing environments. There is a pressing need for automatic and efficient approaches to monitor and manage complex computing systems. In this paper, we propose an integrated data-driven framework for computing system management by acquiring the needed knowledge automatically from a large amount of historical log data. Specifically, we apply text mining techniques to automatically categorize the log messages into a set of canonical categories, incorporate temporal information to improve categorization performance, develop temporal mining techniques to discover the relationships between different events, and take a novel approach called event summarization to provide a concise interpretation of the temporal patterns.  相似文献   

20.
Named-entity recognition (NER) involves the identification and classification of named entities in text. This is an important subtask in most language engineering applications, in particular information extraction, where different types of named entity are associated with specific roles in events. In this paper, we present a prototype NER system for Greek texts that we developed based on a NER system for English. Both systems are evaluated on corpora of the same domain and of similar size. The time-consuming process for the construction and update of domain-specific resources in both systems led us to examine a machine learning method for the automatic construction of such resources for a particular application in a specific language.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号