首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 781 毫秒
1.
The increasing amount of unstructured text published on the Web is demanding new tools and methods to automatically process and extract relevant information. Traditional information extraction has focused on harvesting domain-specific, pre-specified relations, which usually requires manual labor and heavy machinery; especially in the biomedical domain, the main efforts have been directed toward the recognition of well-defined entities such as genes or proteins, which constitutes the basis for extracting the relationships between the recognized entities. The intrinsic features and scale of the Web demand new approaches able to cope with the diversity of documents, where the number of relations is unbounded and not known in advance. This paper presents a scalable method for the extraction of domain-independent relations from text that exploits the knowledge in the semantic annotations. The method is not geared to any specific domain (e.g., protein–protein interactions and drug–drug interactions) and does not require any manual input or deep processing. Moreover, the method uses the extracted relations to compute groups of abstract semantic relations characterized by their signature types and synonymous relation strings. This constitutes a valuable source of knowledge when constructing formal knowledge bases, as we enable seamless integration of the extracted relations with the available knowledge resources through the process of semantic annotation. The proposed approach has successfully been applied to a large text collection in the biomedical domain and the results are very encouraging.  相似文献   

2.
罗军  高琦  王翊 《计算机工程》2010,36(23):85-87
实现语义Web目标的一个重要前提是利用本体词汇标注Web资源。为此,提出一种基于弱监督(Bootstrapping)的本体标注方法。对给定的本体进行解析,生成规则文件,通过文本分类筛选出领域文档。采用Bootstrapping的方法进行信息标注抽取和本体推理,经过几次循环后,只利用少量的训练文本就能达到较好的标注效果。实验证明,该方法实体识别准确率高,标注效果好。  相似文献   

3.
4.
针对目前矿山领域异构数据融合时先验知识获取困难、物联网本体库实时性差、实例对象数据手动标注方式效率较低等问题,提出了一种矿山语义物联网自动语义标注方法。给出了传感数据语义化处理框架:一方面,确定本体的专业领域和范畴,通过重用流注释本体(SAO)构建领域本体,作为驱动语义标注的基础;另一方面,使用机器学习方法对感知数据流进行特征提取与数据分析,从海量数据中挖掘出概念间的关系;通过数据挖掘知识来驱动本体的更新与完善,实现本体的动态更新、拓展与更精确的语义标注,增强机器的理解力。以矿井提升系统主轴故障为例阐述从本体到实例化的语义标注过程:结合领域专家知识及本体重用,采用"七步法"建立矿井提升系统主传动故障本体;为了加强实例数据属性描述的准确性,使用主成分分析法(PCA)与K-means聚类方法对数据集进行降维和分组,提取出数据属性与概念的关系;通过基于语义Web的规则语言(SWRL)标注具体先行条件与后续概念的关系,优化领域本体。实验结果表明:在本体实例化过程中,可利用机器学习技术从传感数据中自动提取概念,实现传感数据的自动语义标注。  相似文献   

5.
6.
A user-based document management system has been developed for small communities on the Web. The system is based on the free annotation of documents by users. A number of annotation support tools are used to suggest possible annotations, including suggesting terms from external ontologies. This paper outlines some evaluation data on how users actually interact with the system in annotating their document especially on the use of standard ontologies. Results indicate that although an established external taxonomy can be useful in proposing annotation terms, users appear to be very selective in their use of the terms proposed and to have little interest in adhering to the particular hierarchical structure provided.  相似文献   

7.
基于Web数据的本体概念抽取   总被引:1,自引:0,他引:1  
本体论(Ontology)在知识管理及语义网(Semantic Web)中越来越重要,但建造本体往往需要耗费大量的时间,且建造完成后本体的维护对知识管理者来说也是费时的工作。自动创建领域Ontology可以克服手工方法的不足,成为当前的研究热点之一;而概念是本体中最重要的组成部分之一,从半结构化的Web文档中自动抽取概念的效率和准确度的高低,直接决定了自动建造的本体的质量,提出一种自动的本体概念抽取模型,此模型不依赖于领域词典或核心本体,并且能达到快速有效地通过对中文Web文本挖掘自动地构建及更新领域本体概念的目的。  相似文献   

8.
Full implementation of the Semantic Web requires widespread availability of OWL ontologies. Manual ontology development using current OWL editors remains a tedious and cumbersome task that requires significant understanding of the new ontology language and can easily result in a knowledge acquisition bottleneck. On the other hand, abundant domain knowledge has been specified by existing database schemata such as UML class diagrams. Thus developing an automatic tool for extracting OWL ontologies from existing UML class diagrams is helpful to Web ontology development. In this paper we propose an automatic, semantics-preserving approach for extracting OWL ontologies from existing UML class diagrams. This approach establishes a precise conceptual correspondence between UML and OWL through a semantics-preserving schema translation algorithm. The experiments with our implemented prototype tool, UML2OWL, show that the proposed approach is effective and a fully automatic ontology extraction is achievable. The proposed approach and tool will facilitate the development of Web ontologies and the realization of semantic interoperations between existing Web database applications and the Semantic Web.  相似文献   

9.
Ontology is playing an increasingly important role in knowledge management and the Semantic Web. This study presents a novel episode-based ontology construction mechanism to extract domain ontology from unstructured text documents. Additionally, fuzzy numbers for conceptual similarity computing are presented for concept clustering and taxonomic relation definitions. Moreover, concept attributes and operations can be extracted from episodes to construct a domain ontology, while non-taxonomic relations can be generated from episodes. The fuzzy inference mechanism is also applied to obtain new instances for ontology learning. Experimental results show that the proposed approach can effectively construct a Chinese domain ontology from unstructured text documents.  相似文献   

10.
Ontologies provide formal, machine-readable, and human-interpretable representations of domain knowledge. Therefore, ontologies have come into question with the development of Semantic Web technologies. People who want to use ontologies need an understanding of the ontology, but this understanding is very difficult to attain if the ontology user lacks the background knowledge necessary to comprehend the ontology or if the ontology is very large. Thus, software tools that facilitate the understanding of ontologies are needed. Ontology visualization is an important research area because visualization can help in the development, exploration, verification, and comprehension of ontologies. This paper introduces the design of a new ontology visualization tool, which differs from traditional visualization tools by providing important metrics and analytics about ontology concepts and warning the ontology developer about potential ontology design errors. The tool, called Onyx, also has advantages in terms of speed and readability. Thus, Onyx offers a suitable environment for the representation of large ontologies, especially those used in biomedical and health information systems and those that contain many terms. It is clear that these additional functionalities will increase the value of traditional ontology visualization tools during ontology exploration and evaluation.  相似文献   

11.
One of the goals of the knowledge puzzle project is to automatically generate a domain ontology from plain text documents and use this ontology as the domain model in computer-based education. This paper describes the generation procedure followed by TEXCOMON, the knowledge puzzle ontology learning tool, to extract concept maps from texts. It also explains how these concept maps are exported into a domain ontology. Data sources and techniques deployed by TEXCOMON for ontology learning from texts are briefly described herein. Then, the paper focuses on evaluating the generated domain ontology and advocates the use of a three-dimensional evaluation: structural, semantic, and comparative. Based on a set of metrics, structural evaluations consider ontologies as graphs. Semantic evaluations rely on human expert judgment, and finally, comparative evaluations are based on comparisons between the outputs of state-of-the-art tools and those of new tools such as TEXCOMON, using the very same set of documents in order to highlight the improvements of new techniques. Comparative evaluations performed in this study use the same corpus to contrast results from TEXCOMON with those of one of the most advanced tools for ontology generation from text. Results generated by such experiments show that TEXCOMON yields superior performance, especially regarding conceptual relation learning.  相似文献   

12.
13.
The Semantic Web and ontologies have received increased attention in recent years. The delivery of well-designed ontologies enhances the effect of Semantic Web services, but building ontologies from scratch requires considerable time and effort. Modularizing ontologies and integrating ontology modules to a given context help users effectively develop ontologies and revitalize ontology dissemination. Therefore, various tools for modularizing ontologies have been developed. However, selecting an appropriate tool to fit a given context is difficult because the assumptions for the approaches greatly vary. Therefore, a suitable framework is required to compare and help screen the most suitable modularization tool.In this research, we propose a new evaluation framework for selecting an appropriate ontology modularization tool. We present three aspects of tool evaluation as the main dimensions for the assessment of modularization tools: tool performance, data performance, and usability.This study provides an implicit evaluation and an empirical analysis of three modularization tools. It also provides an evaluation method for ontology modularization, enabling ontology engineers to compare different modularization tools and easily choose an appropriate one for the production of qualifying ontology modules.The experimental results indicate that the proposed evaluation criteria for ontology modularization tools are valid and effective. This research provides a useful method for assessing and selecting ontology modularization tools. Modularization performance, data performance, and usability are the three modularization aspects designed and applied to the context of ontology. We provide a new focus on the comprehensive framework to evaluate the performance and usability of ontology modularization tools. The proposed framework should be of value to both ontology engineers, who are interested in ontology modularization, and to practitioners, who need information on how to evaluate and select a specific type of ontology tool in accordance with the requirements of the individual environment.  相似文献   

14.
中文网页语义标注:由句子到RDF表示   总被引:5,自引:0,他引:5  
语义网远景的实现需要自动化的语义标注方法,提出了一种在领域本体指导下,针对中文网页的语义标注方法,运用统计学方法与自然语言处理技术,以文档中句子为处理对象,采取识别和组合两个阶段来完成句子向RDF表示的映射,它具有以下特点:以统计方法获得领域相关词汇,构造领域词汇标注列表作为外部领域知识,降低对通用语言本体的依赖;显式的属性类型标注方法识别出句子中表达关系的词汇,标注为属性类型,利于后续关系抽取;构造句子的句法依存关系树(森林),按照依存关系对词汇进行组合,形成RDF陈述.实验结果显示此方法较基于主谓宾语法关系的语义标注方法更为有效.  相似文献   

15.
XML plays an important role as the standard language for representing structured data for the traditional Web, and hence many Web-based knowledge management repositories store data and documents in XML. If semantics about the data are formally represented in an ontology, then it is possible to extract knowledge: This is done as ontology definitions and axioms are applied to XML data to automatically infer knowledge that is not explicitly represented in the repository. Ontologies also play a central role in realizing the burgeoning vision of the semantic Web, wherein data will be more sharable because their semantics will be represented in Web-accessible ontologies. In this paper, we demonstrate how an ontology can be used to extract knowledge from an exemplar XML repository of Shakespeare’s plays. We then implement an architecture for this ontology using de facto languages of the semantic Web including OWL and RuleML, thus preparing the ontology for use in data sharing. It has been predicted that the early adopters of the semantic Web will develop ontologies that leverage XML, provide intra-organizational value such as knowledge extraction capabilities that are irrespective of the semantic Web, and have the potential for inter-organizational data sharing over the semantic Web. The contribution of our proof-of-concept application, KROX, is that it serves as a blueprint for other ontology developers who believe that the growth of the semantic Web will unfold in this manner.
Henry M. KimEmail:
  相似文献   

16.
When users need to find something on the Web that is related to a place, chances are place names will be submitted along with some other keywords to a search engine. However, automatic recognition of geographic characteristics embedded in Web documents, which would allow for a better connection between documents and places, remains a difficult task. We propose an ontology-driven approach to facilitate the process of recognizing, extracting, and geocoding partial or complete references to places embedded in text. Our approach combines an extraction ontology with urban gazetteers and geocoding techniques. This ontology, called OnLocus, is used to guide the discovery of geospatial evidence from the contents of Web pages. We show that addresses and positioning expressions, along with fragments such as postal codes or telephone area codes, provide satisfactory support for local search applications, since they are able to determine approximations to the physical location of services and activities named within Web pages. Our experiments show the feasibility of performing automated address extraction and geocoding to identify locations associated to Web pages. Combining location identifiers with basic addresses improved the precision of extractions and reduced the number of false positive results.  相似文献   

17.
18.
The LEMO annotation framework: weaving multimedia annotations with the web   总被引:3,自引:0,他引:3  
Cultural institutions and museums have realized that annotations contribute valuable metadata for search and retrieval, which in turn can increase the visibility of the digital items they expose via their digital library systems. By exploiting annotations created by others, visitors can discover content they would not have found otherwise, which implies that annotations must be accessible and processable for humans and machines. Currently, however, there exists no widely adopted annotation standard that goes beyond specific media types. Most institutions build their own in-house annotation solution and employ proprietary annotation models, which are not interoperable with those of other systems. As a result, annotation data are usually stored in closed data silos and visible and processable only within the scope of a certain annotation system. As the main contribution of this paper, we present the LEMO Annotation Framework. It (1) provides a uniform annotation model for multimedia contents and various types of annotations, (2) can address fragments of various content-types in a uniform, interoperable manner and (3) pulls annotations out of closed data silos and makes them available as interoperable, dereferencable Web resources. With the LEMO Annotation Framework annotations become part of the Web and can be processed, linked, and referenced by other services. This in turn leads to even higher visibility and increases the potential value of annotations.  相似文献   

19.
为了解决已有信息抽取系统中方法不具有重用性及不能抽取语义信息的问题,提出了一个基于领域本体的面向主题的Web信息抽取框架.对Web中文页面,借助外部资料,利用本体解析信息,对文件采集及预处理中的源文档及信息采集、文档预处理、文档存储等技术进行了分析设计,提出了文本转换中的分词及词表查询和命名实体识别算法,并给出了一种知识抽取方案.实验结果表明,该方法可以得到性能较高的抽取结果.  相似文献   

20.
基于本体的Deep Web数据标注   总被引:3,自引:0,他引:3  
袁柳  李战怀  陈世亮 《软件学报》2008,19(2):237-245
借鉴语义Web领域中深度标注的思想,提出了一种对Web数据库查询结果进行语义标注的方法.为了获得完整且一致的标注结果,将领域本体作为Web数据库遵循的全局模式引入到查询结果语义标注过程中.对查询接口及查询结果特征进行详细分析,并采用查询条件重置的策略,从而确定查询结果数据的语义标记.通过对多个不同领域Web数据库的测试,在具有领域本体支持的条件下,该方法能够对Web数据库查询结果添加正确的语义标记,从而验证了该方法的有效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号