首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
The nation’s massive underground utility infrastructure must comply with a multitude of regulations. The regulatory compliance checking of underground utilities requires an objective and consistent interpretation of the regulations. However, utility regulations contain a variety of domain-specific terms and numerous spatial constraints regarding the location and clearance of underground utilities. It is challenging for the interpreters to understand both the domain and spatial semantics in utility regulations. To address the challenge, this paper adopts an ontology and rule-based Natural Language Processing (NLP) framework to automate the interpretation of utility regulations – the extraction of regulatory information and the subsequent transformation into logic clauses. Two new ontologies have been developed. The urban product ontology (UPO) is domain-specific to model domain concepts and capture domain semantics on top of heterogeneous terminologies in utility regulations. The spatial ontology (SO) consists of two layers of semantics – linguistic spatial expressions and formal spatial relations – for better understanding the spatial language in utility regulations. Pattern-matching rules defined on syntactic features (captured using common NLP techniques) and semantic features (captured using ontologies) were encoded for information extraction. The extracted information elements were then mapped to their semantic correspondences via ontologies and finally transformed into deontic logic (DL) clauses to achieve the semantic and logical formalization. The approach was tested on the spatial configuration-related requirements in utility accommodation policies. Results show it achieves a 98.2% precision and a 94.7% recall in information extraction, a 94.4% precision and a 90.1% recall in semantic formalization, and an 83% accuracy in logical formalization.  相似文献   

2.
3.
Abstract: Vast amounts of medical information reside within text documents, so that the automatic retrieval of such information would certainly be beneficial for clinical activities. The need for overcoming the bottleneck provoked by the manual construction of ontologies has generated several studies and research on obtaining semi-automatic methods to build ontologies. Most techniques for learning domain ontologies from free text have important limitations. Thus, they can extract concepts so that only taxonomies are generally produced although there are other types of semantic relations relevant in knowledge modelling. This paper presents a language-independent approach for extracting knowledge from medical natural language documents. The knowledge is represented by means of ontologies that can have multiple semantic relationships among concepts.  相似文献   

4.
Open Information Extraction (OIE) systems focus on identifying and extracting general relations from text. Most OIE systems utilize simple linguistic structure, such as part-of-speech or dependency features, to extract relations and arguments from a sentence. These approaches are simple and fast to implement, but suffer from two main drawbacks: i) they are less effective to handle complex sentences with multiple relations and shared arguments, and ii) they tend to extract overly-specific relations.This paper proposes an approach to Information Extraction called SemIE, which addresses both drawbacks. SemIE identifies significant relations from domain-specific text by utilizing a semantic structure that describes the domain of discourse. SemIE exploits the predicate-argument structure of a text, which is able to handle complex sentences. The semantics of the arguments are explicitly specified by mapping them to relevant concepts in the semantic structure.SemIE uses a semi-supervised learning approach to bootstrap training examples that cover all relations expressed in the semantic structure. SemIE inputs pairs of structured documents and uses a Greedy Mapping module to bootstrap a full set of training examples. The training examples are then used to learn the extraction and mapping rules.We evaluated the performance of SemIE by comparing it with OLLIE, a state-of-the-art OIE system. We tested SemIE and OLLIE on the task of extracting relations from text in the “movie” domain and found that on average, SemIE outperforms OLLIE. Furthermore, we also examined how the performance varies with sentence complexity and sentence length. The results prove the effectiveness of SemIE in handling complex sentences.  相似文献   

5.
Cross-domain word representation aims to learn high-quality semantic representations in an under-resourced domain by leveraging information in a resourceful domain. However, most existing methods mainly transfer the semantics of common words across domains, ignoring the semantic relations among domain-specific words. In this paper, we propose a domain structure-based transfer learning method to learn cross-domain representations by leveraging the relations among domain-specific words. To accomplish this, we first construct a semantic graph to capture the latent domain structure using domain-specific co-occurrence information. Then, in the domain adaptation process, beyond domain alignment, we employ Laplacian Eigenmaps to ensure the domain structure is consistently distributed in the learned embedding space. As such, the learned cross-domain word representations not only capture shared semantics across domains, but also maintain the latent domain structure. We performed extensive experiments on two tasks, namely sentiment analysis and query expansion. The experiment results show the effectiveness of our method for tasks in under-resourced domains.  相似文献   

6.
This paper proposes a non-domain-specific metadata ontology as a core component in a semantic model-based document management system (DMS), a potential contender towards the enterprise information systems of the next generation. What we developed is the core semantic component of an ontology-driven DMS, providing a robust semantic base for describing documents’ metadata. We also enabled semantic services such as automated semantic translation of metadata from one domain to another. The core semantic base consists of three semantic layers, each one serving a different view of documents’ metadata. The core semantic component’s base layer represents a non-domain-specific metadata ontology founded on ebRIM specification. The main purpose of this ontology is to serve as a meta-metadata ontology for other domain-specific metadata ontologies. The base semantic layer provides a generic metadata view. For the sake of enabling domain-specific views of documents’ metadata, we implemented two domain-specific metadata ontologies, semantically layered on top of ebRIM, serving domain-specific views of the metadata. In order to enable semantic translation of metadata from one domain to another, we established model-to-model mappings between these semantic layers by introducing SWRL rules. Having the semantic translation of metadata automated not only allows for effortless switching between different metadata views, but also opens the door for automating the process of documents long-term archiving. For the case study, we chose judicial domain as a promising ground for improving the efficiency of the judiciary by introducing the semantics in this field.  相似文献   

7.
8.
The increasing amount of unstructured text published on the Web is demanding new tools and methods to automatically process and extract relevant information. Traditional information extraction has focused on harvesting domain-specific, pre-specified relations, which usually requires manual labor and heavy machinery; especially in the biomedical domain, the main efforts have been directed toward the recognition of well-defined entities such as genes or proteins, which constitutes the basis for extracting the relationships between the recognized entities. The intrinsic features and scale of the Web demand new approaches able to cope with the diversity of documents, where the number of relations is unbounded and not known in advance. This paper presents a scalable method for the extraction of domain-independent relations from text that exploits the knowledge in the semantic annotations. The method is not geared to any specific domain (e.g., protein–protein interactions and drug–drug interactions) and does not require any manual input or deep processing. Moreover, the method uses the extracted relations to compute groups of abstract semantic relations characterized by their signature types and synonymous relation strings. This constitutes a valuable source of knowledge when constructing formal knowledge bases, as we enable seamless integration of the extracted relations with the available knowledge resources through the process of semantic annotation. The proposed approach has successfully been applied to a large text collection in the biomedical domain and the results are very encouraging.  相似文献   

9.
INTERACTIVE SEMANTIC ANALYSIS OF TECHNICAL TEXTS   总被引:4,自引:0,他引:4  
Sentence syntax is the basis for organizing semantic relations in TANKA, a project that aims to acquire knowledge from technical text. Other hallmarks include an absence of precoded domain-specific knowledge; significant use of public-domain generic linguistic information sources; involvement of the user as a judge and source of expertise; and learning from the meaning representations produced during processing. These elements shape the realization of the TANKA project: implementing a trainable text processing system to propose correct semantic interpretations to the user. A three-level model of sentence semantics, including a comprehensive Case system, provides the framework for TANKA's representations. Text is first processed by the DIPETT parser, which can handle a wide variety of unedited sentences. The semantic analysis module HAIKU then semi-automatically extracts semantic patterns from the parse trees and composes them into domain knowledge representations. HAIKU's dictionaries and main algorithm are described with the aid of examples and traces of user interaction. Encouraging experimental results are described and evaluated.  相似文献   

10.
文中介绍了数据仓库领域一种基于本体的语义集成方法。首先建立领域本体和数据源的局部本体,然后通过局部本体对应的概念树间的映射算法得到数据源全局本体,再和领域本体映射,得到映射关系。最后通过本体推理,得出隐含的语义关系,用最终的语义关系来指导数据抽取、转换和加载过程,实现数据仓库语义程度上的数据集成。  相似文献   

11.
Abstract: As ontologies become more prevalent for information management the need to manage the ontologies increases. Multiple organizations, within a domain, often combine to work on specific projects. When separate organizations come together to communicate, an alignment of terminology and semantics is required. Ontology creation is often privatized for these individual organizations to represent their view of the domain. This creates problems with alignment and integration, making it necessary to consider how much each ontology should influence the current decision to be made. To assist with determining influence a trust‐based approach on authors and their ontologies provides a mechanism for ranking reasoning results. A representation of authors and the individual resources they provide for the merged ontology becomes necessary. The authors are then weighted by trust and trust for the resources they provide the ontology is calculated. This is then used to assist the integration process allowing for an evolutionary trust model to calculate the level of credibility of resources. Once the integration is complete semantic agreement between ontologies allows for the revision of the authors' trust.  相似文献   

12.
Support for semantic content is becoming more common in Web-accessible information systems. We see this support emerging with the use of ontologies and machine-readable, annotated documents. The practice of domain modeling coupled with the extraction of domain-specific, contextually relevant metadata also supports the use of semantics. These advancements enable knowledge discovery approaches that define complex relationships between data that is autonomously collected and managed. The InfoQuilt (One of the incarnations of the InfoQuilt system, as applied to the geographic information as part of the NSF Digital Library II initiative is the ADEPT-UGA system [Ade]. This research was funded in part by National Science Foundation grant IIS-9817432.) system supports one such knowledge discovery approach. This paper presents (parts of) the InfoQuilt system with the focus on its use for modeling and utilizing complex semantic inter-domain relationships to enable human-assisted knowledge discovery over Web-accessible heterogeneous data. This includes the specification and execution of Information Scale (IScapes), a semantically rich information request and correlation mechanism. Edited by: L. Liu. Received: October 28, 2001 / Accepted: June 1, 2002Published online: September 25, 2002  相似文献   

13.
《Information Systems》2006,31(4-5):321-339
An essential element in defining the semantics of Web services is the domain knowledge. Medical informatics is one of the few domains to have considerable domain knowledge exposed through standards. These standards offer significant value in terms of expressing the semantics of Web services in the healthcare domain.In this paper, we describe the architecture of the Artemis project, which exploits ontologies based on the domain knowledge exposed by the healthcare information standards through standard bodies like HL7, CEN TC251, ISO TC215 and GEHR. We use these standards for two purposes: first to describe the Web service functionality semantics, that is, the meaning associated with what a Web service does and secondly to describe the meaning associated with the messages or documents exchanged through Web services.Artemis Web service architecture uses ontologies to describe semantics but it does not propose globally agreed ontologies; rather healthcare institutes reconcile their semantic differences through a mediator component. The mediator component uses ontologies based on prominent healthcare standards as references to facilitate semantic mediation among involved institutes. Mediators have a P2P communication architecture to provide scalability and to facilitate the discovery of other mediators.  相似文献   

14.
This paper addresses the problem of handling semantic heterogeneity during database schema integration. We focus on the semantics of terms used as identifiers in schema definitions. Our solution does not rely on the names of the schema elements or the structure of the schemas. Instead, we utilize formal ontologies consisting of intensional definitions of terms represented in a logical language. The approach is based on similarity relations between intensional definitions in different ontologies. We present the definitions of similarity relations based on intensional definitions in formal ontologies. The extensional consequences of intensional relations are addressed. The paper shows how similarity relations are discovered by a reasoning system using a higher-level ontology. These similarity relations are then used to derive an integrated schema in two steps. First, we show how to use similarity relations to generate the class hierarchy of the global schema. Second, we explain how to enhance the class definitions with attributes. This approach reduces the cost of generating or re-generating global schemas for tightly-coupled federated databases.  相似文献   

15.
Over the last decade, ontology engineering has been pursued by “learning” the ontology from domain-specific electronic documents. Most of the research works are focused on extraction of concepts and taxonomic relations. The extraction of non-taxonomic relations is often neglected and not well researched. In this paper, we present a multi-phase correlation search framework to extract non-taxonomic relations from unstructured text. Our framework addresses the two main problems in any non-taxonomic relations extraction: (a) the discovery of non-taxonomic relations and (b) the labelling of non-taxonomic relations. First, our framework is capable of extracting correlated concepts beyond ordinary search window size of a single sentence. Interesting correlations are then filtered using association rule mining with lift interestingness measure. Next, our framework distinguishes non-taxonomic concept pairs from taxonomic concept pairs based on existing domain ontology. Finally, our framework features the usage of domain related verbs as labels for the non-taxonomic relations. Our proposed framework has been tested with the marine biology domain. Results have been validated by domain experts showing reliable results as well as demonstrate significant improvement over traditional association rule approach in search of non-taxonomic relations from unstructured text.  相似文献   

16.
Currently, most of the information available in the Web is adapted primarily for human consumption, but there is so much information that can no longer be processed by a person in a reasonable time, either in digital or physical formats. To solve this problem, the idea of the Semantic Web arose. The Semantic Web deals with adding machine-readable information to Web pages. Ontologies represent a very important element of this web, as they provide a valid and robust structure to represent knowledge based on concepts, relations, axioms, etc. The need for overcoming the bottleneck provoked by the manual construction of ontologies has generated several studies and research on obtaining semiautomatic methods to learn ontologies. In this sense, this paper proposes a new ontology learning methodology based on semantic role labeling from digital Spanish documents. The method makes it possible to represent multiple semantic relations specially taxonomic and partonomic ones in the standardized OWL 2.0. A set of experiments has been performed with the approach implemented in educational domain that show promising results.  相似文献   

17.
基于领域本体的数据清洗研究   总被引:2,自引:0,他引:2  
王浩  徐宏炳 《计算机工程与设计》2006,27(22):4274-4276,4280
对数据清洗过程中的语义问题进行了分类,基于领域本体提出了领域概念树和精确度水平节点集的概念。结合领域概念树和精确度水平节点集,给出了基于领域本体的数据清洗方法。该方法通过利用领域本体包含的语义信息,提高了数据清洗质量。与传统的数据清洗方法相比,由于该方法只与本体领域模型进行交互,不局限于特定领域,所以扩展性更强,数据清洗的质量也较高。  相似文献   

18.
Provenance information in eScience is metadata that's critical to effectively manage the exponentially increasing volumes of scientific data from industrial-scale experiment protocols. Semantic provenance, based on domain-specific provenance ontologies, lets software applications unambiguously interpret data in the correct context. The semantic provenance framework for eScience data comprises expressive provenance information and domain-specific provenance ontologies and applies this information to data management. The authors' "two degrees of separation" approach advocates the creation of high-quality provenance information using specialized services. In contrast to workflow engines generating provenance information as a core functionality, the specialized provenance services are integrated into a scientific workflow on demand. This article describes an implementation of the semantic provenance framework for glycoproteomics.  相似文献   

19.
Abstract: Managing multiple ontologies is now a core question in most of the applications that require semantic interoperability. The semantic web is surely the most significant application of this report: the current challenge is not to design, develop and deploy domain ontologies but to define semantic correspondences among multiple ontologies covering overlapping domains. In this paper, we introduce a new approach of ontology matching named axiom-based ontology matching. As this approach is founded on the use of axioms, it is mainly dedicated to heavyweight ontologies, but it can also be applied to lightweight ontologies as a complementary approach to the current techniques based on the analysis of natural language expressions, instances and/or taxonomical structures of ontologies. This new matching paradigm is defined in the context of the conceptual graphs model, where the projection (i.e. the main operator for reasoning with conceptual graphs which corresponds to homomorphism of graphs) is used as a means to semantically match the concepts and the relations of two ontologies through the explicit representation of the axioms in terms of conceptual graphs. We also introduce an ontology of representation, called MetaOCGL, dedicated to the reasoning of heavyweight ontologies at the meta-level.  相似文献   

20.
Ontologies represent domain concepts and relations in a form of semantic network. Many research works use ontologies in the information matchmaking and retrieval. This trend is further accelerated by the convergence of various information sources supported by ontologies. In this paper, we propose a novel multi-modality ontology model that integrates both the low-level image features and the high-level text information to represent image contents for image retrieval. By embedding this ontology into an image retrieval system, we are able to realize intelligent image retrieval with high precision. Moreover, benefiting from the soft-coded ontology model, this system has good flexibility and can be easily extended to the larger domains. Currently, our experiment is conducted on the animal domain canine. An ontology has been built based on the low-level features and the domain knowledge of canine. A prototype retrieval system is set up to assess the performance. We compare our experiment results with traditional text-based image search engine and prove the advantages of our approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号