首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
In the study of data exchange one usually assumes an open-world semantics, making it possible to extend instances of target schemas. An alternative closed-world semantics only moves ‘as much data as needed’ from the source to the target to satisfy constraints of a schema mapping. It avoids some of the problems exhibited by the open-world semantics, but limits the expressivity of schema mappings. Here we propose a mixed approach: one can designate different attributes of target schemas as open or closed, to combine the additional expressivity of the open-world semantics with the better behavior of query answering in closed worlds. We define such schema mappings, and show that they cover a large space of data exchange solutions with two extremes being the known open and closed-world semantics. We investigate the problems of query answering and schema mapping composition, and prove two trichotomy theorems, classifying their complexity based on the number of open attributes. We find conditions under which schema mappings compose, extending known results to a wide range of closed-world mappings. We also provide results for restricted classes of queries and mappings guaranteeing lower complexity.  相似文献   

2.
因特网的迅速发展使得多数据源综合集成日益重要.但是,不同数据源之间数据结构和语义的异构性导致数据集成是相当困难的.本文提出了一种基于模式映射的查询计划生成算法.该算法在正确定义映射规则的前提下,根据不同的查询条件和不同的数据源模式,自动构造查询计划,并保证结果数据满足目标模式结构与引用完整性要求.  相似文献   

3.
Schema mappings are high-level specifications that describe the relationship between database schemas. They are an important tool in several areas of database research, notably in data integration and data exchange. However, a concrete theory of schema mapping optimization including the formulation of optimality criteria and the construction of algorithms for computing optimal schema mappings is completely lacking to date. The goal of this work is to fill this gap. We start by presenting a system of rewrite rules to minimize sets of source-to-target tuple-generating dependencies. Moreover, we show that the result of this minimization is unique up to variable renaming. Hence, our optimization also yields a schema mapping normalization. By appropriately extending our rewrite rule system, we also provide a normalization of schema mappings containing equality-generating target dependencies. An important application of such a normalization is in the area of defining the semantics of query answering in data exchange, since several definitions in this area depend on the concrete syntactic representation of the mappings. This is, in particular, the case for queries with negated atoms and for aggregate queries. The normalization of schema mappings allows us to eliminate the effect of the concrete syntactic representation of the mapping from the semantics of query answering. We discuss in detail how our results can be fruitfully applied to aggregate queries.  相似文献   

4.
Preserving mapping consistency under schema changes   总被引:1,自引:0,他引:1  
In dynamic environments like the Web, data sources may change not only their data but also their schemas, their semantics, and their query capabilities. When a mapping is left inconsistent by a schema change, it has to be detected and updated. We present a novel framework and a tool (ToMAS) for automatically adapting (rewriting) mappings as schemas evolve. Our approach considers not only local changes to a schema but also changes that may affect and transform many components of a schema. Our algorithm detects mappings affected by structural or constraint changes and generates all the rewritings that are consistent with the semantics of the changed schemas. Our approach explicitly models mapping choices made by a user and maintains these choices, whenever possible, as the schemas and mappings evolve. When there is more than one candidate rewriting, the algorithm may rank them based on how close they are to the semantics of the existing mappings.Received: 13 January 2004, Accepted: 26 March 2004, Published online: 12 August 2004Edited by: M. Carey  相似文献   

5.
Xyleme is a huge warehouse integrating XML data of the Web. Xyleme considers a simple data model with data trees and tree types for describing the data sources, and a simple query language based on tree queries with boolean conditions. The main components of the data model are a mediated schema modeled by an abstract tree type, as a view of a set of tree types associated with actual data trees, called concrete tree types, and a mapping expressing the connection between the mediated schema and the concrete tree types. The first contribution of this paper is formal: we provide a declarative model-theoretic semantics for Xyleme tree queries, a way of checking tree query containment, and a characterization of tree queries as a composition of branch queries. The other contributions are algorithmic and handle the potentially huge size of the mapping relation which is a crucial issue for semantic integration and query evaluation in Xyleme. First, we propose a method for pre-evaluating queries at compile time by storing some specific meta-information about the mapping into map translation tables. These map translation tables summarize the set of all the branch queries that can be generated from the mediated schema and the set of all the mappings. Then, we propose different operators and strategies for relaxing queries which, having an empty map translation table, will have no answer if they are evaluated against the data. Finally, we present a method for semi-automatically generating the mapping relation.  相似文献   

6.
The task of combining data residing at different sources to provide the user a unified view is known as data integration. Schema mappings act as glue between the global schema and the source schemas of a data integration system. Global-and-local-as-view (GLAV) is one the approaches for specifying the schema mappings. Tableaux are used for expressing queries and functional dependencies on a single database. We investigate a general technique for expressing a GLAV mapping by a tabular structure called mapping assertion tableaux (MAT). In a similar way, we also express the tuple generating dependency (tgd) and equality generating dependency (egd) constraints by tabular forms, called tabular tgd (TTGD) and tabular egd (TEGD), respectively. A set consisting of the MATs, TTGDs and TEGDs are called schema mapping tableaux (SMT). We present algorithms that use SMT as operator on an instance of the source schema to produce an instance of the target schema. We show that the target instances computed by the SMT are ‘minimal’ and ‘most general’ in nature. We also define the notion of equivalence between the schema mappings of two data integration systems and present algorithms that optimize schema mappings through the manipulation of the SMT.  相似文献   

7.
The Web as a global information space is developing from a Web of documents to a Web of data. This development opens new ways for addressing complex information needs. Search is no longer limited to matching keywords against documents, but instead complex information needs can be expressed in a structured way, with precise answers as results. In this paper, we present Hermes, an infrastructure for data Web search that addresses a number of challenges involved in realizing search on the data Web. To provide an end-user oriented interface, we support expressive user information needs by translating keywords into structured queries. We integrate heterogeneous Web data sources with automatically computed mappings. Schema-level mappings are exploited in constructing structured queries against the integrated schema. These structured queries are decomposed into queries against the local Web data sources, which are then processed in a distributed way. Finally, heterogeneous result sets are combined using an algorithm called map join, making use of data-level mappings. In evaluation experiments with real life data sets from the data Web, we show the practicability and scalability of the Hermes infrastructure.  相似文献   

8.
This research investigates and approach to query processing in a multidatabase system that uses an objectoriented model to capture the semantics of other data models. The object-oriented model is used to construct a global schema, defining an integrated view of the different schemas in the environment. The model is also used as a self-describing model to build a meta-database for storing information about the global schema. A unique aspect of this work is that the object-oriented model is used to describe the different data models of the multidatabase environment, thereby extending the meta database with semantic information about the local schemas. With the global and local schemas all represented in an object-oriented form, structural mappings between the global schema and each local schema are then easily supported. An object algebra then provides a query language for expressing global queries, using the structural mappings to translate object algebra queries into SQL queries over local relational schema. The advantage of using an object algebra is that the object-oriented database can be viewed as a blackboard for temporary storage of local data and for establishing relationships between different databases. The object algebra can be used to directly retrieve temporarily-stored data from the object-oriented database or to transparently retrieve data from local sources using the translation process described in this paper.  相似文献   

9.
10.
Currently, a good portion of datasets on Internet are accessed through data services, where user’s queries are answered as a composition of multiple data services. Defining the semantics of data services is the first step towards automating their composition. An interesting approach to define the semantics of data services is by describing them as semantic views over a domain ontology. However, defining such semantic views cannot always be done with certainty, especially when the service’s returned data are too complex. In such case, a data service is associated with several possible semantic views. In addition, complex correlations may be present among these possible semantic views, mainly when data services encapsulate the same data sources. In this paper, we propose a probabilistic approach to model the semantic uncertainty of data services. Services along with their possible semantic views are represented in probabilistic service registry. The correlations among service semantics are modeled through a directed probabilistic graphical model (Bayesian network). Based on our modeling, we study the problem of compositing correlated data services to answer a user query, and propose an efficient method to compute the different possible compositions and their probabilities.  相似文献   

11.
The recently introduced Datalog+?/?? family of ontology languages is especially useful for representing and reasoning over lightweight ontologies, and is set to play a central role in the context of query answering and information extraction for the Semantic Web. Recently, it has become apparent that it is necessary to develop a principled way to handle uncertainty in this domain. In addition to uncertainty as an inherent aspect of the Web, one must also deal with forms of uncertainty due to inconsistency and incompleteness, uncertainty resulting from automatically processing Web data, as well as uncertainty stemming from the integration of multiple heterogeneous data sources. In this paper, we take an important step in this direction by developing a probabilistic extension of Datalog+?/??. This extension uses Markov logic networks as the underlying probabilistic semantics. Here, we focus especially on scalable algorithms for answering threshold queries, which correspond to the question “what is the set of all ground atoms that are inferred from a given probabilistic ontology with a probability of at least p?”. These queries are especially relevant to Web information extraction, since uncertain rules lead to uncertain facts, and only information with a certain minimum confidence is desired. We present several algorithms, namely a basic approach, an anytime one, and one based on heuristics, which is guaranteed to return sound results. Furthermore, we also study inconsistency in probabilistic Datalog+?/?? ontologies. We propose two approaches for computing preferred repairs based on two different notions of distance between repairs, namely symmetric and score-based distance. We also study the complexity of the decision problems corresponding to computing such repairs, which turn out to be polynomial and NP-complete in the data complexity, respectively.  相似文献   

12.
13.
Peer Data Management Systems (Pdms) are a novel, useful, but challenging paradigm for distributed data management and query processing. Conventional integrated information systems have a hierarchical structure with an integration component that manages a global schema and distributes queries against this schema to the underlying data sources. Pdms are a natural extension to this architecture by allowing each participating system (peer) to act both as a data source and as an integrator. Peers are interconnected by schema mappings, which guide the rewriting of queries between the heterogeneous schemas, and thus form a P2P (peer-to-peer)-like network.Despite several years of research, the development of efficient Pdms still holds many challenges. In this article we first survey the state of the art on peer data management: We classify Pdms by characteristics concerning their system model, their semantics, their query planning schemes, and their maintenance. Then we systematically examine open research directions in each of those areas. In particular, we observe that research results from both the domain of P2P systems and of conventional distributed data management can have an impact on the development of Pdms.  相似文献   

14.
The CQL continuous query language: semantic foundations and query execution   总被引:2,自引:0,他引:2  
CQL, a continuous query language, is supported by the STREAM prototype data stream management system (DSMS) at Stanford. CQL is an expressive SQL-based declarative language for registering continuous queries against streams and stored relations. We begin by presenting an abstract semantics that relies only on “black-box” mappings among streams and relations. From these mappings we define a precise and general interpretation for continuous queries. CQL is an instantiation of our abstract semantics using SQL to map from relations to relations, window specifications derived from SQL-99 to map from streams to relations, and three new operators to map from relations to streams. Most of the CQL language is operational in the STREAM system. We present the structure of CQL's query execution plans as well as details of the most important components: operators, interoperator queues, synopses, and sharing of components among multiple operators and queries. Examples throughout the paper are drawn from the Linear Road benchmark recently proposed for DSMSs. We also curate a public repository of data stream applications that includes a wide variety of queries expressed in CQL. The relative ease of capturing these applications in CQL is one indicator that the language contains an appropriate set of constructs for data stream processing. Edited by M. Franklin  相似文献   

15.
Schema matching and value mapping across two heterogenous information sources are critical tasks in applications involving data integration, data warehousing, and federation of databases. Before data can be integrated from multiple tables, the columns and the values appearing in the tables must be matched. The complexity of the problem grows quickly with the number of data attributes/columns to be matched and due to multiple semantics of data values. Traditional research has tackled schema matching and value mapping independently. We propose a novel method that optimizes embedded value mappings to enhance schema matching in the presence of opaque data values and column names. In this approach, the fitness objective for matching a pair of attributes from two schemas depends on the value mapping function for each of the two attributes. Suitable fitness objectives include the euclidean distance measure, which we use in our experimental study, as well as relative (cross) entropy. We propose a heuristic local descent optimization strategy that uses sorting and two-opt switching to jointly optimize value mappings and attribute matches. Our experiments show that our proposed technique outperforms earlier uninterpreted schema matching methods, and thus, should form a useful addition to a suite of (semi) automated tools for resolving structural heterogeneity.  相似文献   

16.
Intuitively, data management and data integration tools should be well suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: they typically require a common and comprehensive schema design before they can be used to store or share information, and they are difficult to extend because schema evolution is heavyweight and may break backward compatibility. As a result, many large-scale data sharing tasks are more easily facilitated by non-database-oriented tools that have little support for semantics.The goal of the peer data management system (PDMS) is to address this need: we propose the use of a decentralized, easily extensible data management architecture in which any user can contribute new data, schema information, or even mappings between other peers schemas. PDMSs represent a natural step beyond data integration systems, replacing their single logical schema with an interlinked collection of semantic mappings between peers individual schemas.This paper considers the problem of schema mediation in a PDMS. Our first contribution is a flexible language for mediating between peer schemas that extends known data integration formalisms to our more complex architecture. We precisely characterize the complexity of query answering for our language. Next, we describe a reformulation algorithm for our language that generalizes both global-as-view and local-as-view query answering algorithms. Then we describe several methods for optimizing the reformulation algorithm and an initial set of experiments studying its performance. Finally, we define and consider several global problems in managing semantic mappings in a PDMS.Received: 16 December 2002, Accepted: 14 April 2003, Published online: 12 December 2003Edited by: V. Atluri  相似文献   

17.
The Semantic Web envisions a World Wide Web in which data is described with rich semantics and applications can pose complex queries. To this point, researchers have defined new languages for specifying meanings for concepts and developed techniques for reasoning about them, using RDF as the data model. To flourish, the Semantic Web needs to provide interoperability—both between sites with different terminologies and with existing data and the applications operating on them. To achieve this, we are faced with two problems. First, most of the world’s data is available not in RDF but in XML; XML and the applications consuming it rely not only on the domain structure of the data, but also on its document structure. Hence, to provide interoperability between such sources, we must map between both their domain structures and their document structures. Second, data management practitioners often prefer to exchange data through local point-to-point data translations, rather than mapping to common mediated schemas or ontologies.This paper describes the Piazza system, which addresses these challenges. Piazza offers a language for mediating between data sources on the Semantic Web, and it maps both the domain structure and document structure. Piazza also enables interoperation of XML data with RDF data that is accompanied by rich OWL ontologies. Mappings in Piazza are provided at a local scale between small sets of nodes, and our query answering algorithm is able to chain sets mappings together to obtain relevant data from across the Piazza network. We also describe an implemented scenario in Piazza and the lessons we learned from it.  相似文献   

18.
One of the main steps toward integration or exchange of data is to design the mappings that describe the (often complex) relationships between the source schemas or formats and the desired target schema. In this paper, we introduce a new operator, called MapMerge, that can be used to correlate multiple, independently designed schema mappings of smaller scope into larger schema mappings. This allows a more modular construction of complex mappings from various types of smaller mappings such as schema correspondences produced by a schema matcher or pre-existing mappings that were designed by either a human user or via mapping tools. In particular, the new operator also enables a new “divide-and-merge” paradigm for mapping creation, where the design is divided (on purpose) into smaller components that are easier to create and understand and where MapMerge is used to automatically generate a meaningful overall mapping. We describe our MapMerge algorithm and demonstrate the feasibility of our implementation on several real and synthetic mapping scenarios. In our experiments, we make use of a novel similarity measure between two database instances with different schemas that quantifies the preservation of data associations. We show experimentally that MapMerge improves the quality of the schema mappings, by significantly increasing the similarity between the input source instance and the generated target instance. Finally, we provide a new algorithm that combines MapMerge with schema mapping composition to correlate flows of schema mappings.  相似文献   

19.
Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. In this paper, we address foundational and algorithmic issues related to the semantics of data exchange and to the query answering problem in the context of data exchange. These issues arise because, given a source instance, there may be many target instances that satisfy the constraints of the data exchange problem.We give an algebraic specification that selects, among all solutions to the data exchange problem, a special class of solutions that we call universal. We show that a universal solution has no more and no less data than required for data exchange and that it represents the entire space of possible solutions. We then identify fairly general, yet practical, conditions that guarantee the existence of a universal solution and yield algorithms to compute a canonical universal solution efficiently. We adopt the notion of the “certain answers” in indefinite databases for the semantics for query answering in data exchange. We investigate the computational complexity of computing the certain answers in this context and also address other algorithmic issues that arise in data exchange. In particular, we study the problem of computing the certain answers of target queries by simply evaluating them on a canonical universal solution, and we explore the boundary of what queries can and cannot be answered this way, in a data exchange setting.  相似文献   

20.
Data integration over multiple heterogeneous data sources has become increasingly important for modern applications. The integrated data is usually stored as materialized views to allow better access, performance, and high availability. In loosely coupled environments, such as the data grid, the data sources are autonomous. Hence, tie source updates can be concurrent and cause erroneous results during view maintenance. State-of-the-art maintenance strategies apply compensating queries to correct such errors, making the restricting assumption that all source schemata remain static over time. However, in such dynamic environments, the data sources may change not only their data but also their schema. Consequently, either the maintenance queres or the compensating queries may fail. In this paper, we propose a novel framework called DyDa that overcomes these limitations and handles both source data updates and schema changes. We identify three types of maintenance anomalies, caused by either source data updates, data-preserving schema changes, or non-data-preserving schema changes. We propose a compensation algorithm to solve the first two types of anomalies. We show that the third type of anomaly is caused by the violation of dependencies between maintenance processes. Then, we propose dependency detection and correction algorithms to identify and resolve the violations. Put together, DyDa extends prior maintenance solutions to solve all types of view maintenance anomalies. The experimental results show that DyDa imposes a minimal overhead on data update processing while allowing for the extended functionality to handle concurrent schema changes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号