期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Data exchange and schema mappings in open and closed worlds

Leonid Libkin Cristina Sirangelo 《Journal of Computer and System Sciences》2011,77(3):542-571

In the study of data exchange one usually assumes an open-world semantics, making it possible to extend instances of target schemas. An alternative closed-world semantics only moves ‘as much data as needed’ from the source to the target to satisfy constraints of a schema mapping. It avoids some of the problems exhibited by the open-world semantics, but limits the expressivity of schema mappings. Here we propose a mixed approach: one can designate different attributes of target schemas as open or closed, to combine the additional expressivity of the open-world semantics with the better behavior of query answering in closed worlds. We define such schema mappings, and show that they cover a large space of data exchange solutions with two extremes being the known open and closed-world semantics. We investigate the problems of query answering and schema mapping composition, and prove two trichotomy theorems, classifying their complexity based on the number of open attributes. We find conditions under which schema mappings compose, extending known results to a wide range of closed-world mappings. We also provide results for restricted classes of queries and mappings guaranteeing lower complexity. 相似文献

2.

多数据库系统中的模式映射方法

李瑞轩卢正鼎肖卫军李兵《计算机工程与科学》2004,26(3):65-68

多数据库系统一般具有四级模式结构，全局用户只能访问全局模式，而最终的数据必须从各局部数据库系统中获得，因此必须建立多数据库系统的模式映射，它表示了局部模式通过输出模式集成为全局模式的相应转换。本文给出了一种多数据库系统中的模式映射方法，并使用模式映射树来存储和表达这种模式映射。相似文献

3.

MapMerge: correlating independent schema mappings

Bogdan Alexe Mauricio Hernández Lucian Popa Wang-Chiew Tan 《The VLDB Journal The International Journal on Very Large Data Bases》2012,21(2):191-211

One of the main steps toward integration or exchange of data is to design the mappings that describe the (often complex) relationships between the source schemas or formats and the desired target schema. In this paper, we introduce a new operator, called MapMerge, that can be used to correlate multiple, independently designed schema mappings of smaller scope into larger schema mappings. This allows a more modular construction of complex mappings from various types of smaller mappings such as schema correspondences produced by a schema matcher or pre-existing mappings that were designed by either a human user or via mapping tools. In particular, the new operator also enables a new “divide-and-merge” paradigm for mapping creation, where the design is divided (on purpose) into smaller components that are easier to create and understand and where MapMerge is used to automatically generate a meaningful overall mapping. We describe our MapMerge algorithm and demonstrate the feasibility of our implementation on several real and synthetic mapping scenarios. In our experiments, we make use of a novel similarity measure between two database instances with different schemas that quantifies the preservation of data associations. We show experimentally that MapMerge improves the quality of the schema mappings, by significantly increasing the similarity between the input source instance and the generated target instance. Finally, we provide a new algorithm that combines MapMerge with schema mapping composition to correlate flows of schema mappings. 相似文献

4.

Schema mediation for large-scale semantic data sharing

Alon Y. Halevy Zachary G. Ives Dan Suciu Igor Tatarinov 《The VLDB Journal The International Journal on Very Large Data Bases》2005,14(1):68-83

Intuitively, data management and data integration tools should be well suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: they typically require a common and comprehensive schema design before they can be used to store or share information, and they are difficult to extend because schema evolution is heavyweight and may break backward compatibility. As a result, many large-scale data sharing tasks are more easily facilitated by non-database-oriented tools that have little support for semantics.The goal of the peer data management system (PDMS) is to address this need: we propose the use of a decentralized, easily extensible data management architecture in which any user can contribute new data, schema information, or even mappings between other peers schemas. PDMSs represent a natural step beyond data integration systems, replacing their single logical schema with an interlinked collection of semantic mappings between peers individual schemas.This paper considers the problem of schema mediation in a PDMS. Our first contribution is a flexible language for mediating between peer schemas that extends known data integration formalisms to our more complex architecture. We precisely characterize the complexity of query answering for our language. Next, we describe a reformulation algorithm for our language that generalizes both global-as-view and local-as-view query answering algorithms. Then we describe several methods for optimizing the reformulation algorithm and an initial set of experiments studying its performance. Finally, we define and consider several global problems in managing semantic mappings in a PDMS.Received: 16 December 2002, Accepted: 14 April 2003, Published online: 12 December 2003Edited by: V. Atluri 相似文献

5.

Normalization and optimization of schema mappings

Georg Gottlob Reinhard Pichler Vadim Savenkov 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(2):277-302

Schema mappings are high-level specifications that describe the relationship between database schemas. They are an important tool in several areas of database research, notably in data integration and data exchange. However, a concrete theory of schema mapping optimization including the formulation of optimality criteria and the construction of algorithms for computing optimal schema mappings is completely lacking to date. The goal of this work is to fill this gap. We start by presenting a system of rewrite rules to minimize sets of source-to-target tuple-generating dependencies. Moreover, we show that the result of this minimization is unique up to variable renaming. Hence, our optimization also yields a schema mapping normalization. By appropriately extending our rewrite rule system, we also provide a normalization of schema mappings containing equality-generating target dependencies. An important application of such a normalization is in the area of defining the semantics of query answering in data exchange, since several definitions in this area depend on the concrete syntactic representation of the mappings. This is, in particular, the case for queries with negated atoms and for aggregate queries. The normalization of schema mappings allows us to eliminate the effect of the concrete syntactic representation of the mapping from the semantics of query answering. We discuss in detail how our results can be fruitfully applied to aggregate queries. 相似文献

6.

Uninterpreted Schema Matching with Embedded Value Mapping under Opaque Column Names and Data Values 总被引：1，自引：0，他引：1

Jaiswal Anuj Miller David J. Mitra Prasenjit 《Knowledge and Data Engineering, IEEE Transactions on》2010,22(2):291-304

Schema matching and value mapping across two heterogenous information sources are critical tasks in applications involving data integration, data warehousing, and federation of databases. Before data can be integrated from multiple tables, the columns and the values appearing in the tables must be matched. The complexity of the problem grows quickly with the number of data attributes/columns to be matched and due to multiple semantics of data values. Traditional research has tackled schema matching and value mapping independently. We propose a novel method that optimizes embedded value mappings to enhance schema matching in the presence of opaque data values and column names. In this approach, the fitness objective for matching a pair of attributes from two schemas depends on the value mapping function for each of the two attributes. Suitable fitness objectives include the euclidean distance measure, which we use in our experimental study, as well as relative (cross) entropy. We propose a heuristic local descent optimization strategy that uses sorting and two-opt switching to jointly optimize value mappings and attribute matches. Our experiments show that our proposed technique outperforms earlier uninterpreted schema matching methods, and thus, should form a useful addition to a suite of (semi) automated tools for resolving structural heterogeneity. 相似文献

7.

Data integration with uncertainty 总被引：1，自引：0，他引：1

Xin Luna Dong Alon Halevy Cong Yu 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(2):469-500

This paper reports our first set of results on managing uncertainty in data integration. We posit that data-integration systems need to handle uncertainty at three levels and do so in a principled fashion. First, the semantic mappings between the data sources and the mediated schema may be approximate because there may be too many of them to be created and maintained or because in some domains (e.g., bioinformatics) it is not clear what the mappings should be. Second, the data from the sources may be extracted using information extraction techniques and so may yield erroneous data. Third, queries to the system may be posed with keywords rather than in a structured form. As a first step to building such a system, we introduce the concept of probabilistic schema mappings and analyze their formal foundations. We show that there are two possible semantics for such mappings: by-table semantics assumes that there exists a correct mapping but we do not know what it is; by-tuple semantics assumes that the correct mapping may depend on the particular tuple in the source data. We present the query complexity and algorithms for answering queries in the presence of probabilistic schema mappings, and we describe an algorithm for efficiently computing the top-k answers to queries in such a setting. Finally, we consider using probabilistic mappings in the scenario of data exchange. 相似文献

8.

Schema mapping and query translation in heterogeneous P2P XML databases

Angela Bonifati Elaine Chang Terence Ho Laks V. S. Lakshmanan Rachel Pottinger Yongik Chung 《The VLDB Journal The International Journal on Very Large Data Bases》2010,19(2):231-256

Peers in a peer-to-peer data management system often have heterogeneous schemas and no mediated global schema. To translate queries across peers, we assume each peer provides correspondences between its schema and a small number of other peer schemas. We focus on query reformulation in the presence of heterogeneous XML schemas, including data–metadata conflicts. We develop an algorithm for inferring precise mapping rules from informal schema correspondences. We define the semantics of query answering in this setting and develop query translation algorithm. Our translation handles an expressive fragment of XQuery and works both along and against the direction of mapping rules. We describe the HePToX heterogeneous P2P XML data management system which incorporates our results. We report the results of extensive experiments on HePToX on both synthetic and real datasets. We demonstrate our system utility and scalability on different P2P distributions. 相似文献

9.

The Piazza peer data management system 总被引：5，自引：0，他引：5

Halevy A.Y. Ives Z.G. Jayant Madhavan Mork P. Suciu D. Tatarinov I. 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(7):787-798

Intuitively, data management and data integration tools are well-suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: They typically require a comprehensive schema design before they can be used to store or share information and they are difficult to extend because schema evolution is heavyweight and may break backward compatibility. As a result, many small-scale data sharing tasks are more easily facilitated by nondatabase-oriented tools that have little support for semantics. The goal of the peer data management system (PDMS) is to address this need: We propose the use of a decentralized, easily extensible data management architecture in which any user can contribute new data, schema information, or even mappings between other peers' schemes. PDMSs represent a natural step beyond data integration systems, replacing their single logical schema with an interlinked collection of semantic mappings between peers' individual schemas. This paper describes-several aspects of the Piazza PDMS, including the schema mediation formalism, query answering and optimization algorithms, and the relevance of PDMSs to the semantic Web. 相似文献

10.

Object-oriented query language access to relational databases: A semantic framework for query translation

Susan D. Urban Taoufik Ben Abdellatif 《Journal of Systems Integration》1995,5(2):123-156

This research investigates and approach to query processing in a multidatabase system that uses an objectoriented model to capture the semantics of other data models. The object-oriented model is used to construct a global schema, defining an integrated view of the different schemas in the environment. The model is also used as a self-describing model to build a meta-database for storing information about the global schema. A unique aspect of this work is that the object-oriented model is used to describe the different data models of the multidatabase environment, thereby extending the meta database with semantic information about the local schemas. With the global and local schemas all represented in an object-oriented form, structural mappings between the global schema and each local schema are then easily supported. An object algebra then provides a query language for expressing global queries, using the structural mappings to translate object algebra queries into SQL queries over local relational schema. The advantage of using an object algebra is that the object-oriented database can be viewed as a blackboard for temporary storage of local data and for establishing relationships between different databases. The object algebra can be used to directly retrieve temporarily-stored data from the object-oriented database or to transparently retrieve data from local sources using the translation process described in this paper. 相似文献

11.

Efficient management of uncertainty in XML schema matching

Jian Gong Reynold Cheng David W. Cheung 《The VLDB Journal The International Journal on Very Large Data Bases》2012,21(3):385-409

Despite advances in machine learning technologies a schema matching result between two database schemas (e.g., those derived from COMA++) is likely to be imprecise. In particular, numerous instances of ??possible mappings?? between the schemas may be derived from the matching result. In this paper, we study problems related to managing possible mappings between two heterogeneous XML schemas. First, we study how to efficiently generate possible mappings for a given schema matching task. While this problem can be solved by existing algorithms, we show how to improve the performance of the solution by using a divide-and-conquer approach. Second, storing and querying a large set of possible mappings can incur large storage and evaluation overhead. For XML schemas, we observe that their possible mappings often exhibit a high degree of overlap. We hence propose a novel data structure, called the block tree, to capture the commonalities among possible mappings. The block tree is useful for representing the possible mappings in a compact manner and can be efficiently generated. Moreover, it facilitates the evaluation of a probabilistic twig query (PTQ), which returns the non-zero probability that a fragment of an XML document matches a given query. For users who are interested only in answers with k-highest probabilities, we also propose the top-k PTQ and present an efficient solution for it. An extensive evaluation on real-world data sets shows that our approaches significantly improve the efficiency of generating, storing, and querying possible mappings. 相似文献

12.

Database application evolution: A transformational approach

Jean-Marc Jean-Luc 《Data & Knowledge Engineering》2006,59(3):534-558

While recent data management technologies, such as object oriented techniques, address the problem of database schema evolution, standard information systems currently in use raise challenging evolution problems. This paper examines database evolution from the developer point of view. It shows how requirements changes are propagated to database schemas, to data and to programs through a general strategy. This strategy requires the documentation of database design. When absent, such documentation has to be rebuilt through reverse engineering techniques. Our approach, called DB-MAIN, relies on a generic database model and on transformational paradigm that states that database engineering processes can be modeled by schema transformations. Indeed, a transformation provides both structural and instance mappings that formally define how to modify database structures and contents. We describe both the complete and a simplified approaches, and compare their merits and drawbacks. We then analyze the problem of program modification and describe a CASE tool that can assist developers in their task of system evolution. We illustrate our approach with Biomaze, a biochemical knowledge-based the database of which is rapidly evolving. 相似文献

13.

A history-driven approach at evolving views under meta data changes

Andreas Koeller Elke A. Rundensteiner 《Knowledge and Information Systems》2005,8(1):34-67

相似文献

14.

An object-oriented operation-based approach to translation between MOF metaschemas

Ruth Antoni 《Data & Knowledge Engineering》2008,67(3):444-462

This paper proposes a new approach to the schema translation problem. We deal with schemas whose metaschemas are instances of the OMG’s MOF. Most metaschemas can be defined as an instance of the MOF; therefore, our approach is widely applicable. We leverage the well-known object-oriented concepts embedded in the MOF and its instances (object types, attributes, relationship types, operations, IsA hierarchies, refinements, invariants, pre- and postconditions, etc.) to define metaschemas, schemas and their translations.The main contribution of our approach is the extensive use of object-oriented concepts in the definition of translation mappings, particularly the use of operations (and their refinements) and invariants, both of which are formalized in OCL. Our translation mappings can be used to check that two schemas are translations of each other, and to translate one into the other, in both directions. The translation mappings are declaratively defined by means of pre- and postconditions and invariants, and they can be implemented in any suitable language. From an implementation point of view, by taking a MOF-based approach we have a wide set of tools available, including tools that execute OCL. By way of example, we have defined all schemas and metaschemas in this paper and executed all the OCL expressions in the USE tool. 相似文献

15.

A research agenda for query processing in large-scale peer data management systems

Katja Hose Armin Roth Andr Zeitz Kai-Uwe Sattler Felix Naumann 《Information Systems》2008,33(7-8):597

Peer Data Management Systems (Pdms) are a novel, useful, but challenging paradigm for distributed data management and query processing. Conventional integrated information systems have a hierarchical structure with an integration component that manages a global schema and distributes queries against this schema to the underlying data sources. Pdms are a natural extension to this architecture by allowing each participating system (peer) to act both as a data source and as an integrator. Peers are interconnected by schema mappings, which guide the rewriting of queries between the heterogeneous schemas, and thus form a P2P (peer-to-peer)-like network.Despite several years of research, the development of efficient Pdms still holds many challenges. In this article we first survey the state of the art on peer data management: We classify Pdms by characteristics concerning their system model, their semantics, their query planning schemes, and their maintenance. Then we systematically examine open research directions in each of those areas. In particular, we observe that research results from both the domain of P2P systems and of conventional distributed data management can have an impact on the development of Pdms. 相似文献

16.

基于本体的关系数据集成的查询处理

王进鹏张亚非苗壮《计算机科学》2010,37(12):134-137

为实现异构关系数据库的语义集成,针对传统集成技术存在的问题,在对语义网等相关技术进行分析的基础上,研究基于本体的关系数据集成系统中的查询处理问题,提出了一种基于本体的关系数据库集成框架。设计了基于本体的关系数据的描述方法,使用本体作为集成的全局模式来描述关系模式的语义。设计了查询重写算法,该算法可以将基于全局模式的SPARQL查询重写为针对具体关系数据库的查询,从而实现对异构关系数据库的集成。实验表明,该算法具有良好的可扩展性。相似文献

17.

基于模式映射的查询计划生成算法

李由刘东波张维明《计算机科学》2006,33(3):125-128

因特网的迅速发展使得多数据源综合集成日益重要.但是,不同数据源之间数据结构和语义的异构性导致数据集成是相当困难的.本文提出了一种基于模式映射的查询计划生成算法.该算法在正确定义映射规则的前提下,根据不同的查询条件和不同的数据源模式,自动构造查询计划,并保证结果数据满足目标模式结构与引用完整性要求. 相似文献

18.

Ontology evolution without tears

《Journal of Web Semantics》2013

The evolution of ontologies is an undisputed necessity in ontology-based data integration. Yet, few research efforts have focused on addressing the need to reflect the evolution of ontologies used as global schemata onto the underlying data integration systems. In most of these approaches, when ontologies change their relations with the data sources, i.e., the mappings, are recreated manually, a process which is known to be error-prone and time-consuming. In this paper, we provide a solution that allows query answering in data integration systems under evolving ontologies without mapping redefinition. This is achieved by rewriting queries among ontology versions and then forwarding them to the underlying data integration systems to be answered. To this purpose, initially, we automatically detect and describe the changes among ontology versions using a high level language of changes. Those changes are interpreted as sound global-as-view (GAV) mappings, and they are used in order to produce equivalent rewritings among ontology versions. Whenever equivalent rewritings cannot be produced we a) guide query redefinition or b) provide the best “over-approximations”, i.e., the minimally-containing and minimally-generalized rewritings. We prove that our approach imposes only a small overhead over traditional query rewriting algorithms and it is modular and scalable. Finally, we show that it can greatly reduce human effort spent since continuous mapping redefinition is no longer necessary. 相似文献

19.

An XML query rewriting mechanism with multiple ontologies integration based on complex semantic mapping

Jinguang Gu Baowen Xu Xinmeng Chen 《Information Fusion》2008,9(4):512-522

Semantic integration, which can be divided into three parts including ontology mapping, mapping representation, and reasoning and query rewriting with mappings, plays a key role in information integration systems. This paper develops an XML query rewriting and ontology integration mechanism, which acts as a global-as-view (GAV) approach to represent and query semantic information in mediator based information integration environment. It proposes the patterns and properties of ontology mappings, discusses the procedure and algorithm of ontology integration firstly, and then proposes the ontology based XML query mechanism, especially the XML query rewriting mechanism. Finally, a mediator-based implementation of the mechanism in OBSA system is introduced. 相似文献

20.

基于GLAV集成中的模式匹配方法研究

下载免费PDF全文

陈祥松邓苏黄宏斌《计算机工程与科学》2006,28(6):86-89

基于GLAV集成的系统具体更好的伸缩性.本文研究了这种集成系统中的模式匹配方法,并基于模式树进行匹配,实现了全局模式与源模式之间的映射. 相似文献