共查询到20条相似文献,搜索用时 109 毫秒
1.
The Piazza peer data management system 总被引:5,自引:0,他引:5
Halevy A.Y. Ives Z.G. Jayant Madhavan Mork P. Suciu D. Tatarinov I. 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(7):787-798
Intuitively, data management and data integration tools are well-suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: They typically require a comprehensive schema design before they can be used to store or share information and they are difficult to extend because schema evolution is heavyweight and may break backward compatibility. As a result, many small-scale data sharing tasks are more easily facilitated by nondatabase-oriented tools that have little support for semantics. The goal of the peer data management system (PDMS) is to address this need: We propose the use of a decentralized, easily extensible data management architecture in which any user can contribute new data, schema information, or even mappings between other peers' schemes. PDMSs represent a natural step beyond data integration systems, replacing their single logical schema with an interlinked collection of semantic mappings between peers' individual schemas. This paper describes-several aspects of the Piazza PDMS, including the schema mediation formalism, query answering and optimization algorithms, and the relevance of PDMSs to the semantic Web. 相似文献
2.
GridVine is a semantic overlay infrastructure based on a peer-to-peer (P2P) access structure. Built following the principle of data independence, it separates a logical layer - in which data, schemas, and schema mappings are managed - from a physical layer consisting of a structured P2P network supporting decentralized indexing, key load-balancing, and efficient routing. The system is decentralized, yet fosters semantic interoperability through pair-wise schema mappings and query reformulation. GridVine's heterogeneous but semantically related information sources can be queried transparently using iterative query reformulation. The authors discuss a reference implementation of the system and several mechanisms for resolving queries collaboratively. 相似文献
3.
QUERY ROUTING IN A PEER-TO-PEER SEMANTIC LINK NETWORK 总被引:9,自引:0,他引:9
A semantic link peer-to-peer (P2P) network specifies and manages semantic relationships between peers' data schemas and can be used as the semantic layer of a scalable Knowledge Grid. The proposed approach consists of an automatic semantic link discovery method, a tool for building and maintaining P2P semantic link networks (P2PSLNs), a semantic-based peer similarity measurement for efficient query routing, and the schema mapping algorithms for query reformulation and heterogeneous data integration. The proposed approach has three important aspects. First, it uses semantic links to enrich the relationships between peers' data schemas. Second, it considers not only nodes but also the XML structure in measuring the similarity between schemas to efficiently and accurately forward queries to relevant peers. Third, it copes with semantic and structural heterogeneity and data inconsistency so that peers can exchange and translate heterogeneous information within a uniform view. 相似文献
4.
Angela Bonifati Elaine Chang Terence Ho Laks V. S. Lakshmanan Rachel Pottinger Yongik Chung 《The VLDB Journal The International Journal on Very Large Data Bases》2010,19(2):231-256
Peers in a peer-to-peer data management system often have heterogeneous schemas and no mediated global schema. To translate queries across peers, we assume each peer provides correspondences between its schema and a small number of other peer schemas. We focus on query reformulation in the presence of heterogeneous XML schemas, including data–metadata conflicts. We develop an algorithm for inferring precise mapping rules from informal schema correspondences. We define the semantics of query answering in this setting and develop query translation algorithm. Our translation handles an expressive fragment of XQuery and works both along and against the direction of mapping rules. We describe the HePToX heterogeneous P2P XML data management system which incorporates our results. We report the results of extensive experiments on HePToX on both synthetic and real datasets. We demonstrate our system utility and scalability on different P2P distributions. 相似文献
5.
Diego Calvanese Giuseppe De Giacomo Domenico Lembo Maurizio Lenzerini Riccardo Rosati 《Information Systems》2008,33(4-5):360-384
We study peer-to-peer (P2P) data integration, where each peer models an autonomous system that exports data in terms of its own schema, and data interoperation is achieved by means of mappings among the peer schemas, rather than through a unique global schema. We propose a multi-modal epistemic logical formalization based on the idea that each peer is conceived as a rational agent that exchanges knowledge/belief with other peers, thus nicely modeling the modular structure of the system. We then address the issue of dealing with possible inconsistencies, and distinguish between two types of inconsistencies, called local and P2P, respectively. We define a nonmonotonic extension of our logic that is able to reason on the beliefs of peers under both local and P2P inconsistency tolerance. Tolerance to local inconsistency essentially means that the presence of inconsistency within one peer does not affect the consistency of the whole system. Tolerance to P2P inconsistency means being able to resolve inconsistencies arising from the interaction between peers. We study query answering in the new nonmonotonic logic, with the main goal of establishing its decidability and its computational complexity. Indeed, we show that, under reasonable assumptions on peer schemas, query answering is decidable, and is coNP-complete with respect to data complexity, i.e., the size of the data stored at the peers. 相似文献
6.
In the study of data exchange one usually assumes an open-world semantics, making it possible to extend instances of target schemas. An alternative closed-world semantics only moves ‘as much data as needed’ from the source to the target to satisfy constraints of a schema mapping. It avoids some of the problems exhibited by the open-world semantics, but limits the expressivity of schema mappings. Here we propose a mixed approach: one can designate different attributes of target schemas as open or closed, to combine the additional expressivity of the open-world semantics with the better behavior of query answering in closed worlds. We define such schema mappings, and show that they cover a large space of data exchange solutions with two extremes being the known open and closed-world semantics. We investigate the problems of query answering and schema mapping composition, and prove two trichotomy theorems, classifying their complexity based on the number of open attributes. We find conditions under which schema mappings compose, extending known results to a wide range of closed-world mappings. We also provide results for restricted classes of queries and mappings guaranteeing lower complexity. 相似文献
7.
Katja Hose Armin Roth Andr Zeitz Kai-Uwe Sattler Felix Naumann 《Information Systems》2008,33(7-8):597
Peer Data Management Systems (Pdms) are a novel, useful, but challenging paradigm for distributed data management and query processing. Conventional integrated information systems have a hierarchical structure with an integration component that manages a global schema and distributes queries against this schema to the underlying data sources. Pdms are a natural extension to this architecture by allowing each participating system (peer) to act both as a data source and as an integrator. Peers are interconnected by schema mappings, which guide the rewriting of queries between the heterogeneous schemas, and thus form a P2P (peer-to-peer)-like network.Despite several years of research, the development of efficient Pdms still holds many challenges. In this article we first survey the state of the art on peer data management: We classify Pdms by characteristics concerning their system model, their semantics, their query planning schemes, and their maintenance. Then we systematically examine open research directions in each of those areas. In particular, we observe that research results from both the domain of P2P systems and of conventional distributed data management can have an impact on the development of Pdms. 相似文献
8.
Georg Gottlob Reinhard Pichler Vadim Savenkov 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(2):277-302
Schema mappings are high-level specifications that describe the relationship between database schemas. They are an important tool in several areas of database research, notably in data integration and data exchange. However, a concrete theory of schema mapping optimization including the formulation of optimality criteria and the construction of algorithms for computing optimal schema mappings is completely lacking to date. The goal of this work is to fill this gap. We start by presenting a system of rewrite rules to minimize sets of source-to-target tuple-generating dependencies. Moreover, we show that the result of this minimization is unique up to variable renaming. Hence, our optimization also yields a schema mapping normalization. By appropriately extending our rewrite rule system, we also provide a normalization of schema mappings containing equality-generating target dependencies. An important application of such a normalization is in the area of defining the semantics of query answering in data exchange, since several definitions in this area depend on the concrete syntactic representation of the mappings. This is, in particular, the case for queries with negated atoms and for aggregate queries. The normalization of schema mappings allows us to eliminate the effect of the concrete syntactic representation of the mapping from the semantics of query answering. We discuss in detail how our results can be fruitfully applied to aggregate queries. 相似文献
9.
This framework for peer-to-peer data-sharing systems allows efficient query answering over a network of semantically related peers. A simple class-based language appropriate for practical applications defines peer schemas as hierarchies of atomic classes and mappings as inclusions of logical combinations of atomic classes. 相似文献
10.
Inter-business collaborative contexts prefigure a distributed scenario where companies organize and coordinate themselves to develop common and shared opportunities, but traditional business intelligence systems do not provide support to this end. To fill this gap, in this paper we envision a peer-to-peer data warehousing architecture based on a network of heterogeneous peers, each exposing query answering functionalities aimed at sharing business information. To enhance the decision making process, an OLAP query expressed on a peer needs to be properly reformulated on the local multidimensional schemata of the other peers. To this end, we present a language for the definition of mappings between the multidimensional schemata of peers and we introduce a query reformulation framework that relies on the translation of mappings, queries, and multidimensional schemata onto the relational level. Then, we formalize a query reformulation algorithm and prove two properties: correctness and closure, that are essential in a peer-to-peer setting. Finally, we discuss the main implementation issues related to the reformulation setting proposed, with specific reference to the case in which the local multidimensional engines hosted by peers use the standard MDX language. 相似文献
11.
This research investigates and approach to query processing in a multidatabase system that uses an objectoriented model to capture the semantics of other data models. The object-oriented model is used to construct a global schema, defining an integrated view of the different schemas in the environment. The model is also used as a self-describing model to build a meta-database for storing information about the global schema. A unique aspect of this work is that the object-oriented model is used to describe the different data models of the multidatabase environment, thereby extending the meta database with semantic information about the local schemas. With the global and local schemas all represented in an object-oriented form, structural mappings between the global schema and each local schema are then easily supported. An object algebra then provides a query language for expressing global queries, using the structural mappings to translate object algebra queries into SQL queries over local relational schema. The advantage of using an object algebra is that the object-oriented database can be viewed as a blackboard for temporary storage of local data and for establishing relationships between different databases. The object algebra can be used to directly retrieve temporarily-stored data from the object-oriented database or to transparently retrieve data from local sources using the translation process described in this paper. 相似文献
12.
13.
14.
The problem of integrating data from multiple data sources—either on the Internet or within enterprises—has received much attention in the database and AI communities. The focus has been on building data integration systems that provide a uniform query interface to the sources. A key bottleneck in building such systems has been the laborious manual construction of semantic mappings between the query interface and the source schemas. Examples of mappings are element location maps to address and price maps to listed-price. We propose a multistrategy learning approach to automatically find such mappings. The approach applies multiple learner modules, where each module exploits a different type of information either in the schemas of the sources or in their data, then combines the predictions of the modules using a meta-learner. Learner modules employ a variety of techniques, ranging from Naive Bayes and nearest-neighbor classification to entity recognition and information retrieval. We describe the LSD system, which employs this approach to find semantic mappings. To further improve matching accuracy, LSD exploits domain integrity constraints, user feedback, and nested structures in XML data. We test LSD experimentally on several real-world domains. The experiments validate the utility of multistrategy learning for data integration and show that LSD proposes semantic mappings with a high degree of accuracy. 相似文献
15.
Preserving mapping consistency under schema changes 总被引:1,自引:0,他引:1
In dynamic environments like the Web, data sources may change not only their data but also their schemas, their semantics, and their query capabilities. When a mapping is left inconsistent by a schema change, it has to be detected and updated. We present a novel framework and a tool (ToMAS) for automatically adapting (rewriting) mappings as schemas evolve. Our approach considers not only local changes to a schema but also changes that may affect and transform many components of a schema. Our algorithm detects mappings affected by structural or constraint changes and generates all the rewritings that are consistent with the semantics of the changed schemas. Our approach explicitly models mapping choices made by a user and maintains these choices, whenever possible, as the schemas and mappings evolve. When there is more than one candidate rewriting, the algorithm may rank them based on how close they are to the semantics of the existing mappings.Received: 13 January 2004, Accepted: 26 March 2004, Published online: 12 August 2004Edited by: M. Carey 相似文献
16.
Guija Choe Young-Kwang Nam Joseph Goguen Guilian Wang 《Computer Languages, Systems and Structures》2009,35(4):422-434
We describe a method for generating queries for retrieving data from distributed heterogeneous semistructured documents, and its implementation in the metadata interface DDXMI (distributed document XML metadata interchange). The proposed system generates local queries appropriate to local schemas from a user query over the global schema. The system constructs mappings between global schema and local schemas (extracted from local documents if not given), path substitution, and node identification for resolving the heterogeneity among nodes with the same label that often exist in semistructured data. The system uses Quilt as its XML query language. An experiment is reported over three local semistructured documents: ‘thesis’, ‘reports’, and ‘journal’ documents with ‘article’ global schema. The prototype was developed under Windows system with Java and JavaCC. 相似文献
17.
查询接口集成是Deep Web数据集成的关键,在动态环境下,Web数据源的变化会引起数据模式映射的失效,使得查询接口集成维护难度增加,因此数据模式映射失效检测是Deep Web数据集成研究中的热点问题.针对目前数据模式映射失效检测方法的局限,在模糊聚集算子的研究基础上,提出一种适用于数据模式映射失效检测的结果融合算法.通过实验对比测试,并对映射失效检测方法的性能和效率进行了分析和实验,结果证明了提出的方法对于失效模型的检测是有效的. 相似文献
18.
Prof. Stefano Spaccapietra Christine Parent Yann Dupont 《The VLDB Journal The International Journal on Very Large Data Bases》1992,1(1):81-126
Due to the proliferation of database applications, the integration of existing databases into a distributed or federated system is one of the major challenges in responding to enterprises' information requirements. Some proposed integration techniques aim at providing database administrators (DBAs) with a view definition language they can use to build the desired integrated schema. These techniques leave to the DBA the responsibility of appropriately restructuring schema elements from existing local schemas and of solving inter-schema conflicts. This paper investigates theassertion-based approach, in which the DBA's action is limited to pointing out corresponding elements in the schemas and to defining the nature of the correspondence in between. This methodology is capable of: ensuring better integration by taking into account additional semantic information (assertions about links); automatically solving structural conflicts; building the integrated schema without requiring conforming of initial schemas; applying integration rules to a variety of data models; and performing view as well as database integration. This paper presents the basic ideas underlying our approach and focuses on resolution of structural conflicts. 相似文献
19.
Data integration with uncertainty 总被引:1,自引:0,他引:1
Xin Luna Dong Alon Halevy Cong Yu 《The VLDB Journal The International Journal on Very Large Data Bases》2009,18(2):469-500
This paper reports our first set of results on managing uncertainty in data integration. We posit that data-integration systems
need to handle uncertainty at three levels and do so in a principled fashion. First, the semantic mappings between the data
sources and the mediated schema may be approximate because there may be too many of them to be created and maintained or because
in some domains (e.g., bioinformatics) it is not clear what the mappings should be. Second, the data from the sources may
be extracted using information extraction techniques and so may yield erroneous data. Third, queries to the system may be
posed with keywords rather than in a structured form. As a first step to building such a system, we introduce the concept
of probabilistic schema mappings and analyze their formal foundations. We show that there are two possible semantics for such
mappings: by-table semantics assumes that there exists a correct mapping but we do not know what it is; by-tuple semantics assumes that the correct mapping may depend on the particular tuple in the source data. We present the query complexity
and algorithms for answering queries in the presence of probabilistic schema mappings, and we describe an algorithm for efficiently
computing the top-k answers to queries in such a setting. Finally, we consider using probabilistic mappings in the scenario of data exchange. 相似文献
20.
We consider data exchange for XML documents: given source and target schemas, a mapping between them, and a document conforming to the source schema, construct a target document and answer target queries in a way that is consistent with the source information. The problem has primarily been studied in the relational context, in which data-exchange systems have also been built. Since many XML documents are stored in relations, it is natural to consider using a relational system for XML data exchange. However, there is a complexity mismatch between query answering in relational and in XML data exchange. This indicates that to make the use of relational systems possible, restrictions have to be imposed on XML schemas and mappings, as well as on XML shredding schemes. We isolate a set of five requirements that must be fulfilled in order to have a faithful representation of the XML data-exchange problem by a relational translation. We then demonstrate that these requirements naturally suggest the in-lining technique for data-exchange tasks. Our key contribution is to provide shredding algorithms for schemas, documents, mappings and queries, and demonstrate that they enable us to correctly perform XML data-exchange tasks using a relational system. 相似文献