首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
The integration ofinf ormation systems is becoming increasingly important. A common requirement in distributed data-intensive applications, such as data warehousing and data mining, is that the various databases involved be joined in a process called schema integration. The entity-relationship (ER) model or a variant of the ER model is often used as the common data model. To aid the schema conforming, merging and restructuring phases of the integration process, various transformations have been defined to map between various equivalent ER representations. In this paper, we describe a different approach to integrate ER schemas. We focus on the resolution of structural conflicts, that is, when related real-world concepts are modeled using different constructs in different schemas. Unlike previous work, our approach proposes to resolve the structural conflict between an entity type in one schema and an attribute in another schema and show that the other structural conflicts are automatically resolved. This reduces the manual effort required in integration. We give a detailed algorithm to transform an attribute in one schema into an equivalent entity type in another schema without any loss of semantics, that is, our transformation is both information preserving and constraint preserving.  相似文献   

2.
View integration: a step forward in solving structural conflicts   总被引:1,自引:0,他引:1  
Thanks to the development of the federated systems approach on the one hand and the emphasis on user involvement in database design on the other, the interest in schema integration techniques is significantly increasing. Theories, methods and tools have been proposed. Conflict resolution is the key issue. Different perceptions by schema designers may lead to different representations. A way must be found to support these different representations within a single system. Most current integration methodologies rely on modification of initial schemas to solve the conflicts. This approach needs a strong interaction with the database administrator, who has authority to modify the initial schemas. This paper presents an approach to view integration specifically intended to support the coexistence of different representations of the same real-world objects. The main characteristics of this approach are the following: automatic resolution of structural conflicts, conflict resolution performed without modification of initial views, use of a formal declarative approach for user definition of inter-view correspondences, applicability to a variety of data models, and automatic generation of structural and operational mappings between the views and the integrated schema. Allowing users' views to be kept unchanged should result in improved user satisfaction. Each user is able to define his own view of the database, without having to conform to some other user's view. Moreover, such a feature is essential in database integration if existing programs are to be preserved  相似文献   

3.
Schema integration aims to create a mediated schema as a unified representation of existing heterogeneous sources sharing a common application domain. These sources have been increasingly written in XML due to its versatility and expressive power. Unfortunately, these sources often use different elements and structures to express the same concepts and relations, thus causing substantial semantic and structural conflicts. Such a challenge impedes the creation of high-quality mediated schemas and has not been adequately addressed by existing integration methods. In this paper, we propose a novel method, named XINTOR, for automating the integration of heterogeneous schemas. Given a set of XML sources and a set of correspondences between the source schemas, our method aims to create a complete and minimal mediated schema: it completely captures all of the concepts and relations in the sources without duplication, provided that the concepts do not overlap. Our contributions are fourfold. First, we resolve structural conflicts inherent in the source schemas. Second, we introduce a new statistics-based measure, called path cohesion, for selecting concepts and relations to be a part of the mediated schema. The path cohesion is statistically computed based on multiple path quality dimensions such as average path length and path frequency. Third, we resolve semantic conflicts by augmenting the semantics of similar concepts with context-dependent information. Finally, we propose a novel double-layered mediated schema to retain a wider range of concepts and relations than existing mediated schemas, which are at best either complete or minimal, but not both. Performed on both real and synthetic datasets, our experimental results show that XINTOR outperforms existing methods with respect to (i) the mediated-schema quality using precision, recall, F-measure, and schema minimality; and (ii) the execution performance based on execution time and scale-up performance.  相似文献   

4.
On resolving schematic heterogeneity in multidatabase systems   总被引:4,自引:0,他引:4  
The objective of a multidatabase system is to provide a single uniform interface to accessing multiple independent databases being managed by multiple independent, and possibly heterogeneous, database systems. One crucial element in the design of a multidatabase system is the design of a data definition language for specifying a schema that represents the integration of the schemas of multiple independent databases. The design of such a language in turn requires a comprehensive classification of the conflicts (i.e., discrepancies) among the schemas of the independent databases and development of techniques for resolving (i.e., homogenizing) all of the conflicts in the classification. An earlier paper provided a comprehensive classification of schematic conflicts that may arise when integrating multiple independent relational database (RDB) schemas into a single multidatabase (MDB) schema. In this paper, we provide a comprehensive classification of techniques for resolving the schematic conflicts that may arise when integrating multiple RDB schemas, or RDB schemas and object-oriented database (OODB) schemas, or multiple OODB schemas. The classification of conflict resolution techniques includes not only those necessary for resolving schematic conflicts identified in the earlier paper, but also additional conflicts that arise when OODBs become part of the databases to be integrated. Most of the conflict resolution techniques discussed in the paper have already been incorporated into SQL/M, a multidatabase language implemented in UniSQL/M, a commercially available multidatabase system from UniSQL, Inc. which integrated SQL-based relational database systems and the UniSQL/X unified relational and object-oriented database system.  相似文献   

5.
The integration of views and schemas is an important part of database design and evolution and permits the sharing of data across complex applications. The view and schema integration methodologies used to date are driven purely by semantic considerations, and allow integration of objects only if that is valid from both semantic and structural view points. We discuss a new integration method called structural integration that has the advantage of being able to integrate objects that have structural similarities, even if they differ semantically. This is possible by using the object-oriented Dual Model which allows separate representation of structure and semantics. Structural integration has several advantages, including the identification of shared common structures that is important for sharing of data and methods.  相似文献   

6.
In schema integration, schematic discrepancies occur when data in one database correspond to metadata in another. We explicitly declare the context that is the meta information relating to the source, classification, property etc. of entities, relationships or attribute values in entity–relationship (ER) schemas. We present algorithms to resolve schematic discrepancies by transforming metadata into the attribute values of entity types, keeping the information and constraints of original schemas. Although focusing on the resolution of schematic discrepancies, our technique works seamlessly with the existing techniques resolving other semantic heterogeneities in schema integration.  相似文献   

7.
8.
This paper addresses the problem of handling semantic heterogeneity during database schema integration. We focus on the semantics of terms used as identifiers in schema definitions. Our solution does not rely on the names of the schema elements or the structure of the schemas. Instead, we utilize formal ontologies consisting of intensional definitions of terms represented in a logical language. The approach is based on similarity relations between intensional definitions in different ontologies. We present the definitions of similarity relations based on intensional definitions in formal ontologies. The extensional consequences of intensional relations are addressed. The paper shows how similarity relations are discovered by a reasoning system using a higher-level ontology. These similarity relations are then used to derive an integrated schema in two steps. First, we show how to use similarity relations to generate the class hierarchy of the global schema. Second, we explain how to enhance the class definitions with attributes. This approach reduces the cost of generating or re-generating global schemas for tightly-coupled federated databases.  相似文献   

9.
This research investigates and approach to query processing in a multidatabase system that uses an objectoriented model to capture the semantics of other data models. The object-oriented model is used to construct a global schema, defining an integrated view of the different schemas in the environment. The model is also used as a self-describing model to build a meta-database for storing information about the global schema. A unique aspect of this work is that the object-oriented model is used to describe the different data models of the multidatabase environment, thereby extending the meta database with semantic information about the local schemas. With the global and local schemas all represented in an object-oriented form, structural mappings between the global schema and each local schema are then easily supported. An object algebra then provides a query language for expressing global queries, using the structural mappings to translate object algebra queries into SQL queries over local relational schema. The advantage of using an object algebra is that the object-oriented database can be viewed as a blackboard for temporary storage of local data and for establishing relationships between different databases. The object algebra can be used to directly retrieve temporarily-stored data from the object-oriented database or to transparently retrieve data from local sources using the translation process described in this paper.  相似文献   

10.
Schemaless databases, and document-oriented databases in particular, are preferred to relational ones for storing heterogeneous data with variable schemas and structural forms. However, the absence of a unique schema adds complexity to analytical applications, in which a single analysis often involves large sets of data with different schemas. In this paper we propose an original approach to OLAP on collections stored in document-oriented databases. The basic idea is to stop fighting against schema variety and welcome it as an inherent source of information wealth in schemaless sources. Our approach builds on four stages: schema extraction, schema integration, FD enrichment, and querying; these stages are discussed in detail in the paper. To make users aware of the impact of schema variety, we propose a set of indicators inspired by the definition of attribute density. Finally, we experimentally evaluate our approach in terms of efficiency and effectiveness.  相似文献   

11.
12.
Intuitively, data management and data integration tools should be well suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: they typically require a common and comprehensive schema design before they can be used to store or share information, and they are difficult to extend because schema evolution is heavyweight and may break backward compatibility. As a result, many large-scale data sharing tasks are more easily facilitated by non-database-oriented tools that have little support for semantics.The goal of the peer data management system (PDMS) is to address this need: we propose the use of a decentralized, easily extensible data management architecture in which any user can contribute new data, schema information, or even mappings between other peers schemas. PDMSs represent a natural step beyond data integration systems, replacing their single logical schema with an interlinked collection of semantic mappings between peers individual schemas.This paper considers the problem of schema mediation in a PDMS. Our first contribution is a flexible language for mediating between peer schemas that extends known data integration formalisms to our more complex architecture. We precisely characterize the complexity of query answering for our language. Next, we describe a reformulation algorithm for our language that generalizes both global-as-view and local-as-view query answering algorithms. Then we describe several methods for optimizing the reformulation algorithm and an initial set of experiments studying its performance. Finally, we define and consider several global problems in managing semantic mappings in a PDMS.Received: 16 December 2002, Accepted: 14 April 2003, Published online: 12 December 2003Edited by: V. Atluri  相似文献   

13.
In this paper we consider an approach to developing complex database schemas. Apart from the theoretical model of the approach, we also developed a CASE tool named Integrated Information Systems*Case, R.6.2 (IIS*Case) that supports the practical application of the approach. In this paper the basis of our approach to the design and integration of database schemas and ways of using IIS*Case is outlined. The main features of a new version of IIS*Case, developed in Java, are described. IIS*Case is based on the concept of ‘form type’ and supports the conceptual modelling of a database schema, generating subschemas and integrating them into a relational database schema in 3NF. IIS*Case provides an intelligent support for complex and highly formalized design and programming tasks. Having an advanced knowledge of information systems and database design is not a compulsory prerequisite for using IIS*Case. IIS*Case is based on a methodology of gradual integration of independently designed subschemas into a database schema. The process of independent subschema design may lead to collisions in expressing real‐world constraints. IIS*Case uses specialized algorithms for checking the consistency of constraints embedded in a database schema and its subschemas. This paper briefly outlines the application of the process of detecting collisions, and actions the designer may take to resolve them. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

14.
This paper presents a specific approach of integrating a relational database system into a federated database system. The underlying database integration process consist of three steps: first, the external database systems have to be connected to the integrated database system environment and the external data models have to be mapped into a canonical data model. This step is often called syntactic transformation including structural enrichment and leads to component schemas for each external DBMS. Second, the resulting schemas from the first step are used to construct export schemas which are then integrated into global, individual schemas or views in the third step. In this paper we focus on the first step for relational databases, i.e., the connection of a relational database system and the mapping of the relational model into a canonical data model. We take POSTGRES as the relational database system and the object-oriented federated database system VODAK as the integration platform which provides the open, object-oriented data model as the canonical data model for the integration. We show different variations of mapping the relational model. By exploiting the metaclass concept provided by VML, the modelling language of VODAK, we show how to tailor VML such that the canonical data model meets the requirements of integrating POSTGRES into the global database system VODAK in an efficient way.  相似文献   

15.
This paper studies certain transformations of XML schemas, which are widely used in algorithms of the XML data management. In view of the fact that properties and functional characteristics of the XML documents considerably differ from those of data of other type, the solutions of a number of typical data management problems (such as the XML data validation, schema inference, and data translation to/from other models) for them are more complicated. The general idea of our approach to solving these problems is to transform the original structure (i.e., structural schema constraints) into another structure without loss of information about properties of the original data that are important for applications. The suggested technique has been successfully used in various algorithms for solving problems of this kind. In this paper, a systematic approach to solving these problems is discussed. Methods for reducing the XML schemas to several canonical forms are presented, and algorithms of solving the management problems for data satisfying schemas represented in the canonical forms are examined.  相似文献   

16.
A key aspect of interoperation among data-intensive systems involves the mediation of metadata and ontologies across database boundaries. One way to achieve such mediation between a local database and a remote database is to fold remote metadata into the local metadata, thereby creating a common platform through which information sharing and exchange becomes possible. Schema implantation and semantic evolution, our approach to the metadata folding problem, is a partial database integration scheme in which remote and local (meta)data are integrated in a stepwise manner over time. We introduce metadata implantation and stepwise evolution techniques to interrelate database elements in different databases, and to resolve conflicts on the structure and semantics of database elements (classes, attributes, and individual instances). We employ a semantically rich canonical data model, and an incremental integration and semantic heterogeneity resolution scheme. In our approach, relationships between local and remote information units are determined whenever enough knowledge about their semantics is acquired. The metadata folding problem is solved by implanting remote database elements into the local database, a process that imports remote database elements into the local database environment, hypothesizes the relevance of local and remote classes, and customizes the organization of remote metadata. We have implemented a prototype system and demonstrated its use in an experimental neuroscience environment. Received June 19, 1998 / Accepted April 20, 1999  相似文献   

17.
In this paper, we introduce an approach to task-driven ontology design which is based on information discovery from database schemas. Techniques for semi-automatically discovering terms and relationships used in the information space, denoting concepts, their properties and links are proposed, which are applied in two stages. At the first stage, the focus is on the discovery of heterogeneity/ambiguity of data representations in different schemas. For this purpose, schema elements are compared according to defined comparison features and similarity coefficients are evaluated. This stage produces a set of candidates for unification into ontology concepts. At the second stage, decisions are made on which candidates to unify into concepts and on how to relate concepts by semantic links. Ontology concepts and links can be accessed according to different perspectives, so that the ontology can serve different purposes, such as, providing a search space for powerful mechanisms for concept location, setting a basis for query formulation and processing, and establishing a reference for recognizing terminological relationships between elements in different schemas.  相似文献   

18.
基于GLAV集成的系统具体更好的伸缩性.本文研究了这种集成系统中的模式匹配方法,并基于模式树进行匹配,实现了全局模式与源模式之间的映射.  相似文献   

19.
While recent data management technologies, such as object oriented techniques, address the problem of database schema evolution, standard information systems currently in use raise challenging evolution problems. This paper examines database evolution from the developer point of view. It shows how requirements changes are propagated to database schemas, to data and to programs through a general strategy. This strategy requires the documentation of database design. When absent, such documentation has to be rebuilt through reverse engineering techniques. Our approach, called DB-MAIN, relies on a generic database model and on transformational paradigm that states that database engineering processes can be modeled by schema transformations. Indeed, a transformation provides both structural and instance mappings that formally define how to modify database structures and contents. We describe both the complete and a simplified approaches, and compare their merits and drawbacks. We then analyze the problem of program modification and describe a CASE tool that can assist developers in their task of system evolution. We illustrate our approach with Biomaze, a biochemical knowledge-based the database of which is rapidly evolving.  相似文献   

20.
A methodology for integration of heterogeneous databases   总被引:6,自引:0,他引:6  
The transformation of existing local databases to meet diverse application needs at the global level is performed through a four-layered procedure that stresses total schema integration and virtual integration of local databases. The proposed methodology covers both schema integration and database integration, and uses a four-layered schema architecture (local schemata, local object schemata, global schema, and global view schemata) with each layer presenting an integrated view of the concepts that characterize the layer below. Mechanisms for accomplishing this objective are presented in theoretical terms, along with a running example. Object equivalence classes, property equivalence classes, and other related concepts are discussed in the context of logical integration of heterogeneous schemata, while object instance equivalence classes and property instance equivalence classes, and other related concepts are discussed for data integration purposes. The proposed methodology resolves naming conflicts, scaling conflicts, type conflicts, and level of abstraction, and other types of conflicts during schema integration, and data inconsistencies during data integration  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号