首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 597 毫秒
1.
Within the database field, schema refinements have been proved useful for documentation and maintenance purposes; moreover, schemata describing the reality of interest at different levels of abstraction are extensively used in Computer Aided Software Engineering tools and visual query languages. So, much effort has been spent in analyzing schema transformations and schema refinements. Till now, however, while the syntaxof schema transformations has been deeply investigated, the semantics has been very often neglected. In this paper we present a full formal framework, supporting both the syntax and the semantics of schema refinements. Such a formal framework is used to support a methodology able to merge a set of schemata and the top-down chains of refinement planes produced during their design. The result of this kind of integration, that we call multilevel integration, is an integrated schema plus an associated top-down chain of schemata. The integrated schema and the chain are related to the input schemata by interesting properties, giving rise to a two-dimensional structure useful for exploring the data content of complex information systems.  相似文献   

2.
Integrated access to multiple data sources requires a homogeneous interface provided by a federated schema. Such a federated schema should correctly reflect the semantics of the component schemata of which it is composed. Since the semantics of a database schema is also determined by a set of semantic integrity constraints, a correct schema integration has to deal with integrity constraints existing in the different component schemata. Traditionally, most schema integration approaches solely concentrate on the structural integration of given database schemata. Local integrity constraints are often simply neglected. Their relationship to global extensional assertions, which form the basic integration constraints, are even ignored completely. In this paper, we discuss the impact of global extensional assertions and local integrity constraints on federated schemata. In particular, we point out the correspondence between local integrity constraints and global extensional assertions. The knowledge about the correspondences between the given integrity constraints and extensional assertions can then be utilized for an augmented schema integration process.  相似文献   

3.
Schema matching is the problem of finding relationships among concepts across data sources that are heterogeneous in format and in structure. Starting from the “hidden meaning” associated with schema labels (i.e. class/attribute names), it is possible to discover lexical relationships among the elements of different schemata. In this work, we propose an automatic method aimed at discovering probabilistic lexical relationships in the environment of data integration “on the fly”. Our method is based on a probabilistic lexical annotation technique, which automatically associates one or more meanings with schema elements w.r.t. a thesaurus/lexical resource. However, the accuracy of automatic lexical annotation methods on real-world schemata suffers from the abundance of non-dictionary words such as compound nouns and abbreviations. We address this problem by including a method to perform schema label normalization which increases the number of comparable labels. From the annotated schemata, we derive the probabilistic lexical relationships to be collected in the Probabilistic Common Thesaurus. The method is applied within the MOMIS data integration system but can easily be generalized to other data integration systems.  相似文献   

4.
In most of today's companies we find heterogeneous database systems containing redundant and inconsistent data. This threatens the ability to make coordinated, organization-wide responses to business problems. Benefits of data integration need not dominate the costs. In spite of this, we argue that some kind of common understanding of information structures is necessary. Thus, data integration is necessary at least to some degree. In the literature only technical aspects of schema integration are treated. Also, the complexity issue (large schemata are hard to understand) is usually not addressed explicitly. We present a business process oriented strategy for data integration. This method allows the determination of the order and the degree of integration. Complexity is reduced by schema clustering during the pre-integration phase.  相似文献   

5.
In a heterogeneous distributed database environment, each component database is characterized by its own logical schema and its own set of integrity constraints. The task of generating a global schema from a constituent local schemata has been addressed by many researchers. The complementary problem of using multiple sets of integrity constraints to create a new set of global integrity constraints is examined in this paper. These global integrity constraints facilitate both query optimization and update validation tasks.  相似文献   

6.
Extensible Markup Language (XML) is a common standard for data representation and exchange over the Web. Considering the increasing need for managing data on the Web, integration techniques are required to access heterogeneous XML sources. In this paper, we describe a unification method for heterogeneous XML schemata. The input to the unification method is a set of object-oriented-based canonical schemata that conceptually abstract local Document Type Definitions of the involved sources. The unification process applies specific algorithms and rules to the concepts of the canonical schemata to generate a preliminary ontology. Further adjustments on this preliminary ontology generate a reference ontology that acts as a front-end for user queries to the XML sources.  相似文献   

7.
A methodology for integration of heterogeneous databases   总被引:6,自引:0,他引:6  
The transformation of existing local databases to meet diverse application needs at the global level is performed through a four-layered procedure that stresses total schema integration and virtual integration of local databases. The proposed methodology covers both schema integration and database integration, and uses a four-layered schema architecture (local schemata, local object schemata, global schema, and global view schemata) with each layer presenting an integrated view of the concepts that characterize the layer below. Mechanisms for accomplishing this objective are presented in theoretical terms, along with a running example. Object equivalence classes, property equivalence classes, and other related concepts are discussed in the context of logical integration of heterogeneous schemata, while object instance equivalence classes and property instance equivalence classes, and other related concepts are discussed for data integration purposes. The proposed methodology resolves naming conflicts, scaling conflicts, type conflicts, and level of abstraction, and other types of conflicts during schema integration, and data inconsistencies during data integration  相似文献   

8.
Rank Aggregation for Automatic Schema Matching   总被引:2,自引:0,他引:2  
Schema matching is a basic operation of data integration, and several tools for automating it have been proposed and evaluated in the database community. Research in this area reveals that there is no single schema matcher that is guaranteed to succeed in finding a good mapping for all possible domains and, thus, an ensemble of schema matchers should be considered. In this paper, we introduce schema metamatching, a general framework for composing an arbitrary ensemble of schema matchers and generating a list of best ranked schema mappings. Informally, schema metamatching stands for computing a "consensus" ranking of alternative mappings between two schemata, given the "individual" graded rankings provided by several schema matchers. We introduce several algorithms for this problem, varying from adaptations of some standard techniques for general quantitative rank aggregation to novel techniques specific to the problem of schema matching, and to combinations of both. We provide a formal analysis of the applicability and relative performance of these algorithms and evaluate them empirically on a set of real-world schemata  相似文献   

9.
The task of combining data residing at different sources to provide the user a unified view is known as data integration. Schema mappings act as glue between the global schema and the source schemas of a data integration system. Global-and-local-as-view (GLAV) is one the approaches for specifying the schema mappings. Tableaux are used for expressing queries and functional dependencies on a single database. We investigate a general technique for expressing a GLAV mapping by a tabular structure called mapping assertion tableaux (MAT). In a similar way, we also express the tuple generating dependency (tgd) and equality generating dependency (egd) constraints by tabular forms, called tabular tgd (TTGD) and tabular egd (TEGD), respectively. A set consisting of the MATs, TTGDs and TEGDs are called schema mapping tableaux (SMT). We present algorithms that use SMT as operator on an instance of the source schema to produce an instance of the target schema. We show that the target instances computed by the SMT are ‘minimal’ and ‘most general’ in nature. We also define the notion of equivalence between the schema mappings of two data integration systems and present algorithms that optimize schema mappings through the manipulation of the SMT.  相似文献   

10.
Schema matching is the task of providing correspondences between concepts describing the meaning of data in various heterogeneous, distributed data sources. It is recognized to be one of the basic operations required by the process of data and schema integration and its outcome serves in many tasks such as targeted content delivery and view integration. Schema matching research has been going on for more than 25 years now. An interesting research topic, that was largely left untouched involves the automatic selection of schema matchers to an ensemble, a set of schema matchers. To the best of our knowledge, none of the existing algorithmic solutions offer such a selection feature. In this paper we provide a thorough investigation of this research topic. We introduce a new heuristic, Schema Matcher Boosting (SMB). We show that SMB has the ability to choose among schema matchers and to tune their importance. As such, SMB introduces a new promise for schema matcher designers. Instead of trying to design a perfect schema matcher, a designer can instead focus on finding better than random schema matchers. For the effective utilization of SMB, we propose a complementary approach to the design of new schema matchers. We separate schema matchers into first-line and second-line matchers. First-line schema matchers were designed by-and-large as applications of existing works in other areas (e.g., machine learning and information retrieval) to schemata. Second-line schema matchers operate on the outcome of other schema matchers to improve their original outcome. SMB selects matcher pairs, where each pair contains a first-line matcher and a second-line matcher. We run a thorough set of experiments to analyze SMB ability to effectively choose schema matchers and show that SMB performs better than other, state-of-the-art ensemble matchers.  相似文献   

11.
Web data-extraction systems in use today mainly focus on the generation of extraction rules, i.e., wrapper induction. Thus, they appear ad hoc and are difficult to integrate when a holistic view is taken. Each phase in the data-extraction process is disconnected and does not share a common foundation to make the building of a complete system straightforward. In this paper, we demonstrate a holistic approach to Web data extraction. The principal component of our proposal is the notion of a document schema. Document schemata are patterns of structures embedded in documents. Once the document schemata are obtained, the various phases (e.g. training set preparation, wrapper induction and document classification) can be easily integrated. The implication of this is improved efficiency and better control over the extraction procedure. Our experimental results confirmed this. More importantly, because a document can be represented as avector of schema, it can be easily incorporated into existing systems as the fabric for integration.  相似文献   

12.
王博  郭波 《计算机科学》2007,34(10):129-132
异构数据源集成系统的主要任务是屏蔽数据源数据模型的异构性,提供对数据的统一访问。公共数据模型、模型变换和中间数据模型被广泛用来解决该问题。由于数据集成工作的工作量和任务对象日益复杂、规模日趋庞大,仅采用公共数据模型不能满足现有集成要求。异构数据源数据模型内在的关联性虽然使得面向语义集成变得更加复杂,但更利于模型转换方法在数据集成中的应用。模型转换是模式集成的基础,本文给出了异构数据源模型的形式化描述方法和基础模型转换操作形式化框架。该框架能够保证模型、实例和约束三者的相互独立,适用于大多数基础数据模型及其之间的转换和集成应用。通过该框架可简化和规范异构数据源数据模型转换和模式集成过程。  相似文献   

13.
Answering heterogeneous database queries with degrees of uncertainty   总被引:1,自引:0,他引:1  
In heterogeneous database systems,partial values have been used to resolve some schema integration problems. Performing operations on partial values may producemaybe tuples in the query result which cannot be compared. Thus, users have no way to distinguish which maybe tuple is the most possible answer. In this paper, the concept of partial values is generalized toprobabilistic partial values. We propose an approach to resolve the schema integration problems using probabilistic partial values and develop a full set of extended relational operators for manipulating relations containing probabilistic partial values. With this approach, the uncertain answer tuples of a query are associated with degrees of uncertainty (represented by probabilities). That provides users a comparison among maybe tuples and a better understanding on the query results. Besides, extended selection and join are generalized to -selection and -join, respectively, which can be used to filter out maybe tuples with low probabilities — those which have probabilities smaller than .  相似文献   

14.
Data integration with uncertainty   总被引:1,自引:0,他引:1  
This paper reports our first set of results on managing uncertainty in data integration. We posit that data-integration systems need to handle uncertainty at three levels and do so in a principled fashion. First, the semantic mappings between the data sources and the mediated schema may be approximate because there may be too many of them to be created and maintained or because in some domains (e.g., bioinformatics) it is not clear what the mappings should be. Second, the data from the sources may be extracted using information extraction techniques and so may yield erroneous data. Third, queries to the system may be posed with keywords rather than in a structured form. As a first step to building such a system, we introduce the concept of probabilistic schema mappings and analyze their formal foundations. We show that there are two possible semantics for such mappings: by-table semantics assumes that there exists a correct mapping but we do not know what it is; by-tuple semantics assumes that the correct mapping may depend on the particular tuple in the source data. We present the query complexity and algorithms for answering queries in the presence of probabilistic schema mappings, and we describe an algorithm for efficiently computing the top-k answers to queries in such a setting. Finally, we consider using probabilistic mappings in the scenario of data exchange.  相似文献   

15.
数据源集成系统中动态字典构造方法研究   总被引:2,自引:1,他引:1  
本文从异构数据源集成系统的角度出发,引入模板和动态字典的概念,统一描述各种数据源数据的模式。动态字典不仅能描述对象的结构特征,还能描述对象的行为特征,完全符合面向对象特点。除此以外,本文还引入五种模板操作的定义,并证明OIM对象操作的模板可由相应的模板操作构成,从而给出不通过扫描数据库,而是利用局部动态字典的模板操作构造集成系统全局动态字典的方法。  相似文献   

16.
Database Integration Using Neural Networks: Implementation and Experiences   总被引:4,自引:0,他引:4  
Applications in a wide variety of industries require access to multiple heterogeneous distributed databases. One step in heterogeneous database integration is semantic integration: identifying corresponding attributes in different databases that represent the same real world concept. The rules of semantic integration can not be ‘pre-programmed’ since the information to be accessed is heterogeneous and attribute correspondences could be fuzzy. Manually comparing all possible pairs of attributes is an unreasonably large task. We have applied artificial neural networks (ANNs) to this problem. Metadata describing attributes is automatically extracted from a database to represent their ‘signatures’. The metadata is used to train neural networks to find similar patterns of metadata describing corresponding attributes from other databases. In our system, the rules to determine corresponding attributes are discovered through machine learning. This paper describes how we applied neural network techniques in a database integration problem and how we represent an attribute with its metadata as discriminators. This paper focuses on our experiments on effectiveness of neural networks and each discriminator. We also discuss difficulties of using neural networks for this problem and our wish list for the Machine Learning community. Received 18 February 1999 / Revised 22 April 1999 / Accepted in revised form 20 November 1999  相似文献   

17.
It is well known that the complexity of testing the correctness of an arbitrary update to a database view can be far greater than the complexity of testing a corresponding update to the main schema. However, views are generally managed according to some protocol which limits the admissible updates to a subset of all possible changes. The question thus arises as to whether there is a more tractable relationship between these two complexities in the presence of such a protocol. In this paper, this question is addressed for closed update strategies, which are based upon the constant-complement approach of Bancilhon and Spyratos. The approach is to address a more general question – that of characterizing the complexity of axiomatization of views, relative to the complexity of axiomatization of the main schema. For schemata constrained by denial or consistency constraints, that is, statements which rule out certain situations, such as the equality-generating dependencies (EGDs) or, more specifically, the functional dependencies (FDs) of the relational model, a broad and comprehensive result is obtained in a very general framework which is not tied to the relational model in any way. It states that every such schema is governed by an equivalent set of constraints which embed into the component views, and which are no more complex than the original set. For schemata constrained by generating dependencies, of which tuple-generating dependencies (TGDs) in general and, more specifically, both join dependencies (JDs) and inclusion dependencies (INDs) are examples within the relational model, a similar result is obtained, but only within a context known as meet-uniform decompositions, which fails to recapture some important situations. To address the all-important case of relational schemata constrained by both FDs and INDs, a hybrid approach is also developed, in which the general theory regarding denial constraints is blended with a focused analysis of a special but very practical subset of the INDs known as fanout-free unary inclusion dependencies (fanout-free UINDs), to obtain results parallel to the above-mentioned cases: every such schema is governed by an equivalent set of constraints which embed into the component views, and which are no more complex than the original set. In all cases, the question of view update complexity is then answered via a corollary to this main result. Parts of this paper are based upon work reported in [21].  相似文献   

18.
SCOPE/CIMS系统中模式集成的形式化基础   总被引:3,自引:0,他引:3  
大多数据库系统中,模式集成是将若干个已经存在的模式集成为一个统一模式的过程,是实现异构信息集成的关键问题之一,为满足面向对象的多数据源集成系统SCOPE/CIMS中模式集成的需要,本文提出了一个支持模式集成的形式化基础,为实现一个半自动化的模式集成辅助工具奠定了基础,主要内容包括;(1)一个对应关系描述模型,以支持模式的分析与比较;(2)一套模式集成规则,以提供模式合并与重构的原则;(3)等价类的  相似文献   

19.
The self-organizing knowledge representation aspects in heterogeneous information environments involving object-oriented databases, relational databases, and rulebases are investigated. The authors consider a facet of self-organizability which sustains the structural semantic integrity of an integrated schemea regardless of the dynamic nature of local schemata. To achieve this objective, they propose an overall scheme for schema translation and schema integration with an object-oriented data model as common data model, and it is shown that integrated schemata can be maintained effortlessly by propagating updates in local schemata to integrated schemata unambiguously  相似文献   

20.
Schema theory is the most well-known model of evolutionary algorithms. Imitating from genetic algorithms (GA), nearly all schemata defined for genetic programming (GP) refer to a set of points in the search space that share some syntactic characteristics. In GP, syntactically similar individuals do not necessarily have similar semantics. The instances of a syntactic schema do not behave similarly, hence the corresponding schema theory becomes unreliable. Therefore, these theories have been rarely used to improve the performance of GP. The main objective of this study is to propose a schema theory which could be a more realistic model for GP and could be potentially employed for improving GP in practice. To achieve this aim, the concept of semantic schema is introduced. This schema partitions the search space according to semantics of trees, regardless of their syntactic variety. We interpret the semantics of a tree in terms of the mutual information between its output and the target. The semantic schema is characterized by a set of semantic building blocks and their joint probability distribution. After introducing the semantic building blocks, an algorithm for finding them in a given population is presented. An extraction method that looks for the most significant schema of the population is provided. Moreover, an exact microscopic schema theorem is suggested that predicts the expected number of schema samples in the next generation. Experimental results demonstrate the capability of the proposed schema definition in representing the semantics of the schema instances. It is also revealed that the semantic schema theorem estimation is more realistic than previously defined schemata.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号