期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Automating the database schema evolution process

Carlo Curino Hyun Jin Moon Alin Deutsch Carlo Zaniolo 《The VLDB Journal The International Journal on Very Large Data Bases》2013,22(1):73-98

相似文献

2.

Schema versioning and database conversion techniques for bi-temporal databases

Han-Chieh Wei Ramez Elmasri 《Annals of Mathematics and Artificial Intelligence》2000,30(1-4):23-52

Schema evolution and schema versioning are two techniques used for managing database evolution. Schema evolution keeps only the current version of a schema and database after applying schema changes. Schema versioning creates new schema versions and converts the corresponding data while preserving the old schema versions and data. To provide the most generality, bi-temporal databases can be used to realize schema versioning, since they allow both retroactive and proactive updates to the schema and database. In this paper we first study two proposed database conversion approaches for supporting schema evolution and schema versioning: single table version approach and multiple table version approach. We then propose the partial table version approach to solve the problems encountered in these approaches when applied to bi-temporal databases. This revised version was published online in June 2006 with corrections to the Cover Date. 相似文献

3.

A transparent schema-evolution system based on object-oriented viewtechnology

Young-Gook Ra Rundensteiner E.A. 《Knowledge and Data Engineering, IEEE Transactions on》1997,9(4):600-624

When a database is shared by many users, updates to the database schema are almost always prohibited because there is a risk of making existing application programs obsolete when they run against the modified schema. The paper addresses the problem by integrating schema evolution with view facilities. When new requirements necessitate schema updates for a particular user, then the user specifies schema changes to his personal view, rather than to the shared base schema. Our view schema evolution approach then computes a new view schema that reflects the semantics of the desired schema change, and replaces the old view with the new one. We show that our system provides the means for schema change without affecting other views (and thus without affecting existing application programs). The persistent data is shared by different views of the schema, i.e., both old as well as newly developed applications can continue to interoperate. The paper describes a solution approach of realizing the evolution mechanism as a working system, which as its key feature requires the underlying object oriented view system to support capacity augmenting views. We present algorithms that implement the complete set of typical schema evolution operations as view definitions. Lastly, we describe the transparent schema evolution system (TSE) that we have built on top of GemStone, including our solution for supporting capacity augmenting view mechanisms 相似文献

4.

Efficient Revalidation of XML Documents

Raghavachari M. Shmueli O. 《Knowledge and Data Engineering, IEEE Transactions on》2007,19(4):554-567

We study the problem of schema revalidation where XML data known to conform to one schema must be validated with respect to another schema. Such revalidation algorithms have applications in schema evolution, query processing, XML-based programming languages, and other domains. We describe how knowledge of conformance to an XML Schema may be used to determine conformance to another XML Schema efficiently. We examine both the situation where an XML document is modified before it is revalidated and the situation where it is unmodified 相似文献

5.

An empirical approach to studying software evolution

Kemerer C.F. Slaughter S. 《IEEE transactions on pattern analysis and machine intelligence》1999,25(4):493-509

With the approach of the new millennium, a primary focus in software engineering involves issues relating to upgrading, migrating, and evolving existing software systems. In this environment, the role of careful empirical studies as the basis for improving software maintenance processes, methods, and tools is highlighted. One of the most important processes that merits empirical evaluation is software evolution. Software evolution refers to the dynamic behaviour of software systems as they are maintained and enhanced over their lifetimes. Software evolution is particularly important as systems in organizations become longer-lived. However, evolution is challenging to study due to the longitudinal nature of the phenomenon in addition to the usual difficulties in collecting empirical data. We describe a set of methods and techniques that we have developed and adapted to empirically study software evolution. Our longitudinal empirical study involves collecting, coding, and analyzing more than 25000 change events to 23 commercial software systems over a 20-year period. Using data from two of the systems, we illustrate the efficacy of flexible phase mapping and gamma sequence analytic methods, originally developed in social psychology to examine group problem solving processes. We have adapted these techniques in the context of our study to identify and understand the phases through which a software system travels as it evolves over time. We contrast this approach with time series analysis. Our work demonstrates the advantages of applying methods and techniques from other domains to software engineering and illustrates how, despite difficulties, software evolution can be empirically studied 相似文献

6.

Incremental entity resolution on rules and data

Steven Euijong Whang Hector Garcia-Molina 《The VLDB Journal The International Journal on Very Large Data Bases》2014,23(1):77-102

Entity resolution (ER) identifies database records that refer to the same real-world entity. In practice, ER is not a one-time process, but is constantly improved as the data, schema and application are better understood. We first address the problem of keeping the ER result up-to-date when the ER logic or data “evolve” frequently. A naïve approach that re-runs ER from scratch may not be tolerable for resolving large datasets. This paper investigates when and how we can instead exploit previous “materialized” ER results to save redundant work with evolved logic and data. We introduce algorithm properties that facilitate evolution, and we propose efficient rule and data evolution techniques for three ER models: match-based clustering (records are clustered based on Boolean matching information), distance-based clustering (records are clustered based on relative distances), and pairs ER (the pairs of matching records are identified). Using real datasets, we illustrate the cost of materializations and the potential gains of evolution over the naïve approach. 相似文献

7.

Domain-knowledge-guided schema evolution for accounting database systems

Jia-Lin ChenDennis McLeodDaniel O''Leary 《Expert systems with applications》1995,9(4):491-501

The static meta-data view of accounting database management is that the schema of a database is designed before the database is populated and remains relatively fixed over the life cycle of the system. However, the need to support accounting database evolution is clear: a static meta-data view of an accounting database cannot support next generation dynamic environment where system migration, organization reengineering, and heterogeneous system interoperation are essential. This paper presents a knowledge-based approach and mechanism to support dynamic accounting database schema evolution in an object-based data modeling context. When an accounting database schema does not meet the requirements of a firm, the schema must be changed. Such schema evolution can be realized via a sequence of evolution operators. As a result, this paper considers the question: what heuristics and knowledge are necessary to guide a system to choose a sequence of operators to complete a given evolution task for an accounting database? In particular, we first define a set of basic evolution schema operators, employing heuristics to guide the evolution process. Second, we explore how domain-specific knowledge can be used to guide the use of the operators to complete the evolution task. A well-known accounting data model, REA model, is used here to guide the schema evolution process. Third, we discuss a prototype system, REAtool, to demonstrate and test our approach. 相似文献

8.

Incremental maintenance of schema-restructuring views in SchemaSQL

Koeller A. Rundensteiner E.A. 《Knowledge and Data Engineering, IEEE Transactions on》2004,16(9):1096-1111

The integration of data, especially from heterogeneous sources, is a hard and widely studied problem. One particularly challenging issue is the integration of sources that are semantically equivalent but schematically heterogeneous. While two such data sources may represent the same information, one may store the information inside tuples (data) while the other may store it in attribute or relation names (schema). The SchemaSQL query language is a recent solution to this problem powerful enough to restructure such sources into each other without the loss of information. We propose the first incremental view maintenance strategy for such schema-restructuring views. Our strategy, based on an algebraic representation of the view query, correctly transforms a data update or a schema change to a source into sequences of schema and data updates to be applied to the view. We also introduce an optimization of incremental maintenance using batching. We present a proof of correctness of the propagation approach. We also describe the implementation of our SchemaSQL Query Processor and View Maintainer. Last, our experimental results demonstrate that, in many cases, incremental SchemaSQL view maintenance is significantly faster than complete view recomputation. 相似文献

9.

MeDEA: A database evolution architecture with traceability

Eladio Jorge ngel L. María A. 《Data & Knowledge Engineering》2008,65(3):419-441

One of the most important challenges that software engineers (designers, developers) still have to face in their everyday work is the evolution of working database systems. As a step for the solution of this problem in this paper we propose MeDEA, which stands for Metamodel-based Database Evolution Architecture. MeDEA is a generic evolution architecture that allows us to maintain the traceability between the different artifacts involved in any database development process. MeDEA is generic in the sense that it is independent of the particular modeling techniques being used. In order to achieve this, a metamodeling approach has been followed for the development of MeDEA. The other basic characteristic of the architecture is the inclusion of a specific component devoted to storing the translation of conceptual schemas to logical ones. This component, which is one of the most noteworthy contributions of our approach, enables any modification (evolution) realized on a conceptual schema to be traced to the corresponding logical schema, without having to regenerate this schema from scratch, and furthermore to be propagated to the physical and extensional levels. 相似文献

10.

On the application of model-driven engineering in data reengineering

《Information Systems》2017

Model-Driven Engineering (MDE) emphasizes the systematic use of models to improve software productivity and some aspects of the software quality such as maintainability or interoperability. Model-driven techniques have proven useful not only as regards developing new software applications but also the reengineering of legacy systems. Models and metamodels provide a high-level formalism with which to represent artefacts commonly manipulated in the different stages of a software evolution process (e.g., a software migration) while model transformation allows the automation of the evolution tasks to be performed. Some approaches and experiences of model-driven software reengineering have recently been presented but they have been focused on the code while data reengineering aspects have been overlooked. The objective of this work is to assess to what extent data reengineering processes could also take advantage of MDE techniques.The article starts by characterising data-reengineering in terms of the tasks involved. It then goes on to state that MDE is particularly amenable as regards addressing the tasks previously identified. We present an MDE-based approach for the reengineering of data whose purpose is to improve the quality of the logical schema in a relational data migration scenario. As a proof of concept, the approach is illustrated for two common problems in data re-engineering: undeclared foreign keys and disabled constraints. This approach is organised following the three stages of a software reengineering process: reverse engineering, restructuring and forward engineering. We show how each stage is implemented by means of model transformation chains. A running example is used to illustrate each stage of the process throughout the article. The approach is validated with a real widely-used database. An assessment of the application of MDE in each stage is then presented, and we conclude by identifying the main benefits and drawbacks of using MDE in data reengineering. 相似文献

11.

Handling probabilistic integrity constraints in pay-as-you-go reconciliation of data models

《Information Systems》2019

Data models capture the structure and characteristic properties of data entities, e.g., in terms of a database schema or an ontology. They are the backbone of diverse applications, reaching from information integration, through peer-to-peer systems and electronic commerce to social networking. Many of these applications involve models of diverse data sources. Effective utilisation and evolution of data models, therefore, calls for matching techniques that generate correspondences between their elements. Various such matching tools have been developed in the past. Yet, their results are often incomplete or erroneous, and thus need to be reconciled, i.e., validated by an expert. This paper analyses the reconciliation process in the presence of large collections of data models, where the network induced by generated correspondences shall meet consistency expectations in terms of integrity constraints. We specifically focus on how to handle data models that show some internal structure and potentially differ in terms of their assumed level of abstraction. We argue that such a setting calls for a probabilistic model of integrity constraints, for which satisfaction is preferred, but not required. In this work, we present a model for probabilistic constraints that enables reasoning on the correctness of individual correspondences within a network of data models, in order to guide an expert in the validation process. To support pay-as-you-go reconciliation, we also show how to construct a set of high-quality correspondences, even if an expert validates only a subset of all generated correspondences. We demonstrate the efficiency of our techniques for real-world datasets comprising database schemas and ontologies from various application domains. 相似文献

12.

Database design with user-definable modelling concepts

Peter C. Lockemann Guido Moerkotte Andrea Neufeld Klaus Radermacher Norbert Runge 《Data & Knowledge Engineering》1993,10(3):229-257

Modelling is an integral part of engineering processes. Consequently, database design for engineering applications should take into account the modelling concepts used by engineers. On the other hand, these applications exhibit a wide diversity of modelling concepts. Rather than consolidating these into one single semantic data model one should aim for correspondingly specialized semantic models. This paper takes a constructive approach to developing such specialized models by proposing an Extensible Semantic Model (ESM) as the basis for declaring specialized semantic data models. The paper introduces a computerized environment for database design based on an ESM, and discusses the consequences of the ESM for a number of design tools: the need for a formal definition of the notion of modelling concept in order to have reliable and precise foundation for the extensions, declarative techniques for quickly introducing graphical representations for new concepts and for using them during schema design, conceptual-level test data generation for a designer-oriented evaluation of designs, and optimization techniques to control the wide latitude in mapping a conceptual schema to a logical schema. First experiences seem to point to considerable productivity gains during database design. 相似文献

13.

Semantic heterogeneity resolution in federated databases by metadata implantation and stepwise evolution 总被引：3，自引：0，他引：3

Goksel Aslan Dennis McLeod 《The VLDB Journal The International Journal on Very Large Data Bases》1999,8(2):120-132

A key aspect of interoperation among data-intensive systems involves the mediation of metadata and ontologies across database boundaries. One way to achieve such mediation between a local database and a remote database is to fold remote metadata into the local metadata, thereby creating a common platform through which information sharing and exchange becomes possible. Schema implantation and semantic evolution, our approach to the metadata folding problem, is a partial database integration scheme in which remote and local (meta)data are integrated in a stepwise manner over time. We introduce metadata implantation and stepwise evolution techniques to interrelate database elements in different databases, and to resolve conflicts on the structure and semantics of database elements (classes, attributes, and individual instances). We employ a semantically rich canonical data model, and an incremental integration and semantic heterogeneity resolution scheme. In our approach, relationships between local and remote information units are determined whenever enough knowledge about their semantics is acquired. The metadata folding problem is solved by implanting remote database elements into the local database, a process that imports remote database elements into the local database environment, hypothesizes the relevance of local and remote classes, and customizes the organization of remote metadata. We have implemented a prototype system and demonstrated its use in an experimental neuroscience environment. Received June 19, 1998 / Accepted April 20, 1999 相似文献

14.

Robust and simple database evolution

Kai Herrmann Hannes Voigt Jonas Rausch Andreas Behrend Wolfgang Lehner 《Information Systems Frontiers》2018,20(1):45-61

相似文献

15.

The GMAP: a versatile tool for physical data independence 总被引：1，自引：0，他引：1

Odysseas G. Tsatalos Marvin H. Solomon Yannis E. Ioannidis 《The VLDB Journal The International Journal on Very Large Data Bases》1996,5(2):101-118

Physical data independence is touted as a central feature of modern database systems. It allows users to frame queries in terms of the logical structure of the data, letting a query processor automatically translate them into optimal plans that access physical storage structures. Both relational and object-oriented systems, however, force users to frame their queries in terms of a logical schema that is directly tied to physical structures. We present an approach that eliminates this dependence. All storage structures are defined in a declarative language based on relational algebra as functions of a logical schema. We present an algorithm, integrated with a conventional query optimizer, that translates queries over this logical schema into plans that access the storage structures. We also show how to compile update requests into plans that update all relevant storage structures consistently and optimally. Finally, we report on experiments with a prototype implementation of our approach that demonstrate how it allows storage structures to be tuned to the expected or observed workload to achieve significantly better performance than is possible with conventional techniques. Edited by Matthias Jarke, Jorge Bocca, Carlo Zaniolo. Received September 15, 1994 / Accepted September 1, 1995 相似文献

16.

Scalable keyword search on large data streams

Lu Qin Jeffrey Xu Yu Lijun Chang 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(1):35-57

It is widely recognized that the integration of information retrieval (IR) and database (DB) techniques provides users with a broad range of high quality services. Along this direction, IR-styled m-keyword query processing over a relational database in an rdbms framework has been well studied. It finds all hidden interconnected tuple structures, for example connected trees that contain keywords and are interconnected by sequences of primary/foreign key relationships among tuples. A new challenging issue is how to monitor events that are implicitly interrelated over an open-ended relational data stream for a user-given m-keyword query. Such a relational data stream is a sequence of tuple insertion/deletion operations. The difficulty of the problem is related to the number of costly joins to be processed over time when tuples are inserted and/or deleted. Such cost is mainly affected by three parameters, namely, the number of keywords, the maximum size of interconnected tuple structures, and the complexity of the database schema when it is viewed as a schema graph. In this paper, we propose new approaches. First, we propose a novel algorithm to efficiently determine all the joins that need to be processed for answering an m-keyword query. Second, we propose a new demand-driven approach to process such a query over a high speed relational data stream. We show that we can achieve high efficiency by significantly reducing the number of intermediate results when processing joins over a relational data stream. The proposed new techniques allow us to achieve high scalability in terms of both query plan generation and query plan execution. We conducted extensive experimental studies using synthetic data and real data to simulate a relational data stream. Our approach significantly outperforms existing algorithms. 相似文献

17.

Automatic assessment of interactive OLAP explorations

《Information Systems》2019

Interactive Database Exploration (IDE) is the process of exploring a database by means of a sequence of queries aiming at answering an often imprecise user information need. In this paper, we are interested in the following problem: how to automatically assess the quality of such an exploration. We study this problem under the following angles. First, we formulate the hypothesis that the quality of the exploration can be measured by evaluating the improvement of the skill of writing queries that contribute to the exploration. Second, we restrict to a particular use case of database exploration, namely OLAP explorations of data cubes. Third, we propose to use simple query features to model its contribution to an exploration. The first hypothesis allows to use the Knowledge Tracing, a popular model for skill acquisition, to measure the evolution of the ability to write contributive queries. The restriction to OLAP exploration allows to take advantage of well known OLAP primitives and schema. Finally, using query features allows to apply a supervised learning approach to model query contribution. We show on both real and artificial explorations that automatic assessment of OLAP explorations is feasible and is consistent with the user’s and expert’s viewpoints. 相似文献

18.

A survey of approaches to automatic schema matching 总被引：76，自引：1，他引：75

Erhard Rahm Philip A. Bernstein 《The VLDB Journal The International Journal on Very Large Data Bases》2001,10(4):334-350

Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant limitations. On the other hand, previous research papers have proposed many techniques to achieve a partial automation of the match operation for specific application domains. We present a taxonomy that covers many of these existing approaches, and we describe the approaches in some detail. In particular, we distinguish between schema-level and instance-level, element-level and structure-level, and language-based and constraint-based matchers. Based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component. Received: 5 February 2001 / Accepted: 6 September 2001 Published online: 21 November 2001 相似文献

19.

A framework for user driven data management

《Information Systems》2014

Scientists within the materials engineering community produce a wide variety of data, with datasets differing in size and complexity. Examples include large 3D volume densitometry files (voxel) generated by microfocus computer tomography (μCT) and simple text files containing results from a tensile test. Increasingly, there is a need to share this data as part of international collaborations. The design of a suitable database schema and the architecture of a system that can cope with the varying information is a continuing problem in the management of heterogeneous data. We present a model flexible enough to meet users’ diverse requirements. Metadata is held using a database and its design allows users to control their own data structures. Data is held in a file store which, in combination with the metadata, gives huge flexibility. Using examples from materials engineering we illustrate how the model can be applied. 相似文献

20.

A logical view of structured files

Serge Abiteboul Sophie Cluet Tova Milo 《The VLDB Journal The International Journal on Very Large Data Bases》1998,7(2):96-114

Structured data stored in files can benefit from standard database technology. In particular, we show here how such data can be queried and updated using declarative database languages. We introduce the notion of structuring schema, which consists of a grammar annotated with database programs. Based on a structuring schema, a file can be viewed as a database structure, queried and updated as such. For queries, we show that almost standard database optimization techniques can be used to answer queries without having to construct the entire database. For updates, we study in depth the propagation to the file of an update specified on the database view of this file. The problem is not feasible in general and we present a number of negative results. The positive results consist of techniques that allow to propagate updates efficiently under some reasonable locality conditions on the structuring schemas. Received November 1, 1995 / Accepted June 20, 1997 相似文献