首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
View materialization is an effective method to increase query efficiency in a data warehouse and improve OLAP query performance. However, one encounters the problem of space insufficiency if all possible views are materialized in advance. Reducing query time by means of selecting a proper set of materialized views with a lower cost is crucial for efficient data warehousing. In addition, the costs of data warehouse creation, query, and maintenance have to be taken into account while views are materialized. In this paper, we propose efficient algorithms to select a proper set of materialized views, constrained by storage and cost considerations, to help speed up the entire data warehousing process. We derive a cost model for data warehouse query and maintenance as well as efficient view selection algorithms that effectively exploit the gain and loss metrics. The main contribution of our paper is to speed up the selection process of materialized views. Concurrently, this will greatly reduce the overall cost of data warehouse query and maintenance.  相似文献   

2.
This paper describes the integration of a multidatabase system and a knowledge-base system to support the data-integration component of a data warehouse. The multidatabase system integrates various component databases with a common query language; however, it does not provide capability for schema integration and other utilities necessary for data warehousing. In addition, the knowledge base system offers a declarative logic language with second-order syntax but first-order semantics for integrating the schemes of the data sources into the warehouse and for defining complex, recursively defined materialized views. Furthermore, deductive rules are also used for cleaning, checking the integrity and summarizing the data imported into the data warehouse. The knowledge base system features an efficient incremental view maintenance mechanism that is used for refreshing the data warehouse, without querying the data sources.  相似文献   

3.
《Information Systems》1999,24(3):229-253
Most database researchers have studied data warehouses (DW) in their role as buffers of materialized views, mediating between update-intensive OLTP systems and query-intensive decision support. This neglects the organizational role of data warehousing as a means of centralized information flow control. As a consequence, a large number of quality aspects relevant for data warehousing cannot be expressed with the current DW meta models. This paper makes two contributions towards solving these problems. Firstly, we enrich the meta data about DW architectures by explicit enterprise models. Secondly, many very different mathematical techniques for measuring or optimizing certain aspects of DW quality are being developed. We adapt the Goal-Question-Metric approach from software quality management to a meta data management environment in order to link these special techniques to a generic conceptual framework of DW quality. The approach has been implemented in full on top of the ConceptBase repository system and has undergone some validation by applying it to the support of specific quality-oriented methods, tools, and application projects in data warehousing.  相似文献   

4.
Joseph Fong  Herbert Shiu  Davy Cheung 《Software》2008,38(11):1183-1213
Integrating information from multiple data sources is becoming increasingly important for enterprises that partner with other companies for e‐commerce. However, companies have their internal business applications deployed on diverse platforms and no standard solution for integrating information from these sources exists. To support business intelligence query activities, it is useful to build a data warehouse on top of middleware that aggregates the data obtained from various heterogeneous database systems. Online analytical processing (OLAP) can then be used to provide fast access to materialized views from the data warehouse. Since extensible markup language (XML) documents are a common data representation standard on the Internet and relational tables are commonly used for production data, OLAP must handle both relational and XML data. SQL and XQuery can be used to process the materialized relational and XML data cubes created from the aggregated data. This paper shows how to handle the two kinds of data cubes from a relational–XML data warehouse using extract, transformation and loading. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

5.
为了加快对大量数据的查询处理速度,通常在数据仓库以实视图方式存储数据,当基础数据发生变化时,这些实视图也必须随着更新,因而视图自维护和一致性维护成为数据仓库的重要问题。本文提出利用视图计算的中间结果创建辅助视图,在数据仓库中进行实体化,采用有效的增量维护算法计算实视图的精确变化,实现数据仓库视图自维护。  相似文献   

6.
Selection of views to materialize in a data warehouse   总被引:4,自引:0,他引:4  
A data warehouse stores materialized views of data from one or more sources, with the purpose of efficiently implementing decision-support or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The goal is to select an appropriate set of views that minimizes total query response time and the cost of maintaining the selected views, given a limited amount of resource, e.g., materialization time, storage space, etc. In This work, we have developed a theoretical framework for the general problem of selection of views in a data warehouse. We present polynomial-time heuristics for a selection of views to optimize total query response time under a disk-space constraint, for some important special cases of the general data warehouse scenario, viz.: 1) an AND view graph, where each query/view has a unique evaluation, e.g., when a multiple-query optimizer can be used to general a global evaluation plan for the queries, and 2) an OR view graph, in which any view can be computed from any one of its related views, e.g., data cubes. We present proofs showing that the algorithms are guaranteed to provide a solution that is fairly close to (within a constant factor ratio of) the optimal solution. We extend our heuristic to the general AND-OR view graphs. Finally, we address in detail the view-selection problem under the maintenance cost constraint and present provably competitive heuristics.  相似文献   

7.
View selection for designing the global data warehouse   总被引:1,自引:0,他引:1  
A global data warehouse (DW) integrates data from multiple distributed heterogeneous databases and other information sources. A global DW can be abstractly seen as a set of materialized views. The selection of views for materialization in a DW is an important decision in the design of a DW. Current commercial products do not provide tools for automatic DW design. We provide a general method that, given a set of select-project-join queries to be satisfied by the DW, generates sets of materialized views that satisfy all the input queries. This process is complex since ‘common subexpressions' between the queries need to be detected and exploited. Our method is then applied to solve the problem of selecting such a materialized view set that fits in the space allocated to the DW for materialization and minimizes the combined overall query evaluation and view maintenance cost. We design algorithms which are implemented and we report on their experimental evaluation.  相似文献   

8.
数据仓库通常要对大量的数据进行运算,以精简的结果来回答用户的查询,这一特点使得物化视图技术在数据仓库中尤为重要.然而现有支持物化视图自动选择的方法是静态的,它违背了联机分析处理(OLAP)和决策支持系统(DSS)的动态本质.本文提出了可扩展的动态物化视图方法,通过将整个物化视图选择问题(MVS)分解为三个阶段,降低了问题的复杂度,提高了物化视图的有效性.通过动态调整,物化视图能即时适应查询需求.算法复杂度分析证明了方案的可扩展性.动态调整算法模拟实验验证了方案具有很好的自适应性.  相似文献   

9.
OLAP queries involve a lot of aggregations on a large amount of data in data warehouses. To process expensive OLAP queries efficiently, we propose a new method to rewrite a given OLAP query using various kinds of materialized views which already exist in data warehouses. We first define the normal forms of OLAP queries and materialized views based on the selection and aggregation granularities, which are derived from the lattice of dimension hierarchies. Conditions for usability of materialized views in rewriting a given query are specified by relationships between the components of their normal forms. We present a rewriting algorithm for OLAP queries that can effectively utilize materialized views having different selection granularities, selection regions, and aggregation granularities together. We also propose an algorithm to find a set of materialized views that results in a rewritten query which can be executed efficiently. We show the effectiveness and performance of the algorithm experimentally.  相似文献   

10.
现有的静态实视图选择算法存在搜索空间太大、时间复杂度高以及未考虑查询的概率和分布等诸多缺点,并且当源数据发生变化时,这种变化不能立刻反映到数据仓库,不适合在线运行。针对上述问题在候选视图生成算法和IGA算法的基础上,对算法进行了动态调整,从而得出了新型物化视图动态调整算法CNUMV。经实验证明该算法降低了视图的搜索空间和时间复杂度,更重要的是该算法考虑到了各视图之间相互依赖关系对视图收益的影响,从而使算法能够动态地在线调整,并且用实验证明了CNUMV算法的优越性,达到了预期的目的。  相似文献   

11.
异构数据源数据集成的研究   总被引:2,自引:0,他引:2  
对象代理模型可以作为数据集成的一种通用数据模型。通过建立代理对象和源对象,查询处理的对应关系也能够较好地实现,应用可以将不同的、对于代理对象的查询处理翻译成对于局部数据源源对象的查询处理,也可以把对局部数据源的查询结果以用户应用想要的方式返回。对象代理模型可在Smalltalk环境中实现。本文讨论如何在Smalltalk
k环境中实现基于对象代理模型的异构信息源的集成。  相似文献   

12.
数据仓库联机维护中一致性问题的研究   总被引:5,自引:0,他引:5  
数据仓库是存储供查询和决策分析用的集成化信息仓库,它的信息来源于不同地点的数据库或其他信息源.实体化视图是数据仓库中存储的主要信息实体,当原始数据发生变化时,数据仓库中的实体化视图也必须作相应的更新维护.在数据仓库实体化视图的联机维护过程中,由于联机分析处理(On-line Analytical Process,简称OLAP)查询的介入,会产生数据不一致的问题.文章提出了一种MVCA(multiversion compensating algorithm)算法来解决这一问题.MVCA采用版本控制方法,利用补偿思想和应答机制协调数据库和数据仓库之间的更新维护操作,达到保证数据一致的目的.最后,文章通过一个典型示例说明了该算法在实际中的具体应用.  相似文献   

13.
For speeding up query processing on Big Data, frequent sub-queries or views may be materialized such that the query processing cost is minimized with optimum cost of maintaining the materialized views and/or queries. Materializing frequent sub-queries and views means that resultant data set of the views reside in the memory of one or more nodes in the cluster, so that it reduces the MapReduce cost, submission and scheduling cost of Distributed File System jobs for query processing. We have defined materialized views as resultant data of frequent sub-queries and aggregation functions of a set of Big Data warehousing queries that are saved for enhancing query performance. The problem is defined as a multi-objective optimization problem for minimizing the total query processing MapReduce cost, MapReduce cost for maintaining the materialized views and the number of views selected for materializing with maximized total size of the views selected. We applied Differential Evolution algorithm and NSGA-II to study their performances for developing a recommendation system for selecting views for materializing in Big Data warehousing.  相似文献   

14.
《Information Systems》2001,26(5):363-381
A data warehouse (DW) can be abstractly seen as a set of materialized views defined over a set of remote data sources. A DW is intended to satisfy a set of queries. The views materialized in a DW relate to each other in a complex manner, through common subexpressions, in order to guarantee high query performance and low view maintenance cost. DWs are time varying. As time passes new materialized views are added in order to satisfy new queries, or for performance reasons, while old queries are dropped. The evolution of a DW can result in a redundant set of materialized views. In this paper, we address the problem of detecting redundant materialized views in a given DW view selection, that is, materialized views that can be removed from DW without negatively affecting the query evaluation or the view maintenance process. Using an AND/OR dag representation for multiple queries and views, we first formalize the process of propagating source relation changes to the materialized views by exploiting common subexpressions between views and by using other materialized views that are not affected by these changes. Then, we provide an algorithm for detecting materialized views that are not needed in the process of propagating source relation changes to the DW. We also show how trivially redundant views can be identified in this process. Finally, we use these results to provide a procedure for detecting materialized views that are redundant in a DW. Our approach considers a broad class of views that includes grouping/aggregation views and is not dependent on a specific cost model.  相似文献   

15.
Physical data layout is a crucial factor in the performance of queries and updates in large data warehouses. Data layout enhances and complements other performance features such as materialized views and dynamic caching of aggregated results. Prior work has identified that the multidimensional nature of large data warehouses imposes natural restrictions on the query workload. A method based on a “uniform” query class approach has been proposed for data clustering and shown to be optimal. However, we believe that realistic query workloads will exhibit data access skew. For instance, if time is a dimension in the data model, then more queries are likely to focus on the most recent time interval. The query class approach does not adequately model the possibility of multidimensional data access skew. We propose the affinity graph model for capturing workload characteristics in the presence of access skew and describe an efficient algorithm for physical data layout. Our proposed algorithm considers declustering and load balancing issues which are inherent to the multidisk data layout problem. We demonstrate the validity of this approach experimentally.  相似文献   

16.
Views over databases have regained attention in the context of data warehouses, which are seen as materialized views. In this setting, efficient view maintenance is an important issue, for which the notion of self-maintainability has been identified as desirable. In this paper, we extend the concept of self-maintainability to (query and update) independence within a formal framework, where independence with respect to arbitrary given sets of queries and updates over the sources can be guaranteed. To this end we establish an intuitively appealing connection between warehouse independence and view complements. Moreover, we study special kinds of complements, namely monotonic complements, and show how to compute minimal ones in the presence of keys and foreign keys in the underlying databases. Taking advantage of these complements, an algorithmic approach is proposed for the specification of independent warehouses with respect to given sets of queries and updates. Received: 21 November 2000 / Accepted: 1 May 2001 Published online: 6 September 2001  相似文献   

17.
《Information Systems》2001,26(7):477-506
The issue of discovering functional dependencies from populated databases has received a great deal of attention because it is a key concern in database analysis. Such a capability is strongly required in database administration and design while being of great interest in other application fields such as query folding. Investigated for long years, the issue has been recently addressed in a novel and more efficient way by applying principles of data mining algorithms. The two algorithms fitting in such a trend are TANE and Dep-Miner. They strongly improve previous proposals. In this paper, we propose a new approach adopting a data mining point of view. We define a novel characterization of minimal functional dependencies. This formal framework is sound and simpler than related work. We introduce the new concept of free set for capturing source of functional dependencies. By using the concepts of closure and quasi-closure of attribute sets, targets of such dependencies are characterized. Our approach is enforced through the algorithm FUN which is particularly efficient since it is comparable or improves the two best operational solutions (according to our knowledge): TANE and Dep-Miner. It makes use of various optimization techniques and it can work on very large databases. Applying on real life or synthetic data more or less correlated, comparative experiments are performed in order to assess performance of FUN against TANE and Dep-Miner. Moreover, our approach also exhibits (without significant additional execution time) embedded functional dependencies, i.e. dependencies captured in any subset of the attribute set originally considered. Embedded dependencies capture a knowledge specially relevant in all fields where materialized data sets are managed (e.g. materialized views widely used in data warehouses).  相似文献   

18.
数据仓库自维护实质上是通过维护实化视图实现,然而现有的实化视图自维护策略不能有效的减少数据仓库集成端和数据源监视端的多余数据,从而影响数据仓库环境的整体响应速度.一种基于数据仓库自维护方法的视图分解系统改进了现有的视图分解模式,将全局定义的实化视图分解成局部定义的单源视图集来减少存在数据仓库中不必要的数据,实现了现有实化视图自维护策略的分解和重写,提高数据仓库自维护效率.  相似文献   

19.
Web数据仓库的异步迭代查询处理方法   总被引:2,自引:0,他引:2  
何震瀛  李建中  高宏 《软件学报》2002,13(2):214-218
数据仓库信息量的飞速膨胀对数据仓库提出了巨大挑战.如何提高Web环境下数据仓库的查询效率成为数据仓库研究领域重要的研究问题.对Web数据仓库的体系结构和查询方法进行了研究和探讨.在分析几种Web数据仓库实现方法的基础上,提出了一种Web数据仓库的层次体系结构,并在此基础上提出了Web数据仓库的异步迭代查询方法.该方法充分利用了流水线并行技术,在Web数据仓库的查询处理过程中不同层次的结点以流水线方式运行,并行完成查询的处理,提高了查询效率.理论分析表明,该方法可以有效地提高Web数据仓库的查询效率.  相似文献   

20.
数据仓库中多数据源物化视图的一种有效更新算法   总被引:4,自引:0,他引:4  
数据仓库中存储着大量的汇总数据以支持查询和相关决策的制定,这些汇总数据常常是定义在若干数据源上的物化视图.当数据源发生变化时,物化视图也需要相应的更新,这必然给数据仓库带来庞大的开销,因而如何有效地对物化视图进行更新成为一个非常重要的问题.利用BinPartition算法可以使计算费用达到最低,随后证明了该算法的正确性并分析了其时间复杂性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号