首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Meshing Streaming Updates with Persistent Data in an Active Data Warehouse   总被引:1,自引:0,他引:1  
Active data warehousing has emerged as an alternative to conventional warehousing practices in order to meet the high demand of applications for up-to-date information. In a nutshell, an active warehouse is refreshed online and thus achieves a higher consistency between the stored information and the latest data updates. The need for online warehouse refreshment introduces several challenges in the implementation of data warehouse transformations, with respect to their execution time and their overhead to the warehouse processes. In this paper, we focus on a frequently encountered operation in this context, namely, the join of a fast stream 5" of source updates with a disk-based relation R, under the constraint of limited memory. This operation lies at the core of several common transformations such as surrogate key assignment, duplicate detection, or identification of newly inserted tuples. We propose a specialized join algorithm, termed mesh join (MESHJOIN), which compensates for the difference in the access cost of the two join inputs by 1) relying entirely on fast sequential scans of R and 2) sharing the I/O cost of accessing R across multiple tuples of 5". We detail the MESHJOIN algorithm and develop a systematic cost model that enables the tuning of MESHJOIN for two objectives: maximizing throughput under a specific memory budget or minimizing memory consumption for a specific throughput. We present an experimental study that validates the performance of MESHJOIN on synthetic and real-life data. Our results verify the scalability of MESHJOIN to fast streams and large relations and demonstrate its numerous advantages over existing join algorithms.  相似文献   

2.
为了加快对大量数据的查询处理速度,通常在数据仓库以实视图方式存储数据,当基础数据发生变化时,这些实视图也必须随着更新,因而视图自维护和一致性维护成为数据仓库的重要问题。本文提出利用视图计算的中间结果创建辅助视图,在数据仓库中进行实体化,采用有效的增量维护算法计算实视图的精确变化,实现数据仓库视图自维护。  相似文献   

3.
In a distributed environment, materialized views are used to integrate data from different information sources and then store them in some centralized location. In order to maintain such materialized views, maintenance queries need to be sent to information sources by the data warehouse management system. Due to the independence of the information sources and the data warehouse, concurrency issues are raised between the maintenance queries and the local update transactions at each information source. Recent solutions such as ECA and Strobe tackle such concurrent maintenance, however with the requirement of quiescence of the information sources. SWEEP and POSSE overcome this limitation by decomposing the global maintenance query into smaller subqueries to be sent to every information source and then performing conflict correction locally at the data warehouse. Note that all these previous approaches handle the data updates one at a time. Hence either some of the information sources or the data warehouse is likely to be idle during most of the maintenance process. In this paper, we propose that a set of updates should be maintained in parallel by several concurrent maintenance processes so that both the information sources as well as the warehouse would be utilized more fully throughout the maintenance process. This parallelism should then improve the overall maintenance performance. For this we have developed a parallel view maintenance algorithm, called PVM, that substantially improves upon the performance of previous maintenance approaches by handling a set of data updates at the same time. The parallel handling of a set of updates is orthogonal to the particular maintenance algorithm applied to the handling of each individual update. In order to perform parallel view maintenance, we have identified two critical issues that must be overcome: (1) detecting maintenance-concurrent data updates in a parallel mode and (2) correcting the problem that the data warehouse commit order may not correspond to the data warehouse update processing order due to parallel maintenance handling. In this work, we provide solutions to both issues. For the former, we insert a middle-layer timestamp assignment module for detecting maintenance-concurrent data updates without requiring any global clock synchronization. For the latter, we introduce the negative counter concept to solve the problem of variant orders of committing effects of data updates to the data warehouse. We provide a proof of the correctness of PVM that guarantees that our strategy indeed generates the correct final data warehouse state. We have implemented both SWEEP and PVM in our EVE data warehousing system. Our performance study demonstrates that a manyfold performance improvement is achieved by PVM over SWEEP.Received: 12 November 2001, Accepted: 18 December 2002, Published online: 31 July 2003This work was supported in part by the NSF NYI grant IIS-979624 and NSF CISE Instrumentation grant IRIS 97-29878 and NSF grant IIS-9988776.  相似文献   

4.
多数据源数据仓库的一致性维护算法——Strobe算法的改进   总被引:3,自引:0,他引:3  
数据仓库是一个集成了多个分布式、自治或异构数据源上的信息的数据储藏室,以支持用户的查询和分析。该文介绍了DM3数据仓库实现多数据源实化视图一致性维护的策略,分析了产生视图不一致性的原因和解决办法,以及改进后的一致性维护算法:Strobe算法和T-Strobe算法。  相似文献   

5.
分布式数据源的实视图维护算法研究   总被引:1,自引:0,他引:1  
数据仓库作为决策支持系统的集成化数据中心,其数据可以认为是定义在多个不同数据源的实视图集。近年来数据仓库中实视图维护算法的研究激起很多学者的重视。当多个独立的数据源出现并发更新时传统的实视图维护算法可能导致视图维护异常,本文提出了一个双向扫描并行处理实视图维护(BSP)算法,能确保实视图与数据源的完全一致性,并通过实验与其它类似的算法进行了比较,说明本算法具有较高的效率。  相似文献   

6.
Schema Evolution in Data Warehouses   总被引:2,自引:0,他引:2  
In this paper, we address the issues related to the evolution and maintenance of data warehousing systems, when underlying data sources change their schema capabilities. These changes can invalidate views at the data warehousing system. We present an approach for dynamically adapting views according to schema changes arising on source relations. This type of maintenance concerns both the schema and the data of the data warehouse. The main issue is to avoid the view recomputation from scratch especially when views are defined from multiple sources. The data of the data warehouse is used primarily in organizational decision-making and may be strategic. Therefore, the schema of the data warehouse can evolve for modeling new requirements resulting from analysis or data-mining processing. Our approach provides means to support schema evolution of the data warehouse independently of the data sources. Received 20 March 2000 / Revised 5 January 2001 / Accepted in revised form 20 April 2001  相似文献   

7.
Data warehouse systems typically designate downtime for view maintenance, ranging from tens of minutes to hours depending on the system size. We develop a multiagent system that achieves immediate incremental view maintenance (IIVM) for continuous updating of data warehouse views. We describe an IIVM system that processes updates as transactions are executed at the underlying data sources to eliminate view maintenance downtime for the data warehouse-a crucial requirement for internet applications. The use of a multiagent framework provides considerable process speed improvement when compared with other IIVM systems. Since agents are used to delegate the data sources and warehouse views, it is easy to reorganize the components of the system. Through the use of cooperative agents, the data consistency of IIVM can be easily maintained. The test results from this research show that the proposed system increases the availability of the data warehouse while preserving a stringent requirement on data consistency.  相似文献   

8.
数据仓库视图一致性维护与下查研究   总被引:4,自引:0,他引:4  
数据仓库是存储供查询和决策分析用的集成化信息仓库。实体化视图作为数据仓库中存储的主要信息实体,是由对上一级或外部数据源进行抽取、转化、传输和上载的数据构成的。当源数据发生变化时,如何进行数据仓库实体化视图的一致性维护以及OLAP查询,是一个有着实际意义的研究课题。论文提出的算法Glide采用版本控制、补偿思想和应答机制来协调源数据库与数据仓库间的数据更新,保证了数据仓库视图维护与下查的一致性,提高了算法的健壮程度和对源数据库端CPU的利用率,是以往同类算法的一个本质改进。论文指出算法Glide是完全一致的,并给出了严格的数学证明。文章还通过一个示例说明了该算法在实际中的具体运用。  相似文献   

9.
DM3多数据源数据仓库的一致性维护算法研究   总被引:2,自引:1,他引:1  
数据仓库是一个集成了多个分布式、自治或异构数据源上的信息的数据储藏室,以支持用户的查询和分析。本文介绍了DM3数据仓库实现我数据源实化视一致性维护的策略,分析了产生视图不一致性的原因和解决办法,以及改进后的一致性维护算法:Strobe*算法和T-Strobe*算法。  相似文献   

10.
在传统的实化视图维护时,数据源把增量数据以XML文档的方式报送给数据仓库,数据仓库从此文档中解析出数据,利用JDBC完成对实化视图的更新。文中提出在数据源把增量数据封装成序列化对象存储于文件中再报送给数据仓库,而数据仓库从文件中读出对象,利用Hibernate直接把对象更新到实化视图。通过两种方案性能的比较,表明后一种方案是可行并且更加高效的。  相似文献   

11.
多策略通用数据采掘工具MSMiner   总被引:6,自引:0,他引:6  
介绍了一种多策略通用数据采掘工具MSMiner的设计与实现。MSMiner建立在数据仓库之一,采用面向对象的方法描述关于数据源、采掘算法、采掘步骤和用户的元数据,该系统集成决策树、关联规则、传统统计分析、聚类分析、神经网络和可视化等多种数据采掘算法,以任务模型的形式生成和执行数据 采掘及决策支持任务。其特点是支持数据库、数据仓库、文本以及Web页面等形式数据源,可以动态地添加采掘算法,对数据和采掘策略的组织灵活有效,具有很好的可扩充性和通用性。  相似文献   

12.
I/O parallelism is considered to be a promising approach to achieving high performance in parallel data warehousing systems where huge amounts of data and complex analytical queries have to be processed. This paper proposes a parallel secondary data cube storage structure (PHC for short) to efficiently support the processing of range sum queries and dynamic updates on data cube using parallel computing systems. Based on PHC, two parallel algorithms for processing range sum queries and updates are proposed also. Both the algorithms have the same time complexity, O(logdn/P). The analytical and experimental results show that PHC and the parallel algorithms have high performance and achieve optimum speedup.  相似文献   

13.
数据仓库体系结构是数据仓库建设和维护的重要理论基石,传统的体系框架简单易行,但不够完善。斯坦福大学提出的WHIPS模型解决了信息源自动侦测更新的问题,但由于模型自身存在的瓶颈,使并行更新处理产生阻塞现象。为此,本文提出了一个改进方案,引入了时间戳单元,增加了其中两个重要模块的并行处理能力,并给出一个修正的数
据仓库系统结构。  相似文献   

14.
黄曼妮 《数字社区&智能家居》2009,5(6):4113-4114,4125
该文旨在通过对数据仓库技术的简要分析,引出数据仓库技术在航空业的应用。详细分析航空业对数据仓库的需求,设计出适合航空公司发展的数据仓库技术应用方案,探讨了数据仓库和数据挖掘技术在航空业的应用前景。  相似文献   

15.
该文旨在通过对数据仓库技术的简要分析,引出数据仓库技术在航空业的应用。详细分析航空业对数据仓库的需求,设计出适合航空公司发展的数据仓库技术应用方案,探讨了数据仓库和数据挖掘技术在航空业的应用前景。  相似文献   

16.
View materialization is an effective method to increase query efficiency in a data warehouse and improve OLAP query performance. However, one encounters the problem of space insufficiency if all possible views are materialized in advance. Reducing query time by means of selecting a proper set of materialized views with a lower cost is crucial for efficient data warehousing. In addition, the costs of data warehouse creation, query, and maintenance have to be taken into account while views are materialized. In this paper, we propose efficient algorithms to select a proper set of materialized views, constrained by storage and cost considerations, to help speed up the entire data warehousing process. We derive a cost model for data warehouse query and maintenance as well as efficient view selection algorithms that effectively exploit the gain and loss metrics. The main contribution of our paper is to speed up the selection process of materialized views. Concurrently, this will greatly reduce the overall cost of data warehouse query and maintenance.  相似文献   

17.
Providing integrated access to multiple, distributed, heterogeneous databases and other information sources has become one of the leading issues in database research and the industry. One of the most effective approaches is to extract and integrate information of interest from each source in advance and store them in a centralized repository (known as a data warehouse). When a query is posed, it is evaluated directly at the warehouse without accessing the original information sources. One of the techniques that this approach uses to improve the efficiency of query processing is materialized view(s). Essentially, materialized views are used for data warehouses, and various methods for relational databases have been developed. In this paper, we first discuss an object deputy approach to realize materialized object views for data warehouses which can also incorporate object-oriented databases. A framework has been developed using Smalltalk to prepare data for data warehousing, in which an object deputy model and database connecting tools have been implemented. The object deputy model can provide an easy-to-use way to resolve inconsistency and conflicts while preparing data for data warehousing, as evidenced by our empirical study.  相似文献   

18.
内联网数据仓库技术及应用   总被引:1,自引:0,他引:1       下载免费PDF全文
本文讨论了内联网数据仓库技术的需求和实现,提出了由操作数据层、数据提取层、数据仓库层、分析处理层和用户层构成的五层结构的企业内联网数据仓库系统,并详细探 讨了建立实用内联网数据仓库系统的关键技术  相似文献   

19.
数据仓库中多视图环境下的联机维护   总被引:3,自引:0,他引:3  
数据仓库的视图联机维护是指数数据仓库中的实体化视图实时地与信息源中的数据库仑保持一致,同时不影响前端用户对数据仓库的正常使用。为了解决多视图环境中视图联机维护与下钻查询的一致性问题,文中在数据仓库体系结构中引入了“基库”模型,并提出了相应的视图维护算法3VPA。  相似文献   

20.
在数据仓库的实化视图维护处理中,如何有效地处理并发更新是一个重要而又棘手的问题.文中阐述了P2P环境下模式与数据全面并发的典型情形,分析了因并发更新而导致视图维护异常的原因,针对这些不同的方面提出相应的纠正策略.给出了一种基于时态演算的并发更新侦测方法,以及混合更新下对关联更新进行检测的有效算法,最后提出了解决乱序提交问题的增强代理机制,确保了数据仓库与数据源的一致性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号