首页 | 本学科首页   官方微博 | 高级检索  
 共查询到20条相似文献,搜索用时 15 毫秒
The cgmCUBE project: Optimizing parallel data cube generation for ROLAP   总被引:5,自引:0,他引:5  
On-line Analytical Processing (OLAP) has become one of the most powerful and prominent technologies for knowledge discovery in VLDB (Very Large Database) environments. Central to the OLAP paradigm is the data cube, a multi-dimensional hierarchy of aggregate values that provides a rich analytical model for decision support. Various sequential algorithms for the efficient generation of the data cube have appeared in the literature. However, given the size of contemporary data warehousing repositories, multi-processor solutions are crucial for the massive computational demands of current and future OLAP systems. In this paper we discuss the cgmCUBE Project, a multi-year effort to design and implement a multi-processor platform for data cube generation that targets the relational database model (ROLAP). More specifically, we discuss new algorithmic and system optimizations relating to (1) a thorough optimization of the underlying sequential cube construction method and (2) a detailed and carefully engineered cost model for improved parallel load balancing and faster sequential cube construction. These optimizations were key in allowing us to build a prototype that is able to produce data cube output at a rate of over one TeraByte per hour. Research supported by the Natural Sciences and Engineering Research Council of Canada (NSERC).  相似文献   

Motivated by the globalization trend and Internet speed competition, enterprise nowadays often divides into many departments or organizations or even merges with other companies that located in different regions to bring up the competency and reaction ability. As a result, there are a number of data warehouse systems in a geographically-distributed enterprise. To meet the distributed decision-making requirements, the data in different data warehouses is addressed to enable data exchange and integration. Therefore, an open, vendor-independent, and efficient data exchange standard to transfer data between data warehouses over the Internet is an important issue. However, current solutions for cross-warehouse data exchange employ only approaches either based on records or transferring plain-text files, which are neither adequate nor efficient. In this research, issues on multidimensional data exchange are studied and an Intelligent XML-based multidimensional data exchange model is developed. In addition, a generic-construct-based approach is proposed to enable many-to-many systematic mapping between distributed data warehouses, introducing a consistent and unique standard exchange format. Based on the transformation model we develop between multidimensional data model and XML data model, and enhanced by the multidimensional metadata management mechanism proposed in this research, a general-purpose intelligent XML-based multidimensional data exchange process over web is facilitated efficiently and improved in quality. Moreover, we develop an intelligent XML-based prototype system to exchange multidimensional data, which shows that the proposed multidimensional data exchange model is feasible, and the multidimensional data exchange process is more systematic and efficient using metadata.  相似文献   

提出一种新的数据立方体结构,通过索引和集合的交并运算来获得查询结果,特别是在进行区域查询时,避免了将区域分解为点后再依次进行点查询的方式,从而在保持较少的磁盘空间和较好的点查询响应速度的情况下,改善区域查询的性能;同时给出其生成和查询算法,并使用合成数据和实际数据进行了实验验证.  相似文献   

黄斐 《计算机科学》2006,33(12):200-202
本文介绍了商务网站数据挖掘基本原理和方法,结合数据仓库研究,论述了多维数据模型,关联规则挖掘算法,商务网站数据挖掘的基本过程,以及联机分析处理。  相似文献   

The lifecycle of a data cube involves efficient construction and storage, fast query answering, and incremental updating. Existing ROLAP methods that implement data cubes are weak with respect to one or more of the above, focusing mainly on construction and storage. In this paper, we present a comprehensive ROLAP solution that addresses efficiently all functionality in the lifecycle of a cube and can be implemented easily over existing relational servers. It is a family of algorithms centered around a purely ROLAP construction method that provides fast computation of a fully materialized cube in compressed form, is incrementally updateable, and exhibits quick query response times that can be improved by low-cost indexing and caching. This is demonstrated through comprehensive experiments on both synthetic and real-world datasets, whose results have shown great promise for the performance and scalability potential of the proposed techniques, with respect to both the size and dimensionality of the fact table. The project is co-financed within Op. Education by the ESF (European Social Fund) and National Resources.  相似文献   

数据仓库在证券交易中的研究与应用   总被引:12,自引:0,他引:12  
着重讨论了数据仓库技术在证券交易中的应用,详细介绍了主送交易系统中数据仓库的设计和建立,并且进一步讨论了如何建立基于仓库的证券交易决策支持模型。  相似文献   

财政数据仓库的设计开发   总被引:6,自引:0,他引:6  
文章在介绍数据仓库的基本概念、特点和体系结构的基础上,针对上海浦东财政业务发展的需要,提出了一个新的数据仓库解决方案,并就维数据库设计、联机分析处理、Discoverer数据分析等方面进行了分析和探讨。  相似文献   

Compressed Data Cube for Approximate OLAP Query Processing   总被引:4,自引:0,他引:4       下载免费PDF全文
Approximate query processing has emerged as an approach to dealing with the huge data volume and complex queries in the environment of data warehouse.In this paper,we present a novel method that provides approximate answers to OLAP queries.Our method is based on building a compressed (approximate) data cube by a clustering technique and using this compressed data cube to provide answers to queries directly,so it improves the performance of the queries.We also provide the algorithm of the OLAP queries and the confidence intervals of query results.An extensive experimental study with the OLAP council benchmark shows the effectiveness and scalability of our cluster-based approach compared to sampling.  相似文献   

The data cube operator computes group-bys for all possible combinations of a set of dimension attributes. Since computing a data cube typically incurs a considerable cost, the data cube is often precomputed and stored as materialized views in data warehouses. A materialized data cube needs to be updated when the source relations are changed. The incremental maintenance of a data cube is to compute and propagate only its changes, rather than recompute the entire data cube from scratch. For n dimension attributes, the data cube consists of 2n group-bys, each of which is called a cuboid. To incrementally maintain a data cube with 2n cuboids, the conventional methods compute 2ndelta cuboids, each of which represents the change of a cuboid. In this paper, we propose an efficient incremental maintenance method that can maintain a data cube using only a subset of 2n delta cuboids. We formulate an optimization problem to find the optimal subset of 2n delta cuboids that minimizes the total maintenance cost, and propose a heuristic solution that allows us to maintain a data cube using only delta cuboids. As a result, the cost of maintaining a data cube is substantially reduced. Through various experiments, we show the performance advantages of the proposed method over the conventional methods. We also extend the proposed method to handle partially materialized cubes and dimension hierarchies.  相似文献   

In some business applications such as trading management in financial institutions, it is required to accurately answer ad hoc aggregate queries over data streams. Materializing and incrementally maintaining a full data cube or even its compression or approximation over a data stream is often computationally prohibitive. On the other hand, although previous studies proposed approximate methods for continuous aggregate queries, they cannot provide accurate answers. In this paper, we develop a novel prefix aggregate tree (PAT) structure for online warehousing data streams and answering ad hoc aggregate queries. Often, a data stream can be partitioned into the historical segment, which is stored in a traditional data warehouse, and the transient segment, which can be stored in a PAT to answer ad hoc aggregate queries. The size of a PAT is linear in the size of the transient segment, and only one scan of the data stream is needed to create and incrementally maintain a PAT. Although the query answering using PAT costs more than the case of a fully materialized data cube, the query answering time is still kept linear in the size of the transient segment. Our extensive experimental results on both synthetic and real data sets illustrate the efficiency and the scalability of our design. Moonjung Cho is a Ph.D. candidate in the Department of Computer Science and Engineering at State University of New York at Buffalo. She obtained her Master from same university in 2003. She has industry experiences as associate researcher for 4 years. Her research interests are in the area of data mining, data warehousing and data cubing. She has received a full scholarship from Institute of Information Technology Assessment in Korea. Jian Pei received the Ph.D. degree in Computing Science from Simon Fraser University, Canada, in 2002. He is currently an Assistant Professor of Computing Science at Simon Fraser University, Canada. In 2002–2004, he was an Assistant Professor of Computer Science and Engineering at the State University of New York at Buffalo, USA. His research interests can be summarized as developing advanced data analysis techniques for emerging applications. Particularly, he is currently interested in various techniques of data mining, data warehousing, online analytical processing, and database systems, as well as their applications in bioinformatics. His current research is supported in part by Natural Sciences and Engineering Research Council of Canada (NSERC) and National Science Foundation (NSF). He has published over 70 papers in refereed journals, conferences, and workshops, has served in the program committees of over 60 international conferences and workshops, and has been a reviewer for some leading academic journals. He is a member of the ACM, the ACM SIGMOD, and the ACM SIGKDD. Ke Wang received Ph.D from Georgia Institute of Technology. He is currently a professor at School of Computing Science, Simon Fraser University. Before joining Simon Fraser, he was an associate professor at National University of Singapore. He has taught in the areas of database and data mining. Ke Wang's research interests include database technology, data mining and knowledge discovery, machine learning, and emerging applications, with recent interests focusing on the end use of data mining. This includes explicitly modeling the business goal (such as profit mining, bio-mining and web mining) and exploiting user prior knowledge (such as extracting unexpected patterns and actionable knowledge). He is interested in combining the strengths of various fields such as database, statistics, machine learning and optimization to provide actionable solutions to real life problems. Ke Wang has published in database, information retrieval, and data mining conferences, including SIGMOD, SIGIR, PODS, VLDB, ICDE, EDBT, SIGKDD, SDM and ICDM. He is an associate editor of the IEEE TKDE journal and has served program committees for international conferences including DASFAA, ICDE, ICDM, PAKDD, PKDD, SIGKDD and VLDB.  相似文献   

数据仓库中的多维数据模型及其对象关系的实现   总被引:1,自引:0,他引:1  
数据仓库和联机分析处理(OLAP)是当今商业数据处理领域的研究重点。传统的关系数据库技术已经很难满足联机分析处理对数据仓库的大量数据进行分析查询的要求。多维数据模型已成为数据仓库和联机分析处理的核心技术。该文设计了一种直观、有效的多维数据模型并给出了基于对象关系的实现。  相似文献   

随着内存容量的飞速扩大,出现了一些配备以GB计的内存的工作站。但现行的OLAP系统都没有充分利用大容量RAM,鉴于此,文章提出一种基于内存的数据立方查询处理系统。该系统采用一种二级索引内存数据结构,充分利用有限的内存空间,有效组织各数据小方的元组,实现了高效数据立方查询。  相似文献   

OLAP数据仓库在电网调度决策中的研究与应用   总被引:6,自引:1,他引:6  
以某电力系统为研究背景,在对原有的数据源进行分析和重新组织的基础上,构建电网调度数据仓库,并建立多维雪花模式的数据立方体。运用OLAP和数据挖掘技术,从多角度、多层次快速地分析和查询数据仓库的数据,实现负荷预估和调度的科学化,并说明OLAP数据仓库能够为电网调度管理人员提供有效的决策信息。  相似文献   

The Model‐Driven Architecture (MDA) is an approach that aligns modeling and automation for software development. By applying such an approach to data warehouse (DW) projects, we can minimize a great deal of time and cost. Furthermore, most of OnLine Analytical Processing (OLAP) platforms seem to be like black boxes that provide wizards only to business intelligence developers to create and manipulate OLAP objects without allowing their sustainability and migration from a platform to another. That is why many works in the literature have proposed using the MDA approach in DW projects. However, most of them have mainly focused on the generation of the DW relational model from the conceptual one, and they overlooked the OLAP model and the cube implementation. To deal with this problem, we propose in this paper an MDA solution to automate the process of getting OLAP cube and its implementation through a set of metamodels and automatic transformations among them. In fact, the proposal generates the OLAP and DW relational models (PSMs) from the conceptual one, using also a PDM model that describes the target business intelligence platform. After that, the source code to create the cube is got from both PSM models. For this aim, we define a set of transformation rules implemented using the Atlas transformation language. Finally, a case study will be provided to validate our approach.  相似文献   

大型数据仓库实现技术的研究   总被引:2,自引:0,他引:2  
大型数据仓库是实现海量数据存储的有效途径,但在大型数据仓库的实现中存在很多问题。在分析问题的基础上,对大型数据仓库的实现问题提出了一定的解决策略,对其中的几个关键技术即数据立方体的有效计算、增量式更新维护、索引优化、故障恢复、模式设计和查询优化的代价模型及元数据的定义和管理等作了研究。  相似文献   

联机分析查询处理是一种涉及大量数据的即席复杂查洵,它通常都包含分组聚集运算。分析了关系数据仓库星型模式存储结构和数据更新的特点,把实体关系看成分布式数据库中以内存排序缓冲区人小为分段条件的全局关系,对分组操作进行分布式聚集运算,给出了一种改进的MuSA算法,有效地提高了算法性能。  相似文献   

数据仓库多维分析模型的设计   总被引:13,自引:0,他引:13  
该文介绍了建立数据仓库多维分析模型的理论基础,提出了一种气象数据仓库多维分析模型的设计方法,并展现了具体的实例,为数据仓库的分析型环境准备了必要的、切实可行的实践基础。  相似文献   

目前的CRM体系结构把数据仓库作为整个体系的核心,但在实际应用中CRM体系中的不同部分需要不同的响应速度,要求短期的信息更新与长期的交易历史数据相结合,因此如何将数据仓库与其它模块有机的联系起来成了一个急需解决的问题.针对这种情况提出一种基于数据仓库的CRM体系结构,以提高整个CRM系统的效率为目标,对于基于数据仓库的CRM体系结构中不同的用户而言,都可以用统一的观点来进行CRM上的分析操作,同时也就可以更好的支持CRM安全性管理.  相似文献   

数据仓库系统中层次式Cube存储结构   总被引:11,自引:0,他引:11       下载免费PDF全文
高宏  李建中  李金宝 《软件学报》2003,14(7):1258-1266
区域查询是数据仓库上支持联机分析处理(on-line analytical processing,简称OLAP)的重要操作.近几年,人们提出了一些支持区域查询和数据更新的Cube存储结构.然而这些存储结构的空间复杂性和时间复杂性都很高,难以在实际中使用.为此,提出了一种层次式Cube存储结构HDC(hierarchical data cube)及其上的相关算法.HDC上区域查询的代价和数据更新代价均为O(logdn),综合性能为O((logn)2d)(使用CqCu模型)或O(K(logn)d)(使用Cqnq+Cunu模型).理论分析与实验表明,HDC的区域查询代价、数据更新代价、空间代价以及综合性能都优于目前所有的Cube存储结构.  相似文献   

介绍了一个简易在线分析处理系统的基本原理和设计框架,并提供了关键数据结构和算法描述.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号