首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 760 毫秒
1.
MapReduce环境下的并行Dwarf立方构建   总被引:1,自引:0,他引:1  
针对数据密集型应用,提出了一种基于MapReduce框架的并行Dwarf数据立方构建算法.算法将传统Dwarf立方等价分割为多个独立的子Dwarf立方,采用MapReduce架构,实现了Dwarf立方的并行构建、查询和更新.实验证明,并行Dwarf算法一方面结合了MapReduce框架的并行性和高可扩展性,另一方面结合...  相似文献   

2.
Dwarf尺寸的进一步缩减   总被引:1,自引:0,他引:1  
Dwarf不仅降低了数据立方的存储开销,而且具有结构简单、易于实现、查询和维护等优点,是一种比较理想的数据立方组织方法。为了进一步缩减Dwarf的存储尺寸,本文通过研究Dwarf结构,分别提出了浓缩Dwarf和冰山Dwarf:前者从Dwarf结构中删除了对于查询来说冗余的内容,而后者从Dwarf结构中去掉了对于用户来说琐碎的内容。实验和分析表明,浓缩Dwarf有效地减小了Dwarf的存储尺寸,而冰山Dwarf适合于忽略细节的应用场合,极大地降低了Dwarf的存储开销。  相似文献   

3.
纠删编码是一种通过产生数据冗余来提高P2P存储系统容错性和可用性的有效方法。对基于纠删编码的非结构化P2P存储系统来说,使用洪泛算法进行数据维护会在网络中产生大量冗余消息,系统效率低。本文提出一种使用二叉树来保存文件块的信息的算法。二叉树结构被建立以后,更新消息在二叉树中的节点间传播,不需要在网络中洪泛。分析表明,相较于洪泛算法,本算法有效减少了冗余的消息数量,提高了数据维护的效率,而付出的是极小的存储代价。  相似文献   

4.
多传感器信号数据采集实时压缩算法   总被引:1,自引:0,他引:1  
针对多传感器测试系统采样数据信息量大、不便于存储和实时传输的问题,提出了一种基于改进的二叉树算法对原始采样数据进行实时压缩.根据多传感器系统信息冗余量大的特点,充分利用多传感器信号之间相关性和采样点之间相关性,对采样数据在二维方向上做去冗余处理,从而达到节省数据存储空间、提高数据传输效率的目的.该算法编程简单、计算量较小、易于硬件实现,并在动态称重压力传感器信号的压缩实验中取得了较好的压缩效果.  相似文献   

5.
在现实应用中许多数据往往是动态变化的,静态的属性约简算法处理此类数据需消耗大量的计算时间和存储空间。针对集值决策信息系统中数据的动态变化情况,通过引入条件信息量和属性重要性概念,提出了一种启发式的动态属性约简算法,当新的属性集增加到决策信息系统时,算法能够利用原系统的属性约简结果,快速更新属性集增加后的属性约简,并对更新后的属性约简中可能存在的冗余属性进行反向剔除,保持了知识获取的简洁,提高了算法的计算效率。最后,通过实例验证进一步分析了算法的有效性和可行性。  相似文献   

6.
封闭数据立方是一种有效的无损压缩技术,它去掉了数据立方中的冗余信息,从而有效降低了数据立方的存储空间、加快了计算速度,而且几乎不影响查询性能.Hadoop的MapReduce并行计算模型为数据立方的计算提供了技术支持,Hadoop的分布式文件系统HDFS为数据立方的存储提供了保障.为了节省存储空间、加快查询速度,在传统数据立方的基础上提出封闭直方图立方,它在封闭数据立方的基础上通过编码技术进一步节省了存储空间,通过建立索引加快了查询速度.Hadoop并行计算平台不论从扩展性还是均衡性都为封闭直方图立方提供了保证.实验证明:封闭直方图立方对数据立方进行了有效压缩,具有较高的查询性能,根据Hadoop的特点通过增加节点个数明显加快了计算速度.  相似文献   

7.
实例选择能有效移除数据中的噪声和冗余数据,但现有方法难以在提高泛化能力的同时实现约简。针对该问题,提出一种冗余实例对消除算法用于实例选择。给出最近同类实例对的概念,计算数据集中存在的最近同类实例对,并移除满足条件的实例,在11个不同数据集上进行的仿真实验结果表明,经过该算法处理后的数据集在分类准确率和存储压缩率上较原始样本集有明显提升。对比剪辑最近邻规则算法,该算法能够在保持分类准确率的同时提高平均存储压缩率35%以上,并完整保留原始样本集的数据分布特征,在分类准确率和存储压缩率上取得折中。  相似文献   

8.
卓越 《计算机应用研究》2021,38(5):1463-1467
如何在计算能力和存储能力有限的移动或嵌入式设备中部署神经网络是神经网络发展过程中必须面对的一个问题。为了压缩模型大小和减轻计算压力,提出了一种基于信息瓶颈理论的神经网络混合压缩方案。以信息瓶颈理论为基础,找到相邻神经网络层之间冗余信息,并以此为基础修剪冗余的神经元,然后对剩余的神经元进行三值量化,从而进一步减少模型存储所需内存。实验结果表明,在MNIST和CIFAR-10数据集上与同类算法对比,所提方法具有更高的压缩率和更低的计算量。  相似文献   

9.
在现有的时态数据模型基础上,通过对时态数据的冗余存储与查询效率进行折衷处理,提出了一种改进的时态数据模型,并进一步为该模型的增删改等操作设计了相应的更新算法。实际应用表明,改进模型在增加了约16%的数据冗余存储后其时态查询性能获得了近58%的提升。  相似文献   

10.
提出了一种自适应提升小波变换算法,该算法是根据待分析信号的局部特点,自适应的选择更新器,自适应更新器的选择,主要是保持图像边缘不被平滑,防止出现大的小波系数。为确保系统的稳定性,又提出了采用先更新后预测的提升结构。在此结构中,预测过程不影响更新过程,这也为自适应算法的重构提供了有利条件,由于稳定性高、高频系数小以及能够完全重构,使得此方法有利于压缩性能的提高。通过实验证明了该算法的有效性。  相似文献   

11.
We present a new full cube computation technique and a cube storage representation approach, called the multidimensional cyclic graph (MCG) approach. The data cube relational operator has exponential complexity and therefore its materialization involves both a huge amount of memory and a substantial amount of time. Reducing the size of data cubes, without a loss of generality, thus becomes a fundamental problem. Previous approaches, such as Dwarf, Star and MDAG, have substantially reduced the cube size using graph representations. In general, they eliminate prefix redundancy and some suffix redundancy from a data cube. The MCG differs significantly from previous approaches as it completely eliminates prefix and suffix redundancies from a data cube. A data cube can be viewed as a set of sub-graphs. In general, redundant sub-graphs are quite common in a data cube, but eliminating them is a hard problem. Dwarf, Star and MDAG approaches only eliminate some specific common sub-graphs. The MCG approach efficiently eliminates all common sub-graphs from the entire cube, based on an exact sub-graph matching solution. We propose a matching function to guarantee one-to-one mapping between sub-graphs. The function is computed incrementally, in a top-down fashion, and its computation uses a minimal amount of information to generate unique results. In addition, it is computed for any measurement type: distributive, algebraic or holistic. MCG performance analysis demonstrates that MCG is 20-40% faster than Dwarf, Star and MDAG approaches when computing sparse data cubes. Dense data cubes have a small number of aggregations, so there is not enough room for runtime and memory consumption optimization, therefore the MCG approach is not useful in computing such dense cubes. The compact representation of sparse data cubes enables the MCG approach to reduce memory consumption by 70-90% when compared to the original Star approach, proposed in [33]. In the same scenarios, the improved Star approach, proposed in [34], reduces memory consumption by only 10-30%, Dwarf by 30-50% and MDAG by 40-60%, when compared to the original Star approach. The MCG is the first approach that uses an exact sub-graph matching function to reduce cube size, avoiding unnecessary aggregation, i.e. improving cube computation runtime.  相似文献   

12.
New Algorithm for Computing Cube on Very Large Compressed Data Sets   总被引:2,自引:0,他引:2  
Data compression is an effective technique to improve the performance of data warehouses. Since cube operation represents the core of online analytical processing in data warehouses, it is a major challenge to develop efficient algorithms for computing cube on compressed data warehouses. To our knowledge, very few cube computation techniques have been proposed for compressed data warehouses to date in the literature. This paper presents a novel algorithm to compute cubes on compressed data warehouses. The algorithm operates directly on compressed data sets without the need of first decompressing them. The algorithm is applicable to a large class of mapping complete data compression methods. The complexity of the algorithm is analyzed in detail. The analytical and experimental results show that the algorithm is more efficient than all other existing cube algorithms. In addition, a heuristic algorithm to generate an optimal plan for computing cube is also proposed  相似文献   

13.
Data cube pre-computation is an important concept for supporting OLAP (Online Analytical Processing) and has been studied extensively. It is often not feasible to compute a complete data cube due to the huge storage requirement. Recently proposed quotient cube addressed this issue through a partitioning method that groups cube cells into equivalence partitions. Such an approach not only is useful for distributive aggregate functions such as SUM but also can be applied to the maintenance of holistic aggregate functions like MEDIAN which will require the storage of a set of tuples for each equivalence class. Unfortunately, as changes are made to the data sources, maintaining the quotient cube is non-trivial since the partitioning of the cube cells must also be updated. In this paper, the authors design incremental algorithms to update a quotient cube efficiently for both SUM and MEDIAN aggregate functions. For the aggregate function SUM, concepts are borrowed from the principle of Galois Lattice to develop CPU-efficient algorithms to update a quotient cube. For the aggregate function MEDIAN, the concept of a pseudo class is introduced to further reduce the size of the quotient cube, Coupled with a novel sliding window technique, an efficient algorithm is developed for maintaining a MEDIAN quotient cube that takes up reasonably small storage space. Performance study shows that the proposed algorithms are efficient and scalable over large databases.  相似文献   

14.
Data cube computation is a well-known expensive operation and has been studied extensively. It is often not feasible to compute a complete data cube due to the huge storage requirement. Recently proposed quotient cube addressed this fundamental issue through a partitioning method that groups cube cells into equivalent partitions. The effectiveness and efficiency of the quotient cube for cube compression and computation have been proved. However, as changes are made to the data sources, to maintain such a quotient cube is non-trivial since the equivalent classes in it must be split or merged. In this paper, incremental algorithms are designed to update existing quotient cube efficiently based on Galois lattice. Performance study shows that these algorithms are efficient and scalable for large databases.  相似文献   

15.
提出了一种基于Dwarf结构的能快速、精确地计算在线OLAP查询的多维数据流立方体框架StreamDwarf,并给出相应的计算算法.基于Dwarf子树粒度的裁剪策略显著地减小了Dwarf结构的体积,降低了维护计算成本,使得StreamDwarf具有很好的适应性.实验表明,在稀疏数据集上当数据偏斜(Zipf值)较大时StreamDwarf具有很好的性能,而大部分真实应用中的数据恰恰具有该特性.  相似文献   

16.
The cube adaptive-hash join algorithm, which combines the merits of nested-loop and hybrid-hash, is presented. The performance of these algorithms is compared through analytical cost modeling. The nonuniform data value distribution of the inner relation is shown to have a greater impact than that of the outer relation. The cube adaptive-hash algorithm outperforms the cube hybrid-hash algorithm when bucket overflow occurs. In the worst case, this algorithm converges to the cube nested-loop-hash algorithm. When there is no hash table overflow, the cube adaptive-hash algorithm converges to the cube hybrid-hash algorithm. Since the cube adaptive-hash algorithm adapts itself depending on the characteristics of the relations, it is relatively immune to the data distribution  相似文献   

17.
冯玉才  刘玉葆  冯剑琳 《软件学报》2003,14(10):1706-1716
约束立方梯度挖掘是一项重要的挖掘任务,其主要目的是从数据立方中挖掘出满足梯度约束的梯度-探测元组对.然而,现有的研究都是基于一般数据立方的.研究了浓缩数据立方中约束数据立方梯度的挖掘问题.通过扩展LiveSet驱动算法,提出了一个eLiveSet算法.测试表明,该算法在立方梯度挖掘效率上比现有算法要高.  相似文献   

18.
On-line analytical processing (OLAP) typically involves complex aggregate queries over large datasets. The data cube has been proposed as a structure that materializes the results of such queries in order to accelerate OLAP. A significant fraction of the related work has been on Relational-OLAP (ROLAP) techniques, which are based on relational technology. Existing ROLAP cubing solutions mainly focus on “flat” datasets, which do not include hierarchies in their dimensions. Nevertheless, as shown in this paper, the nature of hierarchies introduces several complications into the entire lifecycle of a data cube including the operations of construction, storage, indexing, query processing, and incremental maintenance. This fact renders existing techniques essentially inapplicable in a significant number of real-world applications and mandates revisiting the entire cube lifecycle under the new perspective. In order to overcome this problem, the CURE algorithm has been recently proposed as an efficient mechanism to construct complete cubes over large datasets with arbitrary hierarchies and store them in a highly compressed format, compatible with the relational model. In this paper, we study the remaining phases in the cube lifecycle and introduce query-processing and incremental-maintenance algorithms for CURE cubes. These are significantly different from earlier approaches, which have been proposed for flat cubes constructed by other techniques and are inadequate for CURE due to its high compression rate and the presence of hierarchies. Our methods address issues such as cube indexing, query optimization, and lazy update policies. Especially regarding updates, such lazy approaches are applied for the first time on cubes. We demonstrate the effectiveness of CURE in all phases of the cube lifecycle through experiments on both real-world and synthetic datasets. Among the experimental results, we distinguish those that have made CURE the first ROLAP technique to complete the construction and usage of the cube of the highest-density dataset in the APB-1 benchmark (12 GB). CURE was in fact quite efficient on this, showing great promise with respect to the potential of the technique overall.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号