共查询到18条相似文献,搜索用时 62 毫秒
1.
MapReduce环境下的并行Dwarf立方构建 总被引:1,自引:0,他引:1
针对数据密集型应用,提出了一种基于MapReduce框架的并行Dwarf数据立方构建算法.算法将传统Dwarf立方等价分割为多个独立的子Dwarf立方,采用MapReduce架构,实现了Dwarf立方的并行构建、查询和更新.实验证明,并行Dwarf算法一方面结合了MapReduce框架的并行性和高可扩展性,另一方面结合... 相似文献
2.
一种快速生成最小浓缩数据立方的算法 总被引:2,自引:0,他引:2
语义OLAP技术是近来学者研究的热点之一,浓缩数据立方就是其中一种.本文设计了一个用于快速生成最小浓缩数据立方的算法SQCube.算法分两个阶段:首先利用BottomUpBST算法生成一个非最小的浓缩数据立方,然后对所得到的非最小浓缩数据立方进行后处理,把其中的所有纯BST和隐BST压缩为一条BST,从而生成一个最小浓缩数据立方.实验表明SQCube算法明显优于以往提出的同类算法MinCube. 相似文献
3.
4.
数据立方计算是代价非常大的操作,并且被广泛研究。受空问的限制,存储一个完全实例化的数据立方是不可行的。最近提出的一种语义压缩数据立方一Dwarf,通过消除前缀冗余和后缀冗余把一个完全实例化的数据立方压缩存储到一个很小的空问。然而,当数据源发生变化时,它的更新过程是很复杂的。本文通过研究Dwarf在更新过程中汇总结点的变化特性,提出了一种基于Dwarf的新的增量更新算法,既能完全实例化数据立方又不需要重新计算,大大提高了数据立方的更新效率。实验进一步证明了该算法的效率和有效性,尤其适合数据仓库中的高维数据集。 相似文献
5.
6.
7.
8.
封闭数据立方是一种有效的无损压缩技术,它去掉了数据立方中的冗余信息,从而有效降低了数据立方的存储空间、加快了计算速度,而且几乎不影响查询性能.Hadoop的MapReduce并行计算模型为数据立方的计算提供了技术支持,Hadoop的分布式文件系统HDFS为数据立方的存储提供了保障.为了节省存储空间、加快查询速度,在传统数据立方的基础上提出封闭直方图立方,它在封闭数据立方的基础上通过编码技术进一步节省了存储空间,通过建立索引加快了查询速度.Hadoop并行计算平台不论从扩展性还是均衡性都为封闭直方图立方提供了保证.实验证明:封闭直方图立方对数据立方进行了有效压缩,具有较高的查询性能,根据Hadoop的特点通过增加节点个数明显加快了计算速度. 相似文献
9.
王国庆 《计算技术与自动化》2008,27(2):134-137
在数据挖掘之前对待挖掘数据进行一些选择与准备的预处理工作,可以对数据挖掘的过程和结果产生直接影响。其中数据缩减技术最大限度地精简数据量,提高数据挖掘的执行速度与效率。论述一些典型的数据缩减方法,说明典型方法的应用情况,分析典型方法的使用特点,通过对训练集数据的应用对数据缩减方法进行研究。 相似文献
10.
为提高数据采掘的效率,通常需要在提供同等分析结果的情况下对原数据集进行简化。文章提出了一种有效的数据缩减算法Sodra,以无监督与有监督相结合的学习方式生成适于分类的缩减数据集。对实际数据集和人工数据集的分类实验表明,所提出的算法既能大大降低空间需求,又不损害分类性能。同时,利用缩减集上的特征分析算法Relif-P可进一步提高算法对无关特征的适应能力。 相似文献
11.
提出了一种基于Dwarf结构的能快速、精确地计算在线OLAP查询的多维数据流立方体框架StreamDwarf,并给出相应的计算算法.基于Dwarf子树粒度的裁剪策略显著地减小了Dwarf结构的体积,降低了维护计算成本,使得StreamDwarf具有很好的适应性.实验表明,在稀疏数据集上当数据偏斜(Zipf值)较大时StreamDwarf具有很好的性能,而大部分真实应用中的数据恰恰具有该特性. 相似文献
12.
数据立方梯度挖掘的研究 总被引:2,自引:0,他引:2
1 前言随着人们生成、收集和存储数字化数据能力的极大提高,当今世界面临着各种原始数据的爆炸性增长。数据库技术的巨大进步创建了对大量数据的有效存储,成千上万的大型数据库被广泛地应用在商业、政府和科研等等部门。大量数据资源的积累为人们从历史数据中发现有用信息提供了基础,人们期望数据库能够提供智能化或者至少是半自动化的数据分析处理的能力。于是,数据仓库技术(Data Warehouse)、联机分析处理技术(On Line Analysis Processing)以及数据挖掘技术(Data Mining)应运而生。 相似文献
13.
Katerina Doka Dimitrios Tsoumakos Nectarios KozirisAuthor vitae 《Journal of Parallel and Distributed Computing》2011,71(11):1434-1446
In this paper we present the Brown Dwarf, a distributed data analytics system designed to efficiently store, query and update multidimensional data over commodity network nodes, without the use of any proprietary tool. Brown Dwarf distributes a centralized indexing structure among peers on-the-fly, reducing cube creation and querying times by enforcing parallelization. Analytical queries are naturally performed on-line through cooperating nodes that form an unstructured Peer-to-Peer overlay. Updates are also performed on-line, eliminating the usually costly over-night process. Moreover, the system employs an adaptive replication scheme that adjusts to the workload skew as well as the network churn by expanding or shrinking the units of the distributed data structure. Our system has been thoroughly evaluated on an actual testbed: it manages to accelerate cube creation up and querying up to several tens of times compared to the centralized solution by exploiting the capabilities of the available network nodes working in parallel. It also manages to quickly adapt even after sudden bursts in load and remains unaffected with a considerable fraction of frequent node failures. These advantages are even more apparent for dense and skewed data cubes and workloads. 相似文献
14.
15.
Ying Chen Frank Dehne Todd Eavis Andrew Rau-Chaplin 《Distributed and Parallel Databases》2008,23(2):99-126
We present “Pipe ’n Prune” (PnP), a new hybrid method for iceberg-cube query computation. The novelty of our method is that
it achieves a tight integration of top-down piping for data aggregation with bottom-up a priori data pruning. A particular
strength of PnP is that it is efficient for all of the following scenarios: (1) Sequential iceberg-cube queries, (2) External memory iceberg-cube queries, and (3) Parallel
iceberg-cube queries on shared-nothing PC clusters with multiple disks.
We performed an extensive performance analysis of PnP for the above scenarios with the following main results: In the first
scenario PnP performs very well for both dense and sparse data sets, providing an interesting alternative to BUC and Star-Cubing. In the second scenario PnP shows a surprisingly
efficient handling of disk I/O, with an external memory running time that is less than twice the running time for full in-memory
computation of the same iceberg-cube query. In the third scenario PnP scales very well, providing near linear speedup for
a larger number of processors and thereby solving the scalability problem observed for the parallel iceberg-cubes proposed
by Ng et al.
Research partially supported by the Natural Sciences and Engineering Research Council of Canada. A preliminary version of
this work appeared in the International Conference on Data Engineering (ICDE’05). 相似文献
16.
17.
现有数据立方梯度查询语言CubegradeQL主要是针对非实例化数据立方的,实际上,为了提高OLAP查询效率,数据仓库中往往保存了大量实例化的数据立方。本文我们改进了CubegradeQL语言,给出了一个新的查询语言dmGQL,dmGQL能够支持实例化/非实例化数据立方中的梯度查询,最后,我们讨论了dmGQL的查询处理。 相似文献
18.
Non-derivable itemset mining 总被引:3,自引:2,他引:3
All frequent itemset mining algorithms rely heavily on the monotonicity principle for pruning. This principle allows for excluding
candidate itemsets from the expensive counting phase. In this paper, we present sound and complete deduction rules to derive
bounds on the support of an itemset. Based on these deduction rules, we construct a condensed representation of all frequent
itemsets, by removing those itemsets for which the support can be derived, resulting in the so called Non-Derivable Itemsets (NDI) representation. We also present connections between our proposal and recent other proposals for condensed representations
of frequent itemsets. Experiments on real-life datasets show the effectiveness of the NDI representation, making the search
for frequent non-derivable itemsets a useful and tractable alternative to mining all frequent itemsets. 相似文献