首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 11 毫秒
1.
Multidimensional aggregation is a dominant operation on data warehouses for on-line analytical processing(OLAP).Many efficinet algorithms to compute multidimensional aggregation on relational database based data warehouses have been developed.However,to our knowledge,there is nothing to date in the literature about aggregation algorithms on multidimensional data warehouses that store datasets in mulitidimensional arrays rather than in tables.This paper presents a set of multidimensional aggregation algorithms on very large and compressed multidimensional data warehouses.These algorithms operate directly on compressed datasets in multidimensional data warehouses without the need to first decompress them.They are applicable to a variety of data compression methods.The algorithms have different performance behavior as a function of dataset parameters,sizes of out puts and ain memory availability.The algorithms are described and analyzed with respect to the I/O and CPU costs,A decision procedure to select the most efficient algorithm ,given an aggregation request,is also proposed.The analytical and experimental results show that the algorithms are more efficient than the traditional aggregation algorithms.  相似文献   

2.
Efficient algorithms for large-scale temporal aggregation   总被引:2,自引:0,他引:2  
The ability to model time-varying natures is essential to many database applications such as data warehousing and mining. However, the temporal aspects provide many unique characteristics and challenges for query processing and optimization. Among the challenges is computing temporal aggregates, which is complicated by having to compute temporal grouping. We introduce a variety of temporal aggregation algorithms that overcome major drawbacks of previous work. First, for small-scale aggregations, both the worst-case and average-case processing time have been improved significantly. Second, for large-scale aggregations, the proposed algorithms can deal with a database that is substantially larger than the size of available memory. Third, the parallel algorithm designed on a shared-nothing architecture achieves scalable performance by delivering nearly linear scale-up and speed-up, even at the presence of data skew. The contributions made in this paper are particularly important because the rate of increase in database size and response time requirements has out-paced advancements in processor and mass storage technology.  相似文献   

3.
多维数据实视图选择问题是一个NP完全问题。提出一种基于约束的多目标优化遗传算法,将查询代价和维护代价分开考虑,更有效地解决复杂的实视图选择问题。实验结果表明,该算法具有更好的性能,特别是在获得的Pareto前沿的分布性上。  相似文献   

4.
5.
刘金岭 《计算机应用》2008,28(7):1689-1691
对空间多维数据的复杂查询是多维数据研究的重点和难点,目前研究的结论相对较少。在传统算法的基础上,进行了几个方面的改进:按分组属性值进行数据分块;对分组数据进行有效的排序;在聚集函数的应用上进行优化。模拟数据的试验表明:改进算法较大地提高了查询效率。  相似文献   

6.
7.
Designing data warehouses   总被引:9,自引:0,他引:9  
A Data Warehouse (DW) is a database that collects and stores data from multiple remote and heterogeneous information sources. When a query is posed, it is evaluated locally, without accessing the original information sources. In this paper we deal with the issue of designing a DW, in the context of the relational model, by selecting a set of views to materialize in the DW. First, we briefly present a theoretical framework for the DW design problem, which concerns the selection of a set of views that (a) fit in the space allocated to the DW, (b) answer all the queries of interest, and (c) minimize the total query evaluation and view maintenance cost. We then formalize the DW design problem as a state space search problem by taking into account multiquery optimization over the maintenance queries (i.e., queries that compute changes to the materialized views) and the use of auxiliary views for reducing the view maintenance cost. Finally, incremental algorithms and heuristics for pruning the search space are presented.  相似文献   

8.
无线传感器网络中的分布式数据处理处理时,会出现数据压缩率低和数据相关性差的特点,在数据采集和处理中采用分布式压缩感知算法.对基础矩阵和观测矩阵的合理建立,在满足无线传感器网络数据传输的可靠性同时,充分考虑数据之间的相关性,大大降低数据的冗余度,减少了数据量的传输.实验结果表明,在满足信号数据可靠性的同时,有效的增加了信号的压缩比,减少了节点数据采集量.  相似文献   

9.
Energy is a scarce resource in Wireless Sensor Networks (WSN). Some studies show that more than 70% of energy is consumed in data transmission in WSN. Since most of the time, the sensed information is redundant due to geographically collocated sensors, most of this energy can be saved through data aggregation. Furthermore, data aggregation improves bandwidth usage and reduces collisions due to interference. Unfortunately, while aggregation eliminates redundancy, it makes data integrity verification more complicated since the received data is unique.  相似文献   

10.
Many data sharing applications require that publishing data should protect sensitive information pertaining to individuals, such as diseases of patients, the credit rating of a customer, and the salary of an employee. Meanwhile, certain information is required to be published. In this paper, we consider data-publishing applications where the publisher specifies both sensitive information and shared information. An adversary can infer the real value of a sensitive entry with a high confidence by using publishing data. The goal is to protect sensitive information in the presence of data inference using derived association rules on publishing data. We formulate the inference attack framework, and develop complexity results. We show that computing a safe partial table is an NP-hard problem. We classify the general problem into subcases based on the requirements of publishing information, and propose algorithms for finding a safe partial table to publish. We have conducted an empirical study to evaluate these algorithms on real data. The test results show that the proposed algorithms can produce approximate maximal published data and improve the performance of existing algorithms. Supported by the Program for New Century Excellent Talents in Universities (Grant No. NCET-06-0290), the National Natural Science Foundation of China (Grant Nos. 60828004, 60503036), and the Fok Ying Tong Education Foundation Award (Grant No. 104027)  相似文献   

11.
Data distribution has been one of the most important research topics in parallelizing compilers for distributed memory parallel computers. Good data distribution schema should consider both the computation load balance and the communication overhead. In this paper, we show that data redistribution is necessary for executing a sequence of Do-loops if the communication cost due to performing this sequence of Do-loops is larger than a threshold value. Based on this observation, we can prune the searching space and derive efficient dynamic programming algorithms for determining effective data distribution schema to execute a sequence of Do-loops with a general structure. Experimental studies on a 32-node nCUBE-2 computer are also presented  相似文献   

12.
Views over databases have regained attention in the context of data warehouses, which are seen as materialized views. In this setting, efficient view maintenance is an important issue, for which the notion of self-maintainability has been identified as desirable. In this paper, we extend the concept of self-maintainability to (query and update) independence within a formal framework, where independence with respect to arbitrary given sets of queries and updates over the sources can be guaranteed. To this end we establish an intuitively appealing connection between warehouse independence and view complements. Moreover, we study special kinds of complements, namely monotonic complements, and show how to compute minimal ones in the presence of keys and foreign keys in the underlying databases. Taking advantage of these complements, an algorithmic approach is proposed for the specification of independent warehouses with respect to given sets of queries and updates. Received: 21 November 2000 / Accepted: 1 May 2001 Published online: 6 September 2001  相似文献   

13.
With the rapid development of applications for wireless sensor networks, efficient data aggregation methods are becoming increasingly emphasized. Many researchers have studied the problem of reporting data with minimum energy cost when data is allowed to be aggregated many times. However, some aggregation functions used to aggregate multiple data into one packet are unrepeatable; that is, every data is aggregated only at most once. This problem motivated us to study reporting data with minimum energy cost subject to that a fixed number of data are allowed to be aggregated into one packet and every data is aggregated at most once. In this paper, we propose novel data aggregation and routing structures for reporting generated data. With the structures, we study the problem of scheduling data to nodes in the networks for data aggregation such that the energy cost of reporting data is minimized, termed MINIMUM ENERGY-COST DATA-AGGREGATION SCHEDULING. In addition, we show that MINIMUM ENERGY-COST DATA-AGGREGATION SCHEDULING is NP-complete. Furthermore, a distributed data scheduling algorithm is proposed accordingly. Simulations show that the proposed algorithm provides a good solution for MINIMUM ENERGY-COST DATA-AGGREGATION SCHEDULING.  相似文献   

14.
Clustering aggregation, known as clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a single better clustering. Existing clustering aggregation algorithms are applied directly to data points, in what is referred to as the point-based approach. The algorithms are inefficient if the number of data points is large. We define an efficient approach for clustering aggregation based on data fragments. In this fragment-based approach, a data fragment is any subset of the data that is not split by any of the clustering results. To establish the theoretical bases of the proposed approach, we prove that clustering aggregation can be performed directly on data fragments under two widely used goodness measures for clustering aggregation taken from the literature. Three new clustering aggregation algorithms are described. The experimental results obtained using several public data sets show that the new algorithms have lower computational complexity than three well-known existing point-based clustering aggregation algorithms (Agglomerative, Furthest, and LocalSearch); nevertheless, the new algorithms do not sacrifice the accuracy.  相似文献   

15.
16.
Due to an explosive increase of XML documents, it is imperative to manage XML data in an XML data warehouse. XML warehousing imposes challenges, which are not found in the relational data warehouses. In this paper, we firstly present a framework to build an XML data warehouse schema. For the purpose of scalability due to the increase of data volume, we propose a number of partitioning techniques for multi-version XML data warehouses, including document based partitioning, schema based partitioning, and cascaded (mixed) partitioning model. Finally, we formulate cost models to evaluate various types of queries for an XML data warehouse.  相似文献   

17.
This paper investigates the design of fault-tolerant TDMA-based data aggregation scheduling (DAS) protocols for wireless sensor networks (WSNs). DAS is a fundamental pattern of communication in wireless sensor networks where sensor nodes aggregate and relay data to a sink node. However, any such DAS protocol needs to be cognisant of the fact that crash failures can occur. We make the following contributions: (i) we identify a necessary condition to solve the DAS problem, (ii) we introduce a strong and weak version of the DAS problem, (iii) we show several impossibility results due to the crash failures, (iv) we develop a modular local algorithm that solves stabilising weak DAS and (v) we show, through simulations and an actual deployment on a small testbed, how specific instantiations of parameters can lead to the algorithm achieving very efficient stabilisation.  相似文献   

18.
In domains like bioinformatics, information retrieval and social network analysis, one can find learning tasks where the goal consists of inferring a ranking of objects, conditioned on a particular target object. We present a general kernel framework for learning conditional rankings from various types of relational data, where rankings can be conditioned on unseen data objects. We propose efficient algorithms for conditional ranking by optimizing squared regression and ranking loss functions. We show theoretically, that learning with the ranking loss is likely to generalize better than with the regression loss. Further, we prove that symmetry or reciprocity properties of relations can be efficiently enforced in the learned models. Experiments on synthetic and real-world data illustrate that the proposed methods deliver state-of-the-art performance in terms of predictive power and computational efficiency. Moreover, we also show empirically that incorporating symmetry or reciprocity properties can improve the generalization performance.  相似文献   

19.
Many data warehouses contain massive amounts of data, accumulated over long periods of time. In some cases, it is necessary or desirable to either delete “old” data or to maintain the data at an aggregate level. This may be due to privacy concerns, in which case the data are aggregated to levels that ensure anonymity. Another reason is the desire to maintain a balance between the uses of data that change as the data age and the size of the data, thus avoiding overly large data warehouses. This paper presents effective techniques for data reduction that enable the gradual aggregation of detailed data as the data ages. With these techniques, data may be aggregated to higher levels as they age, enabling the maintenance of more compact, consolidated data and the compliance with privacy requirements. Special care is taken to avoid semantic problems in the aggregation process. The paper also describes the querying of the resulting data warehouses and an implementation strategy based on current database technology.  相似文献   

20.
In energy-constrained wireless sensor networks, energy efficiency is critical for prolonging the network lifetime. A family of ant colony algorithms called DAACA for data aggregation are proposed in this paper. DAACA consists of three phases: initialization, packets transmissions and operations on pheromones. In the transmission phase, each node estimates the remaining energy and the amount of pheromones of neighbor nodes to compute the probabilities for dynamically selecting the next hop. After certain rounds of transmissions, the pheromones adjustments are performed, which take the advantages of both global and local merits for evaporating or depositing pheromones. Four different pheromones adjustment strategies which constitute DAACA family are designed to prolong the network lifetime. Experimental results indicate that, compared with other data aggregation algorithms, DAACA shows higher superiority on average degree of nodes, energy efficiency, prolonging the network lifetime, computation complexity and success ratio of one hop transmission. At last, the features of DAACA are analyzed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号