首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Yao Liu  Hui Xiong 《Information Sciences》2006,176(9):1215-1240
A data warehouse stores current and historical records consolidated from multiple transactional systems. Securing data warehouses is of ever-increasing interest, especially considering areas where data are sold in pieces to third parties for data mining practices. In this case, existing data warehouse security techniques, such as data access control, may not be easy to enforce and can be ineffective. Instead, this paper proposes a data perturbation based approach, called the cubic-wise balance method, to provide privacy preserving range queries on data cubes in a data warehouse. This approach is motivated by the following observation: analysts are usually interested in summary data rather than individual data values. Indeed, our approach can provide a closely estimated summary data for range queries without providing access to actual individual data values. As demonstrated by our experimental results on APB benchmark data set from the OLAP council, the cubic-wise balance method can achieve both better privacy preservation and better range query accuracy than random data perturbation alternatives.  相似文献   

2.
在无线传感器网络中如何对传输的聚合数据同时进行数据隐私保护和完整性保护是当前物联网应用中的重要挑战。Ozdemir等人提出的PRDA(Polynomial Regression Based Secure Data Aggregation)协议基于分簇思想并利用多项式性质对聚合数据进行隐私保护,但无法验证数据的完整性。针对PRDA协议的聚合数据可能被篡改或伪造等问题,提出了一种可检测数据完整性的安全数据聚合协议iPRDA。该协议采用多项式函数和数据扰动技术对数据进行隐私保护,通过利用数据之间的关联特性在基站进行完整性检测。实验表明:该方案在不影响数据机密性的条件下,能有效地进行数据完整性检测。  相似文献   

3.
Physical data layout is a crucial factor in the performance of queries and updates in large data warehouses. Data layout enhances and complements other performance features such as materialized views and dynamic caching of aggregated results. Prior work has identified that the multidimensional nature of large data warehouses imposes natural restrictions on the query workload. A method based on a “uniform” query class approach has been proposed for data clustering and shown to be optimal. However, we believe that realistic query workloads will exhibit data access skew. For instance, if time is a dimension in the data model, then more queries are likely to focus on the most recent time interval. The query class approach does not adequately model the possibility of multidimensional data access skew. We propose the affinity graph model for capturing workload characteristics in the presence of access skew and describe an efficient algorithm for physical data layout. Our proposed algorithm considers declustering and load balancing issues which are inherent to the multidisk data layout problem. We demonstrate the validity of this approach experimentally.  相似文献   

4.
Multidimensional data modeling for location-based services   总被引:5,自引:0,他引:5  
With the recent and continuing advances in areas such as wireless communications and positioning technologies, mobile, location-based services are becoming possible.Such services deliver location-dependent content to their users. More specifically, these services may capture the movements and requests of their users in multidimensional databases, i.e., data warehouses, and content delivery may be based on the results of complex queries on these data warehouses. Such queries aggregate detailed data in order to find useful patterns, e.g., in the interaction of a particular user with the services.The application of multidimensional technology in this context poses a range of new challenges. The specific challenge addressed here concerns the provision of an appropriate multidimensional data model. In particular, the paper extends an existing multidimensional data model and algebraic query language to accommodate spatial values that exhibit partial containment relationships instead of the total containment relationships normally assumed in multidimensional data models. Partial containment introduces imprecision in aggregation paths. The paper proposes a method for evaluating the imprecision of such paths. The paper also offers transformations of dimension hierarchies with partial containment relationships to simple hierarchies, to which existing precomputation techniques are applicable.Received: 28 September 2002, Accepted: 5 April 2003, Published online: 12 August 2003Edited by: J. Veijalainen Correspondence to: I. Timko  相似文献   

5.
We consider how an untrusted data aggregator can be assessed over multiple data streams. The aggregator could be the sink node in a sensor network where all the sensory data are gathered, or a smart-meter responsible for computing power measurements of a group of households, or any other entity that is basically in charge of answering aggregation queries such as average or summation in a data streaming environment. In these applications, important decisions are made based on the aggregated results and therefore, it is vitally important to investigate the authenticity and integrity of aggregated values. One possible approach for solving this problem is marking the data before sending it out to the aggregators (i.e. marked at the point of origin) such that the existence of those marks could be verified subsequently after the aggregation process. Our goal is to produce hidden marks that remain detectable after the aggregation and thereby not only the trustworthiness of every individual data source, but also the trustworthiness of the aggregators could be verified. This problem is referred to secure data aggregation that has been investigated by means of digital watermarking and steganography techniques in recent years. Data synchronization is a serious problem which was not addressed in the current schemes, though. Therefore, in this paper, a new watermarking construction is proposed that provides ‘synchronization marks’ in the aggregated data stream and helps protect the data itself at the end-points. Our method works at the data layer so standard transport layer security methods can be used to protect the transport of data if it is required. Finally, a set of experiments are conducted using synthesized and real sensory data as a proof of concept.  相似文献   

6.
With the rapid development of applications for wireless sensor networks, efficient data aggregation methods are becoming increasingly emphasized. Many researchers have studied the problem of reporting data with minimum energy cost when data is allowed to be aggregated many times. However, some aggregation functions used to aggregate multiple data into one packet are unrepeatable; that is, every data is aggregated only at most once. This problem motivated us to study reporting data with minimum energy cost subject to that a fixed number of data are allowed to be aggregated into one packet and every data is aggregated at most once. In this paper, we propose novel data aggregation and routing structures for reporting generated data. With the structures, we study the problem of scheduling data to nodes in the networks for data aggregation such that the energy cost of reporting data is minimized, termed MINIMUM ENERGY-COST DATA-AGGREGATION SCHEDULING. In addition, we show that MINIMUM ENERGY-COST DATA-AGGREGATION SCHEDULING is NP-complete. Furthermore, a distributed data scheduling algorithm is proposed accordingly. Simulations show that the proposed algorithm provides a good solution for MINIMUM ENERGY-COST DATA-AGGREGATION SCHEDULING.  相似文献   

7.
The increasing diffusion of Automatic Meter Reading (AMR) and the possibility to open the system to third party services has raised many concerns about the protection of personal data related to energy, water or gas consumption, from which details about the habits of the users can be inferred.This paper proposes an infrastructure and a communication protocol for allowing utilities and third parties (data Consumers) to collect measurement data with different levels of spatial and temporal aggregation from smart meters without revealing the individual measurements to any single node of the architecture.The proposed infrastructure introduces a set of functional nodes in the smart grid, namely the Privacy Preserving Nodes (PPNs), which collect customer data encrypted by means of Shamir’s Secret Sharing Scheme, and are supposed to be controlled by independent parties. By exploiting the homomorphic properties of the sharing scheme, the measurements can be aggregated directly in the encrypted domain. Therefore, an honest-but-curious attacker can obtain neither disaggregated nor aggregated data. The PPNs perform different spatial and temporal aggregation for each Consumer according to its needs and access rights. The information Consumers recover the aggregated data by collecting multiple shares from the PPNs.The paper also discusses the problem of deploying the information flows from the customers to the PPNs and, then, to the information Consumers in a resource constrained environment. We prove that minimizing the number of PPNs is a NP-hard problem and propose a fast greedy algorithm. The scalability of the infrastructure is first analyzed under the assumption that the communication network is reliable and timely, then in presence of communication errors and node failures. The paper also evaluates the anonymity of external attackers.  相似文献   

8.
This article is a position paper on the current security issues in Vehicular Ad hoc Networks (VANETs). VANETs face many interesting research challenges in multiple areas, from privacy and anonymity to the detection and eviction of misbehaving nodes and many others in between. Multiple solutions have been proposed to address those issues. This paper surveys the most relevant while discussing its benefits and drawbacks. The paper explores the newest trends in privacy, anonymity, misbehaving nodes, the dissemination of false information and secure data aggregation, giving a perspective on how we foresee the future of this research area.First, the paper discusses the use of Public Key Infrastructure (PKI) (and certificates revocation), location privacy, anonymity and group signatures for VANETs. Then, it compares several proposals to identify and evict misbehaving and faulty nodes. Finally, the paper explores the differences between syntactic and semantic aggregation techniques, cluster and non-cluster based with fixed and dynamic based areas, while presenting secure as well as probabilistic aggregation schemes.  相似文献   

9.
Random-data perturbation techniques and privacy-preserving data mining   总被引:2,自引:4,他引:2  
Privacy is becoming an increasingly important issue in many data-mining applications. This has triggered the development of many privacy-preserving data-mining techniques. A large fraction of them use randomized data-distortion techniques to mask the data for preserving the privacy of sensitive data. This methodology attempts to hide the sensitive data by randomly modifying the data values often using additive noise. This paper questions the utility of the random-value distortion technique in privacy preservation. The paper first notes that random matrices have predictable structures in the spectral domain and then it develops a random matrix-based spectral-filtering technique to retrieve original data from the dataset distorted by adding random values. The proposed method works by comparing the spectrum generated from the observed data with that of random matrices. This paper presents the theoretical foundation and extensive experimental results to demonstrate that, in many cases, random-data distortion preserves very little data privacy. The analytical framework presented in this paper also points out several possible avenues for the development of new privacy-preserving data-mining techniques. Examples include algorithms that explicitly guard against privacy breaches through linear transformations, exploiting multiplicative and colored noise for preserving privacy in data mining applications.  相似文献   

10.
Converting XML DTDs to UML diagrams for conceptual data integration   总被引:2,自引:0,他引:2  
Extensible Markup Language (XML) is fast becoming the new standard for data representation and exchange on the World Wide Web, e.g., in B2B e-commerce. Modern enterprises need to combine data from many sources in order to answer important business questions, creating a need for integration of web-based XML data. Previous web-based data integration efforts have focused almost exclusively on the logical level of data models, creating a need for techniques that focus on the conceptual level in order to communicate the structure and properties of the available data to users at a higher level of abstraction. The most widely used conceptual model at the moment is the Unified Modeling Language (UML).

This paper presents algorithms for automatically constructing UML diagrams from XML DTDs, enabling fast and easy graphical browsing of XML data sources on the web. The algorithms capture important semantic properties of the XML data such as precise cardinalities and aggregation (containment) relationships between the data elements. As a motivating application, it is shown how the generated diagrams can be used for the conceptual design of data warehouses based on web data, and an integration architecture is presented. The choice of data warehouses and On-Line Analytical Processing as the motivating application is another distinguishing feature of the presented approach.  相似文献   


11.
Incremental maintenance of data warehouses has attracted a lot of research attention for the past few years. Nevertheless, most of the previous work is confined to the relational setting. Recently, object-oriented data warehouses have been regarded as a better means to integrate data from modern heterogeneous data sources. However, existing approaches to incremental maintenance of data warehouses do not directly apply to object-oriented data warehouses. In this paper, therefore, we propose an approach to incremental maintenance of object-oriented data warehouses. We focus on two primary issues specifically. First, we identify six categories of potential updates to an object-oriented view and propose an algorithm to find potential updates from the definition of the view. Second, we propose an incremental view maintenance algorithm for maintaining object-oriented data warehouses. We have implemented a prototype system for incremental maintenance of object-oriented data warehouses. Performance evaluation has been conducted, which indicates that our approach is correct and efficient.  相似文献   

12.
Multidimensional aggregation is a dominant operation on data warehouses for on-line analytical processing(OLAP).Many efficinet algorithms to compute multidimensional aggregation on relational database based data warehouses have been developed.However,to our knowledge,there is nothing to date in the literature about aggregation algorithms on multidimensional data warehouses that store datasets in mulitidimensional arrays rather than in tables.This paper presents a set of multidimensional aggregation algorithms on very large and compressed multidimensional data warehouses.These algorithms operate directly on compressed datasets in multidimensional data warehouses without the need to first decompress them.They are applicable to a variety of data compression methods.The algorithms have different performance behavior as a function of dataset parameters,sizes of out puts and ain memory availability.The algorithms are described and analyzed with respect to the I/O and CPU costs,A decision procedure to select the most efficient algorithm ,given an aggregation request,is also proposed.The analytical and experimental results show that the algorithms are more efficient than the traditional aggregation algorithms.  相似文献   

13.
Data preparation, whether for populating enterprise data warehouses or as a precursor to more exploratory analyses, is recognised as being laborious, and as a result is a barrier to cost-effective data analysis. Several steps that recur within data preparation pipelines are amenable to automation, but it seems important that automated decisions can be refined in the light of user feedback on data products. There has been significant work on how individual data preparation steps can be refined in the light of feedback. This paper goes further, by proposing an approach in which feedback on the correctness of values in a data product can be used to revise the results of diverse data preparation components. The approach uses statistical techniques, both in determining which actions should be applied to refine the data preparation process and to identify the values on which it would be most useful to obtain further feedback. The approach has been implemented to refine the results of matching, mapping and data repair components in the VADA data preparation system, and is evaluated using deep web and open government data sets from the real estate domain. The experiments have shown how the approach enables feedback to be assimilated effectively for use with individual data preparation components, and furthermore that synergies result from applying the feedback to several data preparation components.  相似文献   

14.
Data warehouses are based on multidimensional modeling. Using On-Line Analytical Processing (OLAP) tools, decision makers navigate through and analyze multidimensional data. Typically, users need to analyze data at different aggregation levels (using roll-up and drill-down functions). Therefore, aggregation knowledge should be adequately represented in conceptual multidimensional models, and mapped in subsequent logical and physical models. However, current conceptual multidimensional models poorly represent aggregation knowledge, which (1) has a complex structure and dynamics and (2) is highly contextual. In order to account for the characteristics of this knowledge, we propose to represent it with objects (UML class diagrams) and rules in the Production Rule Representation language (PRR). Static aggregation knowledge is represented in the class diagrams, while rules represent the dynamics (i.e. how aggregation may be performed depending on context). We present the class diagrams, and a typology and examples of associated rules. We argue that this representation of aggregation knowledge enables an early modeling of user requirements in a data warehouse project. A prototype has been developed based on the Java Expert System Shell (Jess).  相似文献   

15.
As the total amount of traffic data in networks has been growing at an alarming rate, there is currently a substantial body of research that attempts to mine traffic data with the purpose of obtaining useful information. For instance, there are some investigations into the detection of Internet worms and intrusions by discovering abnormal traffic patterns. However, since network traffic data contain information about the Internet usage patterns of users, network users’ privacy may be compromised during the mining process. In this paper, we propose an efficient and practical method that preserves privacy during sequential pattern mining on network traffic data. In order to discover frequent sequential patterns without violating privacy, our method uses the N-repository server model, which operates as a single mining server and the retention replacement technique, which changes the answer to a query probabilistically. In addition, our method accelerates the overall mining process by maintaining the meta tables in each site so as to determine quickly whether candidate patterns have ever occurred in the site or not. Extensive experiments with real-world network traffic data revealed the correctness and the efficiency of the proposed method.  相似文献   

16.
OLAP queries involve a lot of aggregations on a large amount of data in data warehouses. To process expensive OLAP queries efficiently, we propose a new method to rewrite a given OLAP query using various kinds of materialized views which already exist in data warehouses. We first define the normal forms of OLAP queries and materialized views based on the selection and aggregation granularities, which are derived from the lattice of dimension hierarchies. Conditions for usability of materialized views in rewriting a given query are specified by relationships between the components of their normal forms. We present a rewriting algorithm for OLAP queries that can effectively utilize materialized views having different selection granularities, selection regions, and aggregation granularities together. We also propose an algorithm to find a set of materialized views that results in a rewritten query which can be executed efficiently. We show the effectiveness and performance of the algorithm experimentally.  相似文献   

17.
In this paper, we review the role of information fusion in data privacy. To that end, we introduce data privacy, and describe how information and data fusion are used in some fields of data privacy. Our study is focused on the use of aggregation for privacy protections, and record linkage techniques.  相似文献   

18.
Efficient aggregation algorithms for compressed data warehouses   总被引:9,自引:0,他引:9  
Aggregation and cube are important operations for online analytical processing (OLAP). Many efficient algorithms to compute aggregation and cube for relational OLAP have been developed. Some work has been done on efficiently computing cube for multidimensional data warehouses that store data sets in multidimensional arrays rather than in tables. However, to our knowledge, there is nothing to date in the literature describing aggregation algorithms on compressed data warehouses for multidimensional OLAP. This paper presents a set of aggregation algorithms on compressed data warehouses for multidimensional OLAP. These algorithms operate directly on compressed data sets, which are compressed by the mapping-complete compression methods, without the need to first decompress them. The algorithms have different performance behaviors as a function of the data set parameters, sizes of outputs and main memory availability. The algorithms are described and the I/O and CPU cost functions are presented in this paper. A decision procedure to select the most efficient algorithm for a given aggregation request is also proposed. The analysis and experimental results show that the algorithms have better performance on sparse data than the previous aggregation algorithms  相似文献   

19.
Privacy preserving clustering on horizontally partitioned data   总被引:3,自引:0,他引:3  
Data mining has been a popular research area for more than a decade due to its vast spectrum of applications. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals. The aim of privacy preserving data mining researchers is to develop data mining techniques that could be applied on databases without violating the privacy of individuals. Privacy preserving techniques for various data mining models have been proposed, initially for classification on centralized data then for association rules in distributed environments. In this work, we propose methods for constructing the dissimilarity matrix of objects from different sites in a privacy preserving manner which can be used for privacy preserving clustering as well as database joins, record linkage and other operations that require pair-wise comparison of individual private data objects horizontally distributed to multiple sites. We show communication and computation complexity of our protocol by conducting experiments over synthetically generated and real datasets. Each experiment is also performed for a baseline protocol, which has no privacy concern to show that the overhead comes with security and privacy by comparing the baseline protocol and our protocol.  相似文献   

20.
数据仓库的质量管理问题和方法   总被引:1,自引:1,他引:1  
A data warehouse is often a large-scale information system for an enterprise,so its quality management isimportant and difficult. Recently, some researchers have studied the problems of quality management in data ware-houses from different views ,and achieved some good results. This paper will broadly introduce the concept ,methods and techniques in quality management in data warehouses ,and discuss the important quality factors in data warehous-es in detail.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号