共查询到20条相似文献,搜索用时 531 毫秒
1.
Hillol Kargupta Souptik Datta Qi Wang Krishnamoorthy Sivakumar 《Knowledge and Information Systems》2005,7(4):387-414
Privacy is becoming an increasingly important issue in many data-mining applications. This has triggered the development of many privacy-preserving data-mining techniques. A large fraction of them use randomized data-distortion techniques to mask the data for preserving the privacy of sensitive data. This methodology attempts to hide the sensitive data by randomly modifying the data values often using additive noise. This paper questions the utility of the random-value distortion technique in privacy preservation. The paper first notes that random matrices have predictable structures in the spectral domain and then it develops a random matrix-based spectral-filtering technique to retrieve original data from the dataset distorted by adding random values. The proposed method works by comparing the spectrum generated from the observed data with that of random matrices. This paper presents the theoretical foundation and extensive experimental results to demonstrate that, in many cases, random-data distortion preserves very little data privacy. The analytical framework presented in this paper also points out several possible avenues for the development of new privacy-preserving data-mining techniques. Examples include algorithms that explicitly guard against privacy breaches through linear transformations, exploiting multiplicative and colored noise for preserving privacy in data mining applications. 相似文献
2.
Kamalika Das Kanishka Bhaduri Hillol Kargupta 《Peer-to-Peer Networking and Applications》2011,4(2):192-209
This paper proposes a scalable, local privacy-preserving algorithm for distributed Peer-to-Peer (P2P) data aggregation useful
for many advanced data mining/analysis tasks such as average/sum computation, decision tree induction, feature selection,
and more. Unlike most multi-party privacy-preserving data mining algorithms, this approach works in an asynchronous manner
through local interactions and it is highly scalable. It particularly deals with the distributed computation of the sum of
a set of numbers stored at different peers in a P2P network in the context of a P2P web mining application. The proposed optimization-based
privacy-preserving technique for computing the sum allows different peers to specify different privacy requirements without
having to adhere to a global set of parameters for the chosen privacy model. Since distributed sum computation is a frequently
used primitive, the proposed approach is likely to have significant impact on many data mining tasks such as multi-party privacy-preserving
clustering, frequent itemset mining, and statistical aggregate computation. 相似文献
3.
Privacy preserving clustering on horizontally partitioned data 总被引:3,自引:0,他引:3
Data mining has been a popular research area for more than a decade due to its vast spectrum of applications. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals. The aim of privacy preserving data mining researchers is to develop data mining techniques that could be applied on databases without violating the privacy of individuals. Privacy preserving techniques for various data mining models have been proposed, initially for classification on centralized data then for association rules in distributed environments. In this work, we propose methods for constructing the dissimilarity matrix of objects from different sites in a privacy preserving manner which can be used for privacy preserving clustering as well as database joins, record linkage and other operations that require pair-wise comparison of individual private data objects horizontally distributed to multiple sites. We show communication and computation complexity of our protocol by conducting experiments over synthetically generated and real datasets. Each experiment is also performed for a baseline protocol, which has no privacy concern to show that the overhead comes with security and privacy by comparing the baseline protocol and our protocol. 相似文献
4.
5.
介绍企业信用评估和当前隐私保护数据挖掘技术的最新进展,利用适用于企业信用评估的大规模分布式隐私保护数据挖掘架构,讨论了基于该架构的面向企业信用评估的分布式隐私保护数据挖掘。该研究不仅将有助于大规模分布式环境下的隐私保护数据挖掘系统的研发,而且能够有力推动“信用中国”的建设步伐,以达到更好地服务经济的目的。 相似文献
6.
随着经济的快速发展,当前很多企业构成了产业链,通过对其进行分布式的商务智能分析,能够获取很多有价值的信.研究了适用于产业链型数据的大规模分布式隐私保护数据挖掘架构,重点研究基于安全多方计算技术的分布式隐私保护数据挖掘通用算法组件,特别是研究面向产业链型数据的分布式隐私保护数据挖掘算法.该研究不仅将有助于大规模分布式环境下的隐私保护数据挖掘系统的研发,而且能够达到更好地服务经济的目的. 相似文献
7.
A framework for condensation-based anonymization of string data 总被引:1,自引:0,他引:1
In recent years, privacy preserving data mining has become an important problem because of the large amount of personal data
which is tracked by many business applications. An important method for privacy preserving data mining is the method of condensation.
This method is often used in the case of multi-dimensional data in which pseudo-data is generated to mask the true values
of the records. However, these methods are not easily applicable to the case of string data, since they require the use of
multi-dimensional statistics in order to generate the pseudo-data. String data are especially important in the privacy preserving
data-mining domain because most DNA and biological data are coded as strings. In this article, we will discuss a new method
for privacy preserving mining of string data with the use of simple template-based models. The template-based model turns
out to be effective in practice, and preserves important statistical characteristics of the strings such as intra-record distances.
We will explore the behavior in the context of a classification application, and show that the accuracy of the application
is not affected significantly by the anonymization process.
This article is an extended version of the conference version of the paper presented at the SIAM Conference on Data Mining,
2007 Aggarwal and Yu (2007). Available at . 相似文献
8.
张国荣 《数字社区&智能家居》2006,(3):30-30,212
隐私保护是数据挖掘中一个重要的研究方向,如何在不违反隐私规定的情况下,利用数据挖掘工具发现有意义的知识是一个热点问题。本文介绍了分布式数据挖掘中隐私保护的现状,着重介绍分布式数据挖掘中隐私保护问题和技术。 相似文献
9.
10.
Data perturbation is a popular technique in privacy-preserving data mining. A major challenge in data perturbation is to balance
privacy protection and data utility, which are normally considered as a pair of conflicting factors. We argue that selectively
preserving the task/model specific information in perturbation will help achieve better privacy guarantee and better data
utility. One type of such information is the multidimensional geometric information, which is implicitly utilized by many
data-mining models. To preserve this information in data perturbation, we propose the Geometric Data Perturbation (GDP) method.
In this paper, we describe several aspects of the GDP method. First, we show that several types of well-known data-mining
models will deliver a comparable level of model quality over the geometrically perturbed data set as over the original data
set. Second, we discuss the intuition behind the GDP method and compare it with other multidimensional perturbation methods
such as random projection perturbation. Third, we propose a multi-column privacy evaluation framework for evaluating the effectiveness
of geometric data perturbation with respect to different level of attacks. Finally, we use this evaluation framework to study
a few attacks to geometrically perturbed data sets. Our experimental study also shows that geometric data perturbation can
not only provide satisfactory privacy guarantee but also preserve modeling accuracy well. 相似文献
11.
张国荣 《数字社区&智能家居》2006,(8)
隐私保护是数据挖掘中一个重要的研究方向,如何在不违反隐私规定的情况下,利用数据挖掘工具发现有意义的知识是一个热点问题。本文介绍了分布式数据挖掘中隐私保护的现状,着重介绍分布式数据挖掘中隐私保护问题和技术。 相似文献
12.
针对垂直分布下的隐私保护关联规则挖掘算法安全性不高和挖掘效率较低的问题,提出了一种隐私保护关联规则挖掘算法.算法采用一种新的点积协议,通过引入逆矩阵和随机数隐藏原始输入信息,具有较好的安全性;利用挖掘最大频繁项集来代替挖掘所有频繁项集,采用深度优先遍历策略,结合各种剪枝策略,明显加快了频繁项集的生成速度,大大减少计算代价.实验结果表明,挖掘效率得到了很大提高. 相似文献
13.
14.
在数据挖掘的应用中,隐私保护非常重要。在数据中加上噪声可以在一定程度上保护用户的隐私,但会降低数据的准确性,进而影响数据挖掘结果的有效性。提出一种高效的基于理性密码学的分布式隐私保护数据挖掘框架,在此框架中每个参与方都被认为是理性的,而不像在经典密码学中简单地把每个参与方认为是恶意的或诚实的。基于此种假设和一个半可信的第三方,许多数据挖掘函数,如求和、求平均值、求积、比较、和求频繁项等,都可以在本框架下高效地实现。 相似文献
15.
Data collection is a necessary step in data mining process. Due to privacy reasons, collecting data from different parties becomes difficult. Privacy concerns may prevent the parties from directly sharing the data and some types of information about the data. How multiple parties collaboratively conduct data mining without breaching data privacy presents a challenge. The objective of this paper is to provide solutions for privacy-preserving collaborative data mining problems. In particular, we illustrate how to conduct privacy-preserving naive Bayesian classification which is one of the data mining tasks. To measure the privacy level for privacy- preserving schemes, we propose a definition of privacy and show that our solutions preserve data privacy. 相似文献
16.
17.
基于博弈论的隐私保护分布式数据挖掘 总被引:1,自引:1,他引:0
隐私保护的分布式数据挖掘问题是数据挖掘领域的一个研究热点,而基于经济视角,利用博弈论的方法对隐私保护分布式数据挖掘进行研究只是处于初始阶段。基于收益最大化,研究了完全信息静态博弈下分布式数据挖掘中参与者(两方或多方)的策略决策问题,得出了如下结论:数据挖掘在满足一定的条件下,参与者(两方或多方)的准诚信攻击策略是一个帕累托最优的纳什均衡策略;在准诚信攻击的假设下,参与者(多方)的非共谋策略并不是一个纳什均衡策略。同时给出了该博弈的混合战略纳什均衡,它对隐私保护分布式数据挖掘中参与者的决策具有一定的理论和指导意义。 相似文献
18.
19.
Privacy preserving association rule mining has been an active research area since recently. To this problem, there have been
two different approaches—perturbation based and secure multiparty computation based. One drawback of the perturbation based
approach is that it cannot always fully preserve individual’s privacy while achieving precision of mining results. The secure
multiparty computation based approach works only for distributed environment and needs sophisticated protocols, which constrains
its practical usage. In this paper, we propose a new approach for preserving privacy in association rule mining. The main
idea is to use keyed Bloom filters to represent transactions as well as data items. The proposed approach can fully preserve
privacy while maintaining the precision of mining results. The tradeoff between mining precision and storage requirement is
investigated. We also propose δ-folding technique to further reduce the storage requirement without sacrificing mining precision and running time. 相似文献
20.
通过数据概化,在多维属性的属性值概念分层上构造少量的具有抽象语义的元组来替换大量具有详细语义的原始元组,从而汇总数据表,这称作表语义汇总。给定原始数据表及其多维属性的属性值的概念分层,表语义汇总的目标是产生规定压缩率且保留尽可能多的语义信息的汇总表。现有算法采用在概化元组集合中寻找最佳概化元组组合的策略将其转换成Set-Covering问题来解决,尽管采取了多种优化策略(如预处理、分级处理)来提高效率,但仍存在转换开销大、算法框架复杂且不易扩展到高维属性等缺点。通过定义多维属性层次结构的度量空间将该问题转换为多维层次空间聚类问题并引入dewey编码来提高转换效率,提出了基于快速收敛的层次凝聚和基于层次空间分辨率调整的两种聚类算法来高效地建立语义汇总表。经真实数据集上的实验表明,新算法在执行效率和汇总质量上都优于现有方法。 相似文献