首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 531 毫秒
1.
Random-data perturbation techniques and privacy-preserving data mining   总被引:2,自引:4,他引:2  
Privacy is becoming an increasingly important issue in many data-mining applications. This has triggered the development of many privacy-preserving data-mining techniques. A large fraction of them use randomized data-distortion techniques to mask the data for preserving the privacy of sensitive data. This methodology attempts to hide the sensitive data by randomly modifying the data values often using additive noise. This paper questions the utility of the random-value distortion technique in privacy preservation. The paper first notes that random matrices have predictable structures in the spectral domain and then it develops a random matrix-based spectral-filtering technique to retrieve original data from the dataset distorted by adding random values. The proposed method works by comparing the spectrum generated from the observed data with that of random matrices. This paper presents the theoretical foundation and extensive experimental results to demonstrate that, in many cases, random-data distortion preserves very little data privacy. The analytical framework presented in this paper also points out several possible avenues for the development of new privacy-preserving data-mining techniques. Examples include algorithms that explicitly guard against privacy breaches through linear transformations, exploiting multiplicative and colored noise for preserving privacy in data mining applications.  相似文献   

2.
This paper proposes a scalable, local privacy-preserving algorithm for distributed Peer-to-Peer (P2P) data aggregation useful for many advanced data mining/analysis tasks such as average/sum computation, decision tree induction, feature selection, and more. Unlike most multi-party privacy-preserving data mining algorithms, this approach works in an asynchronous manner through local interactions and it is highly scalable. It particularly deals with the distributed computation of the sum of a set of numbers stored at different peers in a P2P network in the context of a P2P web mining application. The proposed optimization-based privacy-preserving technique for computing the sum allows different peers to specify different privacy requirements without having to adhere to a global set of parameters for the chosen privacy model. Since distributed sum computation is a frequently used primitive, the proposed approach is likely to have significant impact on many data mining tasks such as multi-party privacy-preserving clustering, frequent itemset mining, and statistical aggregate computation.  相似文献   

3.
Privacy preserving clustering on horizontally partitioned data   总被引:3,自引:0,他引:3  
Data mining has been a popular research area for more than a decade due to its vast spectrum of applications. However, the popularity and wide availability of data mining tools also raised concerns about the privacy of individuals. The aim of privacy preserving data mining researchers is to develop data mining techniques that could be applied on databases without violating the privacy of individuals. Privacy preserving techniques for various data mining models have been proposed, initially for classification on centralized data then for association rules in distributed environments. In this work, we propose methods for constructing the dissimilarity matrix of objects from different sites in a privacy preserving manner which can be used for privacy preserving clustering as well as database joins, record linkage and other operations that require pair-wise comparison of individual private data objects horizontally distributed to multiple sites. We show communication and computation complexity of our protocol by conducting experiments over synthetically generated and real datasets. Each experiment is also performed for a baseline protocol, which has no privacy concern to show that the overhead comes with security and privacy by comparing the baseline protocol and our protocol.  相似文献   

4.
分布式隐私保护数据挖掘研究*   总被引:2,自引:2,他引:0  
隐私保护挖掘是近年来数据挖掘领域的热点之一,主要研究在避免敏感数据泄露的同时在数据中挖掘出潜在的知识。实际应用中,大量的数据分别存放在多个站点,因此分布式隐私保护数据挖掘(distributed privacy preserving data mining, DPPDM)的研究更具有实际意义。对该领域的研究进行了详细的阐述,比较了各种方法的优缺点,对现有方法进行了分类和总结,最后指出了该领域未来的研究方向。  相似文献   

5.
介绍企业信用评估和当前隐私保护数据挖掘技术的最新进展,利用适用于企业信用评估的大规模分布式隐私保护数据挖掘架构,讨论了基于该架构的面向企业信用评估的分布式隐私保护数据挖掘。该研究不仅将有助于大规模分布式环境下的隐私保护数据挖掘系统的研发,而且能够有力推动“信用中国”的建设步伐,以达到更好地服务经济的目的。  相似文献   

6.
随着经济的快速发展,当前很多企业构成了产业链,通过对其进行分布式的商务智能分析,能够获取很多有价值的信.研究了适用于产业链型数据的大规模分布式隐私保护数据挖掘架构,重点研究基于安全多方计算技术的分布式隐私保护数据挖掘通用算法组件,特别是研究面向产业链型数据的分布式隐私保护数据挖掘算法.该研究不仅将有助于大规模分布式环境下的隐私保护数据挖掘系统的研发,而且能够达到更好地服务经济的目的.  相似文献   

7.
A framework for condensation-based anonymization of string data   总被引:1,自引:0,他引:1  
In recent years, privacy preserving data mining has become an important problem because of the large amount of personal data which is tracked by many business applications. An important method for privacy preserving data mining is the method of condensation. This method is often used in the case of multi-dimensional data in which pseudo-data is generated to mask the true values of the records. However, these methods are not easily applicable to the case of string data, since they require the use of multi-dimensional statistics in order to generate the pseudo-data. String data are especially important in the privacy preserving data-mining domain because most DNA and biological data are coded as strings. In this article, we will discuss a new method for privacy preserving mining of string data with the use of simple template-based models. The template-based model turns out to be effective in practice, and preserves important statistical characteristics of the strings such as intra-record distances. We will explore the behavior in the context of a classification application, and show that the accuracy of the application is not affected significantly by the anonymization process. This article is an extended version of the conference version of the paper presented at the SIAM Conference on Data Mining, 2007 Aggarwal and Yu (2007). Available at .  相似文献   

8.
隐私保护是数据挖掘中一个重要的研究方向,如何在不违反隐私规定的情况下,利用数据挖掘工具发现有意义的知识是一个热点问题。本文介绍了分布式数据挖掘中隐私保护的现状,着重介绍分布式数据挖掘中隐私保护问题和技术。  相似文献   

9.
针对基于随机响应的隐私保护分类挖掘算法仅适用于原始数据属性值是二元的问题,设计了一种适用于多属性值原始数据的隐私保护分类挖掘算法。算法分为两个部分:a)通过比较参数设定值和随机产生数之间的大小,决定是否改变原始数据的顺序,以实现对原始数据进行变换,从而起到保护数据隐私性的目的;b)通过求解信息增益比例的概率估计值,在伪装后的数据上构造决策树。  相似文献   

10.
Geometric data perturbation for privacy preserving outsourced data mining   总被引:1,自引:1,他引:0  
Data perturbation is a popular technique in privacy-preserving data mining. A major challenge in data perturbation is to balance privacy protection and data utility, which are normally considered as a pair of conflicting factors. We argue that selectively preserving the task/model specific information in perturbation will help achieve better privacy guarantee and better data utility. One type of such information is the multidimensional geometric information, which is implicitly utilized by many data-mining models. To preserve this information in data perturbation, we propose the Geometric Data Perturbation (GDP) method. In this paper, we describe several aspects of the GDP method. First, we show that several types of well-known data-mining models will deliver a comparable level of model quality over the geometrically perturbed data set as over the original data set. Second, we discuss the intuition behind the GDP method and compare it with other multidimensional perturbation methods such as random projection perturbation. Third, we propose a multi-column privacy evaluation framework for evaluating the effectiveness of geometric data perturbation with respect to different level of attacks. Finally, we use this evaluation framework to study a few attacks to geometrically perturbed data sets. Our experimental study also shows that geometric data perturbation can not only provide satisfactory privacy guarantee but also preserve modeling accuracy well.  相似文献   

11.
隐私保护是数据挖掘中一个重要的研究方向,如何在不违反隐私规定的情况下,利用数据挖掘工具发现有意义的知识是一个热点问题。本文介绍了分布式数据挖掘中隐私保护的现状,着重介绍分布式数据挖掘中隐私保护问题和技术。  相似文献   

12.
针对垂直分布下的隐私保护关联规则挖掘算法安全性不高和挖掘效率较低的问题,提出了一种隐私保护关联规则挖掘算法.算法采用一种新的点积协议,通过引入逆矩阵和随机数隐藏原始输入信息,具有较好的安全性;利用挖掘最大频繁项集来代替挖掘所有频繁项集,采用深度优先遍历策略,结合各种剪枝策略,明显加快了频繁项集的生成速度,大大减少计算代价.实验结果表明,挖掘效率得到了很大提高.  相似文献   

13.
吕品  陈年生  董武世 《微机发展》2006,16(7):147-149
隐私与安全是数据挖掘中一个越来越重要的问题。隐私与安全问题的解决能破坏图谋不轨的挖掘工程。文中研究了数据挖掘中隐私保护技术的发展现状,总结出了隐私保护技术的分类,详细讨论了隐私保护技术中最重要的隐私保持技术,最后得出了隐私保护技术算法的评估指标。  相似文献   

14.
在数据挖掘的应用中,隐私保护非常重要。在数据中加上噪声可以在一定程度上保护用户的隐私,但会降低数据的准确性,进而影响数据挖掘结果的有效性。提出一种高效的基于理性密码学的分布式隐私保护数据挖掘框架,在此框架中每个参与方都被认为是理性的,而不像在经典密码学中简单地把每个参与方认为是恶意的或诚实的。基于此种假设和一个半可信的第三方,许多数据挖掘函数,如求和、求平均值、求积、比较、和求频繁项等,都可以在本框架下高效地实现。  相似文献   

15.
Data collection is a necessary step in data mining process. Due to privacy reasons, collecting data from different parties becomes difficult. Privacy concerns may prevent the parties from directly sharing the data and some types of information about the data. How multiple parties collaboratively conduct data mining without breaching data privacy presents a challenge. The objective of this paper is to provide solutions for privacy-preserving collaborative data mining problems. In particular, we illustrate how to conduct privacy-preserving naive Bayesian classification which is one of the data mining tasks. To measure the privacy level for privacy- preserving schemes, we propose a definition of privacy and show that our solutions preserve data privacy.  相似文献   

16.
隐私保护数据挖掘*   总被引:4,自引:0,他引:4  
隐私保护数据挖掘的目标是寻找一种数据集变换方法,使得敏感数据或敏感知识在实施数据挖掘的过程中不被发现。近年出现了大量相关算法,按照隐私保持技术可将它们分为基于启发式技术、基于安全多方技术和基于重构技术三种。结合目前研究的热点对关联规则和分类规则的隐私保护数据挖掘进行介绍,并给出算法的评估方法,最后提出了关联规则隐私保护数据挖掘未来研究工作的方向。  相似文献   

17.
基于博弈论的隐私保护分布式数据挖掘   总被引:1,自引:1,他引:0  
葛新景  朱建明 《计算机科学》2011,38(11):161-166
隐私保护的分布式数据挖掘问题是数据挖掘领域的一个研究热点,而基于经济视角,利用博弈论的方法对隐私保护分布式数据挖掘进行研究只是处于初始阶段。基于收益最大化,研究了完全信息静态博弈下分布式数据挖掘中参与者(两方或多方)的策略决策问题,得出了如下结论:数据挖掘在满足一定的条件下,参与者(两方或多方)的准诚信攻击策略是一个帕累托最优的纳什均衡策略;在准诚信攻击的假设下,参与者(多方)的非共谋策略并不是一个纳什均衡策略。同时给出了该博弈的混合战略纳什均衡,它对隐私保护分布式数据挖掘中参与者的决策具有一定的理论和指导意义。  相似文献   

18.
基于多参数随机扰动的布尔规则挖掘   总被引:2,自引:0,他引:2  
陈芸  张伟  周霆  邹汉斌 《计算机工程》2006,32(10):63-65
在MASK算法基础上提出了基于多参数随机扰动后布尔规则的挖掘过程,通过对实验结果的评估分析,表明该算法能够提供较高的隐私保护,并讨论了隐私保护及挖掘精度之间的关系。最后对未来多参数随机扰动数据挖掘研究进行了展望。  相似文献   

19.
Privacy preserving association rule mining has been an active research area since recently. To this problem, there have been two different approaches—perturbation based and secure multiparty computation based. One drawback of the perturbation based approach is that it cannot always fully preserve individual’s privacy while achieving precision of mining results. The secure multiparty computation based approach works only for distributed environment and needs sophisticated protocols, which constrains its practical usage. In this paper, we propose a new approach for preserving privacy in association rule mining. The main idea is to use keyed Bloom filters to represent transactions as well as data items. The proposed approach can fully preserve privacy while maintaining the precision of mining results. The tradeoff between mining precision and storage requirement is investigated. We also propose δ-folding technique to further reduce the storage requirement without sacrificing mining precision and running time.  相似文献   

20.
通过数据概化,在多维属性的属性值概念分层上构造少量的具有抽象语义的元组来替换大量具有详细语义的原始元组,从而汇总数据表,这称作表语义汇总。给定原始数据表及其多维属性的属性值的概念分层,表语义汇总的目标是产生规定压缩率且保留尽可能多的语义信息的汇总表。现有算法采用在概化元组集合中寻找最佳概化元组组合的策略将其转换成Set-Covering问题来解决,尽管采取了多种优化策略(如预处理、分级处理)来提高效率,但仍存在转换开销大、算法框架复杂且不易扩展到高维属性等缺点。通过定义多维属性层次结构的度量空间将该问题转换为多维层次空间聚类问题并引入dewey编码来提高转换效率,提出了基于快速收敛的层次凝聚和基于层次空间分辨率调整的两种聚类算法来高效地建立语义汇总表。经真实数据集上的实验表明,新算法在执行效率和汇总质量上都优于现有方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号