首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 193 毫秒
1.
差分隐私因具有严格推理和证明的隐私保证,常被应用于位置隐私保护场景中.用户进行位置连续查询时,会引起噪声叠加导致查询精度下降,目前基于规则树结构的差分隐私虽然能降低查询误差,但会产生大量无效零节点,数据结构过大,在查询精度上还有进一步提高的空间.本文提出了不规则线段树的差分隐私位置隐私保护方法,将不规则线段树引入差分隐私方法中,根据节点覆盖率和Laplace机制的敏感度推导出不规则线段树的估值函数,从而筛选出较优的不规则线段树结构.该方法能有效减小连续查询时噪声叠加带来的查询精度下降的问题,相对于其他提高差分隐私查询精度的方法有更小的查询误差,并能适应不同密度环境的LBS位置查询服务.  相似文献   

2.
事务数据常见于各种应用场景中,如购物记录、页面浏览历史等.为了提供更好的服务,服务提供商收集用户数据并进行分析,但收集事务数据会泄露用户的隐私信息.为了解决上述问题,基于压缩的本地差分隐私模型,提出一种事务数据收集方法.首先,定义了一种新的候选项集分值函数;其次,基于该函数,将候选项集的样本空间划分为多个子空间;然后,随机选择其中一个子空间,基于该子空间随机生成事务数据并发送给不可信的数据收集者;最后,考虑到隐私参数的设置问题,基于最大后验置信度攻击模型设计启发式隐私参数设置策略.理论分析表明,该方法能够同时保护事务数据的长度与内容,满足压缩的本地差分隐私要求.实验结果表明,与目前最优的工作相比,所收集的数据具有更高的效用性,隐私参数设置更具有语义性.  相似文献   

3.
针对查询函数在用户用电数据上的全局敏感度较大、计算复杂度较高且独立噪声易被滤除的问题,提出了一种基于周期敏感度的差分隐私保护方法(Period Sensitivity Method,PSM)。PSM将用电序列分解为稳定期序列集和活跃期序列集,并根据数据稀疏度和相关性的差异使用两种隐私保护策略。向稳定期序列添加独立同分布噪声,并使用平滑滤波器对加噪后的稳定期序列进行平滑处理;向活跃期序列添加与活跃期序列的自相关函数相同的相关性噪声。理论分析与实验结果表明,PSM满足差分隐私,并且具有更好的可用性和更小的计算复杂度。  相似文献   

4.
针对用电大数据环境下,非交互式差分隐私模型无法提供准确查询结果及计算开销较大的问题,提出一种基于最大信息系数与数据匿名化的差分隐私数据发布方法。从原始数据集中选出部分隐私属性作为特征集,利用最大信息系数选出与此特征集相关性高的数据作为隐私数据集,使用协同隐私保护算法对隐私数据集进行保护,发布满足差分隐私保护的用电大数据集。理论分析与实验结果表明,所提出的方法在提高大数据隐私保护处理效率同时,有效分化查询函数敏感性,提高发布数据可用性。  相似文献   

5.
为加强隐私保护和提高数据可用性,提出一种可对混合属性数据表执行差分隐私的数据保护方法。该方法首先采用ICMD(insensitive clustering for mixed data)聚类算法对数据集进行聚类匿名,然后在此基础上进行-差分隐私保护。ICMD聚类算法对数据表中的分类属性和数值属性采用不同方法计算距离和质心,并引入全序函数以满足执行差分隐私的要求。通过聚类,实现了将查询敏感度由单条数据向组数据的分化,降低了信息损失和信息披露的风险。最后实验结果表明了该方法的有效性。  相似文献   

6.
差分隐私作为现在的一种隐私保护机制得到了广泛的应用.目前虽然存在着很多种静态数据集上的直方图发布方法,但是对于数据流环境下的基于滑动窗口直方图发布方法较少,并且面临着直方图的发布误差较高的问题.对于此问题,提出了一种适用于滑动窗口模型的数据流差分隐私直方图发布算法(histogram pub-lishing algorithm for sliding window model,HPA-SW).该算法首先基于数据分块的思想来把一个滑动窗口划分为k个子块,并通过该参数来控制和调节数据直方图的统计误差;随后,该算法通过比较相邻两个直方图数据分布的差异来优化当前窗口的隐私预算分配,从而快速计算出局部最优直方图.为了验证算法的有效性,首先通过严格的理论推导证实了所设计的算法符合差分隐私要求,并且其近似误差不超过W/2k.其次,通过在真实数据集合上的实验对比,显示了该算法的发布误差较低,比SSHP算法降低了50%.  相似文献   

7.
如何在位置差分隐私保护中实现更合理的噪声添加是当前研究的一大热点,但在不同的位置添加相同噪声的隐私保护模式会导致服务可用性和隐私保护度下降.针对这问题,提出了一种融合语义位置的差分私有位置隐私保护方法,该方法首先利用"地理不可区分性"的框架构建预期距离,然后通过定义隐私质量函数和需求函数构建语义位置信息来确定不同位置点的敏感度,最后依据位置点的敏感度为不同类型的区域细粒度地添加Laplace噪声,系统地解决了位置隐私保护、服务可用性和时间开销之间的矛盾.在两个公开数据集上进行仿真实验,与已有的方法从基于贝叶斯攻击的查询成功率、基于预期距离量化的服务可用性和时间开销方面进行了对比分析,结果证明了所提方法的可行性和有效性,并且在隐私保护度、服务可用性和时间开销方面取得了更好的权衡.  相似文献   

8.
差分隐私是2006年由DWORK提出的一种新型的隐私保护机制,它主要针对隐私保护中,如何在分享数据时定义隐私,以及如何在保证可用性的数据发布时,提供隐私保护的问题,这两个问题提出了一个隐私保护的数学模型。由于差分隐私对于隐私的定义不依赖于攻击者的背景知识,所以被作为一种新型的隐私保护模型广泛地应用于数据挖掘,机器学习等各个领域。本文介绍了差分隐私的基础理论和目前的研究进展,以及一些已有的差分隐私保护理论和技术,最后对未来的工作和研究热点进行了展望。  相似文献   

9.
针对大数据环境下,非交互式差分隐私无法准确提供及处理大量范围查询的问题,提出一种基于最大信息系数与机器学习的隐私保护数据查询模型。对原始数据集采用最大信息系数选出相关性低的数据作为训练样本集,然后结合差分隐私的并行组合性质对其进行分块划分得到隐私保护的训练样本集,最后应用线性回归算法训练样本集得到差分隐私保护预测模型,该模型隐私保护的方式回答当前提交和大量未知的查询。实验结果表明,所提出的模型在提升发布数据效用性的同时,也提高了查询处理的效率。  相似文献   

10.
为了保护医疗辅助诊断系统中患者的个人隐私,本文提出一种新的结合决策树与不经意传输(Oblivious Transfer, OT)技术的双向隐私保护方法.该方法首先利用决策树对已有诊断信息进行分类来形成辅助诊断,并利用差分隐私确保决策树构建过程中不会泄露数据库的隐私.其次利用OT技术保护查询过程中的隐私,并提出一种决策树索引协议将决策树算法与OT协议有效结合.提出的方法最早将决策树与OT技术应用于医疗辅助诊断系统,并且在客户端进行医疗数据查询并得到准确查询结果的情况下,能够极好地保护客户端、服务器以及数据库的隐私信息,实现更全面的双向隐私保护.理论分析结果表明,本文提出的方法在保护隐私的同时具有较高的通信效率.进一步地,实验结果也表明,提出的方法不仅具有较高的查询效率,同时还具有较高的查询准确率.  相似文献   

11.
沈思倩  毛宇光  江冠儒 《计算机科学》2017,44(6):139-143, 149
主要研究在对不完全数据集进行决策树分析时,如何加入差分隐私保护技术。首先简单介绍了差分隐私ID3算法和差分隐私随机森林决策树算法;然后针对上述算法存在的缺陷和不足进行了修改,提出指数机制的差分隐私随机森林决策树算法;最后对于不完全数据集提出了一种新的WP(Weight Partition)缺失值处理方法,能够在不需要插值的情况下,使决策树分析算法既能满足差分隐私保护,也能拥有更高的预测准确率和适应性。实验证明,无论是Laplace机制还是指数机制,无论是ID3算法还是随机森林决策树算法,都能适用于所提方法。  相似文献   

12.
Spatial databases are essential to applications in a wide variety of domains. One of the main privacy concerns when answering statistical queries, such as range counting queries, over a spatial database is that an adversary observing changes in query answers may be able to determine whether or not a particular geometric object is present in the database. Differential privacy addresses this concern by guaranteeing that the presence or absence of a geometric object has little effect on query answers. Most of the current differentially private mechanisms for spatial databases ignore the fact that privacy is personal and, thus, provide the same privacy protection for all geometric objects. However, some particular geometric objects may be more sensitive to privacy issues than others, requiring stronger differential privacy guarantees. In this paper, we introduce the concept of spatial personalized differential privacy for spatial databases where different geometric objects have different privacy protection requirements. Also, we present SPDP-PCE, a novel spatial personalized differentially private mechanism to answer range counting queries over spatial databases that fully considers the privacy protection requirements of geometric objects in the underlying geometric space in both steps of noise addition and consistency enforcement. Our experimental results on real datasets demonstrate the effectiveness of SPDP-PCE under various total privacy budgets, query shapes, and privacy level distributions.  相似文献   

13.
When querying databases containing sensitive information, the privacy of individuals stored in the database has to be guaranteed. Such guarantees are provided by differentially private mechanisms which add controlled noise to the query responses. However, most such mechanisms do not take into consideration the valid range of the query being posed. Thus, noisy responses that fall outside of this range may potentially be produced. To rectify this and therefore improve the utility of the mechanism, the commonly-used Laplace distribution can be truncated to the valid range of the query and then normalized. However, such a data-dependent operation of normalization leaks additional information about the true query response, thereby violating the differential privacy guarantee. Here, we propose a new method which preserves the differential privacy guarantee through a careful determination of an appropriate scaling parameter for the Laplace distribution. We adapt the privacy guarantee in the context of the Laplace distribution to account for data-dependent normalization factors and study this guarantee for different classes of range constraint configurations. We provide derivations of the optimal scaling parameter (i.e., the minimal value that preserves differential privacy) for each class or provide an approximation thereof. As a result of this work, one can use the Laplace distribution to answer queries in a range-adherent and differentially private manner. To demonstrate the benefits of our proposed method of normalization, we present an experimental comparison against other range-adherent mechanisms. We show that our proposed approach is able to provide improved utility over the alternative mechanisms.  相似文献   

14.
Latent Dirichlet allocation (LDA) is a topic model widely used for discovering hidden semantics in massive text corpora. Collapsed Gibbs sampling (CGS), as a widely-used algorithm for learning the parameters of LDA, has the risk of privacy leakage. Specifically, word count statistics and updates of latent topics in CGS, which are essential for parameter estimation, could be employed by adversaries to conduct effective membership inference attacks (MIAs). Till now, there are two kinds of methods exploited in CGS to defend against MIAs: adding noise to word count statistics and utilizing inherent privacy. These two kinds of methods have their respective limitations. Noise sampled from the Laplacian distribution sometimes produces negative word count statistics, which render terrible parameter estimation in CGS. Utilizing inherent privacy could only provide weak guaranteed privacy when defending against MIAs. It is promising to propose an effective framework to obtain accurate parameter estimations with guaranteed differential privacy. The key issue of obtaining accurate parameter estimations when introducing differential privacy in CGS is making good use of the privacy budget such that a precise noise scale is derived. It is the first time that R′enyi differential privacy (RDP) has been introduced into CGS and we propose RDP-LDA, an effective framework for analyzing the privacy loss of any differentially private CGS. RDP-LDA could be used to derive a tighter upper bound of privacy loss than the overestimated results of existing differentially private CGS obtained by ε-DP. In RDP-LDA, we propose a novel truncated-Gaussian mechanism that keeps word count statistics non-negative. And we propose distribution perturbation which could provide more rigorous guaranteed privacy than utilizing inherent privacy. Experiments validate that our proposed methods produce more accurate parameter estimation under the JS-divergence metric and obtain lower precision and recall when defending against MIAs.  相似文献   

15.
We propose a new differentially-private decision forest algorithm that minimizes both the number of queries required, and the sensitivity of those queries. To do so, we build an ensemble of random decision trees that avoids querying the private data except to find the majority class label in the leaf nodes. Rather than using a count query to return the class counts like the current state-of-the-art, we use the Exponential Mechanism to only output the class label itself. This drastically reduces the sensitivity of the query – often by several orders of magnitude – which in turn reduces the amount of noise that must be added to preserve privacy. Our improved sensitivity is achieved by using “smooth sensitivity”, which takes into account the specific data used in the query rather than assuming the worst-case scenario. We also extend work done on the optimal depth of random decision trees to handle continuous features, not just discrete features. This, along with several other improvements, allows us to create a differentially private decision forest with substantially higher predictive power than the current state-of-the-art.  相似文献   

16.
张啸剑  徐雅鑫  夏庆荣 《软件学报》2022,33(6):2348-2363
基于中心化/本地化差分隐私的直方图发布已得到了研究者的广泛关注.用户的隐私需求与收集者的分析精度之间的矛盾直接制约着直方图发布的可用性.针对现有直方图发布方法难以有效同时兼顾用户隐私与收集者分析精度的不足,提出了一种基于混洗差分隐私的直方图发布算法HP-SDP(histogram publication with shuffled differential privacy).该算法结合本地哈希编码技术所设计的混洗应答机制SRR (shuffled randomized response),能够以线性分解的方式扰动用户数据以及摆脱数据值域大小的影响.结合SRR机制产生的用户消息,设计了一种基于堆排列技术的用户消息均匀随机排列算法MRS (message random shuffling),混洗方利用MRS对所有用户的消息进行随机排列.由于经过MRS混洗后的消息满足中心化差分隐私,使得恶意收集者无法通过消息与用户之间的链接对目标用户进行身份甄别.此外,HP-SDP利用基于二次规划技术的后置处理算法POP(post-processing)对混洗后的直方图进行求精处理. HP-SDP算法与现有...  相似文献   

17.
This paper analyzes a novel method for publishing data while still protecting privacy. The method is based on computing weights that make an existing dataset, for which there are no confidentiality issues, analogous to the dataset that must be kept private. The existing dataset may be genuine but public already, or it may be synthetic. The weights are importance sampling weights, but to protect privacy, they are regularized and have noise added. The weights allow statistical queries to be answered approximately while provably guaranteeing differential privacy. We derive an expression for the asymptotic variance of the approximate answers. Experiments show that the new mechanism performs well even when the privacy budget is small, and when the public and private datasets are drawn from different populations.  相似文献   

18.
基于不确定数据的频繁项集挖掘算法已经得到了广泛的研究。对于记录用户敏感信息的不确定数据,攻击者可以利用自己掌握的背景信息,通过分析基于不确定数据的频繁项集,从而获得用户的敏感信息。为了从不确定的数据集中挖掘出基于期望支持度的前K个最频繁的频繁项集,并且保证挖掘结果满足差分隐私,在本文中,FIMUDDP算法(Frequent Itemsets Mining for Uncertain Data based on Differential Privacy)被提出来。FIMUDDP利用差分隐私的指数机制和拉普拉斯机制确保从不确定数据中挖掘出的基于期望支持度的前K个最频繁的频繁项集和这些频繁项集的期望支持度满足差分隐私。通过对FIMUDDP进行理论分析和实验评估,验证了FIMUDDP的有效性。  相似文献   

19.
Mining of spatial data is an enabling technology for mobile services, Internet-connected cars and the Internet of Things. But the very distinctiveness of spatial data that drives utility can cost user privacy. Past work has focused upon points and trajectories for differentially private release. In this work, we continue the tradition of privacy-preserving spatial analytics, focusing not on point or path data, but on planar spatial regions. Such data represent the area of a user’s most frequent visitation—such as “around home and nearby shops”. Specifically we consider the differentially private release of data structures that support range queries for counting users’ spatial regions. Counting planar regions leads to unique challenges not faced in existing work. A user’s spatial region that straddles multiple data structure cells can lead to duplicate counting at query time. We provably avoid this pitfall by leveraging the Euler characteristic for the first time with differential privacy. To address the increased sensitivity of range queries to spatial region data, we calibrate privacy-preserving noise using bounded user region size and a constrained inference that uses robust least absolute deviations. Our novel constrained inference reduces noise and promotes covertness by (privately) imposing consistency. We provide a full end-to-end theoretical analysis of both differential privacy and high-probability utility for our approach using concentration bounds. A comprehensive experimental study on several real-world datasets establishes practical validity.  相似文献   

20.
The big data era is coming with strong and ever-growing demands on analyzing personal information and footprints in the cyber world. To enable such analysis without privacy leak risk, differential privacy (DP) has been quickly rising in recent years, as the first practical privacy protection model with rigorous theoretical guarantee. This paper discusses how to publish differentially private histograms on events in time series domain, with sequences of personal events over graphs with events as edges. Such individual-generated sequences commonly appear in formalized industrial workflows, online game logs, and spatial-temporal trajectories. Directly publishing the statistics of sequences may compromise personal privacy. While existing DP mechanisms mainly target at normalized domains with fixed and aligned dimensions, our problem raises new challenges when the sequences could follow arbitrary paths on the graph. To tackle the problem, we reformulate the problem with a three-step framework, which 1) carefully truncates the original sequences, trading off errors introduced by the truncation with those introduced by the noise added to guarantee privacy, 2) decomposes the event graph into path sub-domains based on a group of event pivots, and 3) employs a deeply optimized tree-based histogram construction approach for each sub-domain to benefit with less noise addition. We present a careful analysis on our framework to support thorough optimizations over each step of the framework, and verify the huge improvements of our proposals over state-of-the-art solutions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号