首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 187 毫秒
1.
兰丽辉  鞠时光  金华 《计算机科学》2011,38(11):156-160
由于科学研究和数据共享等需要,应该发布社会网络数据。但直接发布社会网络数据会侵害个体隐私,在发布数据的同时要进行隐私保护。针对将邻域信息作为背景知识的攻击者进行目标节点识别攻击的场景提出了基于k-匿名发布的隐私保护方案。根据个体的隐私保护要求设立不同的隐私保护级别,以最大程度地共享数据,提高数据的有效性。设计实现了匿名发布的KNP算法,并在数据集上进行了验证,实验结果表明该算法能够有效抵御部域攻击。  相似文献   

2.
由于隐私泄露的风险越来越大,而采集的数据中的通常包含大量隐私信息,使数据的采集者不愿意共享自己的数据,造成“数据孤岛”,联邦学习能够实现数据不离本地的数据共享,但其在多机构数据共享中还存在一些问题,一方面中央服务器集中处理信息造成昂贵的成本,易产生单点故障,另一方面,对于多机构数据共享而言,参与节点中混入恶意节点可能影响训练过程,导致数据隐私泄露,基于上述分析,本文提出了一种将区块链和联邦学习相结合的以实现高效节点选择和通信的新的分布式联邦学习架构,解放中央服务器,实现参与节点直接通信,并在此架构上提出了一种基于信誉的节点选择算法方案(RBLNS),对参与节点进行筛选,保证参与节点的隐私安全。仿真结果表明,RBLNS能够显着提高模型的实验性能。  相似文献   

3.
基于差分隐私的数据扰动技术是当前隐私保护技术的研究热点,为了实现对敏感数据差分隐私保护的同时,尽量提高数据的可用性,对隐私参数的合理设置、对添加噪声后数据进行优化是差分隐私保护中的关键技术。提出了隐私参数设置算法RBPPA以及加噪数据的优化算法DPSRUKF。RBPPA将隐私参数设置构建于数据访问者和贡献者的信誉度之上,并与数据隐私度以及访问权限值关联,构造了细粒度的隐私参数设置方案; DPSRUKF采用了平方根无味卡尔曼滤波处理加噪数据,提高了差分隐私数据的可用性。实验分析表明,该算法实现了隐私参数的细粒化设置以及加噪数据优化后数据精度的提高,既为敏感数据的应用提供了数据安全保障,又为数据访问者提供了数据的高可用性。  相似文献   

4.
胡闯  杨庚  白云璐 《计算机科学》2019,46(2):120-126
大数据时代的数据挖掘技术在研究和应用等领域取得了较大发展,但大量敏感信息披露给用户带来了众多威胁和损失。因此,在聚类分析过程中如何保护数据隐私成为数据挖掘和数据隐私保护领域的热点问题。传统差分隐私保护k-means算法对其初始中心点的选择较为敏感,而且在聚簇个数k值的选择上存在一定的盲目性,降低了聚类结果的可用性。为了进一步提高差分隐私k-means聚类方法聚类结果的可用性,研究并提出一种新的基于差分隐私的DPk-means-up聚类算法,同时进行了理论分析和比较实验。理论分析表明,该算法满足ε-差分隐私,可适用于不同规模和不同维度的数据集。此外,实验结果表明,在相同隐私保护级别下,与其他差分隐私k-means聚类方法相比,所提算法有效提高了聚类的可用性。  相似文献   

5.
随着区块链技术的发展,链上数据共享越来越重要。当前区块链交易数据在链上公开透明,存在隐私数据共享受限问题,而且Hyperledger Fabric平台缺乏国密算法的支持,在国内应用中受限。文章首先采用国密算法改造Hyperledger Fabric平台;然后提出交易数据隐私保护方案,以国密算法完成对交易数据的安全和限时共享;最后对改造的Hyperledger Fabric平台和提出的方案做系统实现和性能测试。实验结果表明,文章方法实现了对Hyperledger Fabric平台的国密改造,该方案的执行效率和系统性能均满足实际需求。  相似文献   

6.
差分隐私保护及其应用   总被引:3,自引:0,他引:3  
数据发布与数据挖掘中的隐私保护问题是目前信息安全领域的一个研究热点.作为一种严格的和可证明的隐私定义,差分隐私近年来受到了极大关注并被广泛研究.文中分析了差分隐私保护模型相对于传统安全模型的优势,对差分隐私基础理论及其在数据发布与数据挖掘中的应用研究进行综述.在数据发布方面,介绍了各种交互式和非交互式的差分隐私保护发布方法,并着重从精确度和样本复杂度的角度对这些方法进行了比较.在数据挖掘方面,阐述了差分隐私保护数据挖掘算法在接口模式和完全访问模式下的实现方式,并对这些算法的执行性能进行了分析.最后,介绍了差分隐私保护在其它领域的应用,并展望未来的研究方向.  相似文献   

7.
对差分隐私的基本概念和实现方法进行了介绍,提出了一种用于决策树分析的差分隐私保护数据发布算法.该算法首先将数据完全泛化,然后在给定的隐私保护预算下采用指数机制将数据逐步精确化,最后根据拉普拉斯机制向数据中加入噪声,保证整个算法过程满足差分隐私保护要求;对指数机制中方案选择的方法进行了有效的改进.相对于已有的算法,本算法可在给定的隐私保护预算下使数据泛化程度更小,使所发布数据建立的决策树模型具有更高的分类准确率.实验结果验证了本算法的有效性和相对于其他算法的优越性.  相似文献   

8.
目前有关共享医疗数据的研究中,存在患者对本人数据不可控、隐私泄露、共享效率低下等问题。针对这些问题,提出了一种基于属性加密的个人医疗数据共享方案。该方案的优势在于允许患者自主设置共享策略,实现了细粒度共享数据;设计属性撤销算法,及时变更用户共享权限;结合可搜索加密技术实现多关键词加密检索;描述医疗场景验证该方案合理性。安全性分析和实验结果表明,该方案能够在保护患者隐私的情况下,以较低的通信开销实现患者个人医疗数据共享。  相似文献   

9.
由于协同过滤推荐算法依赖用户的数据,因而存在很大的隐私泄露风险.差分隐私保护技术可提供严格的隐私保护效果,但目前大多数基于差分隐私的推荐算法没有考虑隐式反馈数据,针对该问题,提出了一种新的基于差分隐私保护的协同过滤推荐算法.首先对隐式反馈矩阵进行矩阵分解,得到用户和物品的隐式特征向量;然后把得到的隐式特征向量融合到显式反馈模型求解中,通过在模型求解过程中加入均值扰动和梯度扰动,使算法满足ε-差分隐私保护;最后应用此算法预测评分,并在MovieLens数据集上对算法进行有效性评价.实验结果表明,所提算法能在推荐结果的准确性和用户的隐私保护之间实现有效的平衡.  相似文献   

10.
传统意义的交互式差分隐私保护模型对数据查询结果进行扰动,不能满足用户对数据的多样化需求。为有效使用数据并满足隐私保护要求,用局部差分隐私的思想,在随机响应的基础上实现数据集的链接攻击保护。首先,针对原始数据的分布情况,研究如何更好地选择随机转换矩阵P,在数据效用和隐私保护的基础上更好地实现链接隐私保护,从而避免身份披露和属性披露;其次,针对敏感、准标识符属性以及它们之间的组合讨论相应的隐私保护方法和数据效用的最大化,并给出数据扰动算法;最后,在已知数据分布均值和方差的基础上实验验证原始数据和扰动数据之间的KL-散度、卡方。实验结果表明所用随机化可以带来较小的效用损失。  相似文献   

11.
In current software defect prediction (SDP) research, most previous empirical studies only use datasets provided by PROMISE repository and this may cause a threat to the external validity of previous empirical results. Instead of SDP dataset sharing, SDP model sharing is a potential solution to alleviate this problem and can encourage researchers in the research community and practitioners in the industrial community to share more models. However, directly sharing models may result in privacy disclosure, such as model inversion attack. To the best of our knowledge, we are the first to apply differential privacy (DP) to privacy-preserving SDP model sharing and then propose a novel method DP-Share, since DP mechanisms can prevent this attack when the privacy budget is carefully selected. In particular, DP-Share first performs data preprocessing for the dataset, such as over-sampling for minority instances (i.e., defective modules) and conducting discretization for continuous features to optimize privacy budget allocation. Then, it uses a novel sampling strategy to create a set of training sets. Finally it constructs decision trees based on these training sets and these decision trees can form a random forest (i.e., model). The last phase of DP-Share uses Laplace and exponential mechanisms to satisfy the requirements of DP. In our empirical studies, we choose nine experimental subjects from real software projects. Then, we use AUC (area under ROC curve) as the performance measure and holdout as our model validation technique. After privacy and utility analysis, we find that DP-Share can achieve better performance than a baseline method DF-Enhance in most cases when using the same privacy budget. Moreover, we also provide guidelines to effectively use our proposed method. Our work attempts to fill the research gap in terms of differential privacy for SDP, which can encourage researchers and practitioners to share more SDP models and then effectively advance the state of the art of SDP.  相似文献   

12.

In this paper, recent algorithms are suggested to repair the issue of motif finding. The proposed algorithms are cuckoo search, modified cuckoo search and finally a hybrid of gravitational search and particle swarm optimization algorithm. Motif finding is the technique of handling expressive motifs successfully in huge DNA sequences. DNA motif finding is important because it acts as a significant function in understanding the approach of gene regulation. Recent results of existing motifs finding programs display low accuracy and can not be used to find motifs in different types of datasets. Practical tests are implemented first on synthetic datasets and then on benchmark real datasets that are based on nature-inspired algorithms. The results revealed that the hybridization of gravitational search algorithm and particle swarm algorithms provides higher precision and recall values and provides average enhancement of F-score up to 0.24, compared to other existing algorithms and tools, and also that cuckoo search and modified cuckoo search have been able to successfully locate motifs in DNA sequences.

  相似文献   

13.
With increasingly digitization, more and more information is collected from individuals and organizations, leading to several privacy concerns. These risks are further heightened in the mobile realm as data collection can occur continuously and ubiquitously. When individuals use their own devices in work settings, these issues become concerns for organization as well. The question then is how to ensure individuals perform proper information protection behaviors on mobile devices. In this research, we develop a model of mobile information protection based on an integration of the Theory of Planned Behavior and the information privacy literature to explore the antecedents of the attitude of individuals towards sharing information on their mobile devices, their intentions to use protective settings, and their actual practices. The model is tested with data from 228 iPhone users. The results indicate that mobile information protection intention leads to actual privacy settings practice, and that attitude towards information sharing and mobile privacy protection self-efficacy affect this intention. Determinants of attitude towards information sharing include mobile privacy concern and trust of the mobile platform. Finally, prior invasion experience is related to privacy concern. These findings provide insights into factors that can be targeted to enhance individuals’ protective actions to limit the amount of digital information they share via their smartphones.  相似文献   

14.
针对新一代测序(NGS)的染色质免疫共沉淀的高通量测序(ChIP-Seq)数据集的模体发现问题,提出一种基于费舍尔(Fisher)精确检验的模体发现算法——FisherNet。首先运用费舍尔精确检验计算所有k长短序的P值并筛选出模体的种子;然后,构建初始模体的位置赋权矩阵;最后,用位置赋权矩阵扫描所有k长短序形成最终模体。通过小鼠胚胎干细胞(mESC)和红细胞、人类淋巴母细胞系的ChIP-Seq数据集以及ENCODE数据库的数据进行验证,结果表明所提算法精度和计算速度均高于其他常见的模体发现算法,并且能够发现超过80%的已知转录因子核心模体及其辅调控因子模体。该算法在保证高精度的同时可以应用到大规模测序数据集。  相似文献   

15.
针对以前查找图形主题算法的精度和时间复杂度不能兼容的问题,提出了基于小波和动态时间弯曲 (DTW)的形状主题查询算法。本算法先利用小波变换进行数据降维并剪枝,以降低计算查找主题的复杂度;再 应用DTW的高精度计算的特点来查找图形主题,同时结合v-shift公式,忽略了在垂直方向的距离,这样查找图 形主题可以解决在现实世界中因形状大小不同而相似的问题。实验结果表明该方法具有较高的匹配精度和较 低的计算代价,可以找到图形相似的主题,并不会产生漏报,具有很强的实际意义。  相似文献   

16.
Time series motifs are sets of very similar subsequences of a long time series. They are of interest in their own right, and are also used as inputs in several higher-level data mining algorithms including classification, clustering, rule-discovery and summarization. In spite of extensive research in recent years, finding time series motifs exactly in massive databases is an open problem. Previous efforts either found approximate motifs or considered relatively small datasets residing in main memory. In this work, we leverage off previous work on pivot-based indexing to introduce a disk-aware algorithm to find time series motifs exactly in multi-gigabyte databases which contain on the order of tens of millions of time series. We have evaluated our algorithm on datasets from diverse areas including medicine, anthropology, computer networking and image processing and show that we can find interesting and meaningful motifs in datasets that are many orders of magnitude larger than anything considered before.  相似文献   

17.

A large amount of data and applications need to be shared with various parties and stakeholders in the cloud environment for storage, computation, and data utilization. Since a third party operates the cloud platform, owners cannot fully trust this environment. However, it has become a challenge to ensure privacy preservation when sharing data effectively among different parties. This paper proposes a novel model that partitions data into sensitive and non-sensitive parts, injects the noise into sensitive data, and performs classification tasks using k-anonymization, differential privacy, and machine learning approaches. It allows multiple owners to share their data in the cloud environment for various purposes. The model specifies communication protocol among involved multiple untrusted parties to process owners’ data. The proposed model preserves actual data by providing a robust mechanism. The experiments are performed over Heart Disease, Arrhythmia, Hepatitis, Indian-liver-patient, and Framingham datasets for Support Vector Machine, K-Nearest Neighbor, Random Forest, Naive Bayes, and Artificial Neural Network classifiers to compute the efficiency in terms of accuracy, precision, recall, and F1-score of the proposed model. The achieved results provide high accuracy, precision, recall, and F1-score up to 93.75%, 94.11%, 100%, and 87.99% and improvement up to 16%, 29%, 12%, and 11%, respectively, compared to previous works.

  相似文献   

18.
差分隐私是一种提供强大隐私保护的模型。在非交互式框架下,数据管理者可发布采用差分隐私保护技术处理的数据集供研究人员进行挖掘分析。但是在数据发布过程中需要加入大量噪声,会破坏数据可用性。因此,提出了一种基于k-prototype聚类的差分隐私混合数据发布算法。首先改进k-prototype聚类算法,按数据类型的不同,对数值型属性和分类型属性分别选用不同的属性差异度计算方法,将混合数据集中更可能相关的记录分组,从而降低差分隐私敏感度;结合聚类中心值,采用差分隐私保护技术对数据记录进行处理保护,针对数值型属性使用Laplace机制,分类型属性使用指数机制;从差分隐私的概念及组合性质两方面对该算法进行隐私分析证明。实验结果表明:该算法能够有效提高数据可用性。  相似文献   

19.
Sharing cyber security information helps firms to decrease cyber security risks, prevent attacks, and increase their overall resilience. Hence it affects reducing the social security cost. Although previously cyber security information sharing was being performed in an informal and ad hoc manner, nowadays through development of information sharing and analysis centers (ISACs), cyber security information sharing has become more structured, regular, and frequent. This is while, the privacy risk and information disclosure concerns are still major challenges faced by ISACs that act as barriers in activating the potential impacts of ISACs.This paper provides insights on decisions about security investments and information sharing in consideration of privacy risk and security knowledge growth. By the latest concept i.e. security knowledge growth, we mean fusing the collected security information, adding prior knowledge, and performing extra analyses to enrich the shared information. The impact of this concept on increasing the motivation of firms for voluntarily sharing their sensitive information to authorities such as ISACs has been analytically studied for the first time in this paper. We propose a differential game model in which a linear fusion model for characterizing the process of knowledge growth via the ISAC is employed. The Nash equilibrium of the proposed game including the optimized values of security investment, and the thresholds of data sharing with the price of privacy are highlighted. We analytically find the threshold in which the gain achieved by sharing sensitive information outweighs the privacy risks and hence the firms have natural incentive to share their security information. Moreover, since in this case the threshold of data sharing and the security investment levels chosen in Nash equilibrium may be lower than social optimum, accordingly we design mechanisms which would encourage the firms and lead to a socially optimal outcome. The direct impact of the achieved results is on analyzing the way ISACs can convince firms to share their security information with them.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号