首页 | 本学科首页   官方微博 | 高级检索  
     

面向聚类的数据隐藏发布研究
引用本文:倪巍伟,陈耿,崇志宏,吴英杰.面向聚类的数据隐藏发布研究[J].计算机研究与发展,2012,49(5):1095-1104.
作者姓名:倪巍伟  陈耿  崇志宏  吴英杰
作者单位:1. 东南大学计算机科学与工程学院 南京210096
2. 南京审计学院信息科学学院 南京 211815
基金项目:国家自然科学基金项目,东南大学网络与信息集成教育部重点实验室开放基金项目
摘    要:数据隐藏发布在保护数据隐私和维持数据可用性间寻求一种折中,近年来得到了研究者的持续关注.数据隐藏发布的起因和目标都源于数据的使用价值,聚类作为实现数据深层使用价值的一个重要步骤,在数据挖掘领域得到了广泛的研究.聚类对数据个体特征的依赖与隐藏操作弱化个体特征的主导思想间的矛盾,使得面向聚类的数据隐藏发布成为一个难点.对面向聚类的隐私保护数据发布领域已有研究成果进行了总结,从保存聚类特征粒度的角度,分析保存聚类特征粒度与聚类可用性、隐私保护安全性间的关系;从维持数据聚类可用性效果角度对匿名、随机化、数据交换、人工合成数据替换等主要隐藏方法的原理、特点进行了分析.在对已有技术方法深入对比分析的基础上,指出了面向聚类的数据隐藏发布领域待解决的一些难点问题和未来发展方向.

关 键 词:隐私保护  聚类挖掘  数据隐藏  聚类可用性  数据发布

Privacy-Preserving Data Publication for Clustering
Ni Weiwei , Chen Geng , Chong Zhihong , Wu Yingjie.Privacy-Preserving Data Publication for Clustering[J].Journal of Computer Research and Development,2012,49(5):1095-1104.
Authors:Ni Weiwei  Chen Geng  Chong Zhihong  Wu Yingjie
Affiliation:1(College of Computer Science and Engineering,Southeast University,Nanjing210096) 2(School of Information Science,Nanjing Audit University,Nanjing211815)
Abstract:Privacy-preserving data publication has attracted sustained attention in recent years.It seeks a trade-off between preserving data privacy and maintaining data utility.Clustering is a crucial step for advanced data analysis,which has been widely studied in data mining.There exists some inconsistency between clustering and data obfuscation.Process of clustering heavily depends on characteristics of individual records to segment data into different clusters.On the contrary,the process of data obfuscation usually adopts the idea of suppressing individual characteristics for the sake of avoiding leakage of individual privacy.It becomes difficult to accommodate data privacy and clustering utility of the published data simultaneously.Various distortion and limited distribution techniques are delved into this problem.The state-of-the-art of data obfuscation methods for clustering application is surveyed.The constraint mechanism among clustering character granularities to be kept,clustering usability maintenance and security of data privacy is discussed.Further,the principles and merits of some prevalent methods,such as data anonymity,data randomization,data swapping and synthetic data substitution,are compared from a view of accommodating data privacy preservation and clustering usability maintenance.Following a comprehensive analysis of the existing techniques,some unaddressed problems and future directions are highlighted.
Keywords:privacy-preservation  clustering  data obfuscation  clustering utility  data publication
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号