首页 | 本学科首页   官方微博 | 高级检索  
     

PPDM中面向k-匿名的MI Loss评估模型
引用本文:谷青竹,董红斌.PPDM中面向k-匿名的MI Loss评估模型[J].计算机工程,2022,48(4):143-147.
作者姓名:谷青竹  董红斌
作者单位:武汉大学 国家网络安全学院, 武汉 430000
基金项目:国家自然科学基金“计算机免疫智能的连续免疫应答机制及其应用研究”(61877045);
摘    要:隐私保护数据挖掘(PPDM)利用匿名化等方法使数据所有者在不泄露隐私信息的前提下,安全发布在数据挖掘中有效可用的数据集。k-匿名算法作为PPDM研究使用最广泛的算法之一,具有计算开销低、数据形变小、能抵御链接攻击等优点,但是在一些k-匿名算法研究中使用的数据可用性评估模型的权重设置不合理,导致算法选择的最优匿名数据集在后续的分类问题中分类准确率较低。提出一种使用互信息计算权重的互信息损失(MI Loss)评估模型。互信息反映变量间的关联关系,MI Loss评估模型根据准标识符和标签之间的互信息计算权重,并通过Loss公式得到各个准标识符的信息损失,将加权后的准标识符信息损失的和作为数据集的信息损失,以弥补评估模型的缺陷。实验结果证明,运用MI Loss评估模型指导k-匿名算法能够明显降低匿名数据集在后续分类中的可用性丢失,相较于Loss模型和Entropy Loss模型,该模型分类准确率提升了0.73%~3.00%。

关 键 词:隐私保护数据挖掘  k-匿名算法  数据可用性  分类准确率  MI  Loss评估模型  
收稿时间:2021-05-20
修稿时间:2021-07-10

MI Loss Evaluation Model for k-Anonymity in PPDM
GU Qingzhu,DONG Hongbin.MI Loss Evaluation Model for k-Anonymity in PPDM[J].Computer Engineering,2022,48(4):143-147.
Authors:GU Qingzhu  DONG Hongbin
Affiliation:School of Cyber Science and Engineering, Wuhan University, Wuhan 430000, China
Abstract:Privacy Preserving Data Mining(PPDM) uses methods such as anonymization to allow data owners to safely publish data sets that are effectively available in data mining without revealing private information.The k-anonymity algorithm, one of the most widely used algorithms in PPDM research, has the advantages of low computational overhead, small data deformation, and resistance to link attacks.However, in some studies on k-anonymity algorithms, the weight settings of the data utility evaluation model used by the algorithm are unreasonable, which leads to the low classification accuracy of the optimal anonymous data set selected by the algorithm.Mutual Information (MI) reflects the relationship between variables.The MI Loss evaluation model uses the mutual information between the quasi-identifier and the label to calculate the weight.The information loss of each quasi-identifier is obtained through the Loss formula, and the sum of all weighted quasi-identifier information losses is taken as the information loss of the data set, which makes up for the shortcomings of the existing evaluation model.Experiments show that using the MI Loss evaluation model to guide the k-anonymity algorithm can significantly reduce the utility loss of anonymous data sets in subsequent classification problems.The classification accuracy of the proposed model shows an improvement of 0.73%~3.00% compared with the accuracies of the Loss and Entropy Loss models.
Keywords:Privacy Preserving Data Mining(PPDM)  k-anonymity algorithm  data utility  classification accuracy  MI Loss evaluation model  
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号