首页 | 本学科首页   官方微博 | 高级检索  
     

基于随机森林特征重要性的K-匿名特征优选
引用本文:黄梅,朱焱.基于随机森林特征重要性的K-匿名特征优选[J].计算机应用与软件,2020,37(3):266-270.
作者姓名:黄梅  朱焱
作者单位:西南交通大学信息科学与技术学院 四川 成都 611756;西南交通大学信息科学与技术学院 四川 成都 611756
摘    要:大数据时代,数据的共享与挖掘存在隐私泄露的安全隐患。针对使用K-匿名隐藏实现隐私保护会大幅降低数据分类挖掘性能问题,提出一种基于随机森林特征重要性的K-匿名特征选择算法(RFKA)用于分类挖掘。使用随机森林特征重要性度量特征的分类性能;采用前向序列搜索策略每次选择不破坏K-匿名且分类性能最大的特征加入特征子集;使用特征子集对应的数据集构建模型进行分类实验。实验结果表明,该算法能更有效地平衡K-匿名和分类挖掘性能,且算法运行效率更高。

关 键 词:特征选择  K-匿名  随机森林  分类

K-ANONYMITY FEATURE OPTIMIZATION BASED ON THE IMPORTANCE OF RANDOM FOREST FEATURES
Huang Mei,Zhu Yan.K-ANONYMITY FEATURE OPTIMIZATION BASED ON THE IMPORTANCE OF RANDOM FOREST FEATURES[J].Computer Applications and Software,2020,37(3):266-270.
Authors:Huang Mei  Zhu Yan
Affiliation:(College of Information Science and Technology,Southwest JiaoTong University,Chengdu 611756,Sichuan,China)
Abstract:In the era of big data,the sharing and mining of data has security risks of privacy leakage.Aiming at the problem that the use of K-anonymity hiding to achieve privacy protection can greatly reduce the performance of data classification mining,we propose a K-anonymity feature selection algorithm(RFKA)based on the importance of random forest features for classification mining.The classification performance of feature was measured by random forest feature importance.Then,we adopted the forward sequence search strategy to select the features without destroying k-anonymity and added the highest classification performance to a feature subset.We used the data set corresponding to the feature subset to construct the model for classification experiments.The results show that our algorithm can balance K-anonymity and classification mining performance more effectively,and the algorithm runs more efficiently.
Keywords:Feature selection  K-anonymity  Random forest  Classification
本文献已被 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号