基于随机森林特征重要性的K-匿名特征优选 K-ANONYMITY FEATURE OPTIMIZATION BASED ON THE IMPORTANCE OF RANDOM FOREST FEATURES期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于随机森林特征重要性的K-匿名特征优选

引用本文：	黄梅,朱焱.基于随机森林特征重要性的K-匿名特征优选[J].计算机应用与软件,2020,37(3):266-270.

作者姓名：	黄梅朱焱

作者单位：	西南交通大学信息科学与技术学院四川成都 611756;西南交通大学信息科学与技术学院四川成都 611756

摘要：	大数据时代,数据的共享与挖掘存在隐私泄露的安全隐患。针对使用K-匿名隐藏实现隐私保护会大幅降低数据分类挖掘性能问题,提出一种基于随机森林特征重要性的K-匿名特征选择算法(RFKA)用于分类挖掘。使用随机森林特征重要性度量特征的分类性能;采用前向序列搜索策略每次选择不破坏K-匿名且分类性能最大的特征加入特征子集;使用特征子集对应的数据集构建模型进行分类实验。实验结果表明,该算法能更有效地平衡K-匿名和分类挖掘性能,且算法运行效率更高。
关键词：	特征选择 K-匿名随机森林分类
K-ANONYMITY FEATURE OPTIMIZATION BASED ON THE IMPORTANCE OF RANDOM FOREST FEATURES

Huang Mei,Zhu Yan.K-ANONYMITY FEATURE OPTIMIZATION BASED ON THE IMPORTANCE OF RANDOM FOREST FEATURES[J].Computer Applications and Software,2020,37(3):266-270.

Authors:	Huang Mei Zhu Yan

Affiliation:	(College of Information Science and Technology,Southwest JiaoTong University,Chengdu 611756,Sichuan,China)

Abstract:	In the era of big data,the sharing and mining of data has security risks of privacy leakage.Aiming at the problem that the use of K-anonymity hiding to achieve privacy protection can greatly reduce the performance of data classification mining,we propose a K-anonymity feature selection algorithm(RFKA)based on the importance of random forest features for classification mining.The classification performance of feature was measured by random forest feature importance.Then,we adopted the forward sequence search strategy to select the features without destroying k-anonymity and added the highest classification performance to a feature subset.We used the data set corresponding to the feature subset to construct the model for classification experiments.The results show that our algorithm can balance K-anonymity and classification mining performance more effectively,and the algorithm runs more efficiently.

Keywords:	Feature selection K-anonymity Random forest Classification
本文献已被维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏