首页 | 本学科首页   官方微博 | 高级检索  
     

基于特征选择与集成学习的钓鱼网站检测方法
引用本文:周传华,柳智才,丁敬安,周家亿.基于特征选择与集成学习的钓鱼网站检测方法[J].计算机应用研究,2019,36(4).
作者姓名:周传华  柳智才  丁敬安  周家亿
作者单位:安徽工业大学管理科学与工程学院,安徽马鞍山243002;中国科学技术大学计算机科学与技术学院,合肥230026;安徽工业大学管理科学与工程学院,安徽马鞍山,243002;早稻田大学IPS学院,日本东京
基金项目:国家自然科学基金;国家自然科学基金;创新项目
摘    要:针对目前大部分钓鱼网站检测方法存在检测准确率低、误判率高等问题,提出了一种基于特征选择与集成学习的钓鱼网站检测方法。该检测方法首先使用FSIGR算法进行特征选择,FSIGR算法结合过滤和封装模式的优点,从信息相关性和分类能力两个方面对特征进行综合度量,并采用前向递增后向递归剔除策略对特征进行选择,以分类精度作为评价指标对特征子集进行评价,从而获取最优特征子集;然后使用最优特征子集数据对随机森林分类算法模型进行训练。在UCI数据集上的实验表明,所提方法能够有效提高钓鱼网站检测的正确率,降低误判率,具有实际应用意义。

关 键 词:钓鱼网站  随机森林  信息增益率  特征选择
收稿时间:2017/10/30 0:00:00
修稿时间:2019/3/1 0:00:00

Method of phishing website detection based on feature selection and integrated learning
Zhou-Chuanhu,Liu-Zhicai,Ding-Jingan and Zhou-Jiayi.Method of phishing website detection based on feature selection and integrated learning[J].Application Research of Computers,2019,36(4).
Authors:Zhou-Chuanhu  Liu-Zhicai  Ding-Jingan and Zhou-Jiayi
Affiliation:School of Management Science and Engineering, Anhui University of Technology,,,
Abstract:In view of the fact that most phishing websites detection methods have the problems of low detection accuracy and high false positive rate and other issues, this paper proposed a phishing website detection method based on feature selection and integrated learning. Firstly, the FSIGR algorithm was used to select feature. The FSIGR algorithm combined with the advantages of filter and wrapper modes. First, it carried out a comprehensive measurement of features from two aspects of information correlation and classification ability. Second, it used recursive elimination after increasing forward strategy to select the features, and used the classification accuracy as the evaluation index to measure and select the feature subset. Finally, it obtained the optimal feature subset. Then, the optimal feature subset data is used to train the random forest classification algorithm model.Experiments on the UCI dataset show that this method can improve the accuracy of phishing websites detection and reduce the false positive rate.
Keywords:phishing website  random forest  information gain ratio  feature selection
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号