首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于多特征集成学习的恶意代码静态检测框架
引用本文:杨望, 高明哲, 蒋婷. 一种基于多特征集成学习的恶意代码静态检测框架[J]. 计算机研究与发展, 2021, 58(5): 1021-1034. DOI: 10.7544/issn1000-1239.2021.20200912
作者姓名:杨望  高明哲  蒋婷
作者单位:(东南大学网络空间安全学院 南京 211189) (计算机网络和信息集成教育部重点实验室(东南大学) 南京 211189) (江苏省计算机网络技术重点实验室(东南大学) 南京 211189) (wangy@njnet.edu.cn)
基金项目:国家自然科学基金项目(62072100)。
摘    要:伴随着互联网的普及和5G通信技术的快速发展,网络空间所面临的威胁日益增大,尤其是恶意软件的数量呈指数型上升,其所属家族的变种爆发式增加.传统的基于人工签名的恶意软件的检测方式速度太慢,难以处理每天数百万计新增的恶意软件,而普通的机器学习分类器的误报率和漏检率又明显过高.同时恶意软件的加壳、混淆等对抗技术对该情况造成了更大的困扰.基于此,提出一种基于多特征集成学习的恶意软件静态检测框架.通过提取恶意软件的非PE(Portable Executable)结构特征、可见字符串与汇编码序列特征、PE结构特征以及函数调用关系5部分特征,构建与各部分特征相匹配的模型,采用Bagging集成和Stacking集成算法,提升模型的稳定性,降低过拟合的风险.然后采取权重策略投票算法对5部分集成模型的输出结果做进一步聚合.经过测试,多特征多模型聚合的检测准确率可达96.99%,该结果表明:与其他静态检测方法相比,该方法具有更好的恶意软件鉴别能力,对加壳、混淆等恶意软件同样具备较高的识别率.

关 键 词:恶意代码  多特征  集成学习  策略投票  静态检测

A Malicious Code Static Detection Framework Based on Multi-Feature Ensemble Learning
Yang Wang, Gao Mingzhe, Jiang Ting. A Malicious Code Static Detection Framework Based on Multi-Feature Ensemble Learning[J]. Journal of Computer Research and Development, 2021, 58(5): 1021-1034. DOI: 10.7544/issn1000-1239.2021.20200912
Authors:Yang Wang  Gao Mingzhe  Jiang Ting
Affiliation:(School of Cyber Science and Engineering, Southeast University, Nanjing 211189) (Key Laboratory of Computer Network and Information Integration(Southeast University), Ministry of Education, Nanjing 211189) (Jiangsu Provincial Key Laboratory of Computer Network Technology (Southeast University), Nanjing 211189)
Abstract:With the popularity of the Internet and the rapid development of 5G communication technology,the threats to cyberspace are increasing,especially the exponential increase in the number of malware and the explosive increase in the number of variants of their families.The traditional signature-based malware detection is too slow to handle the millions of new malwares emerged every day,while the false positive and false negative rates of general machine learning classifiers are significantly too high.At the same time malware packing,obfuscation and other adversarial techniques have caused more trouble to the situation.Based on this,we propose a static malware detection framework based on multi-feature ensemble learning.By extracting the non-PE(Portable Executable)structure feature,visible string feature,sink assembly code sequences feature,PE structure feature and function call relationship feature from the malware,we construct models matching each feature,and use Bagging and Stacking ensemble algorithms to reduce the risk of overfitting.Finally we adopt the weighted voting algorithm to further aggregate the output results of the ensemble model.The experimental results show the detection accuracy of multi-feature multi-model aggregation algorithm can reach 96.99%,which prove the method has better malware identification ability than other static detection methods,and higher recognition rate for malwares using packing or obfuscation techniques.
Keywords:malicious code  multiple features  ensemble learning  policy voting  static detection
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机研究与发展》浏览原始摘要信息
点击此处可从《计算机研究与发展》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号