基于代码属性图及注意力双向LSTM的漏洞挖掘方法 Vulnerability Mining Method Based on Code Property Graph and Attention BiLSTM期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于代码属性图及注意力双向LSTM的漏洞挖掘方法

引用本文：	段旭,吴敬征,罗天悦,杨牧天,武延军.基于代码属性图及注意力双向LSTM的漏洞挖掘方法[J].软件学报,2020,31(11):3404-3420.

作者姓名：	段旭吴敬征罗天悦杨牧天武延军

作者单位：	智能软件研究中心(中国科学院软件研究所),北京100190;中国科学院大学,北京 100049;智能软件研究中心(中国科学院软件研究所),北京100190;计算机科学国家重点实验室(中国科学院软件研究所),北京 100190;智能软件研究中心(中国科学院软件研究所),北京100190

基金项目：	国家重点研发计划（2018YFB0803600）；国家自然科学基金（61772507）；北京市科委产业技术创新战略联盟促进专项（Z181100000518032）

摘要：	随着信息安全愈发严峻的趋势，软件漏洞已成为计算机安全的主要威胁之一.如何准确地挖掘程序中存在的漏洞，是信息安全领域的关键问题.然而，现有的静态漏洞挖掘方法在挖掘漏洞特征不明显的漏洞时准确率明显下降.一方面，基于规则的方法通过在目标源程序中匹配专家预先定义的漏洞模式挖掘漏洞，其预定义的漏洞模式较为刻板单一，无法覆盖到细节特征，导致其存在准确率低、误报率高等问题；另一方面，基于学习的方法无法充分地对程序源代码的特征信息进行建模，并且无法有效地捕捉关键特征信息，导致其在面对漏洞特征不明显的漏洞时，无法准确地进行挖掘.针对上述问题，提出了一种基于代码属性图及注意力双向LSTM的源码级漏洞挖掘方法.该方法首先将程序源代码转换为包含语义特征信息的代码属性图，并对其进行切片以剔除与敏感操作无关的冗余信息；其次，使用编码算法将代码属性图编码为特征张量；然后，利用大规模特征数据集训练基于双向LSTM和注意力机制的神经网络；最后，使用训练完毕的神经网络实现对目标程序中的漏洞进行挖掘.实验结果显示，在SARD缓冲区错误数据集、SARD资源管理错误数据集及它们两个C语言程序构成的子集上，该方法的F1分数分别达到了82.8%，77.4%，82.5%和78.0%，与基于规则的静态挖掘工具Flawfinder和RATS以及基于学习的程序分析模型TBCNN相比，有显著的提高.
关键词：	漏洞挖掘深度学习静态分析注意力机制代码属性图
收稿时间：	2019/7/8 0:00:00
修稿时间：	2020/4/11 0:00:00
Vulnerability Mining Method Based on Code Property Graph and Attention BiLSTM

DUAN Xu,WU Jing-Zheng,LUO Tian-Yue,YANG Mu-Tian,WU Yan-Jun.Vulnerability Mining Method Based on Code Property Graph and Attention BiLSTM[J].Journal of Software,2020,31(11):3404-3420.

Authors:	DUAN Xu WU Jing-Zheng LUO Tian-Yue YANG Mu-Tian WU Yan-Jun

Affiliation:	Intelligent Software Research Center(Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;University of Chinese Academy of Sciences, Beijing 100049, China;Intelligent Software Research Center(Institute of Software, Chinese Academy of Sciences), Beijing 100190, China;State Key Laboratory of Computer Science(Institute of Software, Chinese Academy of Sciences), Beijing 100190, China

Abstract:	With the increasingly serious trend of information security, software vulnerability has become one of the main threats to computer security. How to accurately mine vulnerabilities in the program is a key issue in the field of information security. However, existing static vulnerability mining methods have low accuracy when mining vulnerabilities with unobvious vulnerability features. On the one hand, rule-based methods by matching expert-defined code vulnerability patterns in target programs. Its predefined vulnerability pattern is rigid and single, which is unable to cover detailed features and result in problems of low accuracy and high false positives. On the other hand, learning-based methods cannot adequately model the features of the source code and cannot effectively capture the key feature, which makes it fail to accurately mine vulnerabilities with unobvious vulnerability features. To solve this issue, a source code level vulnerability mining method based on code property graph and attention BiLSTM is proposed. It firstly transforms the program source code to code property graph which contains semantic features, and performs program slicing to remove redundant information that is not related to sensitive operations. Then, it encodes the code property graph into the feature tensor with encoding algorithm. After that, a neural network based on BiLSTM and attention mechanism is trained using large-scale feature datasets. Finally, the trained neural network model is used to mine the vulnerabilities in the target program. Experimental results show that the F1 scores of the method reach 82.8%, 77.4%, 82.5%, and 78.0% respectively on the SARD buffer error dataset, SARD resource management error dataset, and their two subsets composed of C programs, which is significantly higher than the rule-based static mining tools Flawfinder and RATS and the learning-based program analysis model TBCNN.

Keywords:	vulnerability mining deep learning static analysis attention mechanism code property graph
本文献已被万方数据等数据库收录！
	点击此处可从《软件学报》浏览原始摘要信息
	点击此处可从《软件学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏