首页 | 本学科首页   官方微博 | 高级检索  
     

基于CNN-GAP可解释性模型的软件源码漏洞检测方法
引用本文:王剑,匡洪宇,李瑞林,苏云飞.基于CNN-GAP可解释性模型的软件源码漏洞检测方法[J].电子与信息学报,2022,44(7):2568-2575.
作者姓名:王剑  匡洪宇  李瑞林  苏云飞
作者单位:国防科技大学电子科学学院 长沙 410073
基金项目:国家自然科学基金(61702540),湖南省自然科学基金(2018JJ3615)
摘    要:源代码漏洞检测是保证软件系统安全的重要手段。近年来,多种深度学习模型应用于源代码漏洞检测,极大提高了漏洞检测的效率,但还存在自定义标识符导致库外词过多、嵌入词向量的语义不够准确、神经网络模型缺乏可解释性等问题。基于此,该文提出了一种基于卷积神经网络(CNN)和全局平均池化(GAP)可解释性模型的源代码漏洞检测方法。首先在源代码预处理中对部分自定义标识符进行归一化,并采用One-hot编码进行词嵌入以缓解库外词过多的问题;然后构建CNN-GAP神经网络模型,识别出包含CWE-119缓冲区溢出类型漏洞的函数;最后通过类激活映射(CAM)可解释方法对结果进行可视化输出,标识出可能与漏洞相关的代码。通过与Russell等人提出的模型以及Li等人提出的VulDeePecker模型进行对比分析,表明CNN-GAP模型能达到相当甚至更好的性能,且具有一定的可解释性,便于研究人员对漏洞进行更深入的分析。

关 键 词:源代码漏洞检测    深度学习    神经网络可解释性
收稿时间:2021-05-12

Software Source Code Vulnerability Detection Based on CNN-GAP Interpretability Model
WANG Jian,KUANG Hongyu,LI Ruilin,SU Yunfei.Software Source Code Vulnerability Detection Based on CNN-GAP Interpretability Model[J].Journal of Electronics & Information Technology,2022,44(7):2568-2575.
Authors:WANG Jian  KUANG Hongyu  LI Ruilin  SU Yunfei
Affiliation:College of Electronic Science and Technology, National University of Defense Technology, Changsha 410073, China
Abstract:Source code vulnerability detection is an important method to ensure the security of software system. In recent years, a variety of deep learning models are applied to source code vulnerability detection, which improves greatly the efficiency of vulnerability detection. However, there are still some problems in source code vulnerability detection based on deep learning, such as too many words outside the database caused by user-defined identifier, inaccurate semantics of embedded word vector, lack of interpretability of neural network model, and so on. A new software source code vulnerability detection method is proposed based on Convolution Neural Networks (CNN) and Global Average Pooling (GAP) interpretability model. Firstly, some user-defined identifiers are normalized in the source code preprocessing, and one hot coding is used for word embedding to alleviate the problem of too many words outside the database. Then, CNN-GAP neural network model is built to identify the functions containing CWE-119 type vulnerabilities. Finally, Class Activation Mapping (CAM) interpretable method is used to output visually the results and identify the codes that may be related to vulnerabilities. Compared with the model proposed by Russell and Vuldeepecker model proposed by Li et al., the experimental results show that CNN-GAP model can achieve quite or even better performance, and has a certain interpretability, which is convenient for researchers to analyze the vulnerability more deeply.
Keywords:
点击此处可从《电子与信息学报》浏览原始摘要信息
点击此处可从《电子与信息学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号