首页 | 本学科首页   官方微博 | 高级检索  
     

基于上下文特征融合的代码漏洞检测方法
引用本文:徐泽鑫,段立娟,王文健,恩擎.基于上下文特征融合的代码漏洞检测方法[J].浙江大学学报(自然科学版 ),2022,56(11):2260-2270.
作者姓名:徐泽鑫  段立娟  王文健  恩擎
作者单位:1. 北京工业大学 信息学部,北京 1001242. 可信计算北京市重点实验室,北京 1001243. 信息安全等级保护关键技术国家工程实验室,北京 1001244. 卡尔顿大学 计算机学院, 人工智能与机器学习实验室,加拿大 渥太华K1S 5B6
基金项目:国家自然科学基金资助项目(62176009,62106065);北京市教委重点项目(KZ201910005008)
摘    要:针对现有代码漏洞检测方法误报率和漏报率较高的问题,提出基于上下文特征融合的代码漏洞检测方法. 该方法将代码特征解耦分为代码块局部特征和上下文全局特征. 代码块局部特征关注代码块中关键词的语义及其短距离依赖关系. 将局部特征融合得到上下文全局特征从而捕捉代码行上下文长距离依赖关系. 该方法通过局部信息与全局信息协同学习,提升了模型的特征学习能力. 模型精确地挖掘出代码漏洞的编程模式,增加了代码漏洞对比映射模块,拉大了正负样本在嵌入空间中的距离,促使对正负样本进行准确地区分. 实验结果表明,在9个软件源代码混合的真实数据集上的精确率最大提升了29%,召回率最大提升了16%.

关 键 词:代码漏洞检测  代码块局部特征提取  上下文全局特征融合  短距离依赖  长距离依赖  

Code vulnerability detection method based on contextual feature fusion
Ze-xin XU,Li-juan DUAN,Wen-jian WANG,Qing EN.Code vulnerability detection method based on contextual feature fusion[J].Journal of Zhejiang University(Engineering Science),2022,56(11):2260-2270.
Authors:Ze-xin XU  Li-juan DUAN  Wen-jian WANG  Qing EN
Abstract:A code vulnerability detection method based on contextual feature fusion was proposed in the view of high false positive rate and the high false negative rate of existing code vulnerability detection methods. The code features were decoupled into code block local features and context global features. The code block local features focused on the semantics of key tokens and short distance dependencies. The context global features were obtained by fusing code block local features to capture long-distance dependencies of code line context. The feature learning ability of the model was improved by collaborating the learning of local and global information. The programming mode of code vulnerabilities was discovered more accurately. A code vulnerability comparison mapping module was introduced to widen the distance between positive and negative samples in embedded space. The model can accurately distinguish between positive and negative samples. The experimental results show that the precision rate is improved by a maximum of 29% and the recall rate is improved by a maximum of 16% on the real data set mixed with 9 software source code.
Keywords:code vulnerability detection  code block local feature extraction  contextual global feature fusion  short-distance dependence  long-distance dependence  
点击此处可从《浙江大学学报(自然科学版 )》浏览原始摘要信息
点击此处可从《浙江大学学报(自然科学版 )》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号