自然语言数据驱动的智能化软件安全评估方法 Natural Language Data Driven Approach for Software Intelligent Safety Evaluation期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

自然语言数据驱动的智能化软件安全评估方法

引用本文：	张一帆,汤恩义,苏琰梓,杨开懋,匡宏宇,陈鑫.自然语言数据驱动的智能化软件安全评估方法[J].软件学报,2018,29(8):2336-2349.

作者姓名：	张一帆汤恩义苏琰梓杨开懋匡宏宇陈鑫

作者单位：	南京大学计算机软件新技术国家重点实验室, 江苏南京 210023;南京大学计算机科学与技术系, 江苏南京 210023,南京大学计算机软件新技术国家重点实验室, 江苏南京 210023;南京大学软件学院, 江苏南京 210093,南京大学计算机软件新技术国家重点实验室, 江苏南京 210023;南京大学软件学院, 江苏南京 210093,南京大学计算机软件新技术国家重点实验室, 江苏南京 210023;南京大学软件学院, 江苏南京 210093,南京大学计算机软件新技术国家重点实验室, 江苏南京 210023;南京大学软件学院, 江苏南京 210093,南京大学计算机软件新技术国家重点实验室, 江苏南京 210023;南京大学计算机科学与技术系, 江苏南京 210023

基金项目：	国家重点研发计划项目课题（No.2016YFB1000802），国家自然科学基金（61772260，61402222）

摘要：	软件安全性是衡量软件是否能够抵御恶意攻击的重要性质.在当前互联网环境下，黑客攻击无处不在，因而估计软件中可能含有的漏洞数量与类型，即对软件进行安全评估，变得十分必要.在实际中用户不仅需要对未发布、或者最新发布的软件实施安全性评估，对已发布软件也会有一定的安全评估需求，例如当用户需要从市场上互为竞争的多款软件中作出选择，就会希望能花费较低成本、较为客观地对这些软件进行第三方的评估与比较.本文提出了一种由自然语言数据驱动的智能化软件安全评估方法来满足这一要求，该方法基于待评估软件现有用户的使用经验信息来评估软件的安全性，它首先自适应地爬取用户在软件使用过程中对软件的自然语言评价数据，并利用深度学习方法与机器学习评估模型的双重训练来获得软件的安全性评估指标.由于本文的自适应爬虫能够在反馈中调整特征词，并结合搜索引擎来获得异构数据，因而可通过采集广泛的自然语言数据来进行安全评估.另外，使用一对多的机器翻译训练能有效解决将自然语言数据转换为语义编码的问题，使得用于安全评估的机器学习模型可以建立在自然语言的语义特征基础上.我们进一步在国际通用漏洞披露数据库（CVE）和美国国家漏洞数据库（NVD）上对本文方法进行了实验，结果表明，本文方法在评估软件漏洞数量，漏洞类型，以及漏洞严重程度等指标上十分有效.
关键词：	软件安全评估自然语言处理机器学习网络爬虫
收稿时间：	2017/7/18 0:00:00
修稿时间：	2017/9/28 0:00:00
Natural Language Data Driven Approach for Software Intelligent Safety Evaluation

ZHANG Yi-Fan,TANG En-Yi,SU Yan-Zi,YANG Kai-Mao,KUANG Hong-Yu and CHEN Xin.Natural Language Data Driven Approach for Software Intelligent Safety Evaluation[J].Journal of Software,2018,29(8):2336-2349.

Authors:	ZHANG Yi-Fan TANG En-Yi SU Yan-Zi YANG Kai-Mao KUANG Hong-Yu and CHEN Xin

Affiliation:	State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China;Department of Computer Science and Technology, Nanjing University, Nanjing 210023, China,State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China;Software Institute, Nanjing University, Nanjing 210093, China,State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China;Software Institute, Nanjing University, Nanjing 210093, China,State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China;Software Institute, Nanjing University, Nanjing 210093, China,State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China;Software Institute, Nanjing University, Nanjing 210093, China and State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China;Department of Computer Science and Technology, Nanjing University, Nanjing 210023, China

Abstract:	Software safety is a key property that determines whether software is vulnerable to malicious attacks. Nowadays, internet attacks are ubiquitous, thus it is important to evaluate the number and category of defects in the software. Users need not only evaluate the safety of software that is released recently, or even is not released yet, but also evaluate the software that is already published for a while. For example, when users want to evaluate the safety of several competitive software systems before they decide their purchase, they need a low cost, objective evaluation approach. In this paper, we propose a natural language data driven approach for evaluating the safety of software that is released already. Our approach crawls natural language data adaptively, and applies a dual training to evaluate the software safety. As our self-adaptive web crawler adjusts feature words from the feedback and acquires heterogeneous data from search engines, our software safety evaluation utilizes extensive data sources automatically. Furthermore, by customizing a machine translation model, it is quite efficient to convert natural language to its semantic encoding. Hence, we build a machine learning model for intelligently evaluating software safety based on semantic characteristics of natural language. We conduct some experiments on the Common Vulnerabilities and Exposures (CVE) and the National Vulnerability Database (NVD). The results show that our approach is able to make safety evaluations precisely on the amount, impact and category of defects in software.

Keywords:	software safety evaluation natural language processing machine learning web crawler

	点击此处可从《软件学报》浏览原始摘要信息
	点击此处可从《软件学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏