基于关键类判定的代码提交理解辅助方法 Auxiliary Method for Code Commit Comprehension Based on Core-Class Identification期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于关键类判定的代码提交理解辅助方法

引用本文：	黄袁,刘志勇,陈湘萍,熊英飞,罗笑南. 基于关键类判定的代码提交理解辅助方法[J]. 软件学报, 2017, 28(6): 1418-1434

作者姓名：	黄袁刘志勇陈湘萍熊英飞罗笑南

作者单位：	中山大学数据科学与计算机学院, 广东广州 510006;国家数字家庭工程技术研究中心, 广东广州 510006,中山大学数据科学与计算机学院, 广东广州 510006;国家数字家庭工程技术研究中心, 广东广州 510006,国家数字家庭工程技术研究中心, 广东广州 510006;中山大学先进技术研究院, 广东广州 510006,北京大学信息科学技术学院软件研究所, 北京 100871;北京大学高可信软件技术教育部重点实验室, 北京 100871,中山大学数据科学与计算机学院, 广东广州 510006;国家数字家庭工程技术研究中心, 广东广州 510006

基金项目：	NSFC-广东联合基金（U1201252）；国家重点研发计划（2016YFB1000101）；国家自然科学基金（61672545，61672045）；广东科技计划项目（2015B040403005）

摘要：	软件代码提交是最重要的软件版本演化数据之一，被广泛应用于软件审查和软件理解中.对于程序员，提交的理解难度随着受影响的类数量、修改的代码量的增加而增加.本文通过对大量数据的分析发现，识别出提交中核心的修改类（关键类），以及为了完成这个核心修改所进行的依赖性改动的类（非关键类），能够辅助代码提交的理解.受机器学习技术在分类领域有效性的启发，本文提出一种基于机器学习的关键类识别方法，将判定提交中的关键类建模为二分类问题（即：关键和非关键类），从软件演化过程中产生的海量提交数据中抽取可判别性特征来度量类的关键性.在多个数据集上的实验结果表明，我们的方法判定关键类的综合准确率达到了87%；相比于开发人员直接理解提交，使用关键类信息提示来辅助理解提交能够显著提高开发人员的效率和正确率.
关键词：	代码修改代码修改理解代码提交机器学习可判别特征
收稿时间：	2016-07-28
修稿时间：	2016-10-10
Auxiliary Method for Code Commit Comprehension Based on Core-Class Identification

HUANG Yuan,LIU Zhi-Yong,CHEN Xiang-Ping,XIONG Ying-Fei and LUO Xiao-Nan. Auxiliary Method for Code Commit Comprehension Based on Core-Class Identification[J]. Journal of Software, 2017, 28(6): 1418-1434

Authors:	HUANG Yuan LIU Zhi-Yong CHEN Xiang-Ping XIONG Ying-Fei LUO Xiao-Nan

Affiliation:	School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China;National Engineering Research Center of Digital Life, Guangzhou 510006, China,School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China;National Engineering Research Center of Digital Life, Guangzhou 510006, China,National Engineering Research Center of Digital Life, Guangzhou 510006, China;Institute of Advanced Technology, Sun Yat-sen University, Guangzhou 510006, China,School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;Key Laboratory of High Confidence Software Technologies of Ministry of Education, Peking University, Beijing 100871, China and School of Data and Computer Science, Sun Yat-sen University, Guangzhou 510006, China;National Engineering Research Center of Digital Life, Guangzhou 510006, China

Abstract:	Code commit is one of the most important software evolution data, which is widely used in the software review and code comprehension. A commit involving multiple modified classes and code makes the review of code changes hard. By analyzing a large amount of commit data, we discover that identifying the core modified classes in a commit can quicken up commit review for developer. Inspired by the effectiveness of machine learning techniques in classification field, we model the core class identification as a binary classification problem (i.e., core and non-core) and propose discriminative feature from a large number of commits to characterize the core modified class. The experiments results show that our approach achieves 87% accuracy and using core class in commit review provides significant improvement than the one without core class.

Keywords:	code changes code changes comprehension software commit machine learning discriminative feature

	点击此处可从《软件学报》浏览原始摘要信息
	点击此处可从《软件学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏