代码克隆检测研究进展 Code Clone Detection: A Literature Review期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

代码克隆检测研究进展

引用本文：	陈秋远,李善平,鄢萌,夏鑫.代码克隆检测研究进展[J].软件学报,2019,30(4):962-980.

作者姓名：	陈秋远李善平鄢萌夏鑫

作者单位：	浙江大学计算机科学与技术学院, 浙江杭州 310007,浙江大学计算机科学与技术学院, 浙江杭州 310007,浙江大学计算机科学与技术学院, 浙江杭州 310007,Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia

摘要：	代码克隆（code clone），是指存在于代码库中两个及以上相同或者相似的源代码片段.代码克隆相关问题是软件工程领域研究的重要课题.代码克隆是软件开发中的常见现象，它能够提高效率，产生一定的正面效益.但是研究表明，代码克隆也会对软件系统的开发、维护产生负面的影响，包括降低软件稳定性，造成代码库冗余和软件缺陷传播等.代码克隆检测技术旨在寻找检测代码克隆的自动化方法，从而用较低成本减少代码克隆的负面效应.研究者们在代码克隆检测方面获得了一系列的检测技术成果，根据这些技术利用源代码信息的程度不同，可以将它们分为基于文本、词汇、语法、语义4个层次.现有的检测技术针对文本相似的克隆取得了有效的检测结果，但同时也面临着更高抽象层次克隆的挑战，亟待更先进的理论、技术来解决.着重从源代码表征方式角度入手，对近年来代码克隆检测研究进展进行了梳理和总结.主要内容包括：（1）根据源代码表征方式阐述并归类了现有的克隆检测方法；（2）总结了模型评估中使用的实验验证方法与性能评估指标；（3）从科学性、实用性和技术难点这3个方面归纳总结了代码克隆研究的关键问题，围绕数据标注、表征方法、模型构建和工程实践4个方面，阐述了问题的可能解决思路和研究的未来发展趋势.
关键词：	代码克隆克隆检测代码表征
收稿时间：	2018/8/21 0:00:00
修稿时间：	2018/10/7 0:00:00
Code Clone Detection: A Literature Review

CHEN Qiu-Yuan,LI Shan-Ping,YAN Meng and XIA Xin.Code Clone Detection: A Literature Review[J].Journal of Software,2019,30(4):962-980.

Authors:	CHEN Qiu-Yuan LI Shan-Ping YAN Meng and XIA Xin

Affiliation:	College of Computer Science and Technology, Zhejiang University, Hangzhou 310007, China,College of Computer Science and Technology, Zhejiang University, Hangzhou 310007, China,College of Computer Science and Technology, Zhejiang University, Hangzhou 310007, China and Faculty of Information Technology, Monash University, Melbourne VIC 3800, Australia

Abstract:	Code clone refers to more than two duplicate or similar code fragments existing in a software system. Code clone is a common phenomenon during software development which can facilitate development and has positive impacts on software system. However, research shows that code clone will also do harm to the development and maintenance of software system, including but not limited to the decline of stability, redundancy of source code repository, and propagation of software defects. Code clone is one of the most active research areas in software engineering. Therefore, various detection techniques are proposed to automatically detect code clone in software systems, which help improve software quality. There are a lot of achievements in this area, and these techniques can be categorized to text-based, lexis-based, syntax-based, and semantic-based categories. Current techniques have obtained effective results in text-based clone detection, but still challenges in detecting other types of code clone. More advanced and unified theoretic and technical guidelines are needed to improve code clone detection techniques. Therefore, in this paper, a literature review for code detection is presented especially from the perspective of source code representation. In summary, the contributions of this study are:(1) current code clone detection techniques are concluded and classified from the perspective of code representation; (2) the model validation and performance measures in model evaluation are concluded; and (3) the key issues of code clone research are summarized from three aspects:scientific, practical, and technical difficulties. The possible solutions to the problems and the future development of the research are elaborated, focusing on data annotation, characterization methods, model construction, and engineering practice.

Keywords:	code clone clone detection code representation

	点击此处可从《软件学报》浏览原始摘要信息
	点击此处可从《软件学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏