首页 | 本学科首页   官方微博 | 高级检索  
     

基于Token编辑距离检测克隆代码
引用本文:张久杰,王春晖,张丽萍,侯敏,刘东升.基于Token编辑距离检测克隆代码[J].计算机应用,2015,35(12):3536-3543.
作者姓名:张久杰  王春晖  张丽萍  侯敏  刘东升
作者单位:内蒙古师范大学计算机与信息工程学院, 呼和浩特 010022
基金项目:国家自然科学基金资助项目(61363017,61462071);内蒙古自然科学基金资助项目(2014MS0613);内蒙古自治区硕士研究生科研创新基金资助项目(S20141013524);内蒙古师范大学研究生科研创新基金资助项目(CXJJS14077)。
摘    要:针对当前Type-3克隆代码检测工具较少、效率偏低等问题,提出了一种基于Token的能有效检测Type-3克隆代码的检测方法。该方法同时能有效检测Type-1和Type-2克隆代码。首先将源代码Token化得到特定代码粒度的Token串,其次将所有Token串的定长子串进行映射,在对映射信息进行查询的基础上,利用编辑距离算法确定克隆对,然后通过并查集算法快速构建克隆群,最终反馈克隆代码信息。实现了原型工具FClones,利用基于代码突变的框架对工具进行了评价,并与领域内较优秀的两款工具NiCad及SimCad进行了对比。实验结果表明,FClones在检测三类克隆代码时查全率均不低于95%,查准率均不低于98%,能更好地检测Type-3克隆代码。

关 键 词:克隆代码  克隆检测  编辑距离  Type-3  token  
收稿时间:2015-06-12
修稿时间:2015-09-06

Clone code detection based on Levenshtein distance of token
ZHANG Jiujie,WANG Chunhui,ZHANG Liping,HOU Min,LIU Dongsheng.Clone code detection based on Levenshtein distance of token[J].journal of Computer Applications,2015,35(12):3536-3543.
Authors:ZHANG Jiujie  WANG Chunhui  ZHANG Liping  HOU Min  LIU Dongsheng
Affiliation:College of Computer and Information Engineering, Inner Mongolia Normal University, Hohhot Nei Mongol 010022, China
Abstract:Aiming at the problems of less clone code detection tools and low efficiency for the current Type-3, an effective clone code detection method for Type-3 based on the levenshtein distance of token was proposed. Type-1, Type-2 and Type-3 clone codes could be detected by the proposed method in an efficient way. Firstly, the source codes of a subject system were tokenized into some token sequences with specified code size. Secondly, each definite-sized substring of the token sequences was mapped with corresponding index. Thirdly, the clone pairs were built by the levenshtein distance algorithm and the clone groups were built by the disjoint-set algorithm on the basis of the mapping information query. Finally, the feedback information of clone codes were given. A prototype tool named FClones was implemented. It was evaluated by the code mutation-based framework and compared with two state-of-the-art tools SimCad and NiCad. The experimental results show that the recall of FCloens is equal to or greater than 95% and its precision is not lower than 98% in detecting all of these three types of clone codes. FClones can do better in detecting Type-3 clones than others.
Keywords:clone code                                                                                                                        clone detection                                                                                                                        Levenshtein distance                                                                                                                        Type-3                                                                                                                        token
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号