期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An insight into the dispersion of changes in cloned and non-cloned code: A genealogy based empirical study

《Science of Computer Programming》2014

In this paper, we present an in-depth empirical study of a new metric, change dispersion, that measures the extent changes are scattered throughout the code of a software system. Intuitively, highly dispersed changes, the changes that are scattered throughout many software entities (such as files, classes, methods, and variables), should require more maintenance effort than the changes that only affect a few entities. In our research we investigate change dispersion on the code-base of a number of subject systems as a whole, and separately on each system's cloned and non-cloned code. Our central objective is to determine whether cloned code negatively affects software evolution and maintenance. The granularity of our focus is at the method level.Our experimental results on 16 open source subject systems written in four different programming languages (Java, C, C#, and Python) involving two clone detection tools (CCFinderX and NiCad) and considering three major types of clones (Type 1: exact, Type 2: dissimilar naming, and Type 3: some dissimilar code) suggests that change dispersion has a positive and statistically significant correlation with the change-proneness (or instability) of source code. Cloned code, especially in Java and C systems, often exhibits a higher change dispersion than non-cloned code. Also, changes to Type 3 clones are more dispersed compared to changes to Type 1 and Type 2 clones. According to our analysis, a primary cause of high change dispersion in cloned code is that clones from the same clone class often require corresponding changes to ensure they remain consistent. 相似文献

2.

An empirical study on the maintenance of source code clones

Suresh Thummalapenta Luigi Cerulo Lerina Aversano Massimiliano Di Penta 《Empirical Software Engineering》2010,15(1):1-34

Code cloning has been very often indicated as a bad software development practice. However, many studies appearing in the literature indicate that this is not always the case. In fact, either changes occurring in cloned code are consistently propagated, or cloning is used as a sort of templating strategy, where cloned source code fragments evolve independently. This paper (a) proposes an automatic approach to classify the evolution of source code clone fragments, and (b) reports a fine-grained analysis of clone evolution in four different Java and C software systems, aimed at investigating to what extent clones are consistently propagated or they evolve independently. Also, the paper investigates the relationship between the presence of clone evolution patterns and other characteristics such as clone radius, clone size and the kind of change the clones underwent, i.e., corrective maintenance or enhancement. 相似文献

3.

Method and implementation for investigating code clones in a software system

《Information and Software Technology》2007,49(9-10):985-998

Maintaining software systems is becoming more difficult as the size and complexity of software increase. One factor that complicates software maintenance is the presence of code clones. A code clone is a code fragment that has identical or similar code fragments to it in the source code. Code clones are introduced for various reasons such as reusing code by ‘copy and paste’. If modifying a code clone with many similar code fragments, we must consider whether to modify each of them. Especially for large-scale software, such a process is very complicated and expensive. In this paper, we propose methods of visualizing and featuring code clones to support their understanding in large-scale software. The methods have been implemented as a tool called Gemini, which has applied to an open source software system. Application results show the usefulness and capability of our system. 相似文献

4.

基于字符串的代码克隆检测方法的分析

安帝玟唐艳宾《数字社区&智能家居》2009,(31)

克隆代码是指在软件源程序中存在的相同或相似的代码片段。克隆代码在很多软件工程中,例如程序理解,代码质量分析,剽窃检测,漏洞查找和病毒检测,都需要通过找出语义或语法上相似的代码片段来实现。目前常用的检测方法有四种:基于文本(text-based)的检测,基于字符序列(token-based)的检测,基于语法树(tree-based)的检测和基于关系图(PDG-based)的检测。基于字符序列的克隆检测首先对源程序进行预处理转换,再经过匹配算法得到克隆检测结果。克隆代码的检测是软件分析的一个重要的部分。相似文献

5.

基于分组的代码克隆增量检测方法

王海林云彭鑫赵文耘《计算机科学与探索》2014,(4):446-455

代码克隆是指软件程序中一组相同或相近的代码片段,它广泛存在于软件中,因此如何发现代码克隆成为软件维护的一个重要问题。目前已有的克隆检测工具大多针对单一版本进行完整的克隆检测,然而对于大规模、复杂软件系统而言,在软件演化过程中随着代码的改变,不断重新检测代码克隆将花费较高的代价。针对这一问题,提出了一种基于分组的增量克隆检测方法。该方法根据前后两个版本的差异将源代码分为发生变化和未发生变化的两组,通过组内和组间的克隆分析实现增量的克隆检测。基于所提出的方法,在克隆检测工具CCFinderX的基础上实现了一个名为ICDBG（incremental clone detector based on grouping）的原型工具。实验证明,在变更较小时,该方法能够在保证正确性的同时显著减少克隆检测时间。相似文献

6.

Is cloned code really stable?

Manishankar Mondal Md Saidur Rahman Chanchal K. Roy Kevin A. Schneider 《Empirical Software Engineering》2018,23(2):693-770

Clone has emerged as a controversial term in software engineering research and practice. The impact of clones is of great importance from software maintenance perspectives. Stability is a well investigated term in assessing the impacts of clones on software maintenance. If code clones appear to exhibit a higher instability (i.e., higher change-proneness) than non-cloned code, then we can expect that code clones require higher maintenance effort and cost than non-cloned code. A number of studies have been done on the comparative stability of cloned and non-cloned code. However, these studies could not come to a consensus. While some studies show that code clones are more stable than non-cloned code, the other studies provide empirical evidence of higher instability of code clones. The possible reasons behind these contradictory findings are that different studies investigated different aspects of stability using different clone detection tools on different subject systems using different experimental setups. Also, the subject systems were not of wide varieties. Emphasizing these issues (with several others mentioned in the motivation) we have conducted a comprehensive empirical study where we have - (i) implemented and investigated seven existing methodologies that explored different aspects of stability, (ii) used two clone detection tools (NiCad and CCFinderX) to implement each of these seven methodologies, and (iii) investigated the stability of three types (Type-1, Type-2, Type-3) of clones. Our investigation on 12 diverse subject systems covering three programming languages (Java, C, C#) with a list of 8 stability assessment metrics suggest that (i) cloned code is often more unstable (change-prone) than non-cloned code in the maintenance phase, (ii) both Type 1 and Type 3 clones appear to exhibit higher instability than Type 2 clones, (iii) clones in Java and C programming languages are more change-prone than the clones in C#, and (iv) changes to the clones in procedural programming languages seem to be more dispersed than the changes to the clones in object oriented languages. We also systematically replicated the original studies with their original settings and found mostly equivalent results as of the original studies. We believe that our findings are important for prioritizing code clones from management perspectives. 相似文献

7.

基于字符串的代码克隆检测方法的分析

安帝玟唐艳宾《数字社区&智能家居》2009,(11):8756-8757

克隆代码是指在软件源程序中存在的相同或相似的代码片段。克隆代码在很多软件工程中,例如程序理解,代码质量分析,剽窃检测,漏洞查找和病毒检测,都需要通过找出语义或语法上相似的代码片段来实现,目前常用的检测方法有四种：基于文本（text—based）检测,基于字符序列（token-based）的检测,基于语法树（tree-based）的检测和基于关系图（PDG—based）的检测。基于字符序列的克隆检测首先对源程序进行预处理转换,再经过匹配算法得到克隆检测结果：克隆代码的检测是软件分析的一个重要的部分。相似文献

8.

软件源代码中的代码克隆现象及其检测方法

叶青青《计算机应用与软件》2008,25(9)

如果软件源程序中的一个代码段和同一程序中的另一个代码段在结构或语义上类似,这些代码段就成了代码克隆.概述代码克隆存在的各种形式,分析代码克隆产生的原因,并在概括了代码克隆检测的一般过程以后进一步阐述两类代码克隆检测方法:基于语义抽象树的检测方法和基于Token序列的检测方法. 相似文献

9.

Towards clone detection in UML domain models

Harald Störrle 《Software and Systems Modeling》2013,12(2):307-329

Code clones (i.e., duplicate fragments of code) have been studied for long, and there is strong evidence that they are a major source of software faults. Anecdotal evidence suggests that this phenomenon occurs similarly in models, suggesting that model clones are as detrimental to model quality as they are to code quality. However, programming language code and visual models have significant differences that make it difficult to directly transfer notions and algorithms developed in the code clone arena to model clones. In this article, we develop and propose a definition of the notion of “model clone” based on the thorough analysis of practical scenarios. We propose a formal definition of model clones, specify a clone detection algorithm for UML domain models, and implement it prototypically. We investigate different similarity heuristics to be used in the algorithm, and report the performance of our approach. While we believe that our approach advances the state of the art significantly, it is restricted to UML models, its results leave room for improvements, and there is no validation by field studies. 相似文献

10.

大规模代码克隆的检测方法

郭颖陈峰宏周明辉《计算机科学与探索》2014,(4):417-426

代码克隆检测在剽窃检测、版权侵犯调查、软件演化分析、代码压缩、错误检测,以及寻找bug,发现复用模式等方面有重要作用。现有的代码克隆检测工具算法复杂,或需要消耗大量的计算资源,不适用于规模巨大的代码数据。为了能够在大规模的数据上检测代码克隆,提出了一种新的代码克隆检测算法。该算法结合数据消重中的基于内容可变长度分块（content-defined chunking,CDC）思想和网页查重中的Simhash算法思想,采用了对代码先分块处理再模糊匹配的方法。在一个包含多种开源项目,超过5亿个代码文件,共约10 TB代码内容的数据源上,实现了该算法。通过实验,比较了不同分块长度对代码克隆检测率和所需要时间的影响,验证了新算法可以运用于大规模代码克隆检测,并且能够检测出一些级别3的克隆代码,达到了较高的准确率。相似文献

11.

代码克隆检测研究进展

陈秋远李善平鄢萌夏鑫《软件学报》2019,30(4):962-980

代码克隆（code clone）,是指存在于代码库中两个及以上相同或者相似的源代码片段.代码克隆相关问题是软件工程领域研究的重要课题.代码克隆是软件开发中的常见现象,它能够提高效率,产生一定的正面效益.但是研究表明,代码克隆也会对软件系统的开发、维护产生负面的影响,包括降低软件稳定性,造成代码库冗余和软件缺陷传播等.代码克隆检测技术旨在寻找检测代码克隆的自动化方法,从而用较低成本减少代码克隆的负面效应.研究者们在代码克隆检测方面获得了一系列的检测技术成果,根据这些技术利用源代码信息的程度不同,可以将它们分为基于文本、词汇、语法、语义4个层次.现有的检测技术针对文本相似的克隆取得了有效的检测结果,但同时也面临着更高抽象层次克隆的挑战,亟待更先进的理论、技术来解决.着重从源代码表征方式角度入手,对近年来代码克隆检测研究进展进行了梳理和总结.主要内容包括：（1）根据源代码表征方式阐述并归类了现有的克隆检测方法;（2）总结了模型评估中使用的实验验证方法与性能评估指标;（3）从科学性、实用性和技术难点这3个方面归纳总结了代码克隆研究的关键问题,围绕数据标注、表征方法、模型构建和工程实践4个方面,阐述了问题的可能解决思路和研究的未来发展趋势. 相似文献

12.

预训练增强的代码克隆检测技术

冷林珊刘爽田承霖窦淑洁王赞张梅山《软件学报》2022,33(5):1758-1773

代码克隆检测是软件工程领域的一项重要任务, 对于语义相似但语法差距较大的四型代码克隆的检测尤为困难. 基于深度学习的方法在四型代码克隆的检测上已经取得了较好的效果, 但是使用人工标注的代码克隆对进行监督学习的成本较高. 提出了两种简单有效的预训练策略来增强基于深度学习的代码克隆检测模型的代码表示, 以减少监督学习模型中... 相似文献

13.

Increasing clone maintenance support by unifying clone detection and refactoring activities

《Information and Software Technology》2012,54(12):1297-1307

ContextClone detection tools provide an automated mechanism to discover clones in source code. On the other side, refactoring capabilities within integrated development environments provide the necessary functionality to assist programmers in refactoring. However, we have observed a gap between the processes of clone detection and refactoring.ObjectiveIn this paper, we describe our work on unifying the code clone maintenance process by bridging the gap between clone detection and refactoring.MethodThrough an Eclipse plug-in called CeDAR (Clone Detection, Analysis, and Refactoring), we forward clone detection results to the refactoring engine in Eclipse. In this case, the refactoring engine is supplied with information about the detected clones to which it can then determine those clones that can be refactored. We describe the extensions to Eclipse’s refactoring engine to allow clones with additional similarity properties to be refactored.ResultsOur evaluation of open source artifacts shows that this process yields considerable increases in the instances of clone groups that may be suggested to the programmer for refactoring within Eclipse.ConclusionBy unifying the processes of clone detection and refactoring, in addition to providing extensions to the refactoring engine of an IDE, the strengths of both processes (i.e., more significant detection capabilities and an established framework for refactoring) can be garnered. 相似文献

14.

基于类粒度的克隆代码群稳定性实证研究

张久杰陈超聂宏轩夏玉芹张丽萍马占飞《计算机科学》2021,48(5):75-85

克隆代码研究与软件工程中的各类问题密切相关。现有的克隆代码稳定性研究主要集中于克隆代码与非克隆代码的比较以及不同克隆代码类型之间的比较,少有研究对克隆代码的稳定性与克隆群所分布的面向对象类进行相关分析。基于面向对象类的粒度进行了克隆群稳定性实证研究,设计了4项与克隆群稳定性相关的研究问题,围绕这些研究问题,将克隆群分为类内、类间和混合3组,并基于4种视角下的9个演化模式进行了克隆群稳定性的对比分析。首先,检测软件系统所有子版本中的克隆代码,识别并标注所有克隆代码片段所属的类信息;其次,基于克隆片段映射方法完成相邻版本间克隆群的演化映射和演化模式的识别与标注,并将映射和标注结果合并为克隆代码演化谱系;然后,在不同视角下,针对3组克隆群进行稳定性计算;最后,根据实验结果对比分析了3组克隆群的稳定性差异。在7款面向对象开源软件系统总共近7700个版本上进行的克隆群稳定性实验结果表明:约60%的类内克隆群的生命周期率达到50%及以上,类间克隆和混合克隆群的生命周期率达到50%及以上的占比均约为35%;类内克隆群发生变化的次数最少,类间克隆群发生合并、分枝和延迟修复演化模式的次数相对略多,混合克隆群发生片段减少、内容一致变化和不一致变化的次数最多。总体而言,类内克隆群的稳定性表现最佳,混合克隆群在演化中可能需要重点跟踪或优先重构。克隆代码稳定性分析方法及实验结论将为克隆代码的跟踪、维护以及重构等克隆管理相关软件活动提供有力的参考和支持。相似文献

15.

CCFinder: a multilinguistic token-based code clone detection system for large scale source code 总被引：2，自引：0，他引：2

Kamiya T. Kusumoto S. Inoue K. 《IEEE transactions on pattern analysis and machine intelligence》2002,28(7):654-670

A code clone is a code portion in source files that is identical or similar to another. Since code clones are believed to reduce the maintainability of software, several code clone detection techniques and tools have been proposed. This paper proposes a new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison. For its implementation with several useful optimization techniques, we have developed a tool, named CCFinder (Code Clone Finder), which extracts code clones in C, C++, Java, COBOL and other source files. In addition, metrics for the code clones have been developed. In order to evaluate the usefulness of CCFinder and metrics, we conducted several case studies where we applied the new tool to the source code of JDK, FreeBSD, NetBSD, Linux, and many other systems. As a result, CCFinder has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems. In addition, we have compared the proposed technique with other clone detection techniques. 相似文献

16.

基于软件漏洞的克隆代码稳定性评估

赵玉武刘东升翟晔《计算机应用研究》2018,35(2):497-502

针对克隆代码与非克隆代码产生"漏洞"倾向性的问题进行了研究,基于"漏洞"对不同类型克隆和非克隆代码进行了比较分析。首先提取软件系统中具有漏洞的代码,并使用克隆检测工具检测出软件的克隆代码;其次分别提取能够产生"漏洞"的克隆和非克隆代码,并分别计算不同克隆类型和非克隆的BOC漏洞密度和LOC漏洞密度;最后对type-1、pure type-2、pure-type3的克隆和非克隆漏洞密度进行了对比分析,并对代码中产生的"漏洞"类型进行分类分析,使用曼—惠特尼检验(WMM)验证了结果的有效性。实验结果表明type-1类型的克隆更容易产生"漏洞",pure type-3类型的克隆引入漏洞的几率相对较小。研究还得出在克隆和非克隆代码中分别存在出现频率较高的"漏洞"集合,增加了对克隆特性的理解,帮助软件设计和开发人员减少代码克隆对软件造成的负面影响。相似文献

17.

An extended assessment of type-3 clones as detected by state-of-the-art tools

Rebecca Tiarks Rainer Koschke Raimar Falke 《Software Quality Journal》2011,19(2):295-331

Code reuse through copying and pasting leads to so-called software clones. These clones can be roughly categorized into identical fragments (type-1 clones), fragments with parameter substitution (type-2 clones), and similar fragments that differ through modified, deleted, or added statements (type-3 clones). Although there has been extensive research on detecting clones, detection of type-3 clones is still an open research issue due to the inherent vagueness in their definition. In this paper, we analyze type-3 clones detected by state-of-the-art tools and investigate type-3 clones in terms of their syntactic differences. Then, we derive their underlying semantic abstractions from their syntactic differences. Finally, we investigate whether there are code characteristics that indicate that a tool-suggested clone candidate is a real type-3 clone from a human’s perspective. Our findings can help developers of clone detectors and clone refactoring tools to improve their tools. 相似文献

18.

A Data Mining Approach for Detecting Higher-Level Clones in Software

Basit Hamid Abdul Jarzabek Stan 《IEEE transactions on pattern analysis and machine intelligence》2009,35(4):497-514

Code clones are similar program structures recurring in variant forms in software system(s). Several techniques have been proposed to detect similar code fragments in software, so-called simple clones. Identification and subsequent unification of simple clones is beneficial in software maintenance. Even further gains can be obtained by elevating the level of code clone analysis. We observed that recurring patterns of simple clones often indicate the presence of interesting higher-level similarities that we call structural clones. Structural clones show a bigger picture of similarity situation than simple clones alone. Being logical groups of simple clones, structural clones alleviate the problem of huge number of clones typically reported by simple clone detection tools, a problem that is often dealt with postdetection visualization techniques. Detection of structural clones can help in understanding the design of the system for better maintenance and in reengineering for reuse, among other uses. In this paper, we propose a technique to detect some useful types of structural clones. The novelty of our approach includes the formulation of the structural clone concept and the application of data mining techniques to detect these higher-level similarities. We describe a tool called Clone Miner that implements our proposed technique. We assess the usefulness and scalability of the proposed techniques via several case studies. We discuss various usage scenarios to demonstrate in what ways the knowledge of structural clones adds value to the analysis based on simple clones alone. 相似文献

19.

克隆代码分析方法研究

王克朝朱宸光王甜甜苏小红《计算机应用研究》2017,34(3)

针对已有克隆代码检测工具只输出克隆组形式的检测结果,而无法分析克隆代码对软件质量的影响问题,提出危害软件质量的关键克隆代码的识别方法。首先,定义了克隆代码的统一表示形式,使之可以分析各种克隆检测工具的检测结果;接下来,解析源程序和克隆检测结果,识别标识符命名不一致性潜在缺陷;然后,定义了克隆关联图,在此基础上检测跨越多个实现不同功能的文件、危害软件可维护性的克隆代码;最后,对检测结果进行可视化统计分析。本文的克隆代码分析工具被应用于分析开源代码httpd,检测出了1组标识符命名不一致的克隆代码和44组危害软件可维护性的关键克隆类,实验结果表明,本文方法可以有效辅助软件开发和维护人员分析、维护克隆代码。相似文献

20.

IBFET: Index-based features extraction technique for scalable code clone detection at file level granularity

Junaid Akram Majid Mumtaz Ping Luo 《Software》2020,50(1):22-46

Many techniques have been developed over the years to detect code clones in different software systems to maintain security measures. These techniques often require the source code to compare the subject system against a very large data set of big code. This paper presents index-based features extraction technique (IBFET) to detect code clones at a very large-scale level to billions of LOC at file level granularity. We performed preprocessing, indexing, and clone detection for more than 324 billion of LOC using a Hadoop distributed environment, which is quite faster and more efficient as compared to existing distributed indexing and clone detection techniques; meanwhile, it detects all three types of clones efficiently. The MapReduce rule of divide and conquer is used for a count and retrieve the similar features between different systems. We evaluated the execution time, scalability, precision, and recall of IBFET by using a well-known clone detection data set IJaDataset and BigCloneBench; furthermore, we compared the results with other state-of-the-art tools. Our approach is faster, flexible, scalable, and provides accurate results with high authenticity and can be implemented at a large-scale level. 相似文献