首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Clone has emerged as a controversial term in software engineering research and practice. The impact of clones is of great importance from software maintenance perspectives. Stability is a well investigated term in assessing the impacts of clones on software maintenance. If code clones appear to exhibit a higher instability (i.e., higher change-proneness) than non-cloned code, then we can expect that code clones require higher maintenance effort and cost than non-cloned code. A number of studies have been done on the comparative stability of cloned and non-cloned code. However, these studies could not come to a consensus. While some studies show that code clones are more stable than non-cloned code, the other studies provide empirical evidence of higher instability of code clones. The possible reasons behind these contradictory findings are that different studies investigated different aspects of stability using different clone detection tools on different subject systems using different experimental setups. Also, the subject systems were not of wide varieties. Emphasizing these issues (with several others mentioned in the motivation) we have conducted a comprehensive empirical study where we have - (i) implemented and investigated seven existing methodologies that explored different aspects of stability, (ii) used two clone detection tools (NiCad and CCFinderX) to implement each of these seven methodologies, and (iii) investigated the stability of three types (Type-1, Type-2, Type-3) of clones. Our investigation on 12 diverse subject systems covering three programming languages (Java, C, C#) with a list of 8 stability assessment metrics suggest that (i) cloned code is often more unstable (change-prone) than non-cloned code in the maintenance phase, (ii) both Type 1 and Type 3 clones appear to exhibit higher instability than Type 2 clones, (iii) clones in Java and C programming languages are more change-prone than the clones in C#, and (iv) changes to the clones in procedural programming languages seem to be more dispersed than the changes to the clones in object oriented languages. We also systematically replicated the original studies with their original settings and found mostly equivalent results as of the original studies. We believe that our findings are important for prioritizing code clones from management perspectives.  相似文献   

2.
Exact or nearly similar code fragments in a software system’s source code are referred to as code clones. It is often the case that updates (i.e., changes) to a code clone will need to be propagated to its related code clones to preserve their similarity and to maintain source code consistency. When there is a delay in propagating the changes (possibly because the developer is unaware of the related cloned code), the system might behave incorrectly. A delay in propagating a change is referred to as ‘late propagation,’ and a number of studies have investigated this phenomenon. However, these studies did not investigate the intensity of late propagation nor how late propagation differs by clone type. In this research, we investigate late propagation separately for each of the three clone types (Type 1, Type 2, and Type 3). According to our experimental results on thousands of revisions of eight diverse subject systems written in two programming languages, late propagation occurs more frequently in Type 3 clones compared with the other two clone types. More importantly, there is a higher probability that Type 3 clones will experience buggy late propagations compared with the other two clone types. Also, we discovered that block clones are more involved in late propagation than method clones. Refactoring and tracking of Similarity Preserving Change Pattern (SPCP) clones (i.e., the clone fragments that evolve following a SPCP) can help us minimize the occurrences of late propagation in clones.  相似文献   

3.
Code cloning has been very often indicated as a bad software development practice. However, many studies appearing in the literature indicate that this is not always the case. In fact, either changes occurring in cloned code are consistently propagated, or cloning is used as a sort of templating strategy, where cloned source code fragments evolve independently. This paper (a) proposes an automatic approach to classify the evolution of source code clone fragments, and (b) reports a fine-grained analysis of clone evolution in four different Java and C software systems, aimed at investigating to what extent clones are consistently propagated or they evolve independently. Also, the paper investigates the relationship between the presence of clone evolution patterns and other characteristics such as clone radius, clone size and the kind of change the clones underwent, i.e., corrective maintenance or enhancement.  相似文献   

4.
针对克隆代码与非克隆代码产生"漏洞"倾向性的问题进行了研究,基于"漏洞"对不同类型克隆和非克隆代码进行了比较分析。首先提取软件系统中具有漏洞的代码,并使用克隆检测工具检测出软件的克隆代码;其次分别提取能够产生"漏洞"的克隆和非克隆代码,并分别计算不同克隆类型和非克隆的BOC漏洞密度和LOC漏洞密度;最后对type-1、pure type-2、pure-type3的克隆和非克隆漏洞密度进行了对比分析,并对代码中产生的"漏洞"类型进行分类分析,使用曼—惠特尼检验(WMM)验证了结果的有效性。实验结果表明type-1类型的克隆更容易产生"漏洞",pure type-3类型的克隆引入漏洞的几率相对较小。研究还得出在克隆和非克隆代码中分别存在出现频率较高的"漏洞"集合,增加了对克隆特性的理解,帮助软件设计和开发人员减少代码克隆对软件造成的负面影响。  相似文献   

5.
The literature presents conflicting claims regarding the effects of clones on software maintainability. For a community to progress, it is important to identify and address those areas of disagreement. Many claims, such as those related to developer behavior, either lack human-based empirical validation or are contradicted by other studies. This paper describes the results of two surveys to evaluate the level of agreement among clone researchers regarding claims that have not yet been validated through human-based empirical study. The surveys covered three key clone-related research topics: general information, developer behavior, and evolution. Survey 1 focused on high-level information about all three topics, whereas Survey 2 focused specifically on developer behavior. Approximately 20 clone researchers responded to each survey. The survey responses showed a lack of agreement on some major clone-related topics. First, the respondents disagree about the definitions of clone types, with some indicating the need for a taxonomy based upon developer intent. Second, the respondents were uncertain whether the ratio of cloned to non-cloned code affected system quality. Finally, the respondents disagree about the usefulness of various detection, analysis, evolution, and visualization tools for clone management tasks such as tracking and refactoring of clones. The overall results indicate the need for more focused, human-based empirical research regarding the effects of clones during maintenance. The paper proposes a strategy for future research regarding developer behavior and code clones in order to bridge the gap between clone research and the application of that research in clone maintenance.  相似文献   

6.
Reusing software through copying and pasting is a continuous plague in software development despite the fact that it creates serious maintenance problems. Various techniques have been proposed to find duplicated redundant code (also known as software clones). A recent study has compared these techniques and shown that token-based clone detection based on suffix trees is fast but yields clone candidates that are often not syntactic units. Current techniques based on abstract syntax trees—on the other hand—find syntactic clones but are considerably less efficient. This paper describes how we can make use of suffix trees to find syntactic clones in abstract syntax trees. This new approach is able to find syntactic clones in linear time and space. The paper reports the results of a large case study in which we empirically compare the new technique to other techniques using the Bellon benchmark for clone detectors. The Bellon benchmark consists of clone pairs validated by humans for eight software systems written in C or Java from different application domains. The new contributions of this paper over the conference paper are the additional analysis of Java programs, the exploration of an alternative path that uses parse trees instead of abstract syntax trees, and the investigation of the impact on recall and precision when clone analyses insist on consistent parameter renaming.  相似文献   

7.
Maintaining software systems is becoming more difficult as the size and complexity of software increase. One factor that complicates software maintenance is the presence of code clones. A code clone is a code fragment that has identical or similar code fragments to it in the source code. Code clones are introduced for various reasons such as reusing code by ‘copy and paste’. If modifying a code clone with many similar code fragments, we must consider whether to modify each of them. Especially for large-scale software, such a process is very complicated and expensive. In this paper, we propose methods of visualizing and featuring code clones to support their understanding in large-scale software. The methods have been implemented as a tool called Gemini, which has applied to an open source software system. Application results show the usefulness and capability of our system.  相似文献   

8.
Code cloning is one of the active research areas in the software engineering community. Specifically, researchers have conducted numerous empirical studies on code cloning and reported that 7 % to 23 % of the code in a typical software system has been cloned. However, there was less awareness of code clones in dynamically-typed languages and most studies are limited to statically-typed languages such as Java, C, and C++. In addition, most previous studies did not consider different application domains such as standalone projects or web applications. As a result, very little is known about clones in dynamically-typed languages, such as JavaScript, in different application domains. In this paper, we report a large-scale clone detection experiment in a dynamically-typed programming language, JavaScript, for different application domains: web pages and standalone projects. Our experimental results showed that unlike JavaScript standalone projects, JavaScript web applications have 95 % of inter-file clones and 91–97 % of widely scattered clones. We observed that web application developers created clones intentionally and such clones may not be as risky as claimed in previous studies. Understanding the risks of cloning in web applications requires further studies, as cloning may be due to either good or bad intentions. Also, we identified unique development practices such as including browser-dependent or device-specific code in code clones of JavaScript web applications. This indicates that features of programming languages and technologies affect how developers duplicate code.  相似文献   

9.
代码克隆是指软件程序中一组相同或相近的代码片段,它广泛存在于软件中,因此如何发现代码克隆成为软件维护的一个重要问题。目前已有的克隆检测工具大多针对单一版本进行完整的克隆检测,然而对于大规模、复杂软件系统而言,在软件演化过程中随着代码的改变,不断重新检测代码克隆将花费较高的代价。针对这一问题,提出了一种基于分组的增量克隆检测方法。该方法根据前后两个版本的差异将源代码分为发生变化和未发生变化的两组,通过组内和组间的克隆分析实现增量的克隆检测。基于所提出的方法,在克隆检测工具CCFinderX的基础上实现了一个名为ICDBG(incremental clone detector based on grouping)的原型工具。实验证明,在变更较小时,该方法能够在保证正确性的同时显著减少克隆检测时间。  相似文献   

10.
A code clone is a code portion in source files that is identical or similar to another. Since code clones are believed to reduce the maintainability of software, several code clone detection techniques and tools have been proposed. This paper proposes a new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison. For its implementation with several useful optimization techniques, we have developed a tool, named CCFinder (Code Clone Finder), which extracts code clones in C, C++, Java, COBOL and other source files. In addition, metrics for the code clones have been developed. In order to evaluate the usefulness of CCFinder and metrics, we conducted several case studies where we applied the new tool to the source code of JDK, FreeBSD, NetBSD, Linux, and many other systems. As a result, CCFinder has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems. In addition, we have compared the proposed technique with other clone detection techniques.  相似文献   

11.
Clones are generally considered bad programming practice in software engineering folklore. They are identified as a bad smell?(Fowler et?al. 1999) and a major contributor to project maintenance difficulties. Clones inherently cause code bloat, thus increasing project size and maintenance costs. In this work, we try to validate the conventional wisdom empirically to see whether cloning makes code more defect prone. This paper analyses the relationship between cloning and defect proneness. For the four medium to large open source projects that we studied, we find that, first, the great majority of bugs are not significantly associated with clones. Second, we find that clones may be less defect prone than non-cloned code. Third, we find little evidence that clones with more copies are actually more error prone. Fourth, we find little evidence to support the claim that clone groups that span more than one file or directory are more defect prone than collocated clones. Finally, we find that developers do not need to put a disproportionately higher effort to fix clone dense bugs. Our findings do not support the claim that clones are really a “bad smell”?(Fowler et?al. 1999). Perhaps we can clone, and breathe easily, at the same?time.  相似文献   

12.
Code smells are indicators of deeper design problems that may cause difficulties in the evolution of a software system. This paper investigates the capability of twelve code smells to reflect actual maintenance problems. Four medium-sized systems with equivalent functionality but dissimilar design were examined for code smells. Three change requests were implemented on the systems by six software developers, each of them working for up to four weeks. During that period, we recorded problems faced by developers and the associated Java files on a daily basis. We developed a binary logistic regression model, with “problematic file” as the dependent variable. Twelve code smells, file size, and churn constituted the independent variables. We found that violation of the Interface Segregation Principle (a.k.a. ISP violation) displayed the strongest connection with maintenance problems. Analysis of the nature of the problems, as reported by the developers in daily interviews and think-aloud sessions, strengthened our view about the relevance of this code smell. We observed, for example, that severe instances of problems relating to change propagation were associated with ISP violation. Based on our results, we recommend that code with ISP violation should be considered potentially problematic and be prioritized for refactoring.  相似文献   

13.
折蓉蓉  张丽萍  侯敏  闫盛 《计算机应用》2018,38(7):2037-2043
针对克隆代码的大量使用会导致长期软件维护问题甚至引入错误,提出了一种基于决策树的分类器来推荐克隆进行重构。首先,使用NiCad进行克隆检测;其次,收集了与克隆关系、克隆代码段和克隆上下文都相关的特征;然后,利用决策树分类器训练;最后,利用K折交叉评估分类结果。在5款开源软件中对近600多个克隆实例进行实验,实验结果表明所提方法为每个目标系统推荐克隆重构实例时达到80%的精度。  相似文献   

14.
针对当前克隆谱系的构建方法较为复杂、演化模式亟需扩充等问题,提出了新的克隆代码演化模式,并根据软件版本间的克隆代码映射关系自动构建了克隆谱系。首先,针对软件每一版本进行克隆检测并利用潜在狄利克雷分配(LDA)抽取克隆代码的主题信息;然后,根据克隆代码主题的相似度确定版本间克隆代码的映射关系;进而,根据已有的映射关系为克隆代码添加演化模式并分析演化特征;最终,结合映射信息与演化模式信息完成克隆谱系的构建。针对4款开源软件进行了克隆谱系的构建实验,实验结果表明所提克隆谱系构建方法可行,证实了新提出的演化模式在克隆代码演化过程中确实存在。实验发现约90%的克隆代码在软件演化过程中比较稳定,约67%的克隆群经历的发布版本数不超过发布版本总数的一半。实验结论及理论分析将为克隆代码的后续研究及克隆代码的维护与管理提供有力支持。  相似文献   

15.
Making changes to software systems can prove costly and it remains a challenge to understand the factors that affect the costs of software evolution. This study sought to identify such factors by investigating the effort expended by developers to perform 336 change tasks in two different software organizations. We quantitatively analyzed data from version control systems and change trackers to identify factors that correlated with change effort. In-depth interviews with the developers about a subset of the change tasks further refined the analysis. Two central quantitative results found that dispersion of changed code and volatility of the requirements for the change task correlated with change effort. The analysis of the qualitative interviews pointed to two important, underlying cost drivers: Difficulties in comprehending dispersed code and difficulties in anticipating side effects of changes. This study demonstrates a novel method for combining qualitative and quantitative analysis to assess cost drivers of software evolution. Given our findings, we propose improvements to practices and development tools to manage and reduce the costs.  相似文献   

16.
Deployed software systems are typically composed of many pieces, not all of which may have been created by the main development team. Often, the provenance of included components—such as external libraries or cloned source code—is not clearly stated, and this uncertainty can introduce technical and ethical concerns that make it difficult for system owners and other stakeholders to manage their software assets. In this work, we motivate the need for the recovery of the provenance of software entities by a broad set of techniques that could include signature matching, source code fact extraction, software clone detection, call flow graph matching, string matching, historical analyses, and other techniques. We liken our provenance goals to that of Bertillonage, a simple and approximate forensic analysis technique based on bio-metrics that was developed in 19th century France before the advent of fingerprints. As an example, we have developed a fast, simple, and approximate technique called anchored signature matching for identifying the source origin of binary libraries within a given Java application. This technique involves a type of structured signature matching performed against a database of candidates drawn from the Maven2 repository, a 275 GB collection of open source Java libraries. To show the approach is both valid and effective, we conducted an empirical study on 945 jars from the Debian GNU/Linux distribution, as well as an industrial case study on 81 jars from an e-commerce application.  相似文献   

17.
Two identical or similar code fragments form a clone pair. Previous studies have identified cloning as a risky practice. Therefore, a developer needs to be aware of any clone pairs in order to properly propagate any changes between clones. A clone pair may experience many changes during the creation and maintenance of a software system. A change can either maintain or remove the similarity between clones in a clone pair. If a change maintains the similarity between clones, the clone pair is left in a consistent state. When a change makes the clones no longer similar, the clone pair is left in an inconsistent state. The set of states and changes experienced by clone pairs over time form an evolution history known as a clone genealogy. In this paper, we examine clone genealogies to identify fault-prone “patterns” of states and changes. We explore the use of clone genealogy information in fault prediction. We conduct a quasi-experiment with four long-lived software systems (i.e., Apache Ant, ArgoUML, JEdit, Maven) and identify clones using the NiCad and iClones clone detection tools. Overall, we find that the size of the clone can impact the fault-proneness of a clone pair. However, there is no clear impact of the time interval between changes to a clone pair on the fault-proneness of the clone pair. We also discover that adding clone genealogy information can increase the explanatory power of fault prediction models.  相似文献   

18.
Code clones are similar program structures recurring in variant forms in software system(s). Several techniques have been proposed to detect similar code fragments in software, so-called simple clones. Identification and subsequent unification of simple clones is beneficial in software maintenance. Even further gains can be obtained by elevating the level of code clone analysis. We observed that recurring patterns of simple clones often indicate the presence of interesting higher-level similarities that we call structural clones. Structural clones show a bigger picture of similarity situation than simple clones alone. Being logical groups of simple clones, structural clones alleviate the problem of huge number of clones typically reported by simple clone detection tools, a problem that is often dealt with postdetection visualization techniques. Detection of structural clones can help in understanding the design of the system for better maintenance and in reengineering for reuse, among other uses. In this paper, we propose a technique to detect some useful types of structural clones. The novelty of our approach includes the formulation of the structural clone concept and the application of data mining techniques to detect these higher-level similarities. We describe a tool called Clone Miner that implements our proposed technique. We assess the usefulness and scalability of the proposed techniques via several case studies. We discuss various usage scenarios to demonstrate in what ways the knowledge of structural clones adds value to the analysis based on simple clones alone.  相似文献   

19.
Many techniques have been developed over the years to detect code clones in different software systems to maintain security measures. These techniques often require the source code to compare the subject system against a very large data set of big code. This paper presents index-based features extraction technique (IBFET) to detect code clones at a very large-scale level to billions of LOC at file level granularity. We performed preprocessing, indexing, and clone detection for more than 324 billion of LOC using a Hadoop distributed environment, which is quite faster and more efficient as compared to existing distributed indexing and clone detection techniques; meanwhile, it detects all three types of clones efficiently. The MapReduce rule of divide and conquer is used for a count and retrieve the similar features between different systems. We evaluated the execution time, scalability, precision, and recall of IBFET by using a well-known clone detection data set IJaDataset and BigCloneBench; furthermore, we compared the results with other state-of-the-art tools. Our approach is faster, flexible, scalable, and provides accurate results with high authenticity and can be implemented at a large-scale level.  相似文献   

20.
Many of the recently developed software systems are implemented in Java. For these systems, activities presently are mainly related to software development tasks rather than to dedicated software maintenance tasks. For these Java systems, therefore, experimental confirmation of established metrics for measuring code quantities that are related to software maintenance is not available. This also includes very basic size measures such as the LOC metric and the Halstead length. In this article, the application of these metrics for Java systems as well as some of the associated difficulties are outlined. The presented results are based on experimental data and include empirical correlations between the basic size metrics as well as newly derived scaling laws which are suitable for maintenance related software measurement.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号