期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Software clone detection: A systematic review

《Information and Software Technology》2013,55(7):1165-1199

ContextReusing software by means of copy and paste is a frequent activity in software development. The duplicated code is known as a software clone and the activity is known as code cloning. Software clones may lead to bug propagation and serious maintenance problems.ObjectiveThis study reports an extensive systematic literature review of software clones in general and software clone detection in particular.MethodWe used the standard systematic literature review method based on a comprehensive set of 213 articles from a total of 2039 articles published in 11 leading journals and 37 premier conferences and workshops.ResultsExisting literature about software clones is classified broadly into different categories. The importance of semantic clone detection and model based clone detection led to different classifications. Empirical evaluation of clone detection tools/techniques is presented. Clone management, its benefits and cross cutting nature is reported. Number of studies pertaining to nine different types of clones is reported. Thirteen intermediate representations and 24 match detection techniques are reported.ConclusionWe call for an increased awareness of the potential benefits of software clone management, and identify the need to develop semantic and model clone detection techniques. Recommendations are given for future research. 相似文献

2.

Automated refactoring to the Null Object design pattern

《Information and Software Technology》2015

ContextNull-checking conditionals are a straightforward solution against null dereferences. However, their frequent repetition is considered a sign of poor program design, since they introduce source code duplication and complexity that impacts code comprehension and maintenance. The Null Object design pattern enables the replacement of null-checking conditionals with polymorphic method invocations that are bound, at runtime, to either a real object or a Null Object.ObjectiveThis work proposes a novel method for automated refactoring to Null Object that eliminates null-checking conditionals associated with optional class fields, i.e., fields that are not initialized in all class instantiations and, thus, their usage needs to be guarded in order to avoid null dereferences.MethodWe introduce an algorithm for automated discovery of refactoring opportunities to Null Object. Moreover, we specify the source code transformation procedure and an extensive set of refactoring preconditions for safely refactoring an optional field and its associated null-checking conditionals to the Null Object design pattern. The method is implemented as an Eclipse plug-in and is evaluated on a set of open source Java projects.ResultsSeveral refactoring candidates are discovered in the projects used in the evaluation and their refactoring lead to improvement of the cyclomatic complexity of the affected classes. The successful execution of the projects’ test suites, on their refactored versions, provides empirical evidence on the soundness of the proposed source code transformation. Runtime performance results highlight the potential for applying our method to a wide range of project sizes.ConclusionOur method automates the elimination of null-checking conditionals through refactoring to the Null Object design pattern. It contributes to improvement of the cyclomatic complexity of classes with optional fields. The runtime processing overhead of applying our method is limited and allows its integration to the programmer’s routine code analysis activities. 相似文献

3.

克隆代码技术研究综述 总被引：1，自引：1，他引：0

史庆庆孟繁军张丽萍刘东升《计算机应用研究》2013,30(6):1617-1623

软件系统中克隆代码的检测与管理是软件工程中的基本问题之一, 在软件的质量、维护、架构、进化、专利和剽窃等众多领域有着广泛的应用需求。综述了克隆检测的过程、技术及其优缺点、克隆进化方向上的相关研究, 以及克隆管理的一些技术, 并特别介绍了克隆重构技术。最后概括了该领域所取得的研究成果, 并讨论了目前克隆代码研究中所遇到的挑战性问题。相似文献

4.

A systematic literature review on the use of machine learning in code clone research

《Computer Science Review》2023

Context:Research related to code clones includes detection of clones in software systems, analysis, visualization and management of clones. Detection of semantic clones and management of clones have attracted use of machine learning techniques in code clone related research.Objective:The aim of this study is to report the extent of machine learning usage in code clone related research areas.Method:The paper uses a systematic review method to report the use of machine learning in research related to code clones. The study considers a comprehensive set of 57 articles published in leading conferences, workshops and journals.Results:Code clone related research using machine learning techniques is classified into different categories. Machine learning and deep learning algorithms used in the code clone research are reported. The datasets, features used to train machine learning models and metrics used to evaluate machine learning algorithms are reported. The comparative results of various machine learning algorithms presented in primary studies are reported.Conclusion:The research will help to identify the status of using machine learning in different code clone related research areas. We identify the need of more empirical studies to assess the benefits of machine learning in code clone research and give recommendations for future research. 相似文献

5.

基于决策树推荐克隆重构的方法

折蓉蓉张丽萍侯敏闫盛《计算机应用》2018,38(7):2037-2043

针对克隆代码的大量使用会导致长期软件维护问题甚至引入错误,提出了一种基于决策树的分类器来推荐克隆进行重构。首先,使用NiCad进行克隆检测;其次,收集了与克隆关系、克隆代码段和克隆上下文都相关的特征;然后,利用决策树分类器训练;最后,利用K折交叉评估分类结果。在5款开源软件中对近600多个克隆实例进行实验,实验结果表明所提方法为每个目标系统推荐克隆重构实例时达到80%的精度。相似文献

6.

CD-Form: A clone detector based on formal methods

《Science of Computer Programming》2014

相似文献

7.

SeByte: Scalable clone and similarity search for bytecode

《Science of Computer Programming》2014

While source code clone detection is a well-established research area, finding similar code fragments in binary and other intermediate code representations has been not yet that widely studied. In this paper, we introduce SeByte, a bytecode clone detection and search model that applies semantic-enabled token matching. It is developed based on the idea of relaxation on the code fingerprints. This approach separates the input content based on the types of tokens into different dimensions, with each dimension representing the input content from a specific point of view. Following this approach, SeByte compares each dimension separately and independently which we refer to as multi-dimensional comparison in our research. As the similarity search function we use a well-known measure that supports our multi-dimensional comparison heuristic, the Jaccard similarity coefficient. Our preliminary study shows that SeByte can detect clones that are missed by existing approaches due to the differences in the input data and the search algorithm. We then further exploit the model to build a scalable bytecode clone search engine. This extension meets the requirements of a classical search engine including the ranking of result sets. Our evaluation with a large dataset of 500,000 compiled Java classes, which we extracted from the six most recent versions of the Eclipse IDE, showed that our SeByte search is not only scalable but also capable of providing a reliable ranking. 相似文献

8.

CCFinder: a multilinguistic token-based code clone detection system for large scale source code 总被引：2，自引：0，他引：2

Kamiya T. Kusumoto S. Inoue K. 《IEEE transactions on pattern analysis and machine intelligence》2002,28(7):654-670

A code clone is a code portion in source files that is identical or similar to another. Since code clones are believed to reduce the maintainability of software, several code clone detection techniques and tools have been proposed. This paper proposes a new clone detection technique, which consists of the transformation of input source text and a token-by-token comparison. For its implementation with several useful optimization techniques, we have developed a tool, named CCFinder (Code Clone Finder), which extracts code clones in C, C++, Java, COBOL and other source files. In addition, metrics for the code clones have been developed. In order to evaluate the usefulness of CCFinder and metrics, we conducted several case studies where we applied the new tool to the source code of JDK, FreeBSD, NetBSD, Linux, and many other systems. As a result, CCFinder has effectively found clones and the metrics have been able to effectively identify the characteristics of the systems. In addition, we have compared the proposed technique with other clone detection techniques. 相似文献

9.

Code clones and developer behavior: results of two surveys of the clone research community

Debarshi Chatterji Jeffrey C. Carver Nicholas A. Kraft 《Empirical Software Engineering》2016,21(4):1476-1508

The literature presents conflicting claims regarding the effects of clones on software maintainability. For a community to progress, it is important to identify and address those areas of disagreement. Many claims, such as those related to developer behavior, either lack human-based empirical validation or are contradicted by other studies. This paper describes the results of two surveys to evaluate the level of agreement among clone researchers regarding claims that have not yet been validated through human-based empirical study. The surveys covered three key clone-related research topics: general information, developer behavior, and evolution. Survey 1 focused on high-level information about all three topics, whereas Survey 2 focused specifically on developer behavior. Approximately 20 clone researchers responded to each survey. The survey responses showed a lack of agreement on some major clone-related topics. First, the respondents disagree about the definitions of clone types, with some indicating the need for a taxonomy based upon developer intent. Second, the respondents were uncertain whether the ratio of cloned to non-cloned code affected system quality. Finally, the respondents disagree about the usefulness of various detection, analysis, evolution, and visualization tools for clone management tasks such as tracking and refactoring of clones. The overall results indicate the need for more focused, human-based empirical research regarding the effects of clones during maintenance. The paper proposes a strategy for future research regarding developer behavior and code clones in order to bridge the gap between clone research and the application of that research in clone maintenance. 相似文献

10.

On the use of clone detection for identifying crosscutting concern code 总被引：2，自引：0，他引：2

Bruntink M. van Deursen A. van Engelen R. Tourwe T. 《IEEE transactions on pattern analysis and machine intelligence》2005,31(10):804-818

In systems developed without aspect-oriented programming, code implementing a crosscutting concern may be spread over many different parts of a system. Identifying such code automatically could be of great help during maintenance of the system. First of all, it allows a developer to more easily find the places in the code that must be changed when the concern changes and, thus, makes such changes less time consuming and less prone to errors. Second, it allows the code to be refactored to an aspect-oriented solution, thereby improving its modularity. In this paper, we evaluate the suitability of clone detection as a technique for the identification of crosscutting concerns. To that end, we manually identify five specific crosscutting concerns in an industrial C system and analyze to what extent clone detection is capable of finding them. We consider our results as a stepping stone toward an automated "aspect miner" based on clone detection. 相似文献

11.

Client-based cohesion metrics for Java programs

Sami Mäkelä Ville Leppänen 《Science of Computer Programming》2009,74(5-6):355-378

One purpose of software metrics is to measure the quality of programs. The results can be for example used to predict maintenance costs or improve code quality. An emerging view is that if software metrics are going to be used to improve quality, they must help in finding code that should be refactored. Often refactoring or applying a design pattern is related to the role of the class to be refactored. In client-based metrics, a project gives the class a context. These metrics measure how a class is used by other classes in the context. We present a new client-based metric LCIC (Lack of Coherence in Clients), which analyses if the class being measured has a coherent set of roles in the program. Interfaces represent the roles of classes. If a class does not have a coherent set of roles, it should be refactored, or a new interface should be defined for the class.We have implemented a tool for measuring the metric LCIC for Java projects in the Eclipse environment. We calculated LCIC values for classes of several open source projects. We compare these results with results of other related metrics, and inspect the measured classes to find out what kind of refactorings are needed. We also analyse the relation of different design patterns and refactorings to our metric. Our experiments reveal the usefulness of client-based metrics to improve the quality of code. 相似文献

12.

Automated refactoring to the Strategy design pattern

《Information and Software Technology》2012,54(11):1202-1214

ContextThe automated identification of code fragments characterized by common design flaws (or “code smells”) that can be handled through refactoring, fosters refactoring activities, especially in large code bases where multiple developers are engaged without a detailed view on the whole system. Automated refactoring to design patterns enables significant contributions to design quality even from developers with little experience on the use of the required patterns.ObjectiveThis work targets the automated identification of refactoring opportunities to the Strategy design pattern and the elimination through polymorphism of respective “code smells” that are related to extensive use of complex conditional statements.MethodAn algorithm is introduced for the automated identification of refactoring opportunities to the Strategy design pattern. Suggested refactorings comprise conditional statements that are characterized by analogies to the Strategy design pattern, in terms of the purpose and selection mode of strategies. Moreover, this work specifies the procedure for refactoring to Strategy the identified conditional statements. For special cases of these statements, a technique is proposed for total replacement of conditional logic with method calls of appropriate concrete Strategy instances. The identification algorithm and the refactoring procedure are implemented and integrated in the JDeodorant Eclipse plug-in. The method is evaluated on a set of Java projects, in terms of quality of the suggested refactorings and run-time efficiency. The relevance of the identified refactoring opportunities is verified by expert software engineers.ResultsThe identification algorithm recalled, from the projects used during evaluation, many of the refactoring candidates that were identified by the expert software engineers. Its execution time on projects of varying size confirmed the run-time efficiency of this method.ConclusionThe proposed method for automated refactoring to Strategy contributes to simplification of conditional statements. Moreover, it enhances system extensibility through the Strategy design pattern. 相似文献

13.

克隆代码分析方法研究

王克朝朱宸光王甜甜苏小红《计算机应用研究》2017,34(3)

针对已有克隆代码检测工具只输出克隆组形式的检测结果,而无法分析克隆代码对软件质量的影响问题,提出危害软件质量的关键克隆代码的识别方法。首先,定义了克隆代码的统一表示形式,使之可以分析各种克隆检测工具的检测结果;接下来,解析源程序和克隆检测结果,识别标识符命名不一致性潜在缺陷;然后,定义了克隆关联图,在此基础上检测跨越多个实现不同功能的文件、危害软件可维护性的克隆代码;最后,对检测结果进行可视化统计分析。本文的克隆代码分析工具被应用于分析开源代码httpd,检测出了1组标识符命名不一致的克隆代码和44组危害软件可维护性的关键克隆类,实验结果表明,本文方法可以有效辅助软件开发和维护人员分析、维护克隆代码。相似文献

14.

Performance comparison of query-based techniques for anti-pattern detection

《Information and Software Technology》2015

ContextProgram queries play an important role in several software evolution tasks like program comprehension, impact analysis, or the automated identification of anti-patterns for complex refactoring operations. A central artifact of these tasks is the reverse engineered program model built up from the source code (usually an Abstract Semantic Graph, ASG), which is traditionally post-processed by dedicated, hand-coded queries.ObjectiveOur paper investigates the costs and benefits of using the popular industrial Eclipse Modeling Framework (EMF) as an underlying representation of program models processed by four different general-purpose model query techniques based on native Java code, OCL evaluation and (incremental) graph pattern matching.MethodWe provide in-depth comparison of these techniques on the source code of 28 Java projects using anti-pattern queries taken from refactoring operations in different usage profiles.ResultsOur results show that general purpose model queries can outperform hand-coded queries by 2–3 orders of magnitude, with the trade-off of an increased in memory consumption and model load time of up to an order of magnitude.ConclusionThe measurement results of usage profiles can be used as guidelines for selecting the appropriate query technologies in concrete scenarios. 相似文献

15.

An approach to clone detection in sequence diagrams and its application to security analysis

Manar H. Alalfi Elizabeth P. Antony James R. Cordy 《Software and Systems Modeling》2018,17(4):1287-1309

Duplication in software systems is an important issue in software quality assurance. While many methods for software clone detection in source code and structural models have been described in the literature, little has been done on similarity in the dynamic behaviour of interactive systems. In this paper, we present an approach to identifying near-miss interaction clones in reverse-engineered UML sequence diagrams. Our goal is to identify patterns of interaction (“conversations”) that can be used to characterize and abstract the run-time behaviour of web applications and other interactive systems. In order to leverage existing robust near-miss code clone technology, our approach is text-based, working on the level of XMI, the standard interchange serialization for UML. Clone detection in UML behavioural models, such as sequence diagrams, presents a number of challenges—first, it is not clear how to break a continuous stream of interaction between lifelines (representing the objects or actors in the system) into meaningful conversational units. Second, unlike programming languages, the XMI text representation for UML is highly non-local, using attributes to reference-related elements in the model file remotely. In this work, we use a set of contextualizing source transformations on the XMI text representation to localize related elements, exposing the hidden hierarchical structure of the model and allowing us to granularize behavioural interactions into conversational units. Then we adapt NICAD, a robust near-miss code clone detection tool, to help us identify conversational clones in reverse-engineered behavioural models. These conversational clones are then analysed to find worrisome interactions that may indicate security access violations. 相似文献

16.

An investigation of the fault-proneness of clone evolutionary patterns

Liliane Barbour Le An Foutse Khomh Ying Zou Shaohua Wang 《Software Quality Journal》2018,26(4):1187-1222

Two identical or similar code fragments form a clone pair. Previous studies have identified cloning as a risky practice. Therefore, a developer needs to be aware of any clone pairs in order to properly propagate any changes between clones. A clone pair may experience many changes during the creation and maintenance of a software system. A change can either maintain or remove the similarity between clones in a clone pair. If a change maintains the similarity between clones, the clone pair is left in a consistent state. When a change makes the clones no longer similar, the clone pair is left in an inconsistent state. The set of states and changes experienced by clone pairs over time form an evolution history known as a clone genealogy. In this paper, we examine clone genealogies to identify fault-prone “patterns” of states and changes. We explore the use of clone genealogy information in fault prediction. We conduct a quasi-experiment with four long-lived software systems (i.e., Apache Ant, ArgoUML, JEdit, Maven) and identify clones using the NiCad and iClones clone detection tools. Overall, we find that the size of the clone can impact the fault-proneness of a clone pair. However, there is no clear impact of the time interval between changes to a clone pair on the fault-proneness of the clone pair. We also discover that adding clone genealogy information can increase the explanatory power of fault prediction models. 相似文献

17.

克隆代码映射的方法与应用

陈桌张丽萍边琦《计算机工程与应用》2017,53(6):14-21

克隆代码是指重复或类似的代码片段,这些重复代码来自于“复制粘贴修改”的编程方式,此类代码会严重影响软件的可维护性。研究者们从各种角度来探索克隆代码的存在、发展和变化规律,对克隆代码进行追踪并发现在其演化过程中表现的特征和模式,从而更好地研究和管理,而克隆映射是整个研究过程的核心步骤。介绍了克隆相关概念及术语,详细阐述了不同类型的映射方法并总结方法的优缺点,说明了克隆映射在克隆演化分析和克隆质量评估方面的应用,对克隆映射的发展趋势进行了总结和展望。相似文献

18.

An extended assessment of type-3 clones as detected by state-of-the-art tools

Rebecca Tiarks Rainer Koschke Raimar Falke 《Software Quality Journal》2011,19(2):295-331

Code reuse through copying and pasting leads to so-called software clones. These clones can be roughly categorized into identical fragments (type-1 clones), fragments with parameter substitution (type-2 clones), and similar fragments that differ through modified, deleted, or added statements (type-3 clones). Although there has been extensive research on detecting clones, detection of type-3 clones is still an open research issue due to the inherent vagueness in their definition. In this paper, we analyze type-3 clones detected by state-of-the-art tools and investigate type-3 clones in terms of their syntactic differences. Then, we derive their underlying semantic abstractions from their syntactic differences. Finally, we investigate whether there are code characteristics that indicate that a tool-suggested clone candidate is a real type-3 clone from a human’s perspective. Our findings can help developers of clone detectors and clone refactoring tools to improve their tools. 相似文献

19.

基于下推自动机的细粒度锁自动重构方法

张杨邵帅张冬雯《软件学报》2021,32(12):3710-3727

针对粗粒度锁会严重影响并发程序的可伸缩性问题,提出一种面向细粒度锁的自动重构方法.该方法借助访问者模式分析、别名分析、负面效应分析等多种程序分析技术获取临界区代码的读写模式,然后使用下推自动机构建不同锁模式的识别方法,根据识别结果进行代码重构.与以往锁重构方法的不同之处在于,该方法考虑了锁降级模式,使重构适用性更广.基于此方法,在Eclipse JDT框架下,以插件的形式实现了自动重构工具FLock.在实验中,从重构个数、改变的代码行数、重构时间、准确性和重构后程序性能等方面对FLock进行了评估,并与已有的重构工具Relocker和CLOCK进行了对比.对HSQLDB,Jenkins和Cassandra等11个大型实际应用程序的重构结果表明：FLock共重构了1 757个内置监视器对象,每个程序重构平均用时17.5s.该重构工具可以有效地实现粗粒度锁到细粒度锁的转换,与手动重构相比,有效提升了细粒度锁的重构效率. 相似文献

20.

CCodeExtractor:一种针对C程序自动化的函数提取方法

张其良张昱周坤《计算机科学》2017,44(4):16-20, 29

随着程序规模和复杂性的增加,代码重构在改善软件质量、性能以及提高软件的扩展性和维护性等方面至关重要。目前的Eclipse中,C源代码重构工具的函数提取只能处理一些简单的代码,且处理过程需要人工参与,不能自动化处理。为此,提出一种C源代码级别自动化的函数提取方法(CCodeExtractor),它通过指定提取条件,在保证语义一致的前提下,将符合条件的代码片段自动分离成一个单独的函数,并将其放到新文件中,而原代码片段替换成了一个函数调用。为了验证CCodeExtractor的有效性,结合循环分析和优化在实际应用中的广泛应用,在LLVM上实现了一个提取循环的工具,将程序中满足条件的for循环封装成单独的函数。在实验中,使用该工具对6个不同规模的程序进行了测试并且对比了变换前后程序运行的结果。实验结果表明,CCodeExtractor代码提取方法在保证程序语义不变的前提下,适用于不同规模的程序。相似文献