首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 281 毫秒
1.
陈翔  于池  杨光  濮雪莲  崔展齐 《软件学报》2023,34(3):1310-1329
Bash是Linux默认的shell命令语言.它在Linux系统的开发和维护中起到重要作用.对不熟悉Bash语言的开发人员来说,理解Bash代码的目的和功能具有一定的挑战性.针对Bash代码注释自动生成问题提出了一种基于双重信息检索的方法 ExplainBash.该方法基于语义相似度和词法相似度进行双重检索,从而生成高质量代码注释.其中,语义相似度基于CodeBERT和BERT-whitening操作训练出代码语义表示,并基于欧式距离来实现;词法相似度基于代码词元构成的集合,并基于编辑距离来实现.以NL2Bash研究中共享的语料库为基础,进一步合并NLC2CMD竞赛共享的数据以构造高质量语料库.随后,选择了来自代码注释自动生成领域的9种基准方法,这些基准方法覆盖了基于信息检索的方法和基于深度学习的方法.实证研究和人本研究的结果验证了ExplainBash方法的有效性.然后设计了消融实验,对ExplainBash方法内设定(例如检索策略、BERT-whitening操作等)的合理性进行了分析.最后,基于所提方法开发出一个浏览器插件,以方便用户对Bash代码的理解.  相似文献   

2.
王敏  潘兴禄  邹艳珍  谢冰 《软件学报》2023,34(10):4705-4723
代码审查是现代软件分布式并行开发过程中的重要机制. 在代码评审时, 帮助代码评审者快速查看某一段源代码的演化过程, 可以让评审者快速理解此段代码变更的原因和必要性, 从而有效提升代码评审的效率与质量. 现有工作虽然提供了一些类似的代码提交历史回溯方法及对应工具, 但缺乏从历史数据中进一步提取辅助代码评审相关辅助信息的能力. 为此, 提出一个面向代码评审的细粒度代码变更溯源方法C2Tracker. 给定一段方法(函数)级别的细粒度代码变更, C2Tracker能够自动追溯到历史开发过程中修改该段代码相关的代码提交, 并在此基础上进一步挖掘其中与该段代码频繁共现修改的代码元素以及相关的变更片段, 辅助代码评审者对当前代码变更的理解与决策. 在10个著名开源项目的数据集下进行实验验证. 实验结果表明, C2Tracker在追溯历史提交的准确率上达到97%, 在挖掘频繁共现代码元素任务上的准确率达到95%, 在追溯相关代码变更片段任务上的准确率达到97%; 相比现有评审方式, C2Tracker在具体案例的代码评审效率和质量上均有较大提升, 在绝大多数的代码评审案例中被评审者认为能提供“明显帮助”或“很大帮助”.  相似文献   

3.
使用抽象语法树和静态分析的克隆代码自动重构方法   总被引:4,自引:0,他引:4  
单个软件系统中以及若干个相似系统之间的代码克隆给软件维护增加了很大困难.本文针对运用克隆侦测发现的相似代码片断,提出一种基于抽象语法树和静态分析的代码自动重构方法.该方法首先为克隆代码分别构造抽象语法树,然后运用语句差异度指标建立起语法树之间流程控制语句的对应关系.在此基础上,该方法根据控制流程和基本语句块两个层次上的差异性分析,最终通过代码可变点提取实现克隆代码的自动合并.针对Java代码开发了克隆代码重构支持工具原型,并分别针对JDK1.5和一个业务系统进行了自动重构实验.初步的结果表明,该方法能够准确、有效地辅助开发者实现克隆代码的自动重构.  相似文献   

4.
代码注释有助于提高程序的可读性和可理解性,而不断地创建和更新注释非常费时费力,这将影响对软件的理解、重用和维护.自动代码注释试图解决此类问题,其中代码的表示和文本生成是研究的核心问题.该文提出一种基于编码器-解码器结构的自动生成Java代码注释模型.方法将代码的顺序序列和代码结构作为单独的输入进行处理,允许模型学习Java方法的结构和语义;以一定的概率从模型生成的预测序列和真实词序列中采样作为下一步的输入,以提高模型的纠错恢复能力.通过与3种典型自动代码注释方法在11个Java项目上的对比实验,结果表明,提出的模型在BLEU得分上提升了16.1%,有助于提高自动代码注释的性能.  相似文献   

5.
在软件开发的过程中,开发人员通过复制粘贴式的开发方式或者模块化的开发方式来完成需求是十分常见的,这两种开发方式可以提高开发效率,但同时会导致软件系统中出现大量的相同代码或者相似代码,大量的相似代码会给软件维护等方面带来很大的困难,这也是最常见的重构对象。源代码相似性度量是指利用一定的检测方法分析程序源代码间的相似程度。该技术被应用于代码抄袭检测、代码克隆检测、软件知识产权保护、代码复用等多个领域。为了提高代码相似性度量的准确性,提出了一种基于多特征值的源代码相似性检测技术。构建了源代码注释、型构、代码文本语句与结构中特征提取的方法,并给出了源代码相似度检测的度量模型。通过与权威的代码相似检测系统Moss进行对比实验,结果表明该方法可以更准确地检测出相似代码。  相似文献   

6.
为了提高软件开发的质量和效率,代码自动生成是当前的研究热点,代码自动生成的性能是其中的重要问题.现有代码自动生成的性能分析方法较简单,难以评估代码自动生成过程中程序员与代码自动生成工具各自的特征.本文综合考虑了代码自动生成过程中程序员与代码自动生成工具的作用,提出了一种基于半监督学习的代码自动生成性能评估方法,通过抽取程序员行为与代码自动生成工具行为的重要特征,划分代码自动生成的性能类别,建立了基于深度神经网络的代码自动生成过程性能评估模型,并计算程序员行为特征与代码自动生成工具行为特征对性能的影响程度.实验结果表明,该方法可以有效分析程序员行为与代码自动生成工具行为对代码自动生成过程性能的影响.  相似文献   

7.
黄凯 《微计算机信息》2012,(5):29-30,33
基于MDA方法,使用UML profile建立企业应用模型,并应用模型转换工具实现代码自动生成,通过修改PSM模型自动实现代码的修改。详细探讨了PS_.NET UML profile的构成和转换过程,通过应用实例,证明该转换方法可以很好地应用于企业应用系统开发,提高开发效率,减少开发代价和维护。  相似文献   

8.
软件代码提交是最重要的软件版本演化数据之一,被广泛应用于软件审查和软件理解中.对于程序员,提交的理解难度随着受影响的类数量、修改的代码量的增加而增加.本文通过对大量数据的分析发现,识别出提交中核心的修改类(关键类),以及为了完成这个核心修改所进行的依赖性改动的类(非关键类),能够辅助代码提交的理解.受机器学习技术在分类领域有效性的启发,本文提出一种基于机器学习的关键类识别方法,将判定提交中的关键类建模为二分类问题(即:关键和非关键类),从软件演化过程中产生的海量提交数据中抽取可判别性特征来度量类的关键性.在多个数据集上的实验结果表明,我们的方法判定关键类的综合准确率达到了87%;相比于开发人员直接理解提交,使用关键类信息提示来辅助理解提交能够显著提高开发人员的效率和正确率.  相似文献   

9.
代码转换是代码自动生成过程中的重要环节.提出一种基于模板、可适用于任意文法描述代码之间转换的"属性匹配-替换"算法.利用该算法,成功实现了OSEK规范中OIL语法描述代码到C语言代码的转换.  相似文献   

10.
许福  郝亮  陈飞翔  李冬梅  崔晓晖 《计算机工程》2020,46(1):222-228,242
开源代码复用是重要的软件开发模式,但开源许可证侵权与代码同步更新是当前开源代码复用中的2个主要问题。利用代码快照间的高度相似性特点,设计一种代码仓库的高效增量分析方法,在此基础上,利用Simhash算法将函数代码映射成函数指纹,提出以函数为基本分析单元的工程相似度计算方法,从而降低分析结果的存储空间并提高代码比对速度。设计3组实验分别从代码分析效率、工程相似度判定和函数更新检测方面进行评估,结果表明,该方法能满足开源代码复用中相似度检测和代码溯源的需求,且能够有效缩短总体分析时间。  相似文献   

11.
In software development, developers often need to change or update lost of similar codes. How to perform code transformation automatically has become a research hotspot in software engineering. An effective way is extracting the modification pattern from a set of similar code changes and applying it to automatic code transformation. In the related work, deep-learning-based approaches have achieved much progress, but they suffer from the problem of significant long-term dependency between the code. To address this challenge, an automatic code transformation method is proposed, namely ExpTrans. Based on the graph-based representations of code changes, ExpTrans is enhanced with the structural information of code. It labels the dependency between variables in code parsing and adopts the graph convolutional network and Transformer structure to capture the long-term dependency between the code. ExpTrans is first compared with existing learning-based approaches to evaluate its effectiveness; the results show that ExpTrans gains 11.8%--30.8% precision increment. Then, it is compared with rule-based approaches and the results demonstrate that ExpTrans significantly improves the correct rate of the modified instances.  相似文献   

12.
In this paper, we present an in-depth empirical study of a new metric, change dispersion, that measures the extent changes are scattered throughout the code of a software system. Intuitively, highly dispersed changes, the changes that are scattered throughout many software entities (such as files, classes, methods, and variables), should require more maintenance effort than the changes that only affect a few entities. In our research we investigate change dispersion on the code-base of a number of subject systems as a whole, and separately on each system's cloned and non-cloned code. Our central objective is to determine whether cloned code negatively affects software evolution and maintenance. The granularity of our focus is at the method level.Our experimental results on 16 open source subject systems written in four different programming languages (Java, C, C#, and Python) involving two clone detection tools (CCFinderX and NiCad) and considering three major types of clones (Type 1: exact, Type 2: dissimilar naming, and Type 3: some dissimilar code) suggests that change dispersion has a positive and statistically significant correlation with the change-proneness (or instability) of source code. Cloned code, especially in Java and C systems, often exhibits a higher change dispersion than non-cloned code. Also, changes to Type 3 clones are more dispersed compared to changes to Type 1 and Type 2 clones. According to our analysis, a primary cause of high change dispersion in cloned code is that clones from the same clone class often require corresponding changes to ensure they remain consistent.  相似文献   

13.
This paper presents the results of an empirical study on the subjective evaluation of code smells that identify poorly evolvable structures in software. We propose use of the term software evolvability to describe the ease of further developing a piece of software and outline the research area based on four different viewpoints. Furthermore, we describe the differences between human evaluations and automatic program analysis based on software evolvability metrics. The empirical component is based on a case study in a Finnish software product company, in which we studied two topics. First, we looked at the effect of the evaluator when subjectively evaluating the existence of smells in code modules. We found that the use of smells for code evaluation purposes can be difficult due to conflicting perceptions of different evaluators. However, the demographics of the evaluators partly explain the variation. Second, we applied selected source code metrics for identifying four smells and compared these results to the subjective evaluations. The metrics based on automatic program analysis and the human-based smell evaluations did not fully correlate. Based upon our results, we suggest that organizations should make decisions regarding software evolvability improvement based on a combination of subjective evaluations and code metrics. Due to the limitations of the study we also recognize the need for conducting more refined studies and experiments in the area of software evolvability.
Casper LasseniusEmail:
  相似文献   

14.
Specialization of code is used to improve the performance of the applications. However, specialization based on ineffective profiles deteriorates the performance. Existing value profiling algorithms are not yet able to address the issue of code size explosion incurred due to specialization of code. This problem can be mitigated by capturing data through profiling that would be useful for specialization of code with minimum code size.In this article, we present an approach to optimize code through value profiling and specialization with code transformation. The values of the parameters selected through an analysis of code are captured in the intervals which are automatically adapted to dynamic behavior of the application. The code is then specialized based on value profiles. The specialized code contains optimizations and may be converted back to the generalized code through a transformation. This approach facilitates the code to obtain optimizations through specialization with minimum size, and no runtime overhead.Using this approach, the experiments performed on Itanium-II (IA-64) architecture with icc compiler v 9.0 show a significant improvement in the performance of the SPEC 2000 benchmarks.  相似文献   

15.
张杨  东春浩  刘辉  葛楚妍 《软件学报》2022,33(5):1551-1568
目前已有的代码坏味检测方法仅依赖于代码结构信息和启发式规则, 对嵌入在不同层次代码中的语义信息关注不够, 而且现有的代码坏味检测方法准确率还有进一步提升的空间. 针对该问题, 提出一种基于预训练模型和多层次信息的代码坏味检测方法DeepSmell, 首先采用静态分析工具提取程序中的代码坏味实例和多层次代码度量信息, 并...  相似文献   

16.
17.
The source code of a software system is in constant change. The impact of these changes spreads out across the software system and may lead to the sudden manifestation of failures in unchanged parts. To help developers fix such failures, we propose a method that, in a pre-processing stage, analyzes prior code changes to determine what functions have been modified. Next, given a particular period of time in the past, the functions changed during that period are propagated throughout the rest of the system using the dependence graph of the system. This information is visualized using Change Impact Graphs (CIGs). Through a case study based on the Apache Web Server, we demonstrate the benefit of using CIGs to investigate several real defects.  相似文献   

18.
When reengineering software systems, maintainers should be able to assess and compare multiple change scenarios for a given goal, so as to choose the most pertinent one. Because they implicitly consider one single working copy, revision control systems do not scale up well to perform simultaneous analyses of multiple versions of systems. We designed Orion, an interactive prototyping tool for reengineering, to simulate changes and compare their impact on multiple versions of software source code models. Our approach offers an interactive simulation of changes, reuses existing assessment tools, and has the ability to hold multiple and branching versions simultaneously in memory. Specifically, we devise an infrastructure which optimizes memory usage of multiple versions for large models. This infrastructure uses an extension of the FAMIX source code meta-model but it is not limited to source code analysis tools since it can be applied to models in general. In this paper, we validate our approach by running benchmarks on memory usage and computation time of model queries on large models. Our benchmarks show that the Orion approach scales up well in terms of memory usage, while the current implementation could be optimized to lower its computation time. We also report on two large case studies on which we applied Orion.  相似文献   

19.
函数自动命名技术旨在为输入的源代码自动生成目标函数名,增强程序代码的可读性以及加速软件开发进程,是软件工程领域中一项重要的研究任务.现有基于机器学习的技术主要是通过序列模型对源代码进行编码,进而自动生成函数名,但存在长程依赖问题和代码结构编码问题.为了更好的提取程序中的结构信息和语义信息,本文提出了一个基于图卷积(Graph Convolutional Network,GCN)的神经网络模型—TrGCN(a Transformer and GCN based automatic method naming).TrGCN利用了Transformer中的自注意力机制来缓解长程依赖问题,同时采用Character-word注意力机制提取代码的语义信息.TrGCN引入了一种基于图卷积的AST Encoder结构,丰富了AST节点特征向量的信息,可以很好地对源代码结构信息进行建模.在实证研究中,使用了3个不同规模的数据集来评估TrGCN的有效性,实验结果表明TrGCN比当前广泛使用的模型code2seq和Sequence-GNNs能更好的自动生成函数名,其中F1分数分别提高了平均5.2%、2.1%.  相似文献   

20.
Conventional approaches to analyze the behavior of software applications are black box based, that is, the software application is treated as a whole and only its interactions with the outside world are modeled. The black box approaches ignore information about the internal structure of the application and the behavior of the individual parts. Hence, they are inadequate to model the behavior of a realistic software application, which is likely to be made up of several interacting parts. Architecture-based analysis, which seeks to assess the behavior of a software application taking into consideration the behavior of its parts and the interactions among the parts is thus essential. Most of the research in the area of architecture-based analysis has been devoted to developing analytical models, with very little, if any effort being devoted to how these models might be applied to real software applications. In order to apply these models to software applications, methods must be developed to extract the parameters of the analytical models from information collected during the execution of the application. In this paper, we present an experimental approach to extract the parameters of architecture-based models from code coverage measurements obtained during the execution of the application. To facilitate this, we use a coverage analysis tool called automatic test analyzer in C (ATAC), which is a part of Telcordia Software Visualization and Analysis Toolsuite (TSVAT) developed at Telcordia Technologies. We demonstrate the approach by predicting the performance and reliability of an application called Symbolic Hierarchical Automated Reliability Predictor (SHARPE), which has been widely used to solve stochastic models of reliability, performance and performability.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号