首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Inconsistent identifiers make it difficult for developers to understand source code. In particular, large software systems written by several developers can be vulnerable to identifier inconsistency. Unfortunately, it is not easy to detect inconsistent identifiers that are already used in source code. Although several techniques have been proposed to address this issue, many of these techniques can result in false alarms since such techniques do not accept domain words and idiom identifiers that are widely used in programming practice. This paper proposes an approach to detecting inconsistent identifiers based on a custom code dictionary. It first automatically builds a Code Dictionary from the existing API documents of popular Java projects by using an Natural Language Processing (NLP) parser. This dictionary records domain words with dominant part-of-speech (POS) and idiom identifiers. This set of domain words and idioms can improve the accuracy when detecting inconsistencies by reducing false alarms. The approach then takes a target program and detects inconsistent identifiers of the program by leveraging the Code Dictionary. We provide CodeAmigo, a GUI-based tool support for our approach. We evaluated our approach on seven Java based open-/proprietary- source projects. The results of the evaluations show that the approach can detect inconsistent identifiers with 85.4 % precision and 83.59 % recall values. In addition, we conducted an interview with developers who used our approach, and the interview confirmed that inconsistent identifiers frequently and inevitably occur in most software projects. The interviewees then stated that our approach can help to better detect inconsistent identifiers that would have been missed through manual detection.  相似文献   

2.
牛长安  葛季栋  唐泽  李传艺  周宇  骆斌 《软件学报》2021,32(7):2142-2165
代码注释在软件质量保障中发挥着重要的作用,它可以提升代码的可读性,使代码更易理解、重用和维护.但是出于各种各样的原因,有时开发者并没有添加必要的注释,使得在软件维护的过程中,往往需要花费大量的时间来理解代码,大大降低了软件维护的效率.近年来,多项工作利用机器学习技术自动生成代码注释,这些方法从代码中提取出语义和结构化信...  相似文献   

3.
Source code comments are a valuable instrument to preserve design decisions and to communicate the intent of the code to programmers and maintainers. Nevertheless, commenting source code and keeping comments up-to-date is often neglected for reasons of time or programmers obliviousness. In this paper, we investigate the question whether developers comment their code and to what extent they add comments or adapt them when they evolve the code. We present an approach to associate comments with source code entities to track their co-evolution over multiple versions. A set of heuristics are used to decide whether a comment is associated with its preceding or its succeeding source code entity. We analyzed the co-evolution of code and comments in eight different open source and closed source software systems. We found with statistical significance that (1) the relative amount of comments and source code grows at about the same rate; (2) the type of a source code entity, such as a method declaration or an if-statement, has a significant influence on whether or not it gets commented; (3) in six out of the eight systems, code and comments co-evolve in 90% of the cases; and (4) surprisingly, API changes and comments do not co-evolve but they are re-documented in a later revision. As a result, our approach enables a quantitative assessment of the commenting process in a software system. We can, therefore, leverage the results to provide feedback during development to increase the awareness of when to add comments or when to adapt comments because of source code changes.  相似文献   

4.
Etzkorn  L.H. Davis  C.G. 《Computer》1997,30(10):66-71
Much object oriented code has been written without reuse in mind, making identification of useful components difficult. The Patricia (Program Analysis Tool for Reuse) system automatically identifies these components through understanding comments and identifiers. To understand a program, Patricia uses a unique heuristic approach, deriving information from the linguistic aspects of comments and identifiers and from other nonlinguistic aspects of OO code, such as a class hierarchy. In developing the Patricia system, we had to overcome the problems of syntactically parsing natural language comments and syntactically analyzing identifiers-all prior to a semantic understanding of the comments and identifiers. Another challenge was the semantic understanding phase, when the organization of the knowledge base and an inferencing scheme were developed  相似文献   

5.
马赛  董东 《计算机科学》2017,44(Z6):495-498
Large Class(上帝类)是面向对象设计中的一种设计瑕疵。为了弥补传统Large Class检测中使用面向代码结构度量的不足,提出基于潜在语义分析的平均概念相似性度量。根据源代码中提取的标识符和注释形成词-文档矩阵,在潜在语义空间下计算方法间的相似度,进而得到类的平均概念相似性;并将概念性度量与代码圈复杂度结合以对Large Class进行识别。在开源的Code Smell检测数据集Landfill上进行实验,结果表明,与传统上使用结构信息对Large Class进行检测相比,使用该方法时检测的准确率和召回率均得到了一定提升。  相似文献   

6.
黄袁  贾楠  周强  陈湘萍  熊英飞  罗笑南 《软件学报》2018,29(8):2226-2242
代码注释是辅助编程人员理解源代码的有效手段之一.高质量的注释决策不仅能覆盖软件系统中的核心代码片段,还能避免产生多余的代码注释.然而,在实际开发中,编程人员往往缺乏统一的注释规范,大部分的注释决策都取决于个人经验以及领域知识.对于新手程序员来说,注释决策显然成为了一项重要而艰巨的任务.为了减少编程人员投入过多的精力在注释决策上,文章从大量的代码注释实例中学习出一种通用的注释决策规范,并提出了一种新颖的CommentAdviser方法用以辅助编程人员在代码开发过程中做出恰当的注释决策.由于注释决策与代码本身的上下文信息密切相关,因此,从当前代码行的上下文代码中提取代码结构特征以及代码语义特征作为支持注释决策的主要依据.然后,利用机器学习算法判定当前代码行是否为可能的注释点.在GitHub中的10个大型开源软件的数据集上评估了我们提出的方法,实验结果以及用户调研表明代码注释决策支持方法CommentAdviser的可行性和有效性.  相似文献   

7.
王潮  徐卫伟  周明辉 《软件学报》2024,35(2):513-531
代码注释作为辅助软件开发群体协作的关键机制,被开发者所广泛使用以提升开发效率.然而,由于代码注释并不直接影响软件运行,使其常被开发者忽视,导致出现代码注释质量问题,进而影响开发效率.代码注释中存在的质量问题会影响开发者理解相关代码,甚至可能产生误解从而引入代码缺陷,因此这一问题受到研究者的广泛关注.采用系统文献调研,对近年来国内外学者在代码注释质量问题上的研究工作进行系统的分析.从代码注释质量的评价维度、度量指标以及提升策略这3个方面总结研究现状,并提出当前研究所存在的不足、挑战及建议.  相似文献   

8.
The maintainability of source code is a key quality characteristic for software quality. Many approaches have been proposed to quantitatively measure code maintainability. Such approaches rely heavily on code metrics, e.g., the number of Lines of Code and McCabe’s Cyclomatic Complexity. The employed code metrics are essentially statistics regarding code elements, e.g., the numbers of tokens, lines, references, and branch statements. However, natural language in source code, especially identifiers, is rarely exploited by such approaches. As a result, replacing meaningful identifiers with nonsense tokens would not significantly influence their outputs, although the replacement should have significantly reduced code maintainability. To this end, in this paper, we propose a novel approach (called DeepM) to measure code maintainability by exploiting the lexical semantics of text in source code. DeepM leverages deep learning techniques (e.g., LSTM and attention mechanism) to exploit these lexical semantics in measuring code maintainability. Another key rationale of DeepM is that measuring code maintainability is complex and often far beyond the capabilities of statistics or simple heuristics. Consequently, DeepM leverages deep learning techniques to automatically select useful features from complex and lengthy inputs and to construct a complex mapping (rather than simple heuristics) from the input to the output (code maintainability index). DeepM is evaluated on a manually-assessed dataset. The evaluation results suggest that DeepM is accurate, and it generates the same rankings of code maintainability as those of experienced programmers on 87.5% of manually ranked pairs of Java classes.  相似文献   

9.
Test suites are a valuable source of up-to-date documentation as developers continuously modify them to reflect changes in the production code and preserve an effective regression suite. While maintaining traceability links between unit test and the classes under test can be useful to selectively retest code after a change, the value of having traceability links goes far beyond this potential savings. One key use is to help developers better comprehend the dependencies between tests and classes and help maintain consistency during refactoring. Despite its importance, test-to-code traceability is not common in software development and, when needed, traceability information has to be recovered during software development and evolution. We propose an advanced approach, named SCOTCH+ (Source code and COncept based Test to Code traceability Hunter), to support the developer during the identification of links between unit tests and tested classes. Given a test class, represented by a JUnit class, the approach first exploits dynamic slicing to identify a set of candidate tested classes. Then, external and internal textual information associated with the classes retrieved by slicing is analyzed to refine this set of classes and identify the final set of candidate tested classes. The external information is derived from the analysis of the class name, while internal information is derived from identifiers and comments. The approach is evaluated on five software systems. The results indicate that the accuracy of the proposed approach far exceeds the leading techniques found in the literature.  相似文献   

10.
王亚芳  刘东升  侯敏 《计算机应用》2019,39(7):2074-2080
目前在代码克隆检测领域,学者们主要从文本、词汇、语法和语义四种角度展开研究,然而长期以来代码克隆检测效果并未取得新的突破。针对这一问题,从图像处理角度提出了一种基于图像相似度的新型代码克隆检测(CCIS)方法。首先对源代码进行移除注释、空白符等操作,以获取"干净"的函数片段,并将函数中的标识符、关键字等进行高亮处理;然后将处理好的源代码转换为图像,并对图像进行规范化处理;最后使用Jaccard距离和感知哈希算法进行检测,得到代码克隆信息。为了验证实验的有效性,使用6款开源软件构建评价数据集进行测试。实验结果表明,CCIS方法能够检测出100%的类型一代码克隆、88%的类型二代码克隆与60%的类型三代码克隆,因此CCIS方法可以很好地进行代码克隆检测。  相似文献   

11.
Recovering traceability links between code and documentation   总被引:2,自引:0,他引:2  
Software system documentation is almost always expressed informally in natural language and free text. Examples include requirement specifications, design documents, manual pages, system development journals, error logs, and related maintenance reports. We propose a method based on information retrieval to recover traceability links between source code and free text documents. A premise of our work is that programmers use meaningful names for program items, such as functions, variables, types, classes, and methods. We believe that the application-domain knowledge that programmers process when writing the code is often captured by the mnemonics for identifiers; therefore, the analysis of these mnemonics can help to associate high-level concepts with program concepts and vice-versa. We apply both a probabilistic and a vector space information retrieval model in two case studies to trace C++ source code onto manual pages and Java code to functional requirements. We compare the results of applying the two models, discuss the benefits and limitations, and describe directions for improvements.  相似文献   

12.
Technical debt is a metaphor for seeking short-term gains at expense of long-term code quality. Previous studies have shown that self-admitted technical debt, which is introduced intentionally, has strong negative impacts on software development and incurs high maintenance overheads. To help developers identify self-admitted technical debt, researchers have proposed many state-of-the-art methods. However, there is still room for improvement about the effectiveness of the current methods, as self-admitted technical debt comments have the characteristics of length variability, low proportion and style diversity. Therefore, in this paper, we propose a novel approach based on the bidirectional long short-term memory (BiLSTM) networks with the attention mechanism to automatically detect self-admitted technical debt by leveraging source code comments. In BiLSTM, we utilize a balanced cross entropy loss function to overcome the class unbalance problem. We experimentally investigate the performance of our approach on a public dataset including 62, 566 code comments from ten open source projects. Experimental results show that our approach achieves 81.75% in terms of precision, 72.24% in terms of recall and 75.86% in terms of F1-score on average and outperforms the state-of-the-art text mining-based method by 8.14%, 5.49% and 6.64%, respectively.  相似文献   

13.
随着IT社区和代码托管平台的发展,针对代码的用户评论数量急剧增加。用户在使用代码后给出的评论中包含丰富的静态和动态代码质量信息,对其进行提取与分析将有助于开发者了解用户关注的代码质量信息,以有针对性地提升代码质量,还有助于用户选择满足要求的代码。为此,文中提出了包含静态特性和动态特性的代码质量模型,以及识别并分析用户评论中代码质量信息的方法。首先,根据评价对象和评价句型规则识别出具有代码质量的用户评论;然后,应用评价对象和评价观点抽取代码质量属性表现;最后,通过分析代码质量属性表现和情感倾向给出代码静态和动态质量的相关结果。实验结果表明,所提方法能够有效地分析用户评论中的代码质量信息。  相似文献   

14.
15.
A large amount of open source code is now available online, presenting a great potential resource for software developers. This has motivated software engineering researchers to develop tools and techniques to allow developers to reap the benefits of these billions of lines of source code. However, collecting and analyzing such a large quantity of source code presents a number of challenges. Although the current generation of open source code search engines provides access to the source code in an aggregated repository, they generally fail to take advantage of the rich structural information contained in the code they index. This makes them significantly less useful than Sourcerer for building state-of-the-art software engineering tools, as these tools often require access to both the structural and textual information available in source code.We have developed Sourcerer, an infrastructure for large-scale collection and analysis of open source code. By taking full advantage of the structural information extracted from source code in its repository, Sourcerer provides a foundation upon which state-of-the-art search engines and related tools can easily be built. We describe the Sourcerer infrastructure, present the applications that we have built on top of it, and discuss how existing tools could benefit from using Sourcerer.  相似文献   

16.
在如今的软件开发中, 开源软件的使用越来越普遍, 但是对大型开源软件的理解和维护仍然是一项复杂的工作. 开源软件通常缺乏完善的文档和注释, 想要完整的理解开源系统难度较大, 研究界产生了一种通过分析大型开源软件的源代码, 进而深入理解系统, 发现和修复系统漏洞的软件分析型任务. 源代码分析注释是软件分析型任务的一项重要产出, 它是一种以注释形式存在的细粒度代码分析报告, 数量庞大, 难以快速做出质量评价. 在传统的软件质量评价中, 对注释的评价通常局限于覆盖度和文本长度, 不能满足源代码分析注释质量评价的要求. 为了更好的评价源代码分析注释的质量, 本文结合现有的对代码注释质量评价的研究以及信息质量领域的评价方法, 提出了一种综合考虑客观质量属性和主观质量属性的质量评价框架. 结合实际的项目数据分析, 本文的方法可以更有效的检测出注释中的冗余以及无关内容, 发现相关质量问题, 从而对源代码分析注释进行更全面的质量评价.  相似文献   

17.
When coding to an application programming interface (API), developers often encounter difficulties, unsure of which class to subclass, which objects to instantiate, and which methods to call. Example source code that demonstrates the use of the API can help developers make progress on their task. This paper describes an approach to provide such examples in which the structure of the source code that the developer is writing is matched heuristically to a repository of source code that uses the API. The structural context needed to query the repository is extracted automatically from the code, freeing the developer from learning a query language or from writing their code in a particular style. The repository is generated automatically from existing applications, avoiding the need for handcrafted examples. We demonstrate that the approach is effective, efficient, and more reliable than traditional alternatives through four empirical studies  相似文献   

18.
软件复用是在软件开发中避免重复劳动的解决方案.在复用一个已有的软件项目时,软件开发人员通常需要理解某些代码元素以及其间的关联关系,称之为代码结构.软件开发人员一般通过浏览软件源代码的方式理解代码结构.由于源代码往往规模较大且结构复杂,理解代码结构通常会耗费大量的时间与精力.因此,将软件开发人员想要理解的代码结构自动、清晰地展示出来是很有帮助的.提出一种基于图数据库的代码结构解析与搜索方法以实现这一目的.这一方法可对软件的代码结构进行解析,并在图数据库中对其进行有效的组织和管理.搜索时,软件开发人员输入自然语言查询语句,该方法中的搜索机制会分析查询语句,并从图数据库中截取出与其相对应的代码结构进行展示.该方法具有高度的可扩展性:不同粒度的结点与多样化的关联关系可以容易地存储进图数据库中,且面向不同搜索目的的代码结构搜索算法亦可以容易地集成进搜索机制中.这一方法已在相应的工具中得到了实现,其有效性在一个商业案例研究中得到了验证.  相似文献   

19.
徐海燕  姜瑛 《软件学报》2021,32(7):2183-2203
随着开发者社区和代码托管平台成为程序员获取代码的主要途径,针对代码的用户评论数量急剧增加.用户在使用代码后给出的评论中包含多种静态和动态的代码质量属性信息,但是由于用户评论多为复杂句,使得评论中包含的代码质量属性难以判断.针对复杂用户评论的代码质量属性判断将有助于分析用户评论中的代码质量信息,有助于开发者在了解用户的代...  相似文献   

20.
Predicting source code changes by mining change history   总被引:1,自引:0,他引:1  
Software developers are often faced with modification tasks that involve source which is spread across a code base. Some dependencies between source code, such as those between source code written in different languages, are difficult to determine using existing static and dynamic analyses. To augment existing analyses and to help developers identify relevant source code during a modification task, we have developed an approach that applies data mining techniques to determine change patterns - sets of files that were changed together frequently in the past - from the change history of the code base. Our hypothesis is that the change patterns can be used to recommend potentially relevant source code to a developer performing a modification task. We show that this approach can reveal valuable dependencies by applying the approach to the Eclipse and Mozilla open source projects and by evaluating the predictability and interestingness of the recommendations produced for actual modification tasks on these systems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号