首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 718 毫秒
1.
Much of software developers' time is spent understanding unfamiliar code. To better understand how developers gain this understanding and how software development environments might be involved, a study was performed in which developers were given an unfamiliar program and asked to work on two debugging tasks and three enhancement tasks for 70 minutes. The study found that developers interleaved three activities. They began by searching for relevant code both manually and using search tools; however, they based their searches on limited and misrepresentative cues in the code, environment, and executing program, often leading to failed searches. When developers found relevant code, they followed its incoming and outgoing dependencies, often returning to it and navigating its other dependencies; while doing so, however, Eclipse's navigational tools caused significant overhead. Developers collected code and other information that they believed would be necessary to edit, duplicate, or otherwise refer to later by encoding it in the interactive state of Eclipse's package explorer, file tabs, and scroll bars. However, developers lost track of relevant code as these interfaces were used for other tasks, and developers were forced to find it again. These issues caused developers to spend, on average, 35 percent of their time performing the mechanics of navigation within and between source files. These observations suggest a new model of program understanding grounded in theories of information foraging and suggest ideas for tools that help developers seek, relate, and collect information in a more effective and explicit manner  相似文献   

2.
Source code examples are used by developers to implement unfamiliar tasks by learning from existing solutions. To better support developers in finding existing solutions, code search engines are designed to locate and rank code examples relevant to user’s queries. Essentially, a code search engine provides a ranking schema, which combines a set of ranking features to calculate the relevance between a query and candidate code examples. Consequently, the ranking schema places relevant code examples at the top of the result list. However, it is difficult to determine the configurations of the ranking schemas subjectively. In this paper, we propose a code example search approach that applies a machine learning technique to automatically train a ranking schema. We use the trained ranking schema to rank candidate code examples for new queries at run-time. We evaluate the ranking performance of our approach using a corpus of over 360,000 code snippets crawled from 586 open-source Android projects. The performance evaluation study shows that the learning-to-rank approach can effectively rank code examples, and outperform the existing ranking schemas by about 35.65 % and 48.42 % in terms of normalized discounted cumulative gain (NDCG) and expected reciprocal rank (ERR) measures respectively.  相似文献   

3.
陶传奇  包盼盼  黄志球  周宇  张智轶 《软件学报》2021,32(11):3351-3371
在软件开发的编程现场,有大量与当前开发任务相关的信息,比如代码上下文信息、用户开发意图等.如果能够根据已有的编程现场上下文给开发人员推荐当前代码行,不仅能够帮助开发人员更好地完成开发任务,还能提高软件开发的效率.而已有的一些方法通常是进行代码修复或者补全,又或者只是基于关键词匹配的搜索方法,很难达到推荐完整代码行的要求.针对上述问题,一种可行的解决方案是基于已有的海量源码数据,利用深度学习析取代码行的相关上下文因子,挖掘隐含的上下文信息,为精准推荐提供基础.因此,提出了一种基于深度学习的编程现场上下文深度感知的代码行推荐方法,能够在已有的大规模代码数据集中学习上下文之间潜在的关联关系,利用编程现场已有的源码数据和任务数据得到当前可能的代码行,并推荐Top-N给编程人员.代码行深度感知使用RNN Encoder-Decoder,该框架能够将编程现场已有的若干行上文代码行进行编码,得到一个包含已有代码行上下文信息的向量,然后根据该向量进行解码,得到预测的Top-N代码行输出.利用在开源平台上收集的大规模代码行数据集,对方法进行实验并测试,结果显示,该方法能够根据已有的上下文推荐相关的代码行给开发人员,Top-10的推荐准确率有60%左右,并且MRR值在0.3左右,表示用户满意的推荐项排在N个推荐结果中比较靠前的位置.  相似文献   

4.
Prior to performing a software change task, developers must discover and understand the subset of the system relevant to the task. Since the behavior exhibited by individual developers when investigating a software system is influenced by intuition, experience, and skill, there is often significant variability in developer effectiveness. To understand the factors that contribute to effective program investigation behavior, we conducted a study of five developers performing a change task on a medium-size open source system. We isolated the factors related to effective program investigation behavior by performing a detailed qualitative analysis of the program investigation behavior of successful and unsuccessful developers. We report on these factors as a set of detailed observations, such as evidence of the phenomenon of inattention blindness by developers skimming source code. In general, our results support the intuitive notion that a methodical and structured approach to program investigation is the most effective.  相似文献   

5.
Reverse engineering, also called reengineering, is used to modify systems that have functioned for many years, but which can no longer accomplish their intended tasks and, therefore, need to be updated. Reverse engineering can support the modification and extension of the knowledge in an already existing system. However, this can be an intricate task for a large, complex and poorly documented knowledge-based system. The rules in the knowledge base must be gathered, analyzed and understood, but also checked for verification and validation. We introduce an approach that uses reverse engineering for the knowledge in knowledge-based systems. The knowledge is encapsulated in rules, facts and conclusions, and in the relationships between them. Reverse engineering also collects functionality and source code. The outcome of reverse engineering is a model of the knowledge base, the functionality and the source code connected to the rules. These models are presented in diagrams using a graphic representation similar to Unified Modeling Language and employing ontology. Ontology is applied on top of rules, facts and relationships. From the diagrams, test cases are generated during the reverse engineering process and adopted to verify and validate the system.  相似文献   

6.
When coding to an application programming interface (API), developers often encounter difficulties, unsure of which class to subclass, which objects to instantiate, and which methods to call. Example source code that demonstrates the use of the API can help developers make progress on their task. This paper describes an approach to provide such examples in which the structure of the source code that the developer is writing is matched heuristically to a repository of source code that uses the API. The structural context needed to query the repository is extracted automatically from the code, freeing the developer from learning a query language or from writing their code in a particular style. The repository is generated automatically from existing applications, avoiding the need for handcrafted examples. We demonstrate that the approach is effective, efficient, and more reliable than traditional alternatives through four empirical studies  相似文献   

7.
As developers modify software entities such as functions or variables to introduce new features, enhance old ones, or fix bugs, they must ensure that other entities in the software system are updated to be consistent with these new changes. Many hard to find bugs are introduced by developers who did not notice dependencies between entities, and failed to propagate changes correctly. Most modern development environments offer tools to assist developers in propagating changes. For example, dependency browsers show static code dependencies between source code entities. Other sources of information such as historical co-change or code layout information could be used by tools to support developers in propagating changes. We present the Development Replay (DR) approach which empirically assess and compares the effectiveness of several not-yet-existing change propagation tools by reenacting the changes stored in source control repositories using these tools. We present a case study of five large open source systems with a total of over 40 years of development history. Our empirical results show that historical co-change information recovered from source control repositories along with code layout information can guide developers in propagating changes better than simple static dependency information.
Richard C. HoltEmail:
  相似文献   

8.
Program comprehension is an important skill for programmers – extending and debugging existing source code is part of the daily routine. Syntax highlighting is one of the most common tools used to support developers in understanding algorithms. However, most research in this area originates from a time when programmers used a completely different tool chain. We examined the influence of syntax highlighting on novices’ ability to comprehend source code. Additional analyses cover the influence of task type and programming experience on the code comprehension ability itself and its relation to syntax highlighting. We conducted a controlled experiment with 390 undergraduate students in an introductory Java programming course. We measured the correctness with which they solved small coding tasks. Each test subject received some tasks with syntax highlighting and some without. The data provided no evidence that syntax highlighting improves novices’ ability to comprehend source code. There are very few similar experiments and it is unclear as of yet which factors impact the effectiveness of syntax highlighting. One major limitation may be the types of tasks chosen for this experiment. The results suggest that syntax highlighting squanders a feedback channel from the IDE to the programmer that can be used more effectively.  相似文献   

9.
Searching application programming interfaces (APIs) is very important for developers to reuse software projects. Existing natural language based API search mainly faces the following challenges. 1) More accurate results are required as software projects evolve to be more heterogeneous and complex. 2) The semantic relationships between APIs (e.g., inheritances between classes, and invocations between methods) need to be illustrated so that developers can better understand their usage scenarios. To deal with these issues, we propose GeAPI, a novel graph embedding based approach for API graph search and recommendation in this paper. First, we build a software project's API graph automatically from its source code and represent each API using graph embedding methods. Second, we search the API graph with a question in natural language, and return the corresponding subgraph that is composed of relevant code elements and their associated relationships, as the best answer of the question. In experiments, we select three well-known open source projects, JodaTime, Apache Lucene and POI, as examples to perform API search tasks. The experimental results show that our approach GeAPI improves F1-score by 10% compared with the existing shortest path based API search approach, while reduces the average response time about 60 times.  相似文献   

10.
Interaction traces (ITs) are developers’ logs collected while developers maintain or evolve software systems. Researchers use ITs to study developers’ editing styles and recommend relevant program entities when developers perform changes on source code. However, when using ITs, they make assumptions that may not necessarily be true. This article assesses the extent to which researchers’ assumptions are true and examines noise in ITs. It also investigates the impact of noise on previous studies. This article describes a quasi-experiment collecting both Mylyn ITs and video-screen captures while 15 participants performed four realistic software maintenance tasks. It assesses the noise in ITs by comparing Mylyn ITs and the ITs obtained from the video captures. It proposes an approach to correct noise and uses this approach to revisit previous studies. The collected data show that Mylyn ITs can miss, on average, about 6% of the time spent by participants performing tasks and can contain, on average, about 85% of false edit events, which are not real changes to the source code. The approach to correct noise reveals about 45% of misclassification of ITs. It can improve the precision and recall of recommendation systems from the literature by up to 56% and 62%, respectively. Mylyn ITs include noise that biases subsequent studies and, thus, can prevent researchers from assisting developers effectively. They must be cleaned before use in studies and recommendation systems. The results on Mylyn ITs open new perspectives for the investigation of noise in ITs generated by other monitoring tools such as DFlow, FeedBag, and Mimec, and for future studies based on ITs.  相似文献   

11.
12.
The evolution of hardware is improving at an incredible rate. However, the advances in parallel software have been hampered for many reasons. Developing an efficient parallel application is still not an easy task. Applications rarely achieve good performances immediately and, therefore, careful performance analysis and optimization are crucial. These tasks are difficult and require a thorough understanding of the programs behavior. In this paper, we propose a systematic approach to online root-cause performance analysis. The automated analysis uses an online model to quickly identify the most important performance problems, and correlates them with application source code. Our technique is able to discover causal dependencies among the problems, infer their root causes and explain them to developers. In all of the scenarios we performed, this online modelling and analysis approach allowed us to understand the behavior of the applications, evaluate the performance and locate problem causes without specific knowledge of application internals.  相似文献   

13.
Developers need to comprehend a program before modifying it. In such cases, developers use cues to establish the relevance of a piece of information with a task. Being familiar with different kinds of cues will help the developers to comprehend a program faster. But unlike the experienced developers, novice developers fail to recognize the relevant cues, because (a) there are many cues and (b) they might be unfamiliar with the artifacts. However, not much is known about the developers’ choice of cue. Hence, we conducted two independent studies to understand the kind of cues used by the developers and how a tool influences their cue selection. First, from our user study on two common comprehension tasks, we found that developers actively seek the cues and their cue source choices are similar but task dependent. In our second exploratory study, we investigated whether an integrated development environment (IDE) influences a developer’s cue choices. By observing their interaction history while fixing bugs in Eclipse IDE, we found that the IDE’s influence on their cue choices was not statistically significant. Finally, as a case in point, we propose a novel task-specific program summarization approach to aid novice developers in comprehending an unfamiliar program. Our approach used developers cue choices to synthesize the summaries. A comparison of the synthesized summaries with the summaries recorded by the developers shows both had similar content. This promising result encourages us to explore task-specific cue models, which can aid novice developers to accomplish complex comprehension tasks faster.  相似文献   

14.
Making changes to software systems can prove costly and it remains a challenge to understand the factors that affect the costs of software evolution. This study sought to identify such factors by investigating the effort expended by developers to perform 336 change tasks in two different software organizations. We quantitatively analyzed data from version control systems and change trackers to identify factors that correlated with change effort. In-depth interviews with the developers about a subset of the change tasks further refined the analysis. Two central quantitative results found that dispersion of changed code and volatility of the requirements for the change task correlated with change effort. The analysis of the qualitative interviews pointed to two important, underlying cost drivers: Difficulties in comprehending dispersed code and difficulties in anticipating side effects of changes. This study demonstrates a novel method for combining qualitative and quantitative analysis to assess cost drivers of software evolution. Given our findings, we propose improvements to practices and development tools to manage and reduce the costs.  相似文献   

15.
在生产实际中,一个新的任务通常和已有任务存在一定的联系。迁移学习方法可以将已有数据集中的有用信息,迁移到新的任务,以减少重新建模过程中大量的时间和费用消耗。然而,由于任务之间的分布差异,在异构环境下如何避免负面迁移问题,仍未得到有效的解决。除了要衡量数据间的相似性,还需要衡量实例间的相关性,而大多数传统方法仅在一个层面进行操作。提出了基于压缩编码的迁移学习方法(TLCC),建立了两个层面的算法模型,具体来说,在数据层面,数据间的相似性可以表示为超平面分类器的编码长度,而在实例层面,通过进一步挑选出有价值的实例进行迁移,提升算法性能,避免负面迁移的发生。实验结果表明,提出的算法相比其他算法具有明显的优势,在噪声环境下也有较高的准确度。  相似文献   

16.
Topic models are generative probabilistic models which have been applied to information retrieval to automatically organize and provide structure to a text corpus. Topic models discover topics in the corpus, which represent real world concepts by frequently co-occurring words. Recently, researchers found topics to be effective tools for structuring various software artifacts, such as source code, requirements documents, and bug reports. This research also hypothesized that using topics to describe the evolution of software repositories could be useful for maintenance and understanding tasks. However, research has yet to determine whether these automatically discovered topic evolutions describe the evolution of source code in a way that is relevant or meaningful to project stakeholders, and thus it is not clear whether topic models are a suitable tool for this task.In this paper, we take a first step towards evaluating topic models in the analysis of software evolution by performing a detailed manual analysis on the source code histories of two well-known and well-documented systems, JHotDraw and jEdit. We define and compute various metrics on the discovered topic evolutions and manually investigate how and why the metrics evolve over time. We find that the large majority (87%–89%) of topic evolutions correspond well with actual code change activities by developers. We are thus encouraged to use topic models as tools for studying the evolution of a software system.  相似文献   

17.
杨程  范强  王涛  尹刚  王怀民 《软件学报》2017,28(6):1357-1372
随着软件协同开发技术与社交网络的深度融合,社交化开发范式已成为当前软件创作与生产的重要方式。这一软件开发模型的灵活性与开放性,吸引了大规模的外围贡献者加入到开源社区中,形成了巨大的软件生产力。在开源社区中,这些分布广泛、规模巨大的外围贡献者主要以一种无组织的松散方式进行协同。他们需要花费大量的时间和精力,在海量的开源项目中寻找到自己真正感兴趣的项目并进行长期贡献。为了提高大规模群体协同的效率,本文提出一种基于多维特征的开源项目个性化推荐方法(即RepoLike)。该方法从开源项目自身流行度、关联项目技术相关度以及大众贡献者之间的社交关联度等三个维度度量开发者和开源项目之间的关联关系,并利用线性组合和Learning To Rank方法构建推荐模型,从而为开发者提供个性化的项目推荐服务。通过大规模的实证实验表明,RepoLike在推荐20个候选项目时的推荐命中率超过25%,能够有效地为开发人员提供有价值的推荐服务。  相似文献   

18.
Application developers utilizing event-based middleware have sought to leverage domain-specific modeling for the advantages of intuitive specification, code synthesis, and support for design evolution. For legacy and cyber-physical systems, the use of event-based middleware may mean that changes in computational platform can result anomalous system behavior, due to the presence of implicit temporal dependencies. These anomalies are a function not of the component implementation, but of the model of computation employed for supporting system composition. In order to address these behavioral anomalies, the paper presents an approach where time-based blocks are inserted into the system to account for the temporal dependencies. An advantage of capturing the system composition in a domain-specific modeling language is the ability to efficiently refactor an application to include time-triggered, event-based schedulers. This paper describes how an existing event-based component topology can be modified to permit a time-triggered model of computation, with no changes to the existing component software. Further, the time-triggered components can be deployed alongside standard publish/subscribe methodologies. This strategy is beneficial to the maintenance of existing legacy systems upon upgrade, since the current operational mode could be maintained with minimal changes to the legacy software even under changes to the target platform which alter execution speed. These time-triggered layers are discussed in three permutations: fully triggered, start triggered, and release triggered. A discussion is provided regarding the limitations of each approach, and a brief example is given. The example shows how to apply these triggering approaches without the modification of existing components, but instead through the insertion of triggered buffers between legacy components.  相似文献   

19.
Test suites are a valuable source of up-to-date documentation as developers continuously modify them to reflect changes in the production code and preserve an effective regression suite. While maintaining traceability links between unit test and the classes under test can be useful to selectively retest code after a change, the value of having traceability links goes far beyond this potential savings. One key use is to help developers better comprehend the dependencies between tests and classes and help maintain consistency during refactoring. Despite its importance, test-to-code traceability is not common in software development and, when needed, traceability information has to be recovered during software development and evolution. We propose an advanced approach, named SCOTCH+ (Source code and COncept based Test to Code traceability Hunter), to support the developer during the identification of links between unit tests and tested classes. Given a test class, represented by a JUnit class, the approach first exploits dynamic slicing to identify a set of candidate tested classes. Then, external and internal textual information associated with the classes retrieved by slicing is analyzed to refine this set of classes and identify the final set of candidate tested classes. The external information is derived from the analysis of the class name, while internal information is derived from identifiers and comments. The approach is evaluated on five software systems. The results indicate that the accuracy of the proposed approach far exceeds the leading techniques found in the literature.  相似文献   

20.
Our current understanding of how programmers perform feature location during software maintenance is based on controlled studies or interviews, which are inherently limited in size, scope and realism. Replicating controlled studies in the field can both explore the findings of these studies in wider contexts and study new factors that have not been previously encountered in the laboratory setting. In this paper, we report on a field study about how software developers perform feature location within source code during their daily development activities. Our study is based on two complementary field data sets: one that reflects complete IDE activity of 67 professional developers over approximately one month, and the other that reflects usage of an IR-based code search tool by nearly 600 developers. Analyzing this data, we report results on how often developers use which type of code search tools, on the types of queries and retreival strategies used by developers, and on patterns of developer feature location behavior following code search. The results of the study suggest that there is (1) a need for helping developers to devise better code search queries; (2) a lack of adoption of niche code search tools; (3) a need for code search tool to handle both lookup and exploratory queries; and (4) a need for better integration between code search, structured navigation, and debugging tools in feature location tasks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号