首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
A large amount of open source code is now available online, presenting a great potential resource for software developers. This has motivated software engineering researchers to develop tools and techniques to allow developers to reap the benefits of these billions of lines of source code. However, collecting and analyzing such a large quantity of source code presents a number of challenges. Although the current generation of open source code search engines provides access to the source code in an aggregated repository, they generally fail to take advantage of the rich structural information contained in the code they index. This makes them significantly less useful than Sourcerer for building state-of-the-art software engineering tools, as these tools often require access to both the structural and textual information available in source code.We have developed Sourcerer, an infrastructure for large-scale collection and analysis of open source code. By taking full advantage of the structural information extracted from source code in its repository, Sourcerer provides a foundation upon which state-of-the-art search engines and related tools can easily be built. We describe the Sourcerer infrastructure, present the applications that we have built on top of it, and discuss how existing tools could benefit from using Sourcerer.  相似文献   

2.
Developers commonly make use of a web search engine such as Google to locate online resources to improve their productivity. A better understanding of what developers search for could help us understand their behaviors and the problems that they meet during the software development process. Unfortunately, we have a limited understanding of what developers frequently search for and of the search tasks that they often find challenging. To address this gap, we collected search queries from 60 developers, surveyed 235 software engineers from more than 21 countries across five continents. In particular, we asked our survey participants to rate the frequency and difficulty of 34 search tasks which are grouped along the following seven dimensions: general search, debugging and bug fixing, programming, third party code reuse, tools, database, and testing. We find that searching for explanations for unknown terminologies, explanations for exceptions/error messages (e.g., HTTP 404), reusable code snippets, solutions to common programming bugs, and suitable third-party libraries/services are the most frequent search tasks that developers perform, while searching for solutions to performance bugs, solutions to multi-threading bugs, public datasets to test newly developed algorithms or systems, reusable code snippets, best industrial practices, database optimization solutions, solutions to security bugs, and solutions to software configuration bugs are the most difficult search tasks that developers consider. Our study sheds light as to why practitioners often perform some of these tasks and why they find some of them to be challenging. We also discuss the implications of our findings to future research in several research areas, e.g., code search engines, domain-specific search engines, and automated generation and refinement of search queries.  相似文献   

3.
As developers modify software entities such as functions or variables to introduce new features, enhance old ones, or fix bugs, they must ensure that other entities in the software system are updated to be consistent with these new changes. Many hard to find bugs are introduced by developers who did not notice dependencies between entities, and failed to propagate changes correctly. Most modern development environments offer tools to assist developers in propagating changes. For example, dependency browsers show static code dependencies between source code entities. Other sources of information such as historical co-change or code layout information could be used by tools to support developers in propagating changes. We present the Development Replay (DR) approach which empirically assess and compares the effectiveness of several not-yet-existing change propagation tools by reenacting the changes stored in source control repositories using these tools. We present a case study of five large open source systems with a total of over 40 years of development history. Our empirical results show that historical co-change information recovered from source control repositories along with code layout information can guide developers in propagating changes better than simple static dependency information.
Richard C. HoltEmail:
  相似文献   

4.
Developers need to comprehend a program before modifying it. In such cases, developers use cues to establish the relevance of a piece of information with a task. Being familiar with different kinds of cues will help the developers to comprehend a program faster. But unlike the experienced developers, novice developers fail to recognize the relevant cues, because (a) there are many cues and (b) they might be unfamiliar with the artifacts. However, not much is known about the developers’ choice of cue. Hence, we conducted two independent studies to understand the kind of cues used by the developers and how a tool influences their cue selection. First, from our user study on two common comprehension tasks, we found that developers actively seek the cues and their cue source choices are similar but task dependent. In our second exploratory study, we investigated whether an integrated development environment (IDE) influences a developer’s cue choices. By observing their interaction history while fixing bugs in Eclipse IDE, we found that the IDE’s influence on their cue choices was not statistically significant. Finally, as a case in point, we propose a novel task-specific program summarization approach to aid novice developers in comprehending an unfamiliar program. Our approach used developers cue choices to synthesize the summaries. A comparison of the synthesized summaries with the summaries recorded by the developers shows both had similar content. This promising result encourages us to explore task-specific cue models, which can aid novice developers to accomplish complex comprehension tasks faster.  相似文献   

5.
Predicting source code changes by mining change history   总被引:1,自引:0,他引:1  
Software developers are often faced with modification tasks that involve source which is spread across a code base. Some dependencies between source code, such as those between source code written in different languages, are difficult to determine using existing static and dynamic analyses. To augment existing analyses and to help developers identify relevant source code during a modification task, we have developed an approach that applies data mining techniques to determine change patterns - sets of files that were changed together frequently in the past - from the change history of the code base. Our hypothesis is that the change patterns can be used to recommend potentially relevant source code to a developer performing a modification task. We show that this approach can reveal valuable dependencies by applying the approach to the Eclipse and Mozilla open source projects and by evaluating the predictability and interestingness of the recommendations produced for actual modification tasks on these systems.  相似文献   

6.
A software repository provides a central information source for understanding and reengineering code in a software project. Complex reverse engineering tools can be built by analyzing information stored in the repository without reparsing the original source code. The most critical design aspect of a repository is its data model, which directly affects how effectively the repository supports various analysis tasks. This paper focuses on the design rationales behind a data model for a C++ software repository that supports reachability analysis and dead code detection at the declaration level. These two tasks are frequently needed in large software projects to help remove excess software baggage, select regression tests and support software reuse studies. The language complexity introduced by class inheritance, friendship, and template instantiation in C++ requires a carefully designed model to catch all necessary dependencies for correct reachability analysis. We examine the major design decisions and their consequences in our model and illustrate how future software repositories can be evaluated for completeness at a selected abstraction level. Examples are given to illustrate how our model also supports variants of reachability analysis: impact analysis, class visibility analysis, and dead code detection. Finally, we discuss the implementation and experience of our analysis tools on a few C++ software projects  相似文献   

7.
ContextDevelopers often need to find answers to questions regarding the evolution of a system when working on its code base. While their information needs require data analysis pertaining to different repository types, the source code repository has a pivotal role for program comprehension tasks. However, the coarse-grained nature of the data stored by commit-based software configuration management systems often makes it challenging for a developer to search for an answer.ObjectiveWe present Replay, an Eclipse plug-in that allows developers to explore the change history of a system by capturing the changes at a finer granularity level than commits, and by replaying the past changes chronologically inside the integrated development environment, with the source code at hand.MethodWe conducted a controlled experiment to empirically assess whether Replay outperforms a baseline (SVN client in Eclipse) on helping developers to answer common questions related to software evolution.ResultsThe experiment shows that Replay leads to a decrease in completion time with respect to a set of software evolution comprehension tasks.ConclusionWe conclude that there are benefits in using Replay over the state of the practice tools for answering questions that require fine-grained change information and those related to recent changes.  相似文献   

8.
Our current understanding of how programmers perform feature location during software maintenance is based on controlled studies or interviews, which are inherently limited in size, scope and realism. Replicating controlled studies in the field can both explore the findings of these studies in wider contexts and study new factors that have not been previously encountered in the laboratory setting. In this paper, we report on a field study about how software developers perform feature location within source code during their daily development activities. Our study is based on two complementary field data sets: one that reflects complete IDE activity of 67 professional developers over approximately one month, and the other that reflects usage of an IR-based code search tool by nearly 600 developers. Analyzing this data, we report results on how often developers use which type of code search tools, on the types of queries and retreival strategies used by developers, and on patterns of developer feature location behavior following code search. The results of the study suggest that there is (1) a need for helping developers to devise better code search queries; (2) a lack of adoption of niche code search tools; (3) a need for code search tool to handle both lookup and exploratory queries; and (4) a need for better integration between code search, structured navigation, and debugging tools in feature location tasks.  相似文献   

9.
Software developers often need to understand a large body of unfamiliar code with little or no documentation, no experts to consult, and little time to do it. A post appeared in January 2008 on Slashdot, a technology news Web site, asking for tools and techniques that could help. This article analyzes 301 often passionate and sometimes articulate responses to this query, including the themes and the associated tool recommendations. The most common suggestions were to use a code navigation tool, use a design recovery tool, use a debugger to step through the code, create a runtime trace, use problem-based learning, ask people for help, study the code from top down, and print out all the code. This analysis presents an intriguing snapshot of how software developers in industry go about comprehending big code.  相似文献   

10.
由于嵌入式系统日趋复杂,程序的复杂程度也与日俱增,这导致程序异常跳转、数组越界、堆栈溢出等异常现象时有发生,但编译器以及专业的静态代码扫描工具无法发现此类问题。程序运行时一旦发生此类异常,往往会导致系统死机等严重故障,系统死机后,留给开发人员的有用信息一般很少,常常需要花很多时间及精力才能查出导致程序异常的具体原因,因此,开发一种高效的程序异常检测手段就显得十分必要。利用DSP软件编译器的插桩功能,可完整记录程序异常前的运行轨迹数据,通过分析这些轨迹数据可重建程序异常前的运行轨迹,利用这些信息,开发人员可高效地查出程序异常的具体原因。  相似文献   

11.
12.
Interaction traces (ITs) are developers’ logs collected while developers maintain or evolve software systems. Researchers use ITs to study developers’ editing styles and recommend relevant program entities when developers perform changes on source code. However, when using ITs, they make assumptions that may not necessarily be true. This article assesses the extent to which researchers’ assumptions are true and examines noise in ITs. It also investigates the impact of noise on previous studies. This article describes a quasi-experiment collecting both Mylyn ITs and video-screen captures while 15 participants performed four realistic software maintenance tasks. It assesses the noise in ITs by comparing Mylyn ITs and the ITs obtained from the video captures. It proposes an approach to correct noise and uses this approach to revisit previous studies. The collected data show that Mylyn ITs can miss, on average, about 6% of the time spent by participants performing tasks and can contain, on average, about 85% of false edit events, which are not real changes to the source code. The approach to correct noise reveals about 45% of misclassification of ITs. It can improve the precision and recall of recommendation systems from the literature by up to 56% and 62%, respectively. Mylyn ITs include noise that biases subsequent studies and, thus, can prevent researchers from assisting developers effectively. They must be cleaned before use in studies and recommendation systems. The results on Mylyn ITs open new perspectives for the investigation of noise in ITs generated by other monitoring tools such as DFlow, FeedBag, and Mimec, and for future studies based on ITs.  相似文献   

13.
Searching for relevant code in the local code base is a common activity during software maintenance. However, previous research indicates that 88% of manually composed search queries retrieve no relevant results. One reason that many searches fail is existing search tools’ dependence on string matching algorithms, which cannot find semantically related code. To solve this problem by helping developers compose better queries, researchers have proposed numerous query recommendation techniques, relying on a variety of dictionaries and algorithms. However, few of these techniques are empirically evaluated by usage data from real-world developers. To fill this gap, we designed a multi-recommendation system that relies on the cooperation between several query recommendation techniques. We implemented and deployed this recommendation system within the Sando code search tool and conducted a longitudinal field study. Our study shows that over 34% of all queries were adopted from recommendation; and recommended queries retrieved results 11% more often than manual queries.  相似文献   

14.
When analyzing legacy code, generating a high‐level model of an application during the reverse engineering process helps the developers understand how the application is structured and how the dependencies relate the different software entities. Within the context of procedural programming languages (such as C), the existing approaches to get a model of the code require documentation and/or implicit knowledge that stakeholders acquire during the software building. These approaches use the code itself to build a syntactic model where we see the different software artifacts, such as variables, functions, and modules. However, there is no supporting methodology to detect and analyze if there are relationships/dependencies between those artifacts, such as which variable in a module is declared using an abstract data type described in another one, or which are the functions that are using parameters typed with an abstract data type; or any design decision taken by original developers, such as how the developer has implemented functions in different modules. On the other hand, current developers use object‐oriented (OO) paradigm to implement not only business applications but also useful methodologies/tools that allow semiautomatic analysis of any application. We must remark the legacy procedural code still has worth and is working in several industries, and as any evolving code, the developers have to be able to perform maintenance tasks minimizing the limitations offered by the language. Based on useful properties that the OO paradigm (and their supporting analysis tools) provide, such as UML models, we propose M2K as a methodology to generate a high‐level model from legacy procedural code, mainly written in Ansi C. To understand how C‐based applications were implemented is not a new problem in software reengineering. However, our contribution is based on building an OO model and suggesting different refactorings that help the developer to improve it and to eventually guide a new implementation of the target application. Specifically, the methodology builds cohesive software entities mapped from procedural code and makes the coupling between C entities explicit in the high‐level model. The result of our methodology is a set of refactored class candidates: a structure that groups a set of variables and a set of functions obtained from the C applications. Based on the class candidate model, we propose refactorings based on OO design principles to improve the design of the application. The most relevant design improvements were obtained with algorithm abstraction by applying the strategy pattern, attributes/methods relocalization, variables types generalization, and removing/renaming methods/attributes. Besides a methodology and the supporting tool, we provide 14 case studies based on real projects implemented in C, and we showed how the results validate our proposal.  相似文献   

15.
ContextThe functionality of a software system is most often expressed in terms of concepts from its problem or solution domains. The process of finding where these concepts are implemented in the source code is known as concept location and it is a prerequisite of software change.ObjectiveWe investigate a static approach to concept location named DepIR that combines program dependency search (DepS) with information retrieval-based search (IR). In this approach, programmers explore the static program dependencies of the source code components retrieved by the IR search engine.MethodThe paper presents an empirical study that compares DepIR with its constituent techniques. The evaluation is based on an empirical method of reenactment that emulates the steps of concept location for 50 past changes mined from software repositories of five software systems.ResultsThe results of the study indicate that DepIR significantly outperforms both DepS and IR.ConclusionDepIR allows developers to perform concept location efficiently. It allows finding concepts even with queries that do not rank the relevant software components highly. Since formulating a good query is not always easy, this tolerance of lower-quality queries significantly broadens the usability of DepIR compared to the traditional IR.  相似文献   

16.
Henninger  S. 《Software, IEEE》1994,11(5):48-59
Component libraries are the dominant paradigm for software reuse, but they suffer from a lack of tools that support the problem-solving process of locating relevant components. Most retrieval tools assume that retrieval is a simple matter of matching well-formed queries to a repository. But forming queries can be difficult. A designer's understanding of the problem evolves while searching for a component, and large repositories often use an esoteric vocabulary. CodeFinder is a retrieval system that combines retrieval by reformulation (which supports incremental query construction) and spreading activation (which retrieves items related to the query) to help users find information. I designed it to investigate the hypothesis that this design makes for a more effective retrieval system. My study confirmed that it was more helpful to users seeking relevant information with ill-defined tasks and vocabulary mismatches than other query systems. The study supports the hypothesis that combining techniques effectively satisfies the kind of information needs typically encountered in software design  相似文献   

17.
Topic models are generative probabilistic models which have been applied to information retrieval to automatically organize and provide structure to a text corpus. Topic models discover topics in the corpus, which represent real world concepts by frequently co-occurring words. Recently, researchers found topics to be effective tools for structuring various software artifacts, such as source code, requirements documents, and bug reports. This research also hypothesized that using topics to describe the evolution of software repositories could be useful for maintenance and understanding tasks. However, research has yet to determine whether these automatically discovered topic evolutions describe the evolution of source code in a way that is relevant or meaningful to project stakeholders, and thus it is not clear whether topic models are a suitable tool for this task.In this paper, we take a first step towards evaluating topic models in the analysis of software evolution by performing a detailed manual analysis on the source code histories of two well-known and well-documented systems, JHotDraw and jEdit. We define and compute various metrics on the discovered topic evolutions and manually investigate how and why the metrics evolve over time. We find that the large majority (87%–89%) of topic evolutions correspond well with actual code change activities by developers. We are thus encouraged to use topic models as tools for studying the evolution of a software system.  相似文献   

18.
为了对Java虚拟机(JVM)进行测试,开发人员通常需要手工设计或利用测试生成工具生成复杂的测试程序,从而检测JVM中潜在的缺陷。然而,复杂的测试程序给开发人员定位及修复缺陷带来了极高的成本。测试程序约简技术旨在保障测试程序缺陷检测能力的同时,尽可能的删减测试程序中与缺陷检测无关的代码。现有研究工作基于Delta调试在C程序和XML输入上可以取得较好的约简效果,但是在JVM测试场景中,具有复杂语法和语义依赖关系的Java测试程序约减仍存在粒度较粗、约简效果较差的问题,导致约简后的程序理解成本依然很高。因此,针对具有复杂程序依赖关系的Java测试程序,本文提出一种基于程序约束的细粒度测试程序约简方法JavaPruner。首先在语句块级别设计细粒度的代码度量方法,随后在Delta调试技术上引入语句块之间的依赖约束关系来对测试程序进行约简。以Java字节码测试程序为实验对象,通过从现有的针对JVM测试的测试程序生成工具中筛选出具有复杂依赖关系的50个测试程序作为基准数据集,并在这些数据集上验证JavaPruner的有效性。实验结果表明,JavaPruner可以有效删减Java字节码测试程序中的冗余代码。与现有方法相比,在所有基准数据集上约减能力平均可提升37.7%。同时,JavaPruner可以在保障程序有效性及缺陷检测能力的同时将Java字节码测试程序最大约简至其原有大小的1.09% ,有效降低了测试程序的分析和理解成本。  相似文献   

19.
王潮  徐卫伟  周明辉 《软件学报》2024,35(2):513-531
代码注释作为辅助软件开发群体协作的关键机制,被开发者所广泛使用以提升开发效率.然而,由于代码注释并不直接影响软件运行,使其常被开发者忽视,导致出现代码注释质量问题,进而影响开发效率.代码注释中存在的质量问题会影响开发者理解相关代码,甚至可能产生误解从而引入代码缺陷,因此这一问题受到研究者的广泛关注.采用系统文献调研,对近年来国内外学者在代码注释质量问题上的研究工作进行系统的分析.从代码注释质量的评价维度、度量指标以及提升策略这3个方面总结研究现状,并提出当前研究所存在的不足、挑战及建议.  相似文献   

20.
Software development is a global activity unconstrained by the bounds of time and space. A major effect of this increasing scale and distribution is that the shared understanding that developers previously acquired by formal and informal face-to-face meetings is difficult to obtain. This paper proposes a shared awareness model that uses information gathered automatically from developer IDE interactions to make explicit orderings of tasks, artefacts and developers that are relevant to particular work contexts in collaborative, and potentially distributed, software development projects. The research findings suggest that such a model can be used to: identify entities (developers, tasks, artefacts) most associated with a particular work context in a software development project; identify relevance relationships amongst tasks, developers and artefacts e.g. which developers and artefacts are currently most relevant to a task or which developers have contributed to a task over time; and, can be used to identify potential bottlenecks in a project through a ‘social graph’ view. Furthermore, this awareness information is captured and provided as developers work in different locations and at different times.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号