期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Supporting collaborative software development over GitHub

Ritu Arora Sanjay Goel Ravi Kant Mittal 《Software》2017,47(10):1393-1416

GitHub is a web‐based, distributed Software Configuration Management (SCM) system build over Git, which enables developers to host shared repositories over the Internet and access them from any location, at any time. It helps developers to effectively orchestrate their activities over shared codebases by capturing direct conflicts arising because of concurrent editing on the same shared artifact. However, SCM systems have limited support for capturing inconsistencies arising because of indirect conflicts which arise because of software dependency relationships that exist between related artifacts, and lead to the introduction of syntactic and semantic inconsistencies in codebases. In this paper, we propose a novel collaborative software development (CSD) tool named, Collaboration Over GitHub (COG), that provides real‐time information about arising direct and indirect conflicts among collaborative developers, working over GitHub, through a collection of workspace awareness widgets. These widgets provide people‐centric information about direct and indirect collaborators over GitHub. Resource‐centric information about current and conflicting activities of real‐time collaborators is captured and propagated to others, based on the dependency relationships between software artifacts being manipulated by them. COG uses dependency graphs to store and process dependency relationship information which is required to ascertain information about indirect conflicts. Notably, the most important novel contribution of COG is that it not only captures indirect conflicts that lead to the introduction of syntactic inconsistencies but also changes that lead to semantic inconsistencies in the codebase. It also does so at finer levels of granularity, with changes to individual method's body being traced, thereby capturing statement‐level conflicts as well. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

2.

Why and how developers fork what from whom in GitHub

Jing Jiang David Lo Jiahuan He Xin Xia Pavneet Singh Kochhar Li Zhang 《Empirical Software Engineering》2017,22(1):547-578

Forking is the creation of a new software repository by copying another repository. Though forking is controversial in traditional open source software (OSS) community, it is encouraged and is a built-in feature in GitHub. Developers freely fork repositories, use codes as their own and make changes. A deep understanding of repository forking can provide important insights for OSS community and GitHub. In this paper, we explore why and how developers fork what from whom in GitHub. We collect a dataset containing 236,344 developers and 1,841,324 forks. We make surveys, and analyze programming languages and owners of forked repositories. Our main observations are: (1) Developers fork repositories to submit pull requests, fix bugs, add new features and keep copies etc. Developers find repositories to fork from various sources: search engines, external sites (e.g., Twitter, Reddit), social relationships, etc. More than 42 % of developers that we have surveyed agree that an automated recommendation tool is useful to help them pick repositories to fork, while more than 44.4 % of developers do not value a recommendation tool. Developers care about repository owners when they fork repositories. (2) A repository written in a developer’s preferred programming language is more likely to be forked. (3) Developers mostly fork repositories from creators. In comparison with unattractive repository owners, attractive repository owners have higher percentage of organizations, more followers and earlier registration in GitHub. Our results show that forking is mainly used for making contributions of original repositories, and it is beneficial for OSS community. Moreover, our results show the value of recommendation and provide important insights for GitHub to recommend repositories. 相似文献

3.

Multi-faceted quality and defect measurement for web software and source contents 总被引：1，自引：0，他引：1

Zhao Li Author VitaeAuthor Vitae Jeff Tian Author Vitae 《Journal of Systems and Software》2010,83(1):18-28

In this paper, we examine external failures and internal faults traceable to web software and source contents. We develop related defect and quality measurements based on different perspectives of customers, users, information or service hosts, maintainers, developers, integrators, and managers. These measurements can help web information and service providers with their quality assessment and improvement activities to meet the quality expectations of their customers and users. The different usages of our measurement framework by different stakeholders of web sites and web applications are also outlined and discussed. The data sources include existing web server logs and statistics reports, defect repositories from web application development and maintenance activities, and source files. We applied our approach to four diverse websites: one educational website, one open source software project website, one online catalog showroom for a small company, and one e-Commerce website for a large company. The results demonstrated the viability and effectiveness of our approach. 相似文献

4.

An in-depth study of the promises and perils of mining GitHub

Eirini Kalliamvakou Georgios Gousios Kelly Blincoe Leif Singer Daniel M. German Daniela Damian 《Empirical Software Engineering》2016,21(5):2035-2071

With over 10 million git repositories, GitHub is becoming one of the most important sources of software artifacts on the Internet. Researchers mine the information stored in GitHub’s event logs to understand how its users employ the site to collaborate on software, but so far there have been no studies describing the quality and properties of the available GitHub data. We document the results of an empirical study aimed at understanding the characteristics of the repositories and users in GitHub; we see how users take advantage of GitHub’s main features and how their activity is tracked on GitHub and related datasets to point out misalignment between the real and mined data. Our results indicate that while GitHub is a rich source of data on software development, mining GitHub for research purposes should take various potential perils into consideration. For example, we show that the majority of the projects are personal and inactive, and that almost 40 % of all pull requests do not appear as merged even though they were. Also, approximately half of GitHub’s registered users do not have public activity, while the activity of GitHub users in repositories is not always easy to pinpoint. We use our identified perils to see if they can pose validity threats; we review selected papers from the MSR 2014 Mining Challenge and see if there are potential impacts to consider. We provide a set of recommendations for software engineering researchers on how to approach the data in GitHub. 相似文献

5.

Assessing the capability of code smells to explain maintenance problems: an empirical study combining quantitative and qualitative data

Aiko Yamashita 《Empirical Software Engineering》2014,19(4):1111-1143

Code smells are indicators of deeper design problems that may cause difficulties in the evolution of a software system. This paper investigates the capability of twelve code smells to reflect actual maintenance problems. Four medium-sized systems with equivalent functionality but dissimilar design were examined for code smells. Three change requests were implemented on the systems by six software developers, each of them working for up to four weeks. During that period, we recorded problems faced by developers and the associated Java files on a daily basis. We developed a binary logistic regression model, with “problematic file” as the dependent variable. Twelve code smells, file size, and churn constituted the independent variables. We found that violation of the Interface Segregation Principle (a.k.a. ISP violation) displayed the strongest connection with maintenance problems. Analysis of the nature of the problems, as reported by the developers in daily interviews and think-aloud sessions, strengthened our view about the relevance of this code smell. We observed, for example, that severe instances of problems relating to change propagation were associated with ISP violation. Based on our results, we recommend that code with ISP violation should be considered potentially problematic and be prioritized for refactoring. 相似文献

6.

Replaying development history to assess the effectiveness of change propagation tools

Ahmed E. Hassan Richard C. Holt 《Empirical Software Engineering》2006,11(3):335-367

As developers modify software entities such as functions or variables to introduce new features, enhance old ones, or fix bugs, they must ensure that other entities in the software system are updated to be consistent with these new changes. Many hard to find bugs are introduced by developers who did not notice dependencies between entities, and failed to propagate changes correctly. Most modern development environments offer tools to assist developers in propagating changes. For example, dependency browsers show static code dependencies between source code entities. Other sources of information such as historical co-change or code layout information could be used by tools to support developers in propagating changes. We present the Development Replay (DR) approach which empirically assess and compares the effectiveness of several not-yet-existing change propagation tools by reenacting the changes stored in source control repositories using these tools. We present a case study of five large open source systems with a total of over 40 years of development history. Our empirical results show that historical co-change information recovered from source control repositories along with code layout information can guide developers in propagating changes better than simple static dependency information.

Richard C. HoltEmail:

相似文献

7.

The limited impact of individual developer data on software defect prediction

Robert M. Bell Thomas J. Ostrand Elaine J. Weyuker 《Empirical Software Engineering》2013,18(3):478-505

Previous research has provided evidence that a combination of static code metrics and software history metrics can be used to predict with surprising success which files in the next release of a large system will have the largest numbers of defects. In contrast, very little research exists to indicate whether information about individual developers can profitably be used to improve predictions. We investigate whether files in a large system that are modified by an individual developer consistently contain either more or fewer faults than the average of all files in the system. The goal of the investigation is to determine whether information about which particular developer modified a file is able to improve defect predictions. We also extend earlier research evaluating use of counts of the number of developers who modified a file as predictors of the file’s future faultiness. We analyze change reports filed for three large systems, each containing 18 releases, with a combined total of nearly 4 million LOC and over 11,000 files. A buggy file ratio is defined for programmers, measuring the proportion of faulty files in Release R out of all files modified by the programmer in Release R-1. We assess the consistency of the buggy file ratio across releases for individual programmers both visually and within the context of a fault prediction model. Buggy file ratios for individual programmers often varied widely across all the releases that they participated in. A prediction model that takes account of the history of faulty files that were changed by individual developers shows improvement over the standard negative binomial model of less than 0.13% according to one measure, and no improvement at all according to another measure. In contrast, augmenting a standard model with counts of cumulative developers changing files in prior releases produced up to a 2% improvement in the percentage of faults detected in the top 20% of predicted faulty files. The cumulative number of developers interacting with a file can be a useful variable for defect prediction. However, the study indicates that adding information to a model about which particular developer modified a file is not likely to improve defect predictions. 相似文献

8.

Implicit ownership types for memory management

Tian Zhao Jason Baker 《Science of Computer Programming》2008,71(3):213-241

The Real-time Specification for Java (RTSJ) introduced a range of language features for explicit memory management. While the RTSJ gives programmers fine control over memory use and allows linear allocation and constant-time deallocation, the RTSJ relies upon dynamic runtime checks for safety, making it unsuitable for safety critical applications. We introduce ScopeJ, a statically-typed, multi-threaded, object calculus in which scopes are first class constructs. Scopes reify allocation contexts and provide a safe alternative to automatic memory management. Safety follows from the use of an ownership type system that enforces a topology on run-time patterns of references. ScopeJ’s type system is novel in that ownership annotations are implicit. This substantially reduces the burden for developers and increases the likelihood of adoption. The notion of implicit ownership is particularly appealing when combined with pluggable type systems, as one can apply different type constraints to different components of an application depending on the requirements without changing the source language. In related work we have demonstrated the usefulness of our approach in the context of highly-responsive systems and stream processing. 相似文献

9.

代码文件贡献组成模式的分析

谭鑫林泽燕张宇霞周明辉《软件学报》2018,29(8):2283-2293

软件开发过程中,同一代码文件经常由多名开发者共同开发和维护,各个开发者向文件贡献了不同的代码量,使之形成特有的贡献组成.代码文件的贡献组成是否合理直接影响开发者的任务分配,进而影响软件质量和开发效率.对于不同类型的代码文件,如何刻画并确定其合理的贡献组成模式,成为一个亟待解决的问题.由于协同开发支撑工具的成熟,使得开发人员的活动可以被有效的记录,因此,其所产生的海量数据为数据驱动的智能化软件开发打下了基础.首先,基于代码所有权,从贡献组成的集中度、复杂度和稳定性三个维度出发,提出刻画贡献组成的三个量度.其次,以OpenStack的核心项目Nova为研究案例,在其版本控制数据上建立贡献组成的量度,总结了12种通用文件类型,归纳出3种贡献组成模式.最后,本文结合邮件以及面对面访谈的方式,验证了量度的有效性以及贡献组成模式的合理性,并从贡献组成的角度,对软件开发过程给出了一些指导性建议. 相似文献

10.

A scalable record locking scheme for parallel file access

H. Eckardt 《Computing》1997,58(2):113-128

相似文献

11.

Sourcerer: mining and searching internet-scale software repositories

Erik Linstead Sushil Bajracharya Trung Ngo Paul Rigor Cristina Lopes Pierre Baldi 《Data mining and knowledge discovery》2009,18(2):300-336

Large repositories of source code available over the Internet, or within large organizations, create new challenges and opportunities for data mining and statistical machine learning. Here we first develop Sourcerer, an infrastructure for the automated crawling, parsing, fingerprinting, and database storage of open source software on an Internet-scale. In one experiment, we gather 4,632 Java projects from SourceForge and Apache totaling over 38 million lines of code from 9,250 developers. Simple statistical analyses of the data first reveal robust power-law behavior for package, method call, and lexical containment distributions. We then develop and apply unsupervised, probabilistic, topic and author-topic (AT) models to automatically discover the topics embedded in the code and extract topic-word, document-topic, and AT distributions. In addition to serving as a convenient summary for program function and developer activities, these and other related distributions provide a statistical and information-theoretic basis for quantifying and analyzing source file similarity, developer similarity and competence, topic scattering, and document tangling, with direct applications to software engineering an software development staffing. Finally, by combining software textual content with structural information captured by our CodeRank approach, we are able to significantly improve software retrieval performance, increasing the area under the curve (AUC) retrieval metric to 0.92– roughly 10–30% better than previous approaches based on text alone. A prototype of the system is available at: . Erik Linstead, Sushil Bajracharya, and Trung Ngo have contributed equally to this work. 相似文献

12.

Privacy and security constraints for code contributions

Rodrigo Andrade Paulo Borba 《Software》2020,50(10):1905-1929

In collaborative software development, developers submit their contributions to repositories that are used to integrate code from various collaborators. To avoid privacy and security issues, code contributions are often reviewed before integration. Although careful manual code review can detect such issues, it might be time-consuming, expensive, and error-prone. Automatic analysis tools can also detect privacy and security issues, but they often demand significant developer effort, or are domain specific, considering fixed framework specific vulnerability sources and sinks. To reduce these problems, in this paper we propose the Salvum policy language to support the specification of constraints that help to protect sensitive information from being inadvertently accessed by specific code contributions. We implement a tool that automatically checks Salvum policies for systems of different technical domains. We also investigate whether Salvum can find policy violations for a number of open-source projects. We find evidence that Salvum helps to detect violations even for well-supported and highly active projects. Moreover, our tool helps to find 80 violations in benchmark projects. 相似文献

13.

Towards semantically enhanced Web service repositories

Marta Sabou Jeff Pan 《Journal of Web Semantics》2007,5(2):142-150

The success of the Web services technology has brought topics as software reuse and discovery once again on the agenda of software engineers. While there are several efforts towards automating Web service discovery and composition, many developers still search for services via online Web service repositories and then combine them manually. However, from our analysis of these online repositories, it yields that, unlike traditional software libraries, they rely on little metadata to support service discovery. We believe that the major cause is the difficulty of automatically deriving metadata that would describe rapidly changing Web service collections. In this paper, we discuss the major shortcomings of state of the art Web service repositories and as a solution, we report on ongoing work and ideas on how to use techniques developed in the context of the Semantic Web (ontology learning, matching, metadata based presentation) to improve the current situation. 相似文献

14.

The FreeBSD project: a replication case study of open source development

Dinh-Trong T.T. Bieman J.M. 《IEEE transactions on pattern analysis and machine intelligence》2005,31(6):481-494

Case studies can help to validate claims that open source software development produces higher quality software at lower cost than traditional commercial development. One problem inherent in case studies are external validity - we do not know whether or not results from one case study apply to another development project. We gain or lose confidence in case study results when similar case studies are conducted on other projects. This case study of the FreeBSD project, a long-lived open source project, provides further understanding of open source development. The paper details a method for mining repositories and querying project participants to retrieve key process information. The FreeBSD development process is fairly well-defined with proscribed methods for determining developer responsibilities, dealing with enhancements and defects, and managing releases. Compared to the Apache project, FreeBSD uses 1) a smaller set of core developers - developers who control the code base - that implement a smaller percentage of the system, 2) a larger set of top developers to implement 80 percent of the system, and 3) a more well-defined testing process. FreeBSD and Apache have a similar ratio of core developers to people involved in adapting and debugging the system and people who report problems. Both systems have similar defect densities and the developers are also users in both systems. 相似文献

15.

Studying software evolution using topic models

《Science of Computer Programming》2014

Topic models are generative probabilistic models which have been applied to information retrieval to automatically organize and provide structure to a text corpus. Topic models discover topics in the corpus, which represent real world concepts by frequently co-occurring words. Recently, researchers found topics to be effective tools for structuring various software artifacts, such as source code, requirements documents, and bug reports. This research also hypothesized that using topics to describe the evolution of software repositories could be useful for maintenance and understanding tasks. However, research has yet to determine whether these automatically discovered topic evolutions describe the evolution of source code in a way that is relevant or meaningful to project stakeholders, and thus it is not clear whether topic models are a suitable tool for this task.In this paper, we take a first step towards evaluating topic models in the analysis of software evolution by performing a detailed manual analysis on the source code histories of two well-known and well-documented systems, JHotDraw and jEdit. We define and compute various metrics on the discovered topic evolutions and manually investigate how and why the metrics evolve over time. We find that the large majority (87%–89%) of topic evolutions correspond well with actual code change activities by developers. We are thus encouraged to use topic models as tools for studying the evolution of a software system. 相似文献

16.

Extracting communication structure of a development organization from a software repository

Jongdae Han Woosung Jung 《Personal and Ubiquitous Computing》2014,18(6):1413-1421

Researchers have found that communication cost is one of the major overheads affecting the overall cost of software development. Actually, there could be communication problems caused by cultural difference, language barrier, different time zone, and, etc. in geographically distributed software development. Thus, extracting potential communication structural information is very useful in understanding and optimizing the development organization. Analyzing the communication structure is also crucial in order to resolve cost issues. While this is already true for general development organizations, geographically distributed software development organizations are especially sensitive to these issues because they are forced to rely on costly mediums. Therefore, this paper suggests a way to extract the development organization from software repositories with respect to the temporal locality. The temporal locality is important for when the project is prolonged for a lengthy period of time. In order to evaluate these issues, we define two metrics which measure contribution of the individual developer and communication need between developers. We also provide a tool to extract a communication structure using these two metrics. The extracted communication structure is visualized to give insight into managers and developers. Finally, we provide statistical results of empirical research to prove the soundness of our approach. The result shows that our approach reflects the real-world relationship between developers well. 相似文献

17.

Continuously mining distributed version control systems: an empirical study of how Linux uses Git

Daniel M. German Bram Adams Ahmed E. Hassan 《Empirical Software Engineering》2016,21(1):260-299

Distributed version control systems (D-VCSs —such as git and mercurial) and their hosting services (such as Github and Bitbucket) have revolutionalized the way in which developers collaborate by allowing them to freely exchange and integrate code changes in a peer-to-peer fashion. However, this flexibility comes at a price: code changes are hard to track because of the proliferation of code repositories and because developers modify (“rebase”) and filter (“cherry-pick”) the history of these changes to streamline their integration into the repositories of other developers. As a consequence, researchers and practitioners, who typically only consider the (cleaned up) history in the official project repository, are unaware of important elements and activities in the collaborative software development process. In this paper, we present a method that continuously mines all known D-VCSs of a software project to uncover the complete development history of a project. We use this method to (1) show the divergence between the code history development in the official Linux kernel repository and the complete kernel development history, and (2) to investigate the characteristics of the ecosystem of git repositories of the Linux kernel. Finally, we discuss how continuous mining could be adopted by current D-VCS hosting services. 相似文献

18.

Developer social networks in software engineering:construction,analysis,and applications

ZHANG WeiQiang NIE LiMing JIANG He CHEN ZhenYu LIU Jia 《中国科学:信息科学(英文版)》2014,(12):82-104

With the increasing popularity of Internet, more and more developers are collaborating together for software development. During the collaboration, a lot of information related to software development, including communication and coordination information of developers, can be recorded in software repositories. The information can be employed to construct Developer Social Networks （DSNs） for facilitating tasks in software engineering. In this paper, we survey recent advances of DSNs and examine three fundamental steps of DSNs, namely construction, analysis, and applications. We summarize the state-of-the-art methods in the three steps and investigate the relationships among them. Furthermore, we discuss the main issues and point out the future opportunities in the study of DSNs. 相似文献

19.

行为优化提升软件配置管理

张薇《计算机时代》2012,(10):67-69

软件配置管理包括对代码、文档、数据等的管理,其优劣受限于项目成员的实际操作。开发人员对于工作区如何使用;成员之间的代码是不是可以及时更新与同步;怎样使用分支,如何进行变更合并,才能减少物理空间浪费和事件延迟。这些问题在实际的项目开发中往往被忽视,亦或团队并没有对成员行为作细节的规范,因而许多软件项目出现了工期推迟或代码质量不高等问题。为此提出了一系列管理措施,通过优化软件配置管理规范项目各成员的行为,以保证高效的软件配置管理的实施。相似文献

20.

A knowledge-based framework for extracting components in agile systems development

Vijayan Sugumaran Mohan Tanniru Veda C. Storey 《Information Technology and Management》2008,9(1):37-53

Considerable strides have been made in the use of components in software development. Many proprietary enterprise resource planning (ERP) software environments use modular components to develop and customize “best practices” to meet a specific organizational need. In agile application development, many developers and users are asked to design systems in a short period of time. These applications may use components that are embedded in software repositories. The challenge then is how to select the right software components (data and procedures) to meet an application requirement. Although experienced developers may select and customize components to meet the needs of an application, such expertise may not be available to other applications. This paper presents a knowledge-based framework to select and customize software components and demonstrates its value in deriving quality specifications, even when the developers are relatively inexperienced. 相似文献