期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Augmenting and structuring user queries to support efficient free-form code search

Raphael Sirres Tegawendé F. Bissyandé Dongsun Kim David Lo Jacques Klein Kisub Kim Yves Le Traon 《Empirical Software Engineering》2018,23(5):2622-2654

Source code terms such as method names and variable types are often different from conceptual words mentioned in a search query. This vocabulary mismatch problem can make code search inefficient. In this paper, we present COde voCABUlary (CoCaBu), an approach to resolving the vocabulary mismatch problem when dealing with free-form code search queries. Our approach leverages common developer questions and the associated expert answers to augment user queries with the relevant, but missing, structural code entities in order to improve the performance of matching relevant code examples within large code repositories. To instantiate this approach, we build GitSearch, a code search engine, on top of GitHub and Stack Overflow Q&A data. We evaluate GitSearch in several dimensions to demonstrate that (1) its code search results are correct with respect to user-accepted answers; (2) the results are qualitatively better than those of existing Internet-scale code search engines; (3) our engine is competitive against web search engines, such as Google, in helping users solve programming tasks; and (4) GitSearch provides code examples that are acceptable or interesting to the community as answers for Stack Overflow questions. 相似文献

2.

<Emphasis Type="SmallCaps">EnTagRec</Emphasis><Superscript>++</Superscript>: An enhanced tag recommendation system for software information sites

Shaowei Wang David Lo Bogdan Vasilescu Alexander Serebrenik 《Empirical Software Engineering》2018,23(2):800-832

相似文献

3.

Linear-time filtering algorithms for the disjunctive constraint and a quadratic filtering algorithm for the cumulative not-first not-last

Hamed Fahimi Yanick Ouellet Claude-Guy Quimper 《Constraints》2018,23(3):272-293

We present new filtering algorithms for Disjunctive and Cumulative constraints, each of which improves the complexity of the state-of-the-art algorithms by a factor of log n. We show how to perform Time-Tabling and Detectable Precedences in linear time on the Disjunctive constraint. Furthermore, we present a linear-time Overload Checking for the Disjunctive and Cumulative constraints. Finally, we show how the rule of Not-first/Not-last can be enforced in quadratic time for the Cumulative constraint. These algorithms rely on the union find data structure, from which we take advantage to introduce a new data structure that we call it time line. This data structure provides constant time operations that were previously implemented in logarithmic time by the Θ-tree data structure. Experiments show that these new algorithms are competitive even for a small number of tasks and outperform existing algorithms as the number of tasks increases. We also show that the time line can be used to solve specific scheduling problems. 相似文献

4.

<Emphasis Type="SmallCaps">Origo</Emphasis>: causal inference by compression

Kailash Budhathoki Jilles Vreeken 《Knowledge and Information Systems》2018,56(2):285-307

相似文献

5.

Patterns and anomalies in <Emphasis Type="Italic">k</Emphasis>-cores of real-world graphs with applications

Kijung Shin Tina Eliassi-Rad Christos Faloutsos 《Knowledge and Information Systems》2018,54(3):677-710

How do the k-core structures of real-world graphs look like? What are the common patterns and the anomalies? How can we exploit them for applications? A k-core is the maximal subgraph in which all vertices have degree at least k. This concept has been applied to such diverse areas as hierarchical structure analysis, graph visualization, and graph clustering. Here, we explore pervasive patterns related to k-cores and emerging in graphs from diverse domains. Our discoveries are: (1) Mirror Pattern: coreness (i.e., maximum k such that each vertex belongs to the k-core) is strongly correlated with degree. (2) Core-Triangle Pattern: degeneracy (i.e., maximum k such that the k-core exists) obeys a 3-to-1 power-law with respect to the count of triangles. (3) Structured Core Pattern: degeneracy–cores are not cliques but have non-trivial structures such as core–periphery and communities. Our algorithmic contributions show the usefulness of these patterns. (1) Core-A, which measures the deviation from Mirror Pattern, successfully spots anomalies in real-world graphs, (2) Core-D, a single-pass streaming algorithm based on Core-Triangle Pattern, accurately estimates degeneracy up to 12 \(\times \) faster than its competitor. (3) Core-S, inspired by Structured Core Pattern, identifies influential spreaders up to 17 \(\times \) faster than its competitors with comparable accuracy. 相似文献

6.

Multi-ML: Programming Multi-BSP Algorithms in ML

V. Allombert F. Gava J. Tesson 《International journal of parallel programming》2017,45(2):340-361

bsp is a bridging model between abstract execution and concrete parallel systems. Structure and abstraction brought by bsp allow to have portable parallel programs with scalable performance predictions, without dealing with low-level details of architectures. In the past, we designed bsml for programming bsp algorithms in ml. However, the simplicity of the bsp model does not fit the complexity of today’s hierarchical architectures such as clusters of machines with multiple multi-core processors. The multi-bsp model is an extension of the bsp model which brings a tree-based view of nested components of hierarchical architectures. To program multi-bsp algorithms in ml, we propose the multi-ml language as an extension of bsml where a specific kind of recursion is used to go through a hierarchy of computing nodes. We define a formal semantics of the language and present preliminary experiments which show performance improvements with respect to bsml. 相似文献

7.

Prompter

Luca?Ponzanelli Email author Gabriele?Bavota Massimiliano?Di Penta Rocco?Oliveto Michele?Lanza 《Empirical Software Engineering》2016,21(5):2190-2231

Developers often require knowledge beyond the one they possess, which boils down to asking co-workers for help or consulting additional sources of information, such as Application Programming Interfaces (API) documentation, forums, and Q&A websites. However, it requires time and energy to formulate one’s problem, peruse and process the results. We propose a novel approach that, given a context in the Integrated Development Environment (IDE), automatically retrieves pertinent discussions from Stack Overflow, evaluates their relevance using a multi-faceted ranking model, and, if a given confidence threshold is surpassed, notifies the developer. We have implemented our approach in Prompter, an Eclipse plug-in. Prompter was evaluated in two empirical studies. The first study was aimed at evaluatingPrompter’s ranking model and involved 33 participants. The second study was conducted with 12 participants and aimed at evaluating Prompter’s usefulness when supporting developers during development and maintenance tasks. Since Prompter uses “volatile information” crawled from the web, we also replicated Study I after one year to assess the impact of such a “volatility” on recommenders like Prompter. Our results indicate that (i) Prompter recommendations were positively evaluated in 74 % of the cases on average, (ii) Prompter significantly helps developers to improve the correctness of their tasks by 24 % on average, but also (iii) 78 % of the provided recommendations are “volatile” and can change at one year of distance. While Prompter revealed to be effective, our studies also point out issues when building recommenders based on information available on online forums. 相似文献

8.

An empirical study of unspecified dependencies in make-based build systems

Cor-Paul Bezemer Shane McIntosh Bram Adams Daniel M. German Ahmed E. Hassan 《Empirical Software Engineering》2017,22(6):3117-3148

Software developers rely on a build system to compile their source code changes and produce deliverables for testing and deployment. Since the full build of large software systems can take hours, the incremental build is a cornerstone of modern build systems. Incremental builds should only recompile deliverables whose dependencies have been changed by a developer. However, in many organizations, such dependencies still are identified by build rules that are specified and maintained (mostly) manually, typically using technologies like make. Incomplete rules lead to unspecified dependencies that can prevent certain deliverables from being rebuilt, yielding incomplete results, which leave sources and deliverables out-of-sync. In this paper, we present a case study on unspecified dependencies in the make-based build systems of the glib, openldap, linux and qt open source projects. To uncover unspecified dependencies in make-based build systems, we use an approach that combines a conceptual model of the dependencies specified in the build system with a concrete model of the files and processes that are actually exercised during the build. Our approach provides an overview of the dependencies that are used throughout the build system and reveals unspecified dependencies that are not yet expressed in the build system rules. During our analysis, we find that unspecified dependencies are common. We identify 6 common causes in more than 1.2 million unspecified dependencies. 相似文献

9.

Efficient filtering for the Resource-Cost AllDifferent constraint

Sascha Van Cauwelaert Pierre Schaus 《Constraints》2017,22(4):493-511

This paper studies a family of optimization problems where a set of items, each requiring a possibly different amount of resource, must be assigned to different slots for which the price of the resource can vary. The objective is then to assign items such that the overall resource cost is minimized. Such problems arise commonly in domains such as production scheduling in the presence of fluctuating renewable energy costs or variants of the Travelling Salesman Problem. In Constraint Programming, this can be naturally modeled in two ways: (a) with a sum of element constraints; (b) with a MinimumAssignment constraint. Unfortunately the sum of element constraints obtains a weak filtering and the MinimumAssignment constraint does not scale well on large instances. This work proposes a third approach by introducing the ResourceCostAllDifferent constraint and an associated incremental and scalable filtering algorithm, running in \(\mathcal {O}(n \cdot m)\), where n is the number of unbound variables and m is the maximum domain size of unbound variables. Its goal is to compute the total cost in a scalable manner by dealing with the fact that all assignments must be different. We first evaluate the efficiency of the new filtering on a real industrial problem and then on the Product Matrix Travelling Salesman Problem, a special case of the Asymmetric Travelling Salesman Problem. The study shows experimentally that our approach generally outperforms the decomposition and the MinimumAssignment ones for the problems we considered. 相似文献

10.

Subdividing Long-Running,Variable-Length Analyses Into Short,Fixed-Length BOINC Workunits

Adam L. Bazinet Michael P. Cummings 《Journal of Grid Computing》2016,14(3):429-441

We describe a scheme for subdividing long-running, variable-length analyses into short, fixed-length boinc workunits using phylogenetic analyses as an example. Fixed-length workunits decrease variance in analysis runtime, improve overall system throughput, and make boinc a more useful resource for analyses that require a relatively fast turnaround time, such as the phylogenetic analyses submitted by users of the garli web service at molecularevolution.org. Additionally, we explain why these changes will benefit volunteers who contribute their processing power to boinc projects, such as the Lattice boinc Project (http://boinc.umiacs.umd.edu). Our results, which demonstrate the advantages of relatively short workunits, should be of general interest to anyone who develops and deploys an application on the boinc platform. 相似文献

11.

BSP-Why: A Tool for Deductive Verification of BSP Algorithms with Subgroup Synchronisation

Jean Fortin Frédéric Gava 《International journal of parallel programming》2016,44(3):574-597

We present bsp-why, a tool for deductive verification of bsp algorithms with subgroup synchronisation. From bsp programs, bsp-why generates sequential codes for the back-end condition generator why and thus benefits from its large range of existing provers. By enabling subgroups, the user can prove the correctness of programs that run on hierarchical machines—e.g. clusters of multi-cores. In general, bsp-why is able to generate proof obligations of mpi programs that only use collective operations. Our case studies are distributed state-space construction algorithms, the basis of model-checking. 相似文献

12.

Ecosystem on the Web: non-linear mining and forecasting of co-evolving online activities

Yasuko Matsubara Yasushi Sakurai Christos Faloutsos 《World Wide Web》2017,20(3):439-465

Given a large collection of co-evolving online activities, such as searches for the keywords “Xbox”, “PlayStation” and “Wii”, how can we find patterns and rules? Are these keywords related? If so, are they competing against each other? Can we forecast the volume of user activity for the coming month? We conjecture that online activities compete for user attention in the same way that species in an ecosystem compete for food. We present EcoWeb, (i.e., Ecosystem on the Web), which is an intuitive model designed as a non-linear dynamical system for mining large-scale co-evolving online activities. Our second contribution is a novel, parameter-free, and scalable fitting algorithm, EcoWeb-Fit, that estimates the parameters of EcoWeb. Extensive experiments on real data show that EcoWeb is effective, in that it can capture long-range dynamics and meaningful patterns such as seasonalities, and practical, in that it can provide accurate long-range forecasts. EcoWeb consistently outperforms existing methods in terms of both accuracy and execution speed. 相似文献

13.

Degree-Constrained Graph Orientation: Maximum Satisfaction and Minimum Violation

Yuichi Asahiro Jesper Jansson Eiji Miyano Hirotaka Ono 《Theory of Computing Systems》2016,58(1):60-93

A degree-constrained graph orientation of an undirected graph G is an assignment of a direction to each edge in G such that the outdegree of every vertex in the resulting directed graph satisfies a specified lower and/or upper bound. Such graph orientations have been studied for a long time and various characterizations of their existence are known. In this paper, we consider four related optimization problems introduced in reference (Asahiro et al. LNCS 7422, 332–343 (2012)): For any fixed non-negative integer W, the problems MAX W-LIGHT, MIN W-LIGHT, MAX W-HEAVY, and MIN W-HEAVY take as input an undirected graph G and ask for an orientation of G that maximizes or minimizes the number of vertices with outdegree at most W or at least W. As shown in Asahiro et al. LNCS 7422, 332–343 (2012)). 相似文献

14.

Approximating the Sparsest k-Subgraph in Chordal Graphs

Rémi Watrigant Marin Bougeret Rodolphe Giroudeau 《Theory of Computing Systems》2016,58(1):111-132

Given a simple undirected graph G = (V, E) and an integer k < |V|, the Sparsest k-Subgraph problem asks for a set of k vertices which induces the minimum number of edges. As a generalization of the classical independent set problem, Sparsest k-Subgraph is ????-hard and even not approximable unless ?????? in general graphs. Thus, we investigate Sparsest k-Subgraph in graph classes where independent set is polynomial-time solvable, such as subclasses of perfect graphs. Our two main results are the ????-hardness of Sparsest k-Subgraph on chordal graphs, and a greedy 2-approximation algorithm. Finally, we also show how to derive a P T A S for Sparsest k-Subgraph on proper interval graphs. 相似文献

15.

Faster Parameterized Algorithms for <Emphasis Type="SmallCaps">Minimum Fill-in</Emphasis>

Hans L. Bodlaender Pinar Heggernes Yngve Villanger 《Algorithmica》2011,61(4):817-838

We present two parameterized algorithms for the Minimum Fill-in problem, also known as Chordal Completion: given an arbitrary graph G and integer k, can we add at most k edges to G to obtain a chordal graph? Our first algorithm has running time \(\mathcal {O}(k^{2}nm+3.0793^{k})\), and requires polynomial space. This improves the base of the exponential part of the best known parameterized algorithm time for this problem so far. We are able to improve this running time even further, at the cost of more space. Our second algorithm has running time \(\mathcal {O}(k^{2}nm+2.35965^{k})\) and requires \(\mathcal {O}^{\ast}(1.7549^{k})\) space. To achieve these results, we present a new lemma describing the edges that can safely be added to achieve a chordal completion with the minimum number of edges, regardless of k. 相似文献

16.

Toward the complexity of the existence of wonderfully stable partitions and strictly core stable coalition structures in enemy-oriented hedonic games

Anja Rey Jörg Rothe Hilmar Schadrack Lena Schend 《Annals of Mathematics and Artificial Intelligence》2016,77(3-4):317-333

We study the computational complexity of the existence and the verification problem for wonderfully stable partitions (WSPE and WSPV) and of the existence problem for strictly core stable coalition structures (SCSCS) in enemy-oriented hedonic games. In this note, we show that WSPV is NP-complete and both WSPE and SCSCS are DP-hard, where DP is the second level of the boolean hierarchy, and we discuss an approach for classifying the latter two problems in terms of their complexity. 相似文献

17.

On Participation Constrained Team Formation

下载免费PDF全文

Yu Zhou Jian-Bin Huang Xiao-Lin Jia He-Li Sun 《计算机科学技术学报》2017,32(1):139-154

The task assignment on the Internet has been widely applied to many areas, e.g., online labor market, online paper review and social activity organization. In this paper, we are concerned with the task assignment problem related to the online labor market, termed as ClusterHire. We improve the definition of the ClusterHire problem, and propose an efficient and effective algorithm, entitled Influence. In addition, we place a participation constraint on ClusterHire. It constrains the load of each expert in order to keep all members from overworking. For the participation-constrained ClusterHire problem, we devise two algorithms, named ProjectFirst and Era. The former generates a participationconstrained team by adding experts to an initial team, and the latter generates a participation-constrained team by removing the experts with the minimum influence from the universe of experts. The experimental evaluations indicate that 1) Influence performs better than the state-of-the-art algorithms in terms of effectiveness and time efficiency; 2) ProjectFirst performs better than Era in terms of time efficiency, yet Era performs better than ProjectFirst in terms of effectiveness. 相似文献

18.

A<Emphasis Type="SmallCaps">rgonauts</Emphasis>: a working system for motivated cooperative agents

Daniel Hölzgen Thomas Vengels Patrick Krümpelmann Matthias Thimm Gabriele Kern-Isberner 《Annals of Mathematics and Artificial Intelligence》2011,61(4):309-332

This paper presents the Argonauts multi-agent framework which was developed as part of a one year student project at Technische Universität Dortmund. The Argonauts framework builds on a BDI approach to model rational agents that act cooperatively in a dynamic and indeterministically changing environment. However, our agent model extends the traditional BDI approach in several aspects, most notably by incorporating motivation into the agent’s goal selection mechanism. The framework has been applied by the Argonauts team in the 2010 version of the annual multi-agent programming contest organized by Technische Universität Clausthal. In this paper, we present a high-level specification and analysis of the actual system used for solving the given scenario. We do this by applying the GAIA methodology, a high-level and iterative approach to model communication and roles in multi-agent scenarios. We further describe the technical details and insights gained during our participation in the multi-agent programming contest. 相似文献

19.

Using regression makes extraction of shared variation in multiple datasets easy

Jussi Korpela Andreas Henelius Lauri Ahonen Arto Klami Kai Puolamäki 《Data mining and knowledge discovery》2016,30(5):1112-1133

In many data analysis tasks it is important to understand the relationships between different datasets. Several methods exist for this task but many of them are limited to two datasets and linear relationships. In this paper, we propose a new efficient algorithm, termed cocoreg, for the extraction of variation common to all datasets in a given collection of arbitrary size. cocoreg extends redundancy analysis to more than two datasets, utilizing chains of regression functions to extract the shared variation in the original data space. The algorithm can be used with any linear or non-linear regression function, which makes it robust, straightforward, fast, and easy to implement and use. We empirically demonstrate the efficacy of shared variation extraction using the cocoreg algorithm on five artificial and three real datasets. 相似文献

20.

Translation-based approaches for solving disjunctive temporal problems with preferences

Enrico Giunchiglia Marco Maratea Luca Pulina 《Constraints》2018,23(4):383-402

Disjunctive Temporal Problems (DTPs) with Preferences (DTPPs) extend DTPs with piece-wise constant preference functions associated to each constraint of the form l ≤ x ? y ≤ u, where x,y are (real or integer) variables, and l,u are numeric constants. The goal is to find an assignment to the variables of the problem that maximizes the sum of the preference values of satisfied DTP constraints, where such values are obtained by aggregating the preference functions of the satisfied constraints in it under a “max” semantic. The state-of-the-art approach in the field, implemented in the native DTPP solver Maxilitis, extends the approach of the native DTP solver Epilitis. In this paper we present alternative approaches that translate DTPPs to Maximum Satisfiability of a set of Boolean combination of constraints of the form l?x ? y?u, ? ∈{<,≤}, that extend previous work dealing with constant preference functions only. We prove correctness and completeness of the approaches. Results obtained with the Satisfiability Modulo Theories (SMT) solvers Yices and MathSAT on randomly generated DTPPs and DTPPs built from real-world benchmarks, show that one of our translation is competitive to, and can be faster than, Maxilitis (This is an extended and revised version of Bourguet et al. 2013). 相似文献