期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

<Emphasis Type="SmallCaps">EnTagRec</Emphasis><Superscript>++</Superscript>: An enhanced tag recommendation system for software information sites

Shaowei Wang David Lo Bogdan Vasilescu Alexander Serebrenik 《Empirical Software Engineering》2018,23(2):800-832

相似文献

2.

Approximating the Sparsest k-Subgraph in Chordal Graphs

Rémi Watrigant Marin Bougeret Rodolphe Giroudeau 《Theory of Computing Systems》2016,58(1):111-132

Given a simple undirected graph G = (V, E) and an integer k < |V|, the Sparsest k-Subgraph problem asks for a set of k vertices which induces the minimum number of edges. As a generalization of the classical independent set problem, Sparsest k-Subgraph is ????-hard and even not approximable unless ?????? in general graphs. Thus, we investigate Sparsest k-Subgraph in graph classes where independent set is polynomial-time solvable, such as subclasses of perfect graphs. Our two main results are the ????-hardness of Sparsest k-Subgraph on chordal graphs, and a greedy 2-approximation algorithm. Finally, we also show how to derive a P T A S for Sparsest k-Subgraph on proper interval graphs. 相似文献

3.

Reverse Furthest Neighbors Query in Road Networks

下载免费PDF全文

Xiao-Jun Xu Jin-Song Bao Bin Yao Jing-Yu Zhou Fei-Long Tang Min-Yi Guo Jian-Qiu Xu 《计算机科学技术学报》2017,32(1):155-167

Given a road network G = (V,E), where V (E) denotes the set of vertices(edges) in G, a set of points of interest P and a query point q residing in G, the reverse furthest neighbors (Rfn _R) query in road networks fetches a set of points p ∈ P that take q as their furthest neighbor compared with all points in P ∪ {q}. This is the monochromatic Rfn _R (Mrfn _R) query. Another interesting version of Rfn _R query is the bichromatic reverse furthest neighbor (Brfn _R) query. Given two sets of points P and Q, and a query point q ∈ Q, a Brfn _R query fetches a set of points p ∈ P that take q as their furthest neighbor compared with all points in Q. This paper presents efficient algorithms for both Mrfn _R and Brfn _R queries, which utilize landmarks and partitioning-based techniques. Experiments on real datasets confirm the efficiency and scalability of proposed algorithms. 相似文献

4.

Improved Approximation Algorithms for Label Cover Problems

Moses Charikar MohammadTaghi Hajiaghayi Howard Karloff 《Algorithmica》2011,61(1):190-206

In this paper we consider both the maximization variant Max Rep and the minimization variant Min Rep of the famous Label Cover problem. So far the best approximation ratios known for these two problems were \(O(\sqrt{n})\) and indeed some authors suggested the possibility that this ratio is the best approximation factor for these two problems. We show, in fact, that there are a O(n ^1/3)-approximation algorithm for Max Rep and a O(n ^1/3log?^2/3 n)-approximation algorithm for Min Rep. In addition, we also exhibit a randomized reduction from Densest k-Subgraph to Max Rep, showing that any approximation factor for Max Rep implies the same factor (up to a constant) for Densest k-Subgraph. 相似文献

5.

Degree-Constrained Graph Orientation: Maximum Satisfaction and Minimum Violation

Yuichi Asahiro Jesper Jansson Eiji Miyano Hirotaka Ono 《Theory of Computing Systems》2016,58(1):60-93

A degree-constrained graph orientation of an undirected graph G is an assignment of a direction to each edge in G such that the outdegree of every vertex in the resulting directed graph satisfies a specified lower and/or upper bound. Such graph orientations have been studied for a long time and various characterizations of their existence are known. In this paper, we consider four related optimization problems introduced in reference (Asahiro et al. LNCS 7422, 332–343 (2012)): For any fixed non-negative integer W, the problems MAX W-LIGHT, MIN W-LIGHT, MAX W-HEAVY, and MIN W-HEAVY take as input an undirected graph G and ask for an orientation of G that maximizes or minimizes the number of vertices with outdegree at most W or at least W. As shown in Asahiro et al. LNCS 7422, 332–343 (2012)). 相似文献

6.

The Complexity Ecology of Parameters: An Illustration Using Bounded Max Leaf Number

Michael Fellows Daniel Lokshtanov Neeldhara Misra Matthias Mnich Frances Rosamond Saket Saurabh 《Theory of Computing Systems》2009,45(4):822-848

In the framework of parameterized complexity, exploring how one parameter affects the complexity of a different parameterized (or unparameterized problem) is of general interest. A well-developed example is the investigation of how the parameter treewidth influences the complexity of (other) graph problems. The reason why such investigations are of general interest is that real-world input distributions for computational problems often inherit structure from the natural computational processes that produce the problem instances (not necessarily in obvious, or well-understood ways). The max leaf number ml(G) of a connected graph G is the maximum number of leaves in a spanning tree for G. Exploring questions analogous to the well-studied case of treewidth, we can ask: how hard is it to solve 3-Coloring, Hamilton Path, Minimum Dominating Set, Minimum Bandwidth or many other problems, for graphs of bounded max leaf number? What optimization problems are W[1]-hard under this parameterization? We do two things:

(1)
We describe much improved FPT algorithms for a large number of graph problems, for input graphs G for which ml(G)≤k, based on the polynomial-time extremal structure theory canonically associated to this parameter. We consider improved algorithms both from the point of view of kernelization bounds, and in terms of improved fixed-parameter tractable (FPT) runtimes O ^*(f(k)).
(2)
The way that we obtain these concrete algorithmic results is general and systematic. We describe the approach, and raise programmatic questions.

相似文献

7.

Distributed computing connected components with linear communication cost

Xing Feng Lijun Chang Xuemin Lin Lu Qin Wenjie Zhang Long Yuan 《Distributed and Parallel Databases》2018,36(3):555-592

The paper studies three fundamental problems in graph analytics, computing connected components (CCs), biconnected components (BCCs), and 2-edge-connected components (ECCs) of a graph. With the recent advent of big data, developing efficient distributed algorithms for computing CCs, BCCs and ECCs of a big graph has received increasing interests. As with the existing research efforts, we focus on the Pregel programming model, while the techniques may be extended to other programming models including MapReduce and Spark. The state-of-the-art techniques for computing CCs and BCCs in Pregel incur \(O(m\times \#\text {supersteps})\) total costs for both data communication and computation, where m is the number of edges in a graph and #supersteps is the number of supersteps. Since the network communication speed is usually much slower than the computation speed, communication costs are the dominant costs of the total running time in the existing techniques. In this paper, we propose a new paradigm based on graph decomposition to compute CCs and BCCs with O(m) total communication cost. The total computation costs of our techniques are also smaller than that of the existing techniques in practice, though theoretically almost the same. Moreover, we also study distributed computing ECCs. We are the first to study this problem and an approach with O(m) total communication cost is proposed. Comprehensive empirical studies demonstrate that our approaches can outperform the existing techniques by one order of magnitude regarding the total running time. 相似文献

8.

Exact and approximate flexible aggregate similarity search

Feifei?Li Ke?Yi Yufei?Tao Bin?Yao Email author Yang?Li Dong?Xie Min?Wang 《The VLDB Journal The International Journal on Very Large Data Bases》2016,25(3):317-338

Aggregate similarity search, also known as aggregate nearest-neighbor (Ann) query, finds many useful applications in spatial and multimedia databases. Given a group Q of M query objects, it retrieves from a database the objects most similar to Q, where the similarity is an aggregation (e.g., \({{\mathrm{sum}}}\), \(\max \)) of the distances between each retrieved object p and all the objects in Q. In this paper, we propose an added flexibility to the query definition, where the similarity is an aggregation over the distances between p and any subset of \(\phi M\) objects in Q for some support \(0< \phi \le 1\). We call this new definition flexible aggregate similarity search and accordingly refer to a query as a flexible aggregate nearest-neighbor ( Fann ) query. We present algorithms for answering Fann queries exactly and approximately. Our approximation algorithms are especially appealing, which are simple, highly efficient, and work well in both low and high dimensions. They also return near-optimal answers with guaranteed constant-factor approximations in any dimensions. Extensive experiments on large real and synthetic datasets from 2 to 74 dimensions have demonstrated their superior efficiency and high quality. 相似文献

9.

Patterns and anomalies in <Emphasis Type="Italic">k</Emphasis>-cores of real-world graphs with applications

Kijung Shin Tina Eliassi-Rad Christos Faloutsos 《Knowledge and Information Systems》2018,54(3):677-710

How do the k-core structures of real-world graphs look like? What are the common patterns and the anomalies? How can we exploit them for applications? A k-core is the maximal subgraph in which all vertices have degree at least k. This concept has been applied to such diverse areas as hierarchical structure analysis, graph visualization, and graph clustering. Here, we explore pervasive patterns related to k-cores and emerging in graphs from diverse domains. Our discoveries are: (1) Mirror Pattern: coreness (i.e., maximum k such that each vertex belongs to the k-core) is strongly correlated with degree. (2) Core-Triangle Pattern: degeneracy (i.e., maximum k such that the k-core exists) obeys a 3-to-1 power-law with respect to the count of triangles. (3) Structured Core Pattern: degeneracy–cores are not cliques but have non-trivial structures such as core–periphery and communities. Our algorithmic contributions show the usefulness of these patterns. (1) Core-A, which measures the deviation from Mirror Pattern, successfully spots anomalies in real-world graphs, (2) Core-D, a single-pass streaming algorithm based on Core-Triangle Pattern, accurately estimates degeneracy up to 12 \(\times \) faster than its competitor. (3) Core-S, inspired by Structured Core Pattern, identifies influential spreaders up to 17 \(\times \) faster than its competitors with comparable accuracy. 相似文献

10.

Common2 extended to stacks and unbounded concurrency

Yehuda Afek Eli Gafni Adam Morrison 《Distributed Computing》2007,20(4):239-252

This paper extends Common2, the family of objects that implement and are wait-free implementable from 2 consensus objects, in two ways: First, the stack object is shown to be in the family, refuting a conjecture to the contrary [6]. Second, Common2 is investigated in the unbounded concurrency model, whereas until now it was considered only in an n-process model. We show that the fetch-and-add, test-and-set , and stack objects are in Common2 even with respect to this stronger notion of wait-free implementation. Our constructions rely on a wait-free implementation of immediate snapshots in the unbounded concurrency model, which was previously not known to be possible. The introduction of unbounded concurrency to the study of Common2 opens several directions of research: are there objects that have n-process implementations but are not unbounded concurrency implementable? We conjecture that swap is such an object. Additionally, the hope is that a queue impossibility proof, which eludes us in the n-process model, will be easier to establish in the unbounded concurrency model. 相似文献

11.

MTP: discovering high quality partitions in real world graphs

Yongsub?Lim Won-Jo?Lee Ho-Jin?Choi U?Kang Email author 《World Wide Web》2017,20(3):491-514

Given a real world graph, how can we find a large subgraph whose partition quality is much better than the original? How can we use a partition of that subgraph to discover a high quality global partition? Although graph partitioning especially with balanced sizes has received attentions in various applications, it is known NP-hard, and also known that there is no good cut at a large scale for real graphs. In this paper, we propose a novel approach for graph partitioning. Our first focus is on finding a large subgraph with high quality partitions, in terms of conductance. Despite the difficulty of the task for the whole graph, we observe that there is a large connected subgraph whose partition quality is much better than the original. Our proposed method MTP finds such a subgraph by removing “hub” nodes with large degrees, and taking the remaining giant connected component. Further, we extend MTP to gb MTP (Global Balanced MTP) for discovering a global balanced partition. gb MTP attaches the excluded nodes in MTP to the partition found by MTP in a greedy way. In experiments, we demonstrate that MTP finds a subgraph of a large size with low conductance graph partitions, compared with competing methods. We also show that the competitors cannot find connected subgraphs while our method does, by construction. This improvement in partition quality for the subgraph is especially noticeable for large scale cuts—for a balanced partition, down to 14 % of the original conductance with the subgraph size 70 % of the total. As a result, the found subgraph has clear partitions at almost all scales compared with the original. Moreover, gb MTP generally discovers global balanced partitions whose conductance are lower than those found by METIS, the state-of-the-art graph partitioning method. 相似文献

12.

Prompter

Luca?Ponzanelli Email author Gabriele?Bavota Massimiliano?Di Penta Rocco?Oliveto Michele?Lanza 《Empirical Software Engineering》2016,21(5):2190-2231

Developers often require knowledge beyond the one they possess, which boils down to asking co-workers for help or consulting additional sources of information, such as Application Programming Interfaces (API) documentation, forums, and Q&A websites. However, it requires time and energy to formulate one’s problem, peruse and process the results. We propose a novel approach that, given a context in the Integrated Development Environment (IDE), automatically retrieves pertinent discussions from Stack Overflow, evaluates their relevance using a multi-faceted ranking model, and, if a given confidence threshold is surpassed, notifies the developer. We have implemented our approach in Prompter, an Eclipse plug-in. Prompter was evaluated in two empirical studies. The first study was aimed at evaluatingPrompter’s ranking model and involved 33 participants. The second study was conducted with 12 participants and aimed at evaluating Prompter’s usefulness when supporting developers during development and maintenance tasks. Since Prompter uses “volatile information” crawled from the web, we also replicated Study I after one year to assess the impact of such a “volatility” on recommenders like Prompter. Our results indicate that (i) Prompter recommendations were positively evaluated in 74 % of the cases on average, (ii) Prompter significantly helps developers to improve the correctness of their tasks by 24 % on average, but also (iii) 78 % of the provided recommendations are “volatile” and can change at one year of distance. While Prompter revealed to be effective, our studies also point out issues when building recommenders based on information available on online forums. 相似文献

13.

Planar Feedback Vertex Set and Face Cover: Combinatorial Bounds and Subexponential Algorithms

Athanassios Koutsonas Dimitrios M. Thilikos 《Algorithmica》2011,60(4):987-1003

The Planar Feedback Vertex Set problem asks whether an n-vertex planar graph contains at most k vertices meeting all its cycles. The Face Cover problem asks whether all vertices of a plane graph G lie on the boundary of at most k faces of G. Standard techniques from parameterized algorithm design indicate that both problems can be solved by sub-exponential parameterized algorithms (where k is the parameter). In this paper we improve the algorithmic analysis of both problems by proving a series of combinatorial results relating the branchwidth of planar graphs with their face cover. Combining this fact with duality properties of branchwidth, allows us to derive analogous results on feedback vertex set. As a consequence, it follows that Planar Feedback Vertex Set and Face Cover can be solved in \(O(2^{15.11\cdot\sqrt{k}}+n^{2})\) and \(O(2^{10.1\cdot\sqrt {k}}+n^{2})\) steps, respectively. 相似文献

14.

Time-weighted counting for recently frequent pattern mining in data streams

Yongsub Lim U. Kang 《Knowledge and Information Systems》2017,53(2):391-422

How can we discover interesting patterns from time-evolving high-speed data streams? How to analyze the data streams quickly and accurately, with little space overhead? How to guarantee the found patterns to be self-consistent? High-speed data stream has been receiving increasing attention due to its wide applications such as sensors, network traffic, social networks, etc. The most fundamental task on the data stream is frequent pattern mining; especially, focusing on recentness is important in real applications. In this paper, we develop two algorithms for discovering recently frequent patterns in data streams. First, we propose TwMinSwap to find top-k recently frequent items in data streams, which is a deterministic version of our motivating algorithm TwSample providing theoretical guarantees based on item sampling. TwMinSwap improves TwSample in terms of speed, accuracy, and memory usage. Both require only O(k) memory spaces and do not require any prior knowledge on the stream such as its length and the number of distinct items in the stream. Second, we propose TwMinSwap-Is to find top-k recently frequent itemsets in data streams. We especially focus on keeping self-consistency of the discovered itemsets, which is the most important property for reliable results, while using O(k) memory space with the assumption of a constant itemset size. Through extensive experiments, we demonstrate that TwMinSwap outperforms all competitors in terms of accuracy and memory usage, with fast running time. We also show that TwMinSwap-Is is more accurate than the competitor and discovers recently frequent itemsets with reasonably large sizes (at most 5–7) depending on datasets. Thanks to TwMinSwap and TwMinSwap-Is, we report interesting discoveries in real world data streams, including the difference of trends between the winner and the loser of U.S. presidential candidates, and temporal human contact patterns. 相似文献

15.

Solvability conditions of problem of synthesizing proportional-integral control of a linear MIMO system

N. E. Zubov E. A. Mikrin M. Sh. Misrikhanov V. N. Ryabchenko 《Journal of Computer and Systems Sciences International》2016,55(5):821-831

The solvability conditions and just the solution of the problem of the regular and irregular proportional-integral (PI) control are found in accordance with the properties of invariant zeros of a multi-input multioutput (MIMO) system. It is proved that the problem of synthesizing the control of the MIMO system is solvable if and only if the pair of matrices (A, B) that describes a control plant is controllable and the matrix B_L^⊥AC_R^⊥ (where B_L^⊥ is the left zero divisor of the matrix B and C_R^⊥ is the right zero divisor of the output matrix C) has a complete row rank. 相似文献

16.

Ecosystem on the Web: non-linear mining and forecasting of co-evolving online activities

Yasuko Matsubara Yasushi Sakurai Christos Faloutsos 《World Wide Web》2017,20(3):439-465

Given a large collection of co-evolving online activities, such as searches for the keywords “Xbox”, “PlayStation” and “Wii”, how can we find patterns and rules? Are these keywords related? If so, are they competing against each other? Can we forecast the volume of user activity for the coming month? We conjecture that online activities compete for user attention in the same way that species in an ecosystem compete for food. We present EcoWeb, (i.e., Ecosystem on the Web), which is an intuitive model designed as a non-linear dynamical system for mining large-scale co-evolving online activities. Our second contribution is a novel, parameter-free, and scalable fitting algorithm, EcoWeb-Fit, that estimates the parameters of EcoWeb. Extensive experiments on real data show that EcoWeb is effective, in that it can capture long-range dynamics and meaningful patterns such as seasonalities, and practical, in that it can provide accurate long-range forecasts. EcoWeb consistently outperforms existing methods in terms of both accuracy and execution speed. 相似文献

17.

Towards a Realistic Analysis of the QuickSelect Algorithm

Julien Clément James Allen Fill Thu Hien Nguyen Thi Brigitte Vallée 《Theory of Computing Systems》2016,58(4):528-578

We revisit the analysis of the classical QuickSelect algorithm. Usually, the analysis deals with the mean number of key comparisons, but here we view keys as words produced by a source, and words are compared via their symbols in lexicographic order. Our probabilistic models belong to a broad category of information sources that encompasses memoryless (i.e., independent-symbols) and Markov sources, as well as many unbounded-correlation sources. The “realistic” cost of the algorithm is here the total number of symbol comparisons performed by the algorithm, and, in this context, the average-case analysis aims to provide estimates for the mean number of symbol comparisons. For the QuickSort algorithm, known average-case complexity results are of \({\Theta } (n \log n)\) in the case of key comparisons, and \({\Theta }(n\log ^{2} n)\) for symbol comparisons. For QuickSelect algorithms, and with respect to key comparisons, the average-case complexity is Θ(n). In this present article, we prove that, with respect to symbol comparisons, QuickSelect’s average-case complexity remains Θ(n). In each case, we provide explicit expressions for the dominant constants, closely related to the probabilistic behaviour of the source. 相似文献

18.

On negative results when using sentiment analysis tools for software engineering research

Robbert?Jongeling Proshanta?Sarkar Subhajit?Datta Alexander?Serebrenik Email author View author&#;s OrcID profile 《Empirical Software Engineering》2017,22(5):2543-2584

Recent years have seen an increasing attention to social aspects of software engineering, including studies of emotions and sentiments experienced and expressed by the software developers. Most of these studies reuse existing sentiment analysis tools such as SentiStrength and NLTK. However, these tools have been trained on product reviews and movie reviews and, therefore, their results might not be applicable in the software engineering domain. In this paper we study whether the sentiment analysis tools agree with the sentiment recognized by human evaluators (as reported in an earlier study) as well as with each other. Furthermore, we evaluate the impact of the choice of a sentiment analysis tool on software engineering studies by conducting a simple study of differences in issue resolution times for positive, negative and neutral texts. We repeat the study for seven datasets (issue trackers and Stack Overflow questions) and different sentiment analysis tools and observe that the disagreement between the tools can lead to diverging conclusions. Finally, we perform two replications of previously published studies and observe that the results of those studies cannot be confirmed when a different sentiment analysis tool is used. 相似文献

19.

Quantum algorithms on Walsh transform and Hamming distance for Boolean functions

Zhengwei Xie Daowen Qiu Guangya Cai 《Quantum Information Processing》2018,17(6):139

相似文献

20.

Continuous visible nearest neighbor query processing in spatial databases 总被引：1，自引：0，他引：1

Yunjun Gao Baihua Zheng Gencai Chen Qing Li Xiaofa Guo 《The VLDB Journal The International Journal on Very Large Data Bases》2011,20(3):371-396

In this paper, we identify and solve a new type of spatial queries, called continuous visible nearest neighbor (CVNN) search. Given a data set P, an obstacle set O, and a query line segment q in a two-dimensional space, a CVNN query returns a set of \({\langle p, R\rangle}\) tuples such that \({p \in P}\) is the nearest neighbor to every point r along the interval \({R \subseteq q}\) as well as p is visible to r. Note that p may be NULL, meaning that all points in P are invisible to all points in R due to the obstruction of some obstacles in O. In contrast to existing continuous nearest neighbor query, CVNN retrieval considers the impact of obstacles on visibility between objects, which is ignored by most of spatial queries. We formulate the problem, analyze its unique characteristics, and develop efficient algorithms for exact CVNN query processing. Our methods (1) utilize conventional data-partitioning indices (e.g., R-trees) on both P and O, (2) tackle the CVNN search by performing a single query for the entire query line segment, and (3) only access the data points and obstacles relevant to the final query result by employing a suite of effective pruning heuristics. In addition, several interesting variations of CVNN queries have been introduced, and they can be supported by our techniques, which further demonstrates the flexibility of the proposed algorithms. A comprehensive experimental evaluation using both real and synthetic data sets has been conducted to verify the effectiveness of our proposed pruning heuristics and the performance of our proposed algorithms. 相似文献