首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Given a simple undirected graph G = (V, E) and an integer k < |V|, the Sparsest k-Subgraph problem asks for a set of k vertices which induces the minimum number of edges. As a generalization of the classical independent set problem, Sparsest k-Subgraph is ????-hard and even not approximable unless ?????? in general graphs. Thus, we investigate Sparsest k-Subgraph in graph classes where independent set is polynomial-time solvable, such as subclasses of perfect graphs. Our two main results are the ????-hardness of Sparsest k-Subgraph on chordal graphs, and a greedy 2-approximation algorithm. Finally, we also show how to derive a P T A S for Sparsest k-Subgraph on proper interval graphs.  相似文献   

2.
How do the k-core structures of real-world graphs look like? What are the common patterns and the anomalies? How can we exploit them for applications? A k-core is the maximal subgraph in which all vertices have degree at least k. This concept has been applied to such diverse areas as hierarchical structure analysis, graph visualization, and graph clustering. Here, we explore pervasive patterns related to k-cores and emerging in graphs from diverse domains. Our discoveries are: (1) Mirror Pattern: coreness (i.e., maximum k such that each vertex belongs to the k-core) is strongly correlated with degree. (2) Core-Triangle Pattern: degeneracy (i.e., maximum k such that the k-core exists) obeys a 3-to-1 power-law with respect to the count of triangles. (3) Structured Core Pattern: degeneracy–cores are not cliques but have non-trivial structures such as core–periphery and communities. Our algorithmic contributions show the usefulness of these patterns. (1) Core-A, which measures the deviation from Mirror Pattern, successfully spots anomalies in real-world graphs, (2) Core-D, a single-pass streaming algorithm based on Core-Triangle Pattern, accurately estimates degeneracy up to 12 \(\times \) faster than its competitor. (3) Core-S, inspired by Structured Core Pattern, identifies influential spreaders up to 17 \(\times \) faster than its competitors with comparable accuracy.  相似文献   

3.
In this paper we consider both the maximization variant Max Rep and the minimization variant Min Rep of the famous Label Cover problem. So far the best approximation ratios known for these two problems were \(O(\sqrt{n})\) and indeed some authors suggested the possibility that this ratio is the best approximation factor for these two problems. We show, in fact, that there are a O(n 1/3)-approximation algorithm for Max Rep and a O(n 1/3log?2/3 n)-approximation algorithm for Min Rep. In addition, we also exhibit a randomized reduction from Densest k-Subgraph to Max Rep, showing that any approximation factor for Max Rep implies the same factor (up to a constant) for Densest k-Subgraph.  相似文献   

4.
Maximal clique enumeration is a fundamental problem in graph theory and has been extensively studied. However, maximal clique enumeration is time-consuming in large graphs and always returns enormous cliques with large overlaps. Motivated by this, in this paper, we study the diversified top-k clique search problem which is to find top-k cliques that can cover most number of nodes in the graph. Diversified top-k clique search can be widely used in a lot of applications including community search, motif discovery, and anomaly detection in large graphs. A naive solution for diversified top-k clique search is to keep all maximal cliques in memory and then find k of them that cover most nodes in the graph by using the approximate greedy max k-cover algorithm. However, such a solution is impractical when the graph is large. In this paper, instead of keeping all maximal cliques in memory, we devise an algorithm to maintain k candidates in the process of maximal clique enumeration. Our algorithm has limited memory footprint and can achieve a guaranteed approximation ratio. We also introduce a novel light-weight \(\mathsf {PNP}\)-\(\mathsf {Index}\), based on which we design an optimal maximal clique maintenance algorithm. We further explore three optimization strategies to avoid enumerating all maximal cliques and thus largely reduce the computational cost. Besides, for the massive input graph, we develop an I/O efficient algorithm to tackle the problem when the input graph cannot fit in main memory. We conduct extensive performance studies on real graphs and synthetic graphs. One of the real graphs contains 1.02 billion edges. The results demonstrate the high efficiency and effectiveness of our approach.  相似文献   

5.
The Planar Feedback Vertex Set problem asks whether an n-vertex planar graph contains at most k vertices meeting all its cycles. The Face Cover problem asks whether all vertices of a plane graph G lie on the boundary of at most k faces of G. Standard techniques from parameterized algorithm design indicate that both problems can be solved by sub-exponential parameterized algorithms (where k is the parameter). In this paper we improve the algorithmic analysis of both problems by proving a series of combinatorial results relating the branchwidth of planar graphs with their face cover. Combining this fact with duality properties of branchwidth, allows us to derive analogous results on feedback vertex set. As a consequence, it follows that Planar Feedback Vertex Set and Face Cover can be solved in \(O(2^{15.11\cdot\sqrt{k}}+n^{2})\) and \(O(2^{10.1\cdot\sqrt {k}}+n^{2})\) steps, respectively.  相似文献   

6.
A degree-constrained graph orientation of an undirected graph G is an assignment of a direction to each edge in G such that the outdegree of every vertex in the resulting directed graph satisfies a specified lower and/or upper bound. Such graph orientations have been studied for a long time and various characterizations of their existence are known. In this paper, we consider four related optimization problems introduced in reference (Asahiro et al. LNCS 7422, 332–343 (2012)): For any fixed non-negative integer W, the problems MAX W-LIGHT, MIN W-LIGHT, MAX W-HEAVY, and MIN W-HEAVY take as input an undirected graph G and ask for an orientation of G that maximizes or minimizes the number of vertices with outdegree at most W or at least W. As shown in Asahiro et al. LNCS 7422, 332–343 (2012)).  相似文献   

7.
The paper studies three fundamental problems in graph analytics, computing connected components (CCs), biconnected components (BCCs), and 2-edge-connected components (ECCs) of a graph. With the recent advent of big data, developing efficient distributed algorithms for computing CCs, BCCs and ECCs of a big graph has received increasing interests. As with the existing research efforts, we focus on the Pregel programming model, while the techniques may be extended to other programming models including MapReduce and Spark. The state-of-the-art techniques for computing CCs and BCCs in Pregel incur \(O(m\times \#\text {supersteps})\) total costs for both data communication and computation, where m is the number of edges in a graph and #supersteps is the number of supersteps. Since the network communication speed is usually much slower than the computation speed, communication costs are the dominant costs of the total running time in the existing techniques. In this paper, we propose a new paradigm based on graph decomposition to compute CCs and BCCs with O(m) total communication cost. The total computation costs of our techniques are also smaller than that of the existing techniques in practice, though theoretically almost the same. Moreover, we also study distributed computing ECCs. We are the first to study this problem and an approach with O(m) total communication cost is proposed. Comprehensive empirical studies demonstrate that our approaches can outperform the existing techniques by one order of magnitude regarding the total running time.  相似文献   

8.
In this paper, we define a new class of queries, the top-k multiple-type integrated query (simply, top-k MULTI query). It deals with multiple data types and finds the information in the order of relevance between the query and the object. Various data types such as spatial, textual, and relational data types can be used for the top-k MULTI query. The top-k MULTI query distinguishes itself from the traditional top-k query in that the component scores to calculate final scores are determined dependent of the query. Hence, each component score is calculated only when the query is given for each data type rather than being calculated apriori as in the top-k query. As a representative instance, the traditional top-k spatial keyword query is an instance of the top-k MULTI query. It deals with the spatial data type and text data type and finds the information based on spatial proximity and textual relevance between the query and the object, which is determined only when the query is given. In this paper, we first define the top-k MULTI query formally and define a new specific instance for the top-k MULTI query, the top-k spatial-keyword-relational(SKR) query, by integrating the relational data type into the traditional top-k spatial keyword query. Then, we investigate the processing approaches for the top-k MULTI query. We discuss the scalability of those approaches as new data types are integrated. We also devise the processing methods for the top-k SKR query. Finally, through extensive experiments on the top-k SKR query using real and synthetic data sets, we compare efficiency of the methods in terms of the query performance and storage.  相似文献   

9.
In the framework of parameterized complexity, exploring how one parameter affects the complexity of a different parameterized (or unparameterized problem) is of general interest. A well-developed example is the investigation of how the parameter treewidth influences the complexity of (other) graph problems. The reason why such investigations are of general interest is that real-world input distributions for computational problems often inherit structure from the natural computational processes that produce the problem instances (not necessarily in obvious, or well-understood ways). The max leaf number ml(G) of a connected graph G is the maximum number of leaves in a spanning tree for G. Exploring questions analogous to the well-studied case of treewidth, we can ask: how hard is it to solve 3-Coloring, Hamilton Path, Minimum Dominating Set, Minimum Bandwidth or many other problems, for graphs of bounded max leaf number? What optimization problems are W[1]-hard under this parameterization? We do two things:
  1. (1)
    We describe much improved FPT algorithms for a large number of graph problems, for input graphs G for which ml(G)≤k, based on the polynomial-time extremal structure theory canonically associated to this parameter. We consider improved algorithms both from the point of view of kernelization bounds, and in terms of improved fixed-parameter tractable (FPT) runtimes O *(f(k)).
     
  2. (2)
    The way that we obtain these concrete algorithmic results is general and systematic. We describe the approach, and raise programmatic questions.
     
  相似文献   

10.
Why-not and why questions can be posed by database users to seek clarifications on unexpected query results. Specifically, why-not questions aim to explain why certain expected tuples are absent from the query results, while why questions try to clarify why certain unexpected tuples are present in the query results. This paper systematically explores the why-not and why questions on reverse top-k queries, owing to its importance in multi-criteria decision making. We first formalize why-not questions on reverse top-k queries, which try to include the missing objects in the reverse top-k query results, and then, we propose a unified framework called WQRTQ to answer why-not questions on reverse top-k queries. Our framework offers three solutions to cater for different application scenarios. Furthermore, we study why questions on reverse top-k queries, which aim to exclude the undesirable objects from the reverse top-k query results, and extend the framework WQRTQ to efficiently answer why questions on reverse top-k queries, which demonstrates the flexibility of our proposed algorithms. Extensive experimental evaluation with both real and synthetic data sets verifies the effectiveness and efficiency of the presented algorithms under various experimental settings.  相似文献   

11.
The key issue in top-k retrieval, finding a set of k documents (from a large document collection) that can best answer a user’s query, is to strike the optimal balance between relevance and diversity. In this paper, we study the top-k retrieval problem in the framework of facility location analysis and prove the submodularity of that objective function which provides a theoretical approximation guarantee of factor 1?\(\frac{1}{e}\) for the (best-first) greedy search algorithm. Furthermore, we propose a two-stage hybrid search strategy which first obtains a high-quality initial set of top-k documents via greedy search, and then refines that result set iteratively via local search. Experiments on two large TREC benchmark datasets show that our two-stage hybrid search strategy approach can supersede the existing ones effectively and efficiently.  相似文献   

12.
The discovery of informative itemsets is a fundamental building block in data analytics and information retrieval. While the problem has been widely studied, only few solutions scale. This is particularly the case when (1) the data set is massive, calling for large-scale distribution, and/or (2) the length k of the informative itemset to be discovered is high. In this paper, we address the problem of parallel mining of maximally informative k-itemsets (miki) based on joint entropy. We propose PHIKS (Parallel Highly Informative \(\underline{K}\)-ItemSet), a highly scalable, parallel miki mining algorithm. PHIKS renders the mining process of large-scale databases (up to terabytes of data) succinct and effective. Its mining process is made up of only two efficient parallel jobs. With PHIKS, we provide a set of significant optimizations for calculating the joint entropies of miki having different sizes, which drastically reduces the execution time, the communication cost and the energy consumption, in a distributed computational platform. PHIKS has been extensively evaluated using massive real-world data sets. Our experimental results confirm the effectiveness of our proposal by the significant scale-up obtained with high itemsets length and over very large databases.  相似文献   

13.
Given a road network G = (V,E), where V (E) denotes the set of vertices(edges) in G, a set of points of interest P and a query point q residing in G, the reverse furthest neighbors (Rfn R ) query in road networks fetches a set of points pP that take q as their furthest neighbor compared with all points in P ∪ {q}. This is the monochromatic Rfn R (Mrfn R ) query. Another interesting version of Rfn R query is the bichromatic reverse furthest neighbor (Brfn R ) query. Given two sets of points P and Q, and a query point qQ, a Brfn R query fetches a set of points pP that take q as their furthest neighbor compared with all points in Q. This paper presents efficient algorithms for both Mrfn R and Brfn R queries, which utilize landmarks and partitioning-based techniques. Experiments on real datasets confirm the efficiency and scalability of proposed algorithms.  相似文献   

14.
We present two parameterized algorithms for the Minimum Fill-in problem, also known as Chordal Completion: given an arbitrary graph G and integer k, can we add at most k edges to G to obtain a chordal graph? Our first algorithm has running time \(\mathcal {O}(k^{2}nm+3.0793^{k})\), and requires polynomial space. This improves the base of the exponential part of the best known parameterized algorithm time for this problem so far. We are able to improve this running time even further, at the cost of more space. Our second algorithm has running time \(\mathcal {O}(k^{2}nm+2.35965^{k})\) and requires \(\mathcal {O}^{\ast}(1.7549^{k})\) space. To achieve these results, we present a new lemma describing the edges that can safely be added to achieve a chordal completion with the minimum number of edges, regardless of k.  相似文献   

15.
Given a large collection of co-evolving online activities, such as searches for the keywords “Xbox”, “PlayStation” and “Wii”, how can we find patterns and rules? Are these keywords related? If so, are they competing against each other? Can we forecast the volume of user activity for the coming month? We conjecture that online activities compete for user attention in the same way that species in an ecosystem compete for food. We present EcoWeb, (i.e., Ecosystem on the Web), which is an intuitive model designed as a non-linear dynamical system for mining large-scale co-evolving online activities. Our second contribution is a novel, parameter-free, and scalable fitting algorithm, EcoWeb-Fit, that estimates the parameters of EcoWeb. Extensive experiments on real data show that EcoWeb is effective, in that it can capture long-range dynamics and meaningful patterns such as seasonalities, and practical, in that it can provide accurate long-range forecasts. EcoWeb consistently outperforms existing methods in terms of both accuracy and execution speed.  相似文献   

16.
Disjunctive Temporal Problems (DTPs) with Preferences (DTPPs) extend DTPs with piece-wise constant preference functions associated to each constraint of the form lx ? yu, where x,y are (real or integer) variables, and l,u are numeric constants. The goal is to find an assignment to the variables of the problem that maximizes the sum of the preference values of satisfied DTP constraints, where such values are obtained by aggregating the preference functions of the satisfied constraints in it under a “max” semantic. The state-of-the-art approach in the field, implemented in the native DTPP solver Maxilitis, extends the approach of the native DTP solver Epilitis. In this paper we present alternative approaches that translate DTPPs to Maximum Satisfiability of a set of Boolean combination of constraints of the form l?x ? y?u, ? ∈{<,≤}, that extend previous work dealing with constant preference functions only. We prove correctness and completeness of the approaches. Results obtained with the Satisfiability Modulo Theories (SMT) solvers Yices and MathSAT on randomly generated DTPPs and DTPPs built from real-world benchmarks, show that one of our translation is competitive to, and can be faster than, Maxilitis (This is an extended and revised version of Bourguet et al. 2013).  相似文献   

17.
Sequential pattern mining (SPM) is an important data mining problem with broad applications. SPM is a hard problem due to the huge number of intermediate subsequences to be considered. State of the art approaches for SPM (e.g., PrefixSpan Pei et al. 2001) are largely based on the pattern-growth approach, where for each frequent prefix subsequence, only its related suffix subsequences need to be considered, and the database is recursively projected into smaller ones. Many authors have promoted the use of constraints to focus on the most promising patterns according to the interests of the end user. The top-k SPM problem is also used to cope with the difficulty of thresholding and to control the number of solutions. State of the art methods developed for SPM and top-k SPM, though efficient, are locked into a rather rigid search strategy, and suffer from the lack of declarativity and flexibility. Indeed, adding new constraints usually amounts to changing the data-structures used in the core of the algorithm, and combining these new constraints often require new developments. Recent works (e.g. Kemmar et al. 2014; Négrevergne and Guns 2015) have investigated the use of Constraint Programming (CP) for SPM. However, despite their nice declarative aspects, all these modelings have scaling problems, due to the huge size of their constraint networks. To address this issue, we propose the Prefix-Projection global constraint, which encapsulates both the subsequence relation as well as the frequency constraint. Its filtering algorithm relies on the principle of projected databases which allows to keep in the variables domain, only values leading to a frequent pattern in the database. Prefix-Projection filtering algorithm enforces domain consistency on the variable succeeding the current frequent prefix in polynomial time. This global constraint also allows for a straightforward implementation of additional constraints such as size, item membership, regular expressions and any combination of them. Experimental results show that our approach clearly outperforms existing CP approaches and competes well with the state-of-the-art methods on large datasets for mining frequent sequential patterns, sequential patterns under various constraints, and top-k sequential patterns. Unlike existing CP methods, our approach achieves a better scalability.  相似文献   

18.
19.
Aggregate similarity search, also known as aggregate nearest-neighbor (Ann) query, finds many useful applications in spatial and multimedia databases. Given a group Q of M query objects, it retrieves from a database the objects most similar to Q, where the similarity is an aggregation (e.g., \({{\mathrm{sum}}}\), \(\max \)) of the distances between each retrieved object p and all the objects in Q. In this paper, we propose an added flexibility to the query definition, where the similarity is an aggregation over the distances between p and any subset of \(\phi M\) objects in Q for some support \(0< \phi \le 1\). We call this new definition flexible aggregate similarity search and accordingly refer to a query as a flexible aggregate nearest-neighbor ( Fann ) query. We present algorithms for answering Fann queries exactly and approximately. Our approximation algorithms are especially appealing, which are simple, highly efficient, and work well in both low and high dimensions. They also return near-optimal answers with guaranteed constant-factor approximations in any dimensions. Extensive experiments on large real and synthetic datasets from 2 to 74 dimensions have demonstrated their superior efficiency and high quality.  相似文献   

20.
The Surjective Homomorphism problem is to test whether a given graph G called the guest graph allows a vertex-surjective homomorphism to some other given graph H called the host graph. The bijective and injective homomorphism problems can be formulated in terms of spanning subgraphs and subgraphs, and as such their computational complexity has been extensively studied. What about the surjective variant? Because this problem is NP-complete in general, we restrict the guest and the host graph to belong to graph classes \({{\mathcal G}}\) and \({{\mathcal H}}\), respectively. We determine to what extent a certain choice of \({{\mathcal G}}\) and \({{\mathcal H}}\) influences its computational complexity. We observe that the problem is polynomial-time solvable if \({{\mathcal H}}\) is the class of paths, whereas it is NP-complete if \({{\mathcal G}}\) is the class of paths. Moreover, we show that the problem is even NP-complete on many other elementary graph classes, namely linear forests, unions of complete graphs, cographs, proper interval graphs, split graphs and trees of pathwidth at most 2. In contrast, we prove that the problem is fixed-parameter tractable in k if \({{\mathcal G}}\) is the class of trees and \({{\mathcal H}}\) is the class of trees with at most k leaves, or if \({{\mathcal G}}\) and \({{\mathcal H}}\) are equal to the class of graphs with vertex cover number at most k.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号