首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Graphs appear in numerous applications including cyber security, the Internet, social networks, protein networks, recommendation systems, citation networks, and many more. Graphs with millions or even billions of nodes and edges are common-place. How to store such large graphs efficiently? What are the core operations/queries on those graph? How to answer the graph queries quickly? We propose Gbase, an efficient analysis platform for large graphs. The key novelties lie in (1) our storage and compression scheme for a parallel, distributed settings and (2) the carefully chosen graph operations and their efficient implementations. We designed and implemented an instance of Gbase using Mapreduce/Hadoop. Gbase provides a parallel indexing mechanism for graph operations that both saves storage space, as well as accelerates query responses. We run numerous experiments on real and synthetic graphs, spanning billions of nodes and edges, and we show that our proposed Gbase is indeed fast, scalable, and nimble, with significant savings in space and time.  相似文献   

2.
In a previous paper, we laid out the vision of a novel graph query processing paradigm where instead of processing a visual query graph after its construction, it interleaves visual query formulation and processing by exploiting the latency offered by the gui to filter irrelevant matches and prefetch partial query results [8]. Our recent attempts at implementing this vision [8, 9] show significant improvement in system response time (srt) for subgraph queries. However, these efforts are designed specifically for graph databases containing a large collection of small or medium-sized graphs. In this paper, we propose a novel algorithm called quble (QUery Blender for Large nEtworks) to realize this visual subgraph querying paradigm on very large networks (e.g., protein interaction networks, social networks). First, it decomposes a large network into a set of graphlets and supergraphlets using a minimum cut-based graph partitioning technique. Next, it mines approximate frequent and small infrequent fragments (sifs) from them and identifies their occurrences in these graphlets and supergraphlets. Then, the indexing framework of [9] is enhanced so that the mined fragments can be exploited to index graphlets for efficient blending of visual subgraph query formulation and query processing. Extensive experiments on large networks demonstrate effectiveness of quble.  相似文献   

3.
Despite a large body of work on XPath query processing in relational environment, systematic study of queries containing not-predicates have received little attention in the literature. Particularly, several xml supports of industrial-strength commercial rdbms fail to efficiently evaluate such queries. In this paper, we present an efficient and novel strategy to evaluate not -twig queries in a tree-unaware relational environment. not -twig queries are XPath queries with ancestor–descendant and parent–child axis and contain one or more not-predicates. We propose a novel Dewey-based encoding scheme called Andes (ANcestor Dewey-based Encoding Scheme), which enables us to efficiently filter out elements satisfying a not-predicate by comparing their ancestor group identifiers. In this approach, a set of elements under the same common ancestor at a specific level in the xml tree is assigned same ancestor group identifier. Based on this scheme, we propose a novel sql translation algorithm for not-twig query evaluation. Experiments carried out confirm that our proposed approach built on top of an off-the-shelf commercial rdbms significantly outperforms state-of-the-art relational and native approaches. We also explore the query plans selected by a commercial relational optimizer to evaluate our translated queries in different input cardinality. Such exploration further validates the performance benefits of Andes.  相似文献   

4.
Given an undirected and edge-weighted graph G together with a set of ordered vertex-pairs, called st-pairs, we consider two problems of finding an orientation of all edges in G: min-sum orientation is to minimize the sum of the shortest directed distances between all st-pairs; and min-max orientation is to minimize the maximum shortest directed distance among all st-pairs. Note that these shortest directed paths for st-pairs are not necessarily edge-disjoint. In this paper, we first show that both problems are strongly NP-hard for planar graphs even if all edge-weights are identical, and that both problems can be solved in polynomial time for cycles. We then consider the problems restricted to cacti, which form a graph class that contains trees and cycles but is a subclass of planar graphs. Then, min-sum orientation is solvable in polynomial time, whereas min-max orientation remains NP-hard even for two st-pairs. However, based on LP-relaxation, we present a polynomial-time 2-approximation algorithm for min-max orientation. Finally, we give a fully polynomial-time approximation scheme (FPTAS) for min-max orientation on cacti if the number of st-pairs is a fixed constant.  相似文献   

5.
We study the problem of answering k -hop reachability queries in a directed graph, i.e., whether there exists a directed path of length $k$ , from a source query vertex to a target query vertex in the input graph. The problem of $k$ -hop reachability is a general problem of the classic reachability (where $k=\infty $ ). Existing indexes for processing classic reachability queries, as well as for processing shortest path distance queries, are not applicable or not efficient for processing $k$ -hop reachability queries. We propose an efficient index for processing $k$ -hop reachability queries. Our experimental results on a wide range of real datasets show that our method is efficient and scalable in terms of both index construction and query processing.  相似文献   

6.
We investigate a class of scheduling problems that arise in the optimization of SQL queries for parallel machines (Chekuri et al. in PODS??95, pp.?255?C265, 1995). In these problems, an undirected graph is used to represent communication and inter-operator parallelism. The goal is to minimize the global response time of the system. We provide a polynomial time approximation scheme for the special cases where the operator graph is a tree, thereby improving on a polynomial time 2.87-approximation algorithm by Chekuri et al. The approximation scheme is generalized to the case where the operator graph has treewidth bounded by a constant. We analyze instances with small response times for general operator graphs: Deciding whether a response time of four time units can be reached is easy, but deciding whether a response time of six time units can be reached is NP-hard. Finally, we prove that for general operator graphs the existence of a polynomial time approximation algorithm with worst case performance guarantee better than 4/3 would imply P=NP.  相似文献   

7.
Providing built-in keyword search capabilities in RDBMS   总被引:2,自引:0,他引:2  
A common approach to performing keyword search over relational databases is to find the minimum Steiner trees in database graphs transformed from relational data. These methods, however, are rather expensive as the minimum Steiner tree problem is known to be NP-hard. Further, these methods are independent of the underlying relational database management system (RDBMS), thus cannot benefit from the capabilities of the RDBMS. As an alternative, in this paper we propose a new concept called Compact Steiner Tree (CSTree), which can be used to approximate the Steiner tree problem for answering top-k keyword queries efficiently. We propose a novel structure-aware index, together with an effective ranking mechanism for fast, progressive and accurate retrieval of top-k highest ranked CSTrees. The proposed techniques can be implemented using a standard relational RDBMS to benefit from its indexing and query-processing capability. We have implemented our techniques in MYSQL, which can provide built-in keyword-search capabilities using SQL. The experimental results show a significant improvement in both search efficiency and result quality comparing to existing state-of-the-art approaches.  相似文献   

8.
Search engine users often encounter the difficulty of phrasing the precise query that could lead to satisfactory search results. Query recommendation is considered an effective assistant in enhancing keyword-based queries in search engines and Web search software. In this paper, we present a Query-URL Bipartite based query reCommendation approach, called QUBiC. It utilizes the connectivity of a query-URL bipartite graph to recommend related queries and can significantly improve the accuracy and effectiveness of personalized query recommendation systems comparing with the conventional pairwise similarity based approach. The main contribution of the QUBiC approach is its three-phase framework for personalized query recommendations. The first phase is the preparation of queries and their search results returned by a search engine, which generates a historical query-URL bipartite collection. The second phase is the discovery of similar queries by extracting a query affinity graph from the bipartite graph, instead of operating on the original bipartite graph directly using biclique-based approach or graph clustering. The query affinity graph consists of only queries as its vertices and its edges are weighted according to a query-URL vector based similarity (dissimilarity) measure. The third phase is the ranking of similar queries. We devise a novel rank mechanism for ordering the related queries based on the merging distances of a hierarchical agglomerative clustering (HAC). By utilizing the query affinity graph and the HAC-based ranking, we are able to capture the propagation of similarity from query to query by inducing an implicit topical relatedness between queries. Furthermore, the flexibility of the HAC strategy makes it possible for users to interactively participate in the query recommendation process, and helps to bridge the gap between the determinacy of actual similarity values and the indeterminacy of users’ information needs, allowing the lists of related queries to be changed from user to user and query to query, thus adaptively recommending related queries on demand. Our experimental evaluation results show that the QUBiC approach is highly efficient and more effective compared to the conventional query recommendation systems, yielding about 13.3 % as the most improvement in terms of precision.  相似文献   

9.
We study the complexity of some algorithmic problems on directed hypergraphs and their strongly connected components (Sccs). The main contribution is an almost linear time algorithm computing the terminal strongly connected components (i.e. Sccs which do not reach any components but themselves). Almost linear here means that the complexity of the algorithm is linear in the size of the hypergraph up to a factor α(n), where α is the inverse of Ackermann function, and n is the number of vertices. Our motivation to study this problem arises from a recent application of directed hypergraphs to computational tropical geometry. We also discuss the problem of computing all Sccs. We establish a superlinear lower bound on the size of the transitive reduction of the reachability relation in directed hypergraphs, showing that it is combinatorially more complex than in directed graphs. Besides, we prove a linear time reduction from the well-studied problem of finding all minimal sets among a given family to the problem of computing the Sccs. Only subquadratic time algorithms are known for the former problem. These results strongly suggest that the problem of computing the Sccs is harder in directed hypergraphs than in directed graphs.  相似文献   

10.
The NP-complete Power Dominating Set problem is an “electric power networks variant” of the classical domination problem in graphs: Given an undirected graph G=(V,E), find a minimum-size set P?V such that all vertices in V are “observed” by the vertices in P. Herein, a vertex observes itself and all its neighbors, and if an observed vertex has all but one of its neighbors observed, then the remaining neighbor becomes observed as well. We show that Power Dominating Set can be solved by “bounded-treewidth dynamic programs.” For treewidth being upper-bounded by a constant, we achieve a linear-time algorithm. In particular, we present a simplified linear-time algorithm for Power Dominating Set in trees. Moreover, we simplify and extend several NP-completeness results, particularly showing that Power Dominating Set remains NP-complete for planar graphs, for circle graphs, and for split graphs. Specifically, our improved reductions imply that Power Dominating Set parameterized by |P| is W[2]-hard and it cannot be better approximated than Dominating Set.  相似文献   

11.
A circle graph is the intersection graph of a set of chords in a circle. Keil [Discrete Appl. Math., 42(1):51–63, 1993] proved that Dominating Set, Connected Dominating Set, and Total Dominating Set are NP-complete in circle graphs. To the best of our knowledge, nothing was known about the parameterized complexity of these problems in circle graphs. In this paper we prove the following results, which contribute in this direction:
  • Dominating Set, Independent Dominating Set, Connected Dominating Set, Total Dominating Set, and Acyclic Dominating Set are W[1]-hard in circle graphs, parameterized by the size of the solution.
  • Whereas both Connected Dominating Set and Acyclic Dominating Set are W[1]-hard in circle graphs, it turns out that Connected Acyclic Dominating Set is polynomial-time solvable in circle graphs.
  • If T is a given tree, deciding whether a circle graph G has a dominating set inducing a graph isomorphic to T is NP-complete when T is in the input, and FPT when parameterized by t=|V(T)|. We prove that the FPT algorithm runs in subexponential time, namely $2^{\mathcal{O}(t \cdot\frac{\log\log t}{\log t})} \cdot n^{\mathcal{O}(1)}$ , where n=|V(G)|.
  相似文献   

12.
The uml Profile for Modeling and Analysis of Real-Time and Embedded (RTE) systems has recently been adopted by the OMG. Its Time Model extends the informal and simplistic Simple Time package proposed by Unified Modeling Language (UML2) and offers a broad range of capabilities required to model RTE systems including discrete/dense and chronometric/logical time. The Marte specification introduces a Time Structure inspired from several time models of the concurrency theory and proposes a new clock constraint specification language (ccsl) to specify, within the context of the uml, logical and chronometric time constraints. A semantic model in ccsl is attached to a (uml) model to give its timed causality semantics. In that sense, ccsl is comparable to the Ptolemy environment, in which directors give the semantics to models according to predefined models of computation and communication. This paper focuses on one historical model of computation of Ptolemy [Synchronous Data Flow (SDF)] and shows how to build SDF graphs by combining uml models and ccsl.  相似文献   

13.
Inclusion/exclusion and measure and conquer are two central techniques from the field of exact exponential-time algorithms that recently received a lot of attention. In this paper, we show that both techniques can be used in a single algorithm. This is done by looking at the principle of inclusion/exclusion as a branching rule. This inclusion/exclusion-based branching rule can be combined in a branch-and-reduce algorithm with traditional branching rules and reduction rules. The resulting algorithms can be analysed using measure and conquer allowing us to obtain good upper bounds on their running times. In this way, we obtain the currently fastest exact exponential-time algorithms for a number of domination problems in graphs. Among these are faster polynomial-space and exponential-space algorithms for #Dominating Set and Minimum Weight Dominating Set (for the case where the set of possible weight sums is polynomially bounded), and a faster polynomial-space algorithm for Domatic Number. This approach is also extended in this paper to the setting where not all requirements in a problem need to be satisfied. This results in faster polynomial-space and exponential-space algorithms for Partial Dominating Set, and faster polynomial-space and exponential-space algorithms for the well-studied parameterised problem k-Set Splitting and its generalisation k-Not-All-Equal Satisfiability.  相似文献   

14.
Classic Learning     
Frazier  Michael  Pitt  Leonard 《Machine Learning》1996,25(2-3):151-193
  相似文献   

15.
Community detection and evaluation is an important task in graph mining. In many cases, a community is defined as a subgraph characterized by dense connections or interactions between its nodes. A variety of measures are proposed to evaluate different quality aspects of such communities—in most cases ignoring the directed nature of edges. In this paper, we introduce novel metrics for evaluating the collaborative nature of directed graphs—a property not captured by the single node metrics or by other established community evaluation metrics. In order to accomplish this objective, we capitalize on the concept of graph degeneracy and define a novel D-core framework, extending the classic graph-theoretic notion of $k$ -cores for undirected graphs to directed ones. Based on the D-core, which essentially can be seen as a measure of the robustness of a community under degeneracy, we devise a wealth of novel metrics used to evaluate graph collaboration features of directed graphs. We applied the D-core approach on large synthetic and real-world graphs such as Wikipedia, DBLP, and ArXiv and report interesting results at the graph as well at the node level.  相似文献   

16.
Given a large directed graph, rapidly answering reachability queries between source and target nodes is an important problem. Existing methods for reachability tradeoff indexing time and space versus query time performance. However, the biggest limitation of existing methods is that they do not scale to very large real-world graphs. We present a simple yet scalable reachability index, called GRAIL, that is based on the idea of randomized interval labeling and that can effectively handle very large graphs. Based on an extensive set of experiments, we show that while more sophisticated methods work better on small graphs, GRAIL is the only index that can scale to millions of nodes and edges. GRAIL has linear indexing time and space, and the query time ranges from constant time to being linear in the graph order and size. Our reference C++ implementations are open source and available for download at http://www.code.google.com/p/grail/.  相似文献   

17.
The Contractibility problem takes as input two graphs G and H, and the task is to decide whether H can be obtained from G by a sequence of edge contractions. The Induced Minor and Induced Topological Minor problems are similar, but the first allows both edge contractions and vertex deletions, whereas the latter allows only vertex deletions and vertex dissolutions. All three problems are NP-complete, even for certain fixed graphs H. We show that these problems can be solved in polynomial time for every fixed H when the input graph G is chordal. Our results can be considered tight, since these problems are known to be W[1]-hard on chordal graphs when parameterized by the size of H. To solve Contractibility and Induced Minor, we define and use a generalization of the classic Disjoint Paths problem, where we require the vertices of each of the k paths to be chosen from a specified set. We prove that this variant is NP-complete even when k=2, but that it is polynomial-time solvable on chordal graphs for every fixed k. Our algorithm for Induced Topological Minor is based on another generalization of Disjoint Paths called Induced Disjoint Paths, where the vertices from different paths may no longer be adjacent. We show that this problem, which is known to be NP-complete when k=2, can be solved in polynomial time on chordal graphs even when k is part of the input. Our results fit into the general framework of graph containment problems, where the aim is to decide whether a graph can be modified into another graph by a sequence of specified graph operations. Allowing combinations of the four well-known operations edge deletion, edge contraction, vertex deletion, and vertex dissolution results in the following ten containment relations: (induced) minor, (induced) topological minor, (induced) subgraph, (induced) spanning subgraph, dissolution, and contraction. Our results, combined with existing results, settle the complexity of each of the ten corresponding containment problems on chordal graphs.  相似文献   

18.
Zeev Nutov 《Algorithmica》2014,70(2):340-364
We consider Degree Constrained Survivable Network problems. For the directed Degree Constrained k -Edge-Outconnected Subgraph problem, we slightly improve the best known approximation ratio, by a simple proof. Our main contribution is giving a framework to handle node-connectivity degree constrained problems with the iterative rounding method. In particular, for the degree constrained versions of the Element-Connectivity Survivable Network problem on undirected graphs, and of the k -Outconnected Subgraph problem on both directed and undirected graphs, our algorithm computes a solution J of cost O(logk) times the optimal, with degrees O(2 k )?b(v). Similar result are obtained for the k -Connected Subgraph problem. The latter improves on the only degree approximation O(klogn)?b(v) in O(n k ) time on undirected graphs by Feder, Motwani, and Zhu.  相似文献   

19.
Conventional data warehouses employ the query-at-a-time model, which maps each query to a distinct physical plan. When several queries execute concurrently, this model introduces contention and thrashing, because the physical plans??unaware of each other??compete for access to the underlying I/O and computation resources. As a result, while modern systems can efficiently optimize and evaluate a single complex data analysis query, their performance suffers significantly and can be highly erratic when multiple complex queries run at the same time. We present in this paper Cjoin, a new design that substantially improves throughput in large-scale data analytics systems processing many concurrent join queries. In contrast to the conventional query-at-a-time model our approach employs a single physical plan that shares I/O, computation, and tuple storage across all in-flight join queries. We use an ??always on?? pipeline of non-blocking operators, managed by a controller that continuously examines the current query mix and optimizes the pipeline on the fly. Our design enables data analytics engines to scale gracefully to large data sets, provide predictable execution times, and reduce contention. We implemented Cjoin as an extension to the PostgreSQL DBMS. This prototype outperforms conventional commercial systems by an order of magnitude for tens to hundreds of concurrent queries.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号