首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Due to its wide applications, subgraph query has attracted lots of attentions in database community. In this paper, we focus on subgraph query over a single large graph G, i.e., finding all embeddings of query Q in G. Different from existing feature-based approaches, we map all edges into a two-dimensional space R 2 and propose a bitmap structure to index R 2. At run time, we find a set of adjacent edge pairs (AEP) or star-style patterns (SSP) to cover Q. We develop edge join (EJ) algorithms to address both AEP and SSP subqueries. Based on the bitmap index, our method can optimize I/O and CPU cost. More importantly, our index has the linear space complexity instead of exponential complexity in feature-based approaches, which indicates that our index can scale well with respect to large data size. Furthermore, our index has light maintenance overhead, which has not been considered in most of existing work. Extensive experiments show that our method significantly outperforms existing ones in both online and offline processing with respect to query response time, index building time, index size and index maintenance overhead.  相似文献   

2.
The distance graph G(n, 2, 1) is a graph where vertices are identified with twoelement subsets of {1, 2,..., n}, and two vertices are connected by an edge whenever the corresponding subsets have exactly one common element. A random subgraph G p (n, 2, 1) in the Erd?os–Rényi model is obtained by selecting each edge of G(n, 2, 1) with probability p independently of other edges. We find a lower bound on the independence number of the random subgraph G1/2(n, 2, 1).  相似文献   

3.
A k-core of a graph is a maximal connected subgraph in which every vertex is connected to at least k vertices in the subgraph. k-core decomposition is often used in large-scale network analysis, such as community detection, protein function prediction, visualization, and solving NP-hard problems on real networks efficiently, like maximal clique finding. In many real-world applications, networks change over time. As a result, it is essential to develop efficient incremental algorithms for dynamic graph data. In this paper, we propose a suite of incremental k-core decomposition algorithms for dynamic graph data. These algorithms locate a small subgraph that is guaranteed to contain the list of vertices whose maximum k-core values have changed and efficiently process this subgraph to update the k-core decomposition. We present incremental algorithms for both insertion and deletion operations, and propose auxiliary vertex state maintenance techniques that can further accelerate these operations. Our results show a significant reduction in runtime compared to non-incremental alternatives. We illustrate the efficiency of our algorithms on different types of real and synthetic graphs, at varying scales. For a graph of 16 million vertices, we observe relative throughputs reaching a million times, relative to the non-incremental algorithms.  相似文献   

4.
Aggregate similarity search, also known as aggregate nearest-neighbor (Ann) query, finds many useful applications in spatial and multimedia databases. Given a group Q of M query objects, it retrieves from a database the objects most similar to Q, where the similarity is an aggregation (e.g., \({{\mathrm{sum}}}\), \(\max \)) of the distances between each retrieved object p and all the objects in Q. In this paper, we propose an added flexibility to the query definition, where the similarity is an aggregation over the distances between p and any subset of \(\phi M\) objects in Q for some support \(0< \phi \le 1\). We call this new definition flexible aggregate similarity search and accordingly refer to a query as a flexible aggregate nearest-neighbor ( Fann ) query. We present algorithms for answering Fann queries exactly and approximately. Our approximation algorithms are especially appealing, which are simple, highly efficient, and work well in both low and high dimensions. They also return near-optimal answers with guaranteed constant-factor approximations in any dimensions. Extensive experiments on large real and synthetic datasets from 2 to 74 dimensions have demonstrated their superior efficiency and high quality.  相似文献   

5.
Automatic annotation is an essential technique for effectively handling and organizing Web objects (e.g., Web pages), which have experienced an unprecedented growth over the last few years. Automatic annotation is usually formulated as a multi-label classification problem. Unfortunately, labeled data are often time-consuming and expensive to obtain. Web data also accommodate much richer feature space. This calls for new semi-supervised approaches that are less demanding on labeled data to be effective in classification. In this paper, we propose a graph-based semi-supervised learning approach that leverages random walks and ? 1 sparse reconstruction on a mixed object-label graph with both attribute and structure information for effective multi-label classification. The mixed graph contains an object-affinity subgraph, a label-correlation subgraph, and object-label edges with adaptive weight assignments indicating the assignment relationships. The object-affinity subgraph is constructed using ? 1 sparse graph reconstruction with extracted structural meta-text, while the label-correlation subgraph captures pairwise correlations among labels via linear combination of their co-occurrence similarity and kernel-based similarity. A random walk with adaptive weight assignment is then performed on the constructed mixed graph to infer probabilistic assignment relationships between labels and objects. Extensive experiments on real Yahoo! Web datasets demonstrate the effectiveness of our approach.  相似文献   

6.
Given a road network G = (V,E), where V (E) denotes the set of vertices(edges) in G, a set of points of interest P and a query point q residing in G, the reverse furthest neighbors (Rfn R ) query in road networks fetches a set of points pP that take q as their furthest neighbor compared with all points in P ∪ {q}. This is the monochromatic Rfn R (Mrfn R ) query. Another interesting version of Rfn R query is the bichromatic reverse furthest neighbor (Brfn R ) query. Given two sets of points P and Q, and a query point qQ, a Brfn R query fetches a set of points pP that take q as their furthest neighbor compared with all points in Q. This paper presents efficient algorithms for both Mrfn R and Brfn R queries, which utilize landmarks and partitioning-based techniques. Experiments on real datasets confirm the efficiency and scalability of proposed algorithms.  相似文献   

7.
We address efficient processing of SPARQL queries over RDF datasets. The proposed techniques, incorporated into the gStore system, handle, in a uniform and scalable manner, SPARQL queries with wildcards and aggregate operators over dynamic RDF datasets. Our approach is graph based. We store RDF data as a large graph and also represent a SPARQL query as a query graph. Thus, the query answering problem is converted into a subgraph matching problem. To achieve efficient and scalable query processing, we develop an index, together with effective pruning rules and efficient search algorithms. We propose techniques that use this infrastructure to answer aggregation queries. We also propose an effective maintenance algorithm to handle online updates over RDF repositories. Extensive experiments confirm the efficiency and effectiveness of our solutions.  相似文献   

8.
In the Intervalizing Coloured Graphs problem, one must decide for a given graph G = (V, E) with a proper vertex colouring of G whether G is the subgraph of a properly coloured interval graph. For the case that the number of colors is fixed, we give an exact algorithm that uses \(2^{\mathcal {O}(n/\log n)}\) time. We also give an \(\mathcal {O}^{\ast }(2^{n})\) algorithm for the case that the number of colors is not fixed.  相似文献   

9.
Tree patterns represent important fragments of XPath. In this paper, we show that some classes \({\mathcal{C}}\) of tree patterns exhibit such a property that, given a finite number of compatible tree patterns \({P_1, \ldots, P_n\in \mathcal{C}}\), there exists another pattern P such that P 1, . . . , P n are all contained in P, and for any tree pattern \({Q\in \mathcal{C}}\), P 1, . . . , P n are all contained in Q if and only if P is contained in Q. We experimentally demonstrate that the pattern P is usually much smaller than P 1, . . . , P n combined together. Using the existence of P above, we show that testing whether a tree pattern, P, is contained in another, \({Q\in \mathcal{C}}\), under an acyclic schema graph G, can be reduced to testing whether P G , a transformed version of P, is contained in Q without any schema graph, provided that the distinguished node of P is not labeled *. We then show that, under G, the maximal contained rewriting (MCR) of a tree pattern Q using a view V can be found by finding the MCR of Q using V G without G, when there are no *-nodes on the distinguished path of V and no *-nodes in Q.  相似文献   

10.
Resource-conscious technologies for cutting sheet material include the ICP and ECP technologies that allow for aligning fragments of the contours of cutouts. In this work, we show the mathematical model for the problem of cutting out parts with these technologies and algorithms for finding cutting tool routes that satisfy technological constraints. We give a solution for the problem of representing a cutting plan as a plane graph G = (V,F,E), which is a homeomorphic image of the cutting plan. This has let us formalize technological constraints on the trajectory of cutting the parts according to the cutting plan and propose a series of algorithms for constructing a route in the graph G = (V,F,E), which is an image of an admissible trajectory. Using known coordinates of the preimages of vertices of graph G = (V,F,E) and the locations of fragments of the cutting plan that are preimages of edges of graph G = (V,F,E), the resulting route in the graph G = (V,E) can be interpreted as the cutting tool’s trajectory.The proposed algorithms for finding routes in a connected graph G have polynomial computational complexity. To find the optimal route in an unconnected graph G, we need to solve, for every dividing face f of graph G, a travelling salesman problem on the set of faces incident to f.  相似文献   

11.
Navigational features have been largely recognized as fundamental for graph database query languages. This fact has motivated several authors to propose RDF query languages with navigational capabilities. In this paper, we propose the query language nSPARQL that uses nested regular expressions to navigate RDF data. We study some of the fundamental properties of nSPARQL and nested regular expressions concerning expressiveness and complexity of evaluation. Regarding expressiveness, we show that nSPARQL is expressive enough to answer queries considering the semantics of the RDFS vocabulary by directly traversing the input graph. We also show that nesting is necessary in nSPARQL to obtain this last result, and we study the expressiveness of the combination of nested regular expressions and SPARQL operators. Regarding complexity of evaluation, we prove that given an RDF graph G and a nested regular expression E, this problem can be solved in time O(|G||E|).  相似文献   

12.
The results for the corona P n ?°?P1 are generalized, which make it possible to state that P n ?°?P1 is not an ( a, d)-distance antimagic graph for arbitrary values of a and d. A condition for the existence of an ( a, d)-distance antimagic labeling of a hypercube Q n is obtained. Functional dependencies are found that generate this type of labeling for Q n . It is proved by the method of mathematical induction that Q n is a (2 n ?+?n???1,?n???2) -distance antimagic graph. Three types of graphs are defined that do not allow a 1-vertex bimagic vertex labeling. A relation between a distance magic labeling of a regular graph G and a 1-vertex bimagic vertex labeling of G?∪?G is established.  相似文献   

13.
Cloud computing is increasingly being seen as a way to reduce infrastructure costs and add elasticity, and is being used by a wide range of organizations. Cloud data management systems today need to serve a range of different workloads, from analytical read-heavy workloads to transactional (OLTP) workloads. For both the service providers and the users, it is critical to minimize the consumption of resources like CPU, memory, communication bandwidth, and energy, without compromising on service-level agreements if any. In this article, we develop a workload-aware data placement and replication approach, called SWORD, for minimizing resource consumption in such an environment. Specifically, we monitor and model the expected workload as a hypergraph and develop partitioning techniques that minimize the average query span, i.e., the average number of machines involved in the execution of a query or a transaction. We empirically justify the use of query span as the metric to optimize, for both analytical and transactional workloads, and develop a series of replication and data placement algorithms by drawing connections to several well-studied graph theoretic concepts. We introduce a suite of novel techniques to achieve high scalability by reducing the overhead of partitioning and query routing. To deal with workload changes, we propose an incremental repartitioning technique that modifies data placement in small steps without resorting to complete repartitioning. We propose the use of fine-grained quorums defined at the level of groups of data items to control the cost of distributed updates, improve throughput, and adapt to different workloads. We empirically illustrate the benefits of our approach through a comprehensive experimental evaluation for two classes of workloads. For analytical read-only workloads, we show that our techniques result in significant reduction in total resource consumption. For OLTP workloads, we show that our approach improves transaction latencies and overall throughput by minimizing the number of distributed transactions.  相似文献   

14.
In the Fixed Cost k-Flow problem, we are given a graph G = (V, E) with edge-capacities {u e eE} and edge-costs {c e eE}, source-sink pair s, tV, and an integer k. The goal is to find a minimum cost subgraph H of G such that the minimum capacity of an st-cut in H is at least k. By an approximation-preserving reduction from Group Steiner Tree problem to Fixed Cost k-Flow, we obtain the first polylogarithmic lower bound for the problem; this also implies the first non-constant lower bounds for the Capacitated Steiner Network and Capacitated Multicommodity Flow problems. We then consider two special cases of Fixed Cost k-Flow. In the Bipartite Fixed-Cost k-Flow problem, we are given a bipartite graph G = (AB, E) and an integer k > 0. The goal is to find a node subset S ? AB of minimum size |S| such G has k pairwise edge-disjoint paths between SA and SB. We give an \(O(\sqrt {k\log k})\) approximation for this problem. We also show that we can compute a solution of optimum size with Ω(k/polylog(n)) paths, where n = |A| + |B|. In the Generalized-P2P problem we are given an undirected graph G = (V, E) with edge-costs and integer charges {b v : vV}. The goal is to find a minimum-cost spanning subgraph H of G such that every connected component of H has non-negative charge. This problem originated in a practical project for shift design [11]. Besides that, it generalizes many problems such as Steiner Forest, k-Steiner Tree, and Point to Point Connection. We give a logarithmic approximation algorithm for this problem. Finally, we consider a related problem called Connected Rent or Buy Multicommodity Flow and give a log3+?? n approximation scheme for it using Group Steiner Tree techniques.  相似文献   

15.
A grid graph \(G_{\mathrm{g}}\) is a finite vertex-induced subgraph of the two-dimensional integer grid \(G^\infty \). A rectangular grid graph R(mn) is a grid graph with horizontal size m and vertical size n. A rectangular grid graph with a rectangular hole is a rectangular grid graph R(mn) such that a rectangular grid subgraph R(kl) is removed from it. The Hamiltonian path problem for general grid graphs is NP-complete. In this paper, we give necessary conditions for the existence of a Hamiltonian path between two given vertices in an odd-sized rectangular grid graph with a rectangular hole. In addition, we show that how such paths can be computed in linear time.  相似文献   

16.
The advancement of World Wide Web has revolutionized the way the manufacturers can do business. The manufacturers can collect customer preferences for products and product features from their sales and other product-related Web sites to enter and sustain in the global market. For example, the manufactures can make intelligent use of these customer preference data to decide on which products should be selected for targeted marketing. However, the selected products must attract as many customers as possible to increase the possibility of selling more than their respective competitors. This paper addresses this kind of product selection problem. That is, given a database of existing products P from the competitors, a set of company’s own products Q, a dataset C of customer preferences and a positive integer k, we want to find k-most promising products (k-MPP) from Q with maximum expected number of total customers for targeted marketing. We model k-MPP query and propose an algorithmic framework for processing such query and its variants. Our framework utilizes grid-based data partitioning scheme and parallel computing techniques to realize k-MPP query. The effectiveness and efficiency of the framework are demonstrated by conducting extensive experiments with real and synthetic datasets.  相似文献   

17.
Because of its wide application, the subgraph matching problem has been studied extensively during the past decade. However, most existing solutions assume that a data graph is a vertex/edge-labeled graph (i.e., each vertex/edge has a simple label). These solutions build structural indices by considering the vertex labels. However, some real graphs contain rich-content vertices such as user profiles in social networks and HTML pages on the World Wide Web. In this study, we consider the problem of subgraph matching using a more general scenario. We build a structural index that does not depend on any vertex content. Based on the index, we design a holistic subgraph matching algorithm that considers the query graph as a whole and finds one match at a time. In order to further improve efficiency, we propose a “partial evaluation and assembly” framework to find subgraph matches over large graphs. Last but not least, our index has light maintenance overhead. Therefore, our method can work well on dynamic graphs. Extensive experiments on real graphs show that our method outperforms the state-of-the-art algorithms.  相似文献   

18.
The k-truss of a graph is the largest edge-induced subgraph such that every edge is contained in at least k triangles within the subgraph, where a triangle is a cycle consisting of three vertices. As a new notion of cohesive subgraphs, truss has recently attracted a lot of research attentions in the database and data mining fields. At the same time, uncertainty is an intrinsic property of massive graph data, and truss decomposition (i.e., finding all k-trusses of a graph) has become a key primitive on uncertain graphs. In this paper, we study the truss decomposition problem on uncertain graphs, that is, finding all highly probable k-trusses of an uncertain graph. We first give an formal statement of the truss decomposition problem on uncertain graphs. Then, we prove that the truss decomposition of an uncertain graph attains two elegant properties, namely uniqueness and hierarchy. We show that the truss decomposition of an uncertain graph can be found in \(O(m^{1.5}Q)\) time by proposing an in-memory algorithm called \(\mathtt {TD_{mem}}\), where m is the number of edges of the uncertain graph, and Q is at most the maximum number of common neighbors of the endpoints of an edge. When an uncertain graph is too large to fit into main memory, we propose an external-memory algorithm \(\mathtt {TD_{I/O}}\) to find the truss decomposition of the uncertain graph. Extensive experiments have been carried out to evaluate the practical performance of the proposed algorithms. The experimental results verify that both \(\mathtt {TD_{mem}}\) and \(\mathtt {TD_{I/O}}\) are efficient when an uncertain graph is small enough to fit into main memory, and that \(\mathtt {TD_{I/O}}\) is much faster than \(\mathtt {TD_{mem}}\) when the graph is too large to fit into main memory.  相似文献   

19.
Existing work of XML keyword search focus on how to find relevant and meaningful data fragments for a query, assuming each keyword is intended as part of it. However, in XML keyword search, user queries usually contain irrelevant or mismatched terms, typos etc, which may easily lead to empty or meaningless results. In this paper, we introduce the problem of content-aware XML keyword query refinement, where the search engine should judiciously decide whether a user query Q needs to be refined during the processing of Q, and find a list of promising refined query candidates which guarantee to have meaningful matching results over the XML data, without any user interaction or a second try. To achieve this goal, we build a novel content-aware XML keyword query refinement framework consisting of two core parts: (1) we build a query ranking model to evaluate the quality of a refined query RQ, which captures the morphological/semantical similarity between Q and RQ and the dependency of keywords of RQ over the XML data; (2) we integrate the exploration of RQ candidates and the generation of their matching results as a single problem, which is fulfilled within a one-time scan of the related keyword inverted lists optimally. Finally, an extensive empirical study verifies the efficiency and effectiveness of our framework.  相似文献   

20.
Given a real world graph, how can we find a large subgraph whose partition quality is much better than the original? How can we use a partition of that subgraph to discover a high quality global partition? Although graph partitioning especially with balanced sizes has received attentions in various applications, it is known NP-hard, and also known that there is no good cut at a large scale for real graphs. In this paper, we propose a novel approach for graph partitioning. Our first focus is on finding a large subgraph with high quality partitions, in terms of conductance. Despite the difficulty of the task for the whole graph, we observe that there is a large connected subgraph whose partition quality is much better than the original. Our proposed method MTP finds such a subgraph by removing “hub” nodes with large degrees, and taking the remaining giant connected component. Further, we extend MTP to gb MTP (Global Balanced MTP) for discovering a global balanced partition. gb MTP attaches the excluded nodes in MTP to the partition found by MTP in a greedy way. In experiments, we demonstrate that MTP finds a subgraph of a large size with low conductance graph partitions, compared with competing methods. We also show that the competitors cannot find connected subgraphs while our method does, by construction. This improvement in partition quality for the subgraph is especially noticeable for large scale cuts—for a balanced partition, down to 14 % of the original conductance with the subgraph size 70 % of the total. As a result, the found subgraph has clear partitions at almost all scales compared with the original. Moreover, gb MTP generally discovers global balanced partitions whose conductance are lower than those found by METIS, the state-of-the-art graph partitioning method.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号