Efficient processing of label-constraint reachability queries in large graphs |
| |
Affiliation: | 1. AI Platform Department, Tencent Inc., Shenzhen, China;2. Web Data Mining Department, Baidu Inc., Shenzhen, China;3. Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong, China;4. Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA, USA |
| |
Abstract: | In this paper, we study a variant of reachability queries, called label-constraint reachability (LCR) queries. Specifically, given a label set S and two vertices u1 and u2 in a large directed graph G, we check the existence of a directed path from u1 to u2, where edge labels along the path are a subset of S. We propose the path-label transitive closure method to answer LCR queries. Specifically, we t4ransform an edge-labeled directed graph into an augmented DAG by replacing the maximal strongly connected components as bipartite graphs. We also propose a Dijkstra-like algorithm to compute path-label transitive closure by re-defining the “distance” of a path. Comparing with the existing solutions, we prove that our method is optimal in terms of the search space. Furthermore, we propose a simple yet effective partition-based framework (local path-label transitive closure+online traversal) to answer LCR queries in large graphs. We prove that finding the optimal graph partition to minimize query processing cost is a NP-hard problem. Therefore, we propose a sampling-based solution to find the sub-optimal partition. Moreover, we address the index maintenance issues to answer LCR queries over the dynamic graphs. Extensive experiments confirm the superiority of our method. |
| |
Keywords: | Graph database Reachability query |
本文献已被 ScienceDirect 等数据库收录! |
|