首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基于最小生成树的图数据库索引算法   总被引:1,自引:0,他引:1  
李楠  高宏  李建中 《软件学报》2009,20(Z1):144-153
对复杂数据进行图模式建模近几年越来越流行,因此,在查询执行的优化过程中图索引技术变得至关重要.研究了图模式的索引问题,并且提出了一种近似的索引方法,称为MSTA方法.MSTA方法利用最小生成树结构作为索引特征,依据最小生成树边序列的包含关系和基于最大公共子图的图距离度量,将最小生成树组织到一个称为MST树的索引结构中.MST树索引结构可以高效地支持多种查询,例如子图查询.MSTA方法具备高效的索引性能.在索引大小和索引建立时间方面,传统方法是MSTA方法的数十倍,甚至上百倍.MSTA方法虽然不能返回完整结果,但是可以返回经图距离度量排序最好的部分结果.  相似文献   

2.
The increasing popularity of graph data in various domains has lead to a renewed interest in developing efficient graph matching techniques, especially for processing large graphs. In this paper, we study the problem of approximate graph matching in a large attributed graph. Given a large attributed graph and a query graph, we compute a subgraph of the large graph that best matches the query graph. We propose a novel structure-aware and attribute-aware index to process approximate graph matching in a large attributed graph. We first construct an index on the similarity of the attributed graph, by partitioning the large search space into smaller subgraphs based on structure similarity and attribute similarity. Then, we construct a connectivity-based index to give a concise representation of inter-partition connections. We use the index to find a set of best matching paths. From these best matching paths, we compute the best matching answer graph using a greedy algorithm. Experimental results on real datasets demonstrate the efficiency of both index construction and query processing. We also show that our approach attains high-quality query answers.  相似文献   

3.
We propose a new way of indexing a large database of small and medium-sized graphs and processing exact subgraph matching (or subgraph isomorphism) and approximate (full) graph matching queries. Rather than decomposing a graph into smaller units (e.g., paths, trees, graphs) for indexing purposes, we represent each graph in the database by its graph signature, which is essentially a multiset. We construct a disk-based index on all the signatures via bulk loading. During query processing, a query graph is also mapped into its signature, and this signature is searched using the index by performing multiset operations. To improve the precision of exact subgraph matching, we develop a new scheme using the concept of line graphs. Through extensive evaluation on real and synthetic graph datasets, we demonstrate that our approach provides a scalable and efficient disk-based solution for a large database of small and medium-sized graphs.  相似文献   

4.
5.
This paper proposes a graph indexing technique for processing constrained spatial queries and discusses the application of such a technique to road map databases where the graph topology is relatively stationary. The fundamental idea of our technique is to augment the original graph with selected augmented links so that query processing cost, especially I/O cost, is minimized. Based on the computational results derived from the probabilistic analysis, we found that the proposed graph indexing technique is a promising approach for significantly reducing costs of spatial queries.Scope and purposeSpatial data is found in geographic information systems where data attributes are associated with nodes and links in directed graphs. Queries on spatial data are generally expensive because of the recursive nature of spatial data traversal. We propose a graph indexing technique to expedite queries on spatial data. The graph index is an instrument for early identification of the relevant nodes and links to the query so that repeated accesses to the same data pages can be eliminated. This paper presents the graph indexing technique in the context of road map databases and shows that the graph indexing technique can improve significantly on the efficiency of constrained queries on spatial data.  相似文献   

6.
Retrieving 2D shapes using caterpillar decomposition   总被引:1,自引:0,他引:1  
Graphs provide effective data structures modeling complex relations and schemaless data such as images, XML documents, circuits, compounds, and proteins. Given a query graph, finding sufficiently similar database graphs without performing a sequential search is an important problem arising in different domains. In this paper, we propose a new method for indexing tree structures based on a graph-theoretic concept called caterpillar decomposition. Our algorithm starts by representing each tree along with its subtrees in the geometric space using its caterpillar decomposition. After representing the query in the same fashion, similar database trees are retrieved efficiently by means of nearest neighbor searches. We have successfully evaluated the proposed algorithm on two shape databases and include a set of perturbation experiments that establish the algorithm’s robustness to noise. We have also shown that the approach compares favorably to previous approaches for shape retrieval on these datasets.  相似文献   

7.
Graphs have been widely used for complex data representation in many real applications, such as social network, bioinformatics, and computer vision. Therefore, graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. The graph similarity join problem studied in this paper is based on graph edit distance constraints. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop tree based indexing method. As for each candidate pair, we propose a similarity computation algorithm with boundary filtering, which can be applied with good efficiency and effectiveness. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time.  相似文献   

8.
Conventional keyword search engines are restricted to a given data model and cannot easily adapt to unstructured, semi-structured or structured data. In this paper, we propose an efficient and adaptive keyword search method, called EASE, for indexing and querying large collections of heterogeneous data. To achieve high efficiency in processing keyword queries, we first model unstructured, semi-structured and structured data as graphs, and then summarize the graphs and construct graph indices instead of using traditional inverted indices. We propose an extended inverted index to facilitate keyword-based search, and present a novel ranking mechanism for enhancing search effectiveness. We have conducted an extensive experimental study using real datasets, and the results show that EASE achieves both high search efficiency and high accuracy, and outperforms the existing approaches significantly.  相似文献   

9.
Building k-connected neighborhood graphs for isometric data embedding   总被引:2,自引:0,他引:2  
Isometric data embedding using geodesic distance requires the construction of a connected neighborhood graph so that the geodesic distance between every pair of data points can be estimated. This paper proposes an approach for constructing k-connected neighborhood graphs. The approach works by applying a greedy algorithm to add each edge, in a nondecreasing order of edge length, to a neighborhood graph if end vertices of the edge are not yet k-connected on the graph. The k-connectedness between vertices is tested using a network flow technique by assigning every vertex a unit flow capacity. This approach is applicable to a wide range of data. Experiments show that it gives better estimation of geodesic distances than other approaches, especially when the data are undersampled or nonuniformly distributed.  相似文献   

10.
This paper studies the problem of processing supergraph queries, that is, given a database containing a set of graphs, find all the graphs in the database of which the query graph is a supergraph. Existing works usually construct an index and performs a filtering-and-verification process, which still requires many subgraph isomorphism testings. There are also significant overheads in both index construction and maintenance. In this paper, we design a graph querying system that achieves both fast indexing and efficient query processing. The index is constructed by a simple but fast method of extracting the commonality among the graphs, which does not involve any costly operation such as graph mining. Our query processing has two key techniques, direct inclusion and filtering. Direct inclusion allows partial query answers to be included directly without candidate verification. Our filtering technique further reduces the candidate set by operating on a much smaller projected database. Experimental results show that our method is significantly more efficient than the existing works in both indexing and query processing, and our index has a low maintenance cost.  相似文献   

11.
With ever growing databases containing multimedia data, indexing has become a necessity to avoid a linear search. We propose a novel technique for indexing multimedia databases in which entries can be represented as graph structures. In our method, the topological structure of a graph as well as that of its subgraphs are represented as vectors whose components correspond to the sorted laplacian eigenvalues of the graph or subgraphs. Given the laplacian spectrum of graph G, we draw from recently developed techniques in the field of spectral integral variation to generate the laplacian spectrum of graph G+e without computing its eigendecomposition, where G+e is a graph obtained by adding edge e to graph G. This process improves the performance of the system for generating the subgraph signatures for 1.8% and 6.5% in datasets of size 420 and 1400, respectively. By doing a nearest neighbor search around the query spectra, similar but not necessarily isomorphic graphs are retrieved. Given a query graph, a voting schema ranks database graphs into an indexing hypothesis to which a final matching process can be applied. The novelties of the proposed method come from the powerful representation of the graph topology and successfully adopting the concept of spectral integral variation in an indexing algorithm. To examine the fitness of the new indexing framework, we have performed a number of experiments using an extensive set of recognition trials in the domain of 2D and 3D object recognition. The experiments, including a comparison with a competing indexing method using two different graph-based object representations, demonstrate both the robustness and efficacy of the overall approach.  相似文献   

12.
Polyhedral object recognition by indexing   总被引:1,自引:0,他引:1  
Radu  Humberto 《Pattern recognition》1995,28(12):1855-1870
In computer vision, the indexing problem is the problem of recognizing a few objects in a large database of objects while avoiding the help of the classical image-feature-to-object-feature matching paradigm. In this paper we address the problem of recognizing three-dimensional (3-D) polyhedral objects from 2-D images by indexing. Both the objects to be recognized and the images are represented by weighted graphs. The indexing problem is therefore the problem of determining whether a graph extracted from the image is present or absent in a database of model graphs. We introduce a novel method for performing this graph indexing process which is based both on polynomial characterization of binary and weighted graphs and on hashing. We describe in detail this polynomial characterization and then we show how it can be used in the context of polyhedral object recognition. Next we describe a practical recognition-by-indexing system that includes the organization of the database, the representation of polyhedral objects in terms of 2-D characteristic views, the representation of this views in terms of weighted graphs and the associated image processing. Finally, some experimental results allow the evaluation of the system performance.  相似文献   

13.
Querying graph data is a fundamental problem that witnesses an increasing interest especially for massive graph databases which come as a promising alternative to relational databases for big data modeling. In this paper, we study the problem of subgraph isomorphism search which consists to enumerate the embedding of a query graph in a data graph. The most known solutions of this NP-complete problem are backtracking-based and result in a high computational cost when we deal with massive graph databases. We address this problem and its challenges via graph compression with modular decomposition. In our approach, subgraph isomorphism search is performed on compressed graphs without decompressing them yielding substantial reduction of the search space and consequently a significant saving in processing time as well as in storage space for the graphs. We evaluated our algorithms on nine real-word datasets. The experimental results show that our approach is efficient and scalable.  相似文献   

14.
Graph-based semi-supervised classification depends on a well-structured graph. However, it is difficult to construct a graph that faithfully reflects the underlying structure of data distribution, especially for data with a high dimensional representation. In this paper, we focus on graph construction and propose a novel method called semi-supervised ensemble classification in subspaces, SSEC in short. Unlike traditional methods that execute graph-based semi-supervised classification in the original space, SSEC performs semi-supervised linear classification in subspaces. More specifically, SSEC first divides the original feature space into several disjoint feature subspaces. Then, it constructs a neighborhood graph in each subspace, and trains a semi-supervised linear classifier on this graph, which will serve as the base classifier in an ensemble. Finally, SSEC combines the obtained base classifiers into an ensemble classifier using the majority-voting rule. Experimental results on facial images classification show that SSEC not only has higher classification accuracy than the competitive methods, but also can be effective in a wide range of values of input parameters.  相似文献   

15.
16.
This paper studies continuous pattern detection over large evolving graphs, which plays an important role in monitoring-related applications. The problem is challenging due to the large size and dynamic updates of graphs, the massive search space of pattern detection and inconsistent query results on dynamic graphs. This paper first introduces a snapshot isolation requirement, which ensures that the query results come from a consistent graph snapshot instead of a mixture of partial evolving graphs. Second, we propose an SSD (single sink directed acyclic graph) plan friendly to vertex-centric-distributed graph processing frameworks. SSD plan can guide the message transformation and transfer among graph vertices, and determine the satisfaction of the pattern on graph vertices for the sink vertex. Third, we devise strategies for major steps in the SSD evaluation, including the location of valid messages to achieve snapshot isolation, AO-List to determine the satisfaction of transition rule over dynamic graph, and message-on-change policy to reduce outgoing messages. The experiments on billion-edge graphs using Giraph, an open source implementation of Pregel, illustrate the efficiency and effectiveness of our method.  相似文献   

17.
Zhang  Fan  Zou  Lei  Zeng  Li  Gou  Xiangyang 《World Wide Web》2020,23(2):873-903

A streaming graph is a graph formed by a sequence of incoming edges with time stamps. Unlike the static graphs, the streaming graph is highly dynamic and time-related. Streaming graphs in the real world, which are of the high volume and velocity, can be challenging to the classic graph data structures: data of internet traffic, social network communication, and financial transections, etc. The traditional graph storage models like the adjacency matrix and the adjacency list are no longer sufficient for the large amount data and high frequency updates. And most the streaming graph structures are only supports the specific graph algorithms. Here a new data structure is presented to meet the challenge: a double orthogonal list in hash table (Dolha) as a high speed and high memory efficiency graph structure. Dolha has constant time cost for single edge processing, and near-linear space cost. Moreover, time cost for neighborhood queries in Dolha is linear, which enables it to support most algorithms of graphs without extra cost. A persistent structure based on Dolha is also presented, to handle the sliding window update and time related queries.

  相似文献   

18.
Learning hash functions/codes for similarity search over multi-view data is attracting increasing attention, where similar hash codes are assigned to the data objects characterizing consistently neighborhood relationship across views. Traditional methods in this category inherently suffer three limitations: 1) they commonly adopt a two-stage scheme where similarity matrix is first constructed, followed by a subsequent hash function learning; 2) these methods are commonly developed on the assumption that data samples with multiple representations are noise-free,which is not practical in real-life applications; and 3) they often incur cumbersome training model caused by the neighborhood graph construction using all N points in the database (O(N)). In this paper, we motivate the problem of jointly and efficiently training the robust hash functions over data objects with multi-feature representations which may be noise corrupted. To achieve both the robustness and training efficiency, we propose an approach to effectively and efficiently learning low-rank kernelized1 hash functions shared across views. Specifically, we utilize landmark graphs to construct tractable similarity matrices in multi-views to automatically discover neighborhood structure in the data. To learn robust hash functions, a latent low-rank kernel function is used to construct hash functions in order to accommodate linearly inseparable data. In particular, a latent kernelized similarity matrix is recovered by rank minimization on multiple kernel-based similarity matrices. Extensive experiments on real-world multi-view datasets validate the efficacy of our method in the presence of error corruptions.We use kernelized similarity rather than kernel, as it is not a squared symmetric matrix for data-landmark affinity matrix.  相似文献   

19.
Graph structure is vital to graph based semi-supervised learning. However, the problem of constructing a graph that reflects the underlying data distribution has been seldom investigated in semi-supervised learning, especially for high dimensional data. In this paper, we focus on graph construction for semi-supervised learning and propose a novel method called Semi-Supervised Classification based on Random Subspace Dimensionality Reduction, SSC-RSDR in short. Different from traditional methods that perform graph-based dimensionality reduction and classification in the original space, SSC-RSDR performs these tasks in subspaces. More specifically, SSC-RSDR generates several random subspaces of the original space and applies graph-based semi-supervised dimensionality reduction in these random subspaces. It then constructs graphs in these processed random subspaces and trains semi-supervised classifiers on the graphs. Finally, it combines the resulting base classifiers into an ensemble classifier. Experimental results on face recognition tasks demonstrate that SSC-RSDR not only has superior recognition performance with respect to competitive methods, but also is robust against a wide range of values of input parameters.  相似文献   

20.
In retrieval from image databases, evaluation of similarity, based both on the appearance of spatial entities and on their mutual relationships, depends on content representation based on attributed relational graphs. This kind of modeling entails complex matching and indexing, which presently prevents its usage within comprehensive applications. In this paper, we provide a graph-theoretical formulation for the problem of retrieval based on the joint similarity of individual entities and of their mutual relationships and we expound its implications on indexing and matching. In particular, we propose the usage of metric indexing to organize large archives of graph models, and we propose an original look-ahead method which represents an efficient solution for the (sub)graph error correcting isomorphism problem needed to compute object distances. Analytic comparison and experimental results show that the proposed lookahead improves the state-of-the-art in state-space search methods and that the combined use of the proposed matching and indexing scheme permits for the management of the complexity of a typical application of retrieval by spatial arrangement  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号