We propose a new way of indexing a large database of small and medium-sized graphs and processing exact subgraph matching (or subgraph isomorphism) and approximate (full) graph matching queries. Rather than decomposing a graph into smaller units (e.g., paths, trees, graphs) for indexing purposes, we represent each graph in the database by its graph signature, which is essentially a multiset. We construct a disk-based index on all the signatures via bulk loading. During query processing, a query graph is also mapped into its signature, and this signature is searched using the index by performing multiset operations. To improve the precision of exact subgraph matching, we develop a new scheme using the concept of line graphs. Through extensive evaluation on real and synthetic graph datasets, we demonstrate that our approach provides a scalable and efficient disk-based solution for a large database of small and medium-sized graphs.  相似文献   

This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O((nlogkn)/B) disk pages and finds all k-error matches with O((|P|+occ)/B+logknloglogBn) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require Ω(|P|+occ+poly(logn)) I/Os. The second index reduces the space to O((nlogn)/B) disk pages, and the I/O complexity is O((|P|+occ)/B+logk(k+1)nloglogn).  相似文献   

图匹配试图求解二图或多图之间节点的对应关系.在图像图形领域,图匹配是一个历久弥新的基础性问题.从优化的角度来看,图匹配问题是一个组合优化问题,且在一般情形下具有非确定性多项式复杂程度(non-deter-ministic polynomial, NP)难度的性质.在过去数十年间,出现了大量求解二图匹配的近似算法,并在各个领域得到了较为广泛的应用.然而,受限于优化问题本身的理论困难和实际应用中数据质量的种种限制,各二图匹配算法在匹配精度上的性能日益趋近饱和.相比之下,由于引入了更多信息且往往更符合实际问题的设定,多图的协同匹配则逐渐成为了一个新兴且重要的研究方向.本文首先介绍了经典的二图匹配方法,随后着重介绍近年来多图匹配方法的最新进展和相关工作.最后,本文讨论了图匹配未来的发展.  相似文献   

The problem of subgraph matching is one fundamental issue in graph search, which is NP-Complete problem. Recently, subgraph matching has become a popular research topic in the field of knowledge graph analysis, which has a wide range of applications including question answering and semantic search. In this paper, we study the problem of subgraph matching on knowledge graph. Specifically, given a query graph q and a data graph G, the problem of subgraph matching is to conduct all possible subgraph isomorphic mappings of q on G. Knowledge graph is formed as a directed labeled multi-graph having multiple edges between a pair of vertices and it has more dense semantic and structural features than general graph. To accelerate subgraph matching on knowledge graph, we propose a novel subgraph matching algorithm based on subgraph index for knowledge graph, called as F G q T-Match. The subgraph matching algorithm consists of two key designs. One design is a subgraph index of matching-driven flow graph ( F G q T), which reduces redundant calculations in advance. Another design is a multi-label weight matrix, which evaluates a near-optimal matching tree for minimizing the intermediate candidates. With the aid of these two key designs, all subgraph isomorphic mappings are quickly conducted only by traversing F G q T. Extensive empirical studies on real and synthetic graphs demonstrate that our techniques outperform the state-of-the-art algorithms.  相似文献   

Graphs are universal modeling tools. They are used to represent objects and their relationships in almost all domains: they are used to represent DNA, images, videos, social networks, XML documents, etc. When objects are represented by graphs, the problem of their comparison is a problem of comparing graphs. Comparing objects is a key task in our daily life. It is the core of a search engine, the backbone of a mining tool, etc. Nowadays, comparing objects faces the challenge of the large amount of data that this task must deal with. Moreover, when graphs are used to model these objects, it is known that graph comparison is very complex and computationally hard especially for large graphs. So, research on simplifying graph comparison gainedan interest and several solutions are proposed. In this paper, we explore and evaluate a new solution for the comparison of large graphs. Our approach relies on a compact encoding of graphs called prime graphs. Prime graphs are smaller and simpler than the original ones but they retain the structure and properties of the encoded graphs. We propose to approximate the similarity between two graphs by comparing the corresponding prime graphs. Simulations results show that this approach is effective for large graphs.  相似文献   

In this paper we present an efficient subquadratic-time algorithm for matching strings and limited expressions in large texts. Limited expressions are a subset of regular expressions that appear often in practice. The generalization from simple strings to limited expressions has a negligible affect on the speed of our algorithm, yet allows much more flexibility. Our algorithm is similar in spirit to that of Masek and Paterson [MP], but it is much faster in practice. Our experiments show a factor of four to five speedup against the algorithms of Sellers [Se] and Ukkonen [Uk1] independent of the sizes of the input strings. Experiments also reveal our algorithm to be faster, in most cases, than a recent improvement by Chang and Lampe [CL2], especially for small alphabet sizes for which it is two to three times faster.The research of U. Manber was supported in part by a Presidential Young Investigator Award DCR-8451397, with matching funds from AT&T, and by NSF Grant CCR-9001619. G. Myers research was supported in part by NIH Grant LM04960, NSF Grant CCR-9001619, and the Aspen Center for Physics.  相似文献   

In intelligence analysis a situation of interest is commonly obscured by the more voluminous amount of unimportant data. This data can be broadly divided into two categories, hard or physical sensor data and soft or human observed data. Soft intelligence data is collected by humans through human interaction, or human intelligence (HUMINT). The value and difficulty in manual processing of these observations due to the volume of available data and cognitive limitations of intelligence analysts necessitate an information fusion approach toward their understanding. The data representation utilized in this work is an attributed graphical format. The uncertainties, size and complexity of the connections within this graph make accurate assessments difficult for the intelligence analyst. While this graphical form is easier to consider for an intelligence analyst than disconnected multi-source human and sensor reports, manual traversal for the purpose of obtaining situation awareness and accurately answering priority information requests (PIRs) is still infeasible. To overcome this difficulty an automated stochastic graph matching approach is developed. This approach consists of three main processes: uncertainty alignment, graph matching result initialization and graph matching result maintenance. Uncertainty alignment associates with raw incoming observations a bias adjusted uncertainty representation representing the true value containing spread of the observation. The graph matching initialization step provides template graph to data graph matches for a newly initialized situation of interest (template graph). Finally, the graph matching result maintenance algorithm continuously updates graph matching results as incoming observations augment the cumulative data graph. Throughout these processes the uncertainties present in the original observations and the template to data graph matches are preserved, ultimately providing an indication of the uncertainties present in the current situation assessment. In addition to providing the technical details of this approach, this paper also provides an extensive numerical testing section which indicates a significant performance improvement of the proposed algorithm over a leading commercial solver.  相似文献   

A hierarchical scheme for elastic graph matching applied to hand gesture recognition is proposed. The proposed algorithm exploits the relative discriminatory capabilities of visual features scattered on the images, assigning the corresponding weights to each feature. A boosting algorithm is used to determine the structure of the hierarchy of a given graph. The graph is expressed by annotating the nodes of interest over the target object to form a bunch graph. Three annotation techniques, manual, semi-automatic, and automatic annotation are used to determine the position of the nodes. The scheme and the annotation approaches are applied to explore the hand gesture recognition performance. A number of filter banks are applied to hand gestures images to investigate the effect of using different feature representation approaches. Experimental results show that the hierarchical elastic graph matching (HEGM) approach classified the hand posture with a gesture recognition accuracy of 99.85% when visual features were extracted by utilizing the Histogram of Oriented Gradient (HOG) representation. The results also provide the performance measures from the aspect of recognition accuracy to matching benefits, node positions correlation and consistency on three annotation approaches, showing that the semi-automatic annotation method is more efficient and accurate than the other two methods.  相似文献   

This paper addresses the problem of global graph alignment on supercomputer-class clusters. We define the alignment of two graphs, as a mapping of each vertex in the first graph to a unique vertex in the second graph so as to optimize a given similarity-based cost function.1 Using a state of the art serial algorithm for the computation of vertex similarity scores called Network Similarity Decomposition (NSD), we derive corresponding parallel formulations. Coupling this parallel similarity algorithm with a parallel auction-based bipartite matching technique, we obtain a highly efficient and scalable graph matching pipeline. We validate the performance of our integrated approach on a large parallel platform and on diverse graph instances (including Protein Interaction, Wikipedia and Web networks). Experimental results demonstrate that our algorithms scale to large machine configurations (thousands of cores) and problem instances, enabling the alignment of networks of sizes two orders of magnitude larger than reported in the current literature.  相似文献   

Approximate string matching is used for spelling correction and personal name matching. In this paper we show how to use string matching techniques in conjunction with lexicon indexes to find approximate matches in a large lexicon. We test several lexicon indexing techniques, including n-grams and permuted lexicons, and several string matching techniques, including string similarity measures and phonetic coding. We propose methods for combining these techniques, and show experimentally that these combinations yield good retrieval effectiveness while keeping index size and retrieval time low. Our experiments also suggest that, in contrast to previous claims, phonetic codings are markedly inferior to string distance measures, which are demonstrated to be suitable for both spelling correction and personal name matching.  相似文献   

In this paper, we deal with both the complexity and the approximability of the labeled perfect matching problem in bipartite graphs. Given a simple graph G=(V,E) with |V|=2n vertices such that E contains a perfect matching (of size n), together with a color (or label) function , the labeled perfect matching problem consists in finding a perfect matching on G that uses a minimum or a maximum number of colors.  相似文献   

A method for segmentation and recognition of image structures based on graph homomorphisms is presented in this paper. It is a model-based recognition method where the input image is over-segmented and the obtained regions are represented by an attributed relational graph (ARG). This graph is then matched against a model graph thus accomplishing the model-based recognition task. This type of problem calls for inexact graph matching through a homomorphism between the graphs since no bijective correspondence can be expected, because of the over-segmentation of the image with respect to the model. The search for the best homomorphism is carried out by optimizing an objective function based on similarities between object and relational attributes defined on the graphs. The following optimization procedures are compared and discussed: deterministic tree search, for which new algorithms are detailed, genetic algorithms and estimation of distribution algorithms. In order to assess the performance of these algorithms using real data, experimental results on supervised classification of facial features using face images from public databases are presented.  相似文献   

The matching preclusion number of a graph with an even number of vertices is the minimum number of edges whose deletion destroys all perfect matchings in the graph. The optimal matching preclusion sets are often precisely those which are induced by a single vertex of minimum degree. To look for obstruction sets beyond these, the conditional matching preclusion number was introduced, which is defined similarly with the additional restriction that the resulting graph has no isolated vertices. In this paper we find the matching preclusion and conditional matching preclusion numbers and classify all optimal sets for the pancake graphs and burnt pancake graphs.  相似文献   

This paper proposes a weighted scheme for elastic graph matching hand posture recognition. Visual features scattered on the elastic graph are assigned corresponding weights according to their relative ability to discriminate between gestures. The weights' values are determined using adaptive boosting. A dictionary representing the variability of each gesture class is expressed in the form of a bunch graph. The positions of the nodes in the bunch graph are determined using three techniques: manually, semi-automatically, and automatically. Experimental results also show that the semi-automatic annotation method is efficient and accurate in terms of three performance measures; assignment cost, accuracy, and transformation error. In terms of the recognition accuracy, our results show that the hierarchical weighting on features has more significant discriminative power than the classic method (uniform weighting). The hierarchical elastic graph matching (WEGM) approach was used to classify a lexicon of ten hand postures, and it was found that the poses were recognized with a recognition accuracy of 97.08% on average. Using the weighted scheme, computing cycles can be decreased by only computing the features for those nodes whose weight is relatively high and ignoring the remaining nodes. It was found that only 30% of the nodes need to be computed to obtain a recognition accuracy of over 90%.  相似文献   

We propose a new variant of the bit-parallel NFA of Baeza-Yates and Navarro (BPD) for approximate string matching [R. Baeza-Yates, G. Navarro, Faster approximate string matching, Algorithmica 23 (1999) 127-158]. BPD is one of the most practical approximate string matching algorithms under moderate pattern lengths and error levels [G. Myers, A fast bit-vector algorithm for approximate string matching based on dynamic programming, J. ACM 46 (3) 1989 395-415; G. Navarro, M. Raffinot, Flexible Pattern Matching in Strings—Practical On-line Search Algorithms for Texts and Biological Sequences, Cambridge University Press, Cambridge, UK, 2002]. Given a length-m pattern and an error threshold k, the original BPD requires (mk)(k+2) bits of space to represent an NFA with (mk)(k+1) states. In this paper we remove redundancy from the original NFA representation. Our variant requires (mk)(k+1) bits of space, which is optimal in the sense that exactly one bit per state is used. The space efficiency is achieved by using an alternative, but equally or even more efficient, simulation algorithm for the bit-parallel NFA. We also present experimental results to compare our modified NFA against the original BPD and its main competitors. Our new variant is more efficient than the original BPD, and it hence takes over/extends the role of the original BPD as one of the most practical approximate string matching algorithms under moderate values of k and m.  相似文献   

We are currently developing unified query processing strategies for image databases. To perform this task, model-based representations of images by content are being used, as well as a hierarchical generalization of a relatively new object-recognition technique called data-driven indexed hypotheses. As the name implies, it is index-based, from which its efficiency derives. Earlier approaches to data-driven model-based object recognition techniques were not capable of handling complex image data containing overlapping, partially visible, and touching objects due to the limitations of the features used for building models. Recently, a few data-driven techniques capable of handling complex image data have been proposed. In these techniques, as in traditional databases, iconic index structures are employed to store the image and shape representation in such a way that searching for a given shape or image feature can be conducted efficiently. Some of these techniques handle the insertion and deletion of shapes and/or image representations very efficiently and with very little influence on the overall system performance. However, the main disadvantage of all previous data-driven implementations is that they are main memory based. In the present paper, we describe a secondary memory implementation of data-driven indexed hypotheses along with some performance studies we have conducted.  相似文献   

