首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
Data mining in structured and semi-structured data focuses on frequent data values. However, in graph data mining, the focus is on common specific topologies. Graph mining, although its ubiquity, is a difficult task since it requires subgraph isomorphism which is known to be NP-complete. In order to effectively prune the search space and thereby save computational time, a graph mining algorithm requires that the support measure of a pattern to be no greater than that of its subpatterns. This property of the support measure is referred to in the literature as the down-closure, anti-monotonicity or admissibility. Unfortunately, when mining a single labeled graph, simply counting the occurrences of a graph pattern may not have the down-closure property. For this, most existing approaches mine frequent substructures in a set of labeled graphs (called also the transactional setting) and few efforts have been devoted to mining frequent globally distributed substructures in a single labeled graph. In this paper, we propose a graph mining algorithm, called NODAR(Non-Overlapping embeDding based grAph mineR), for computing common and globally distributed substructures in a single labeled graph. NODAR adopts the Depth-First Search (DFS) strategy and is based on the SMNOES (Size of Maximum Non Overlapping Embedding Set) as support measure. The core idea of NODAR is to automatically extract frequent subpatterns; and thus without frequency computation thanks to the down-closure property of SMNOES. By adopting this strategy in the computation of frequent substructures, NODAR reduces the number of subgraph isomorphism tests needed to compute pattern frequencies. Experimental results on monograph and transactional graph databases; and comparison with well-known probabilistic and exact algorithms; prove the efficacy of NODAR.  相似文献   

2.
Given a graph, we define a base set to be a set of integers of size equal to the number of vertices in the graph. Given a graph and a base set, a labeling of the graph from the base set is an assignment of distinct integers from the base set to the vertices of the graph. The gap of an edge in a labeled graph is the absolute value of the difference between the labels of its endpoints. The gap of a labeled graph is the sum of the gaps of its edges.The maximum gap graph labeling problem takes as input a graph and a base set and maximizes the gap of the graph over all possible labelings from the base set. We show that this problem is NP-complete even when the base set is restricted to consecutive integers. We also show that this restricted case has polynomial time approximations that achieve a factor of 2/3 for trees, of 1/2 for bipartite graphs, and of 1/4 for general graphs, with a deterministic algorithm, while an expected factor of 1/3 for general graphs is achieved with a randomized algorithm. The case of general base sets is approximated within an expected factor of 1/16 for general graphs with a randomized polynomial time algorithm. We finally give a polynomial time algorithm that solves the maximum gap graph labeling problem for a graph that has bounded degree and bounded treewidth. The maximum graph labeling problem shows connections with the graceful tree conjecture.  相似文献   

3.
A matching in a graph is a set of edges no two of which share a common vertex. In this paper we introduce a new, specialized type of matching which we call uniquely restricted matchings, originally motivated by the problem of determining a lower bound on the rank of a matrix having a specified zero/ non-zero pattern. A uniquely restricted matching is defined to be a matching M whose saturated vertices induce a subgraph which has only one perfect matching, namely M itself. We introduce the two problems of recognizing a uniquely restricted matching and of finding a maximum uniquely restricted matching in a given graph, and present algorithms and complexity results for certain special classes of graphs. We demonstrate that testing whether a given matching M is uniquely restricted can be done in O(|M||E|) time for an arbitrary graph G=(V,E) and in linear time for cacti, interval graphs, bipartite graphs, split graphs and threshold graphs. The maximum uniquely restricted matching problem is shown to be NP-complete for bipartite graphs, split graphs, and hence for chordal graphs and comparability graphs, but can be solved in linear time for threshold graphs, proper interval graphs, cacti and block graphs. Received April 12, 1998; revised June 21, 1999.  相似文献   

4.
Motivated by a problem of targeted advertising in social networks, we introduce a new model of online learning on labeled graphs where the graph is initially unknown and the algorithm is free to choose which vertex to predict next. For this learning model, we define an appropriate measure of regularity of a graph labeling called the merging degree. In general, the merging degree of a graph is small when its vertices can be partitioned into a few well-separated clusters within which labels are roughly constant. For the special case of binary labeled graphs, the merging degree is a more refined measure than the cutsize. After observing that natural nonadaptive exploration/prediction strategies, like depth-first with majority vote, do not behave satisfactorily on graphs with small merging degree, we introduce an efficiently implementable adaptive strategy whose cumulative loss is controlled by the merging degree. A matching lower bound shows that in the case of binary labels our analysis cannot be improved.  相似文献   

5.
Graph support measures are functions measuring how frequently a given subgraph pattern occurs in a given database graph. An important class of support measures relies on overlap graphs. A major advantage of overlap-graph based approaches is that they combine anti-monotonicity with counting the occurrences of a subgraph pattern which are independent according to certain criteria. However, existing overlap-graph based support measures are expensive to compute. In this paper, we propose a new support measure which is based on a new notion of independence. We show that our measure is the solution to a sparse linear program, which can be computed efficiently using interior point methods. We study the anti-monotonicity and other properties of this new measure, and relate it to the statistical power of a sample of embeddings in a network. We show experimentally that, in contrast to earlier overlap-graph based proposals, our support measure makes it feasible to mine subgraph patterns in large networks.  相似文献   

6.
The notion of a random graph is formally defined. It deals with both the probabilistic and the structural aspects of relational data. By interpreting an ensemble of attributed graphs as the outcomes of a random graph, we can use its lower order distribution to characterize the ensemble. To reflect the variability of a random graph, Shannon's entropy measure is used. To synthesize an ensemble of attributed graphs into the distribution of a random graph (or a set of distributions), we propose a distance measure between random graphs based on the minimum change of entropy before and after their merging. When the ensemble contains more than one class of pattern graphs, the synthesis process yields distributions corresponding to various classes. This process corresponds to unsupervised learning in pattern classification. Using the maximum likelihood rule and the probability computed for the pattern graph, based on its matching with the random graph distributions of different classes, we can classify the pattern graph to a class.  相似文献   

7.
Graph matching and graph edit distance have become important tools in structural pattern recognition. The graph edit distance concept allows us to measure the structural similarity of attributed graphs in an error-tolerant way. The key idea is to model graph variations by structural distortion operations. As one of its main constraints, however, the edit distance requires the adequate definition of edit cost functions, which eventually determine which graphs are considered similar. In the past, these cost functions were usually defined in a manual fashion, which is highly prone to errors. The present paper proposes a method to automatically learn cost functions from a labeled sample set of graphs. To this end, we formulate the graph edit process in a stochastic context and perform a maximum likelihood parameter estimation of the distribution of edit operations. The underlying distortion model is learned using an Expectation Maximization algorithm. From this model we finally derive the desired cost functions. In a series of experiments we demonstrate the learning effect of the proposed method and provide a performance comparison to other models.  相似文献   

8.
Planning graphs have been shown to be a rich source of heuristic information for many kinds of planners. In many cases, planners must compute a planning graph for each element of a set of states, and the naive technique enumerates the graphs individually. This is equivalent to solving a multiple-source shortest path problem by iterating a single-source algorithm over each source.We introduce a data-structure, the state agnostic planning graph, that directly solves the multiple-source problem for the relaxation introduced by planning graphs. The technique can also be characterized as exploiting the overlap present in sets of planning graphs. For the purpose of exposition, we first present the technique in deterministic (classical) planning to capture a set of planning graphs used in forward chaining search. A more prominent application of this technique is in conformant and conditional planning (i.e., search in belief state space), where each search node utilizes a set of planning graphs; an optimization to exploit state overlap between belief states collapses the set of sets of planning graphs to a single set. We describe another extension in conformant probabilistic planning that reuses planning graph samples of probabilistic action outcomes across search nodes to otherwise curb the inherent prediction cost associated with handling probabilistic actions. Finally, we show how to extract a state agnostic relaxed plan that implicitly solves the relaxed planning problem in each of the planning graphs represented by the state agnostic planning graph and reduces each heuristic evaluation to counting the relevant actions in the state agnostic relaxed plan. Our experimental evaluation (using many existing International Planning Competition problems from classical and non-deterministic conformant tracks) quantifies each of these performance boosts, and demonstrates that heuristic belief state space progression planning using our technique is competitive with the state of the art.  相似文献   

9.
We describe a polynomial time algorithm to find a minimum weight feedback vertex set, or equivalently, a maximum weight induced forest, in a circle graph. The circle graphs are the overlap graphs of intervals on a line.  相似文献   

10.
《国际计算机数学杂志》2012,89(10):2118-2141
A graph is clique-perfect if the maximum size of a clique-independent set (a set of pairwise disjoint maximal cliques) and the minimum size of a clique-transversal set (a set of vertices meeting every maximal clique) coincide for each induced subgraph. A graph is balanced if its clique-matrix contains no square submatrix of odd size with exactly two ones per row and column. In this work, we give linear-time recognition algorithms and minimal forbidden induced subgraph characterizations of clique-perfectness and balancedness of P4-tidy graphs and a linear-time algorithm for computing a maximum clique-independent set and a minimum clique-transversal set for any P4-tidy graph. We also give a minimal forbidden induced subgraph characterization and a linear-time recognition algorithm for balancedness of paw-free graphs. Finally, we show that clique-perfectness of diamond-free graphs can be decided in polynomial time by showing that a diamond-free graph is clique-perfect if and only if it is balanced.  相似文献   

11.
We consider the sum coloring and sum multicoloring problems on several fundamental classes of graphs, including the classes of interval and k-claw free graphs. We give an algorithm that approximates sum coloring within a factor of 1.796, for any graph in which the maximum k-colorable subgraph problem is polynomially solvable. In particular, this improves on the previous best known ratio of 2 for interval graphs. We introduce a new measure of coloring, robust throughput}, that indicates how quickly the graph is colored, and show that our algorithm approximates this measure within a factor of 1.4575. In addition, we study the contiguous (or non-preemptive) sum multicoloring problem on k-claw free graphs. This models, for example, the scheduling of dependent jobs on multiple dedicated machines, where each job requires the exclusive use of at most k machines. Assuming that k is a fixed constant, we obtain the first constant factor approximation for the problem.  相似文献   

12.
Community detection and evaluation is an important task in graph mining. In many cases, a community is defined as a subgraph characterized by dense connections or interactions between its nodes. A variety of measures are proposed to evaluate different quality aspects of such communities—in most cases ignoring the directed nature of edges. In this paper, we introduce novel metrics for evaluating the collaborative nature of directed graphs—a property not captured by the single node metrics or by other established community evaluation metrics. In order to accomplish this objective, we capitalize on the concept of graph degeneracy and define a novel D-core framework, extending the classic graph-theoretic notion of $k$ -cores for undirected graphs to directed ones. Based on the D-core, which essentially can be seen as a measure of the robustness of a community under degeneracy, we devise a wealth of novel metrics used to evaluate graph collaboration features of directed graphs. We applied the D-core approach on large synthetic and real-world graphs such as Wikipedia, DBLP, and ArXiv and report interesting results at the graph as well at the node level.  相似文献   

13.
Several graph libraries have been developed in the past few decades, and they were basically designed to work with a few graphs. However, there are many problems in which we have to consider all subgraphs satisfying certain constraints on a given graph. Since the number of subgraphs can increase exponentially with the graph size, explicitly representing these sets is infeasible. Hence, libraries concerned with efficiently representing a single graph instance are not suitable for such problems. In this paper, we develop Graphillion, a software library for very large sets of (vertex-)labeled graphs, based on zero-suppressed binary decision diagrams. Graphillion is not based on a traditional representation of graphs. Instead, a graph set is simply regarded as a “set of edge sets” ignoring vertices, which allows us to employ powerful tools of a “family of sets” (a set of sets) and permits large graph sets to be handled efficiently. We also utilize advanced graph enumeration algorithms, which enable the simple family tools to understand the graph structure. Graphillion is implemented as a Python library to encourage easy development of its applications, without introducing significant performance overheads. In experiments, we consider two case studies, a puzzle solver and a power network optimizer, in which several operations and heavy optimization have to be performed over very large sets of constrained graphs (i.e., cycles or forests with complicated conditions). The results show that Graphillion allows us to manage a huge number of graphs with very low development effort.  相似文献   

14.
In a graph G a matching is a set of edges in which no two edges have a common endpoint. An induced matching is a matching in which no two edges are linked by an edge of G. The maximum induced matching (abbreviated MIM) problem is to find the maximum size of an induced matching for a given graph G. This problem is known to be NP-hard even on bipartite graphs or on planar graphs. We present a polynomial time algorithm which given a graph G either finds a maximum induced matching in G, or claims that the size of a maximum induced matching in G is strictly less than the size of a maximum matching in G. We show that the MIM problem is NP-hard on line-graphs, claw-free graphs, chair-free graphs, Hamiltonian graphs and r-regular graphs for r \geq 5. On the other hand, we present polynomial time algorithms for the MIM problem on (P 5,D m )-free graphs, on (bull, chair)-free graphs and on line-graphs of Hamiltonian graphs.  相似文献   

15.
Partitioning a data set of attributed graphs into clusters arises in different application areas of structural pattern recognition and computer vision. Despite its importance, graph clustering is currently an underdeveloped research area in machine learning due to the lack of theoretical analysis and the high computational cost of measuring structural proximities. To address the first issue, we introduce the concept of metric graph spaces that enables central (or center-based) clustering algorithms to be applied to the domain of attributed graphs. The key idea is to embed attributed graphs into Euclidean space without loss of structural information. In addressing the second issue of computational complexity, we propose a neural network solution of the K-means algorithm for structures (KMS). As a distinguishing feature to improve the computational time, the proposed algorithm classifies the data graphs according to the principle of elimination of competition where the input graph is assigned to the winning model of the competition. In experiments we investigate the behavior and performance of the neural KMS algorithm.  相似文献   

16.
We study the parameterized complexity of several minimum label graph problems, in which we are given an undirected graph whose edges are labeled, and a property Π, and we are asked to find a subset of edges satisfying property Π with respect to G that uses the minimum number of labels. These problems have a lot of applications in networking. We show that all the problems under consideration are W[2]-hard when parameterized by the number of used labels, and that they remain W[2]-hard even on graphs whose pathwidth is bounded above by a small constant. On the positive side, we prove that most of these problems are FPT when parameterized by the solution size, that is, the size of the sought edge set. For example, we show that computing a maximum matching or an edge dominating set that uses the minimum number of labels, is FPT when parameterized by the solution size. Proving that some of these problems are FPT requires interesting algorithmic methods that we develop in this paper.  相似文献   

17.
Co-analyzing a set of 3D shapes is a challenging task considering a large geometrical variability of the shapes. To address this challenge, this paper proposes a new automatic 3D shape co-segmentation algorithm by using spectral graph method.Our method firstly represents input shapes as a set of weighted graphs and extracts multiple geometric features to measure the similarities of faces in each individual shape.Secondly all graphs are embedded into the spectral domain to find meaningful correspondences across the set.After that we build a joint weighted matrix for the graph set and then apply normalized cut criterion to find optimal co-segmentation of the input shapes.Finally we evaluate our approach on different categories of 3D shapes, and the experimental results demonstrate that our method can accurately co-segment a wide variety of shapes, which may have different poses and significant topology changes.  相似文献   

18.
A configuration management database (CMDB) can be considered to be a large graph representing the IT infrastructure entities and their interrelationships. Mining such graphs is challenging because they are large, complex, and multi-attributed and have many repeated labels. These characteristics pose challenges for graph mining algorithms, due to the increased cost of subgraph isomorphism (for support counting) and graph isomorphism (for eliminating duplicate patterns). The notion of pattern frequency or support is also more challenging in a single graph, since it has to be defined in terms of the number of its (potentially, exponentially many) embeddings. We present CMDB-Miner, a novel two-step method for mining infrastructure patterns from CMDB graphs. It first samples the set of maximal frequent patterns and then clusters them to extract the representative infrastructure patterns. We demonstrate the effectiveness of CMDB-Miner on real-world CMDB graphs, as well as synthetic graphs.  相似文献   

19.
Goldreich  Ron 《Algorithmica》2002,32(2):302-343
We further develop the study of testing graph properties as initiated by Goldreich, Goldwasser and Ron. Loosely speaking, given an oracle access to a graph, we wish to distinguish the case when the graph has a pre-determined property from the case when it is ``far'' from having this property. Whereas they view graphs as represented by their adjacency matrix and measure the distance between graphs as a fraction of all possible vertex pairs, we view graphs as represented by bounded-length incidence lists and measure the distance between graphs as a fraction of the maximum possible number of edges. Thus, while the previous model is most appropriate for the study of dense graphs, our model is most appropriate for the study of bounded-degree graphs. In particular, we present randomized algorithms for testing whether an unknown bounded-degree graph is connected, k -connected (for k>1 ), cycle-free and Eulerian. Our algorithms work in time polynomial in 1/? , always accept the graph when it has the tested property, and reject with high probability if the graph is ? -far from having the property. For example, the 2-connectivity algorithm rejects (with high probability) any N -vertex d -degree graph for which more than ? dN edges need to be added in order to make the graph 2-edge-connected. In addition we prove lower bounds of Ω(\sqrt N ) on the query complexity of testing algorithms for the bipartite and expander properties.  相似文献   

20.

Embedded real-time systems generate state sequences where time elapses between state changes. Ensuring that such systems adhere to a provided specification of admissible or desired behavior is essential. Formal model-based testing is often a suitable cost-effective approach. We introduce an extended version of the formalism of symbolic graphs, which encompasses types as well as attributes, for representing states of dynamic systems. Relying on this extension of symbolic graphs, we present a novel formalism of timed graph transformation systems (TGTSs) that supports the model-based development of dynamic real-time systems at an abstract level where possible state changes and delays are specified by graph transformation rules. We then introduce an extended form of the metric temporal graph logic (MTGL) with increased expressiveness to improve the applicability of MTGL for the specification of timed graph sequences generated by a TGTS. Based on the metric temporal operators of MTGL and its built-in graph binding mechanics, we express properties on the structure and attributes of graphs as well as on the occurrence of graphs over time that are related by their inner structure. We provide formal support for checking whether a single generated timed graph sequence adheres to a provided MTGL specification. Relying on this logical foundation, we develop a testing framework for TGTSs that are specified using MTGL. Lastly, we apply this testing framework to a running example by using our prototypical implementation in the tool AutoGraph.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号