首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
加权最大频繁子图挖掘算法的研究   总被引:2,自引:1,他引:1       下载免费PDF全文
如何从大量的图中挖掘出令人感兴趣的子图模式已经成为数据挖掘领域研究的热点之一。传统的频繁子图挖掘方法对满足最小支持度阈值的子图同等对待,但在真实数据库中不同的子图往往具有不同的重要程度。为解决上述问题,提出了一种深度优先的挖掘加权最大频繁子图的新算法。首先给出了一种新的用于计算图的邻接矩阵规范编码的结点排序策略,大大降低了求图规范编码的复杂度,并可以加速子图规范编码匹配的速度。其次,给出了加权最大频繁子图的定义,不仅可以找出较为重要的最大频繁子图,而且可以使挖掘结果同样具有反单调性,从而可加速剪枝。实验结果表明,提出的算法不仅可以有效地减少挖掘结果的数量,而且具有较高的效率。  相似文献   

2.
Bicliques are widely used to solve various real-world problems encountered in bio-informatics, data mining and networks. We consider c-isolated bicliques, a variation of bicliques. The c-isolated bicliques can model communities in social and biological networks. We propose an efficient algorithm to generate all c-isolated maximal bicliques in a given bipartite graph. Our algorithm exploits underlying properties of an isolated biclique to trim the input graph. Furthermore, the algorithm deploys the vertex cover enumeration algorithm based on fixed parameter tractability and lists all maximal c-isolated bicliques in linear time (in terms of graph size), for constant c.  相似文献   

3.
We study the polyhedral properties of three problems of constructing an optimal complete bipartite subgraph (a biclique) in a bipartite graph. In the first problem, we consider a balanced biclique with the same number of vertices in both parts and arbitrary edge weights. In the other two problems we are dealing with unbalanced subgraphs of maximum and minimum weight with non-negative edges. All three problems are established to be NP-hard. We study the polytopes and the cone decompositions of these problems and their 1-skeletons. We describe the adjacency criterion in the 1-skeleton of the polytope of the balanced complete bipartite subgraph problem. The clique number of the 1-skeleton is estimated from below by a superpolynomial function. For both unbalanced biclique problems we establish the superpolynomial lower bounds on the clique numbers of the graphs of nonnegative cone decompositions. These values characterize the time complexity in a broad class of algorithms based on linear comparisons.  相似文献   

4.
We combine the idea of confluent drawings with Sugiyama-style drawings in order to reduce the edge crossings in the resultant drawings. Furthermore, it is easier to understand the structures of graphs from the mixed-style drawings. The basic idea is to cover a layered graph by complete bipartite subgraphs (bicliques), then replace bicliques with tree-like structures. The biclique cover problem is reduced to a special edge-coloring problem and solved by heuristic coloring algorithms. Our method can be extended to obtain multi-depth confluent layered drawings.  相似文献   

5.
为减少频繁子图规范化检测的时间复杂度,对规范化邻接矩阵的相关性质进行分析。给出相关定理并证明其正确性,从而减少冗余候选子图的产生。在此基础上,提出一种频繁子图挖掘算法——FSM_CAM。实验结果证明,与现有频繁子图挖掘算法FSubGraphM相比,FSM_CAM算法的效率较高。  相似文献   

6.
时序图是一种边上带有时间戳的图结构,其中边上的时间戳表示该边出现时间,即图随时间变化不断变化.图数据中的稠密子图挖掘问题具有非常强烈的现实意义.目前,时序图中大多数现有的工作都集中在稠密子图检测问题,该问题目标是找到时序图中所有的目标子图.然而,当时序图的规模过大时,这一问题将变得极其复杂且收效甚微.旨在研究在时序图中...  相似文献   

7.
We represent a new method for finding all connected maximal common subgraphs in two graphs which is based on the transformation of the problem into the clique problem. We have developed new algorithms for enumerating all cliques that represent connected maximal common subgraphs. These algorithms are based on variants of the Bron–Kerbosch algorithm. In this paper we explain the transformation of the maximal common subgraph problem into the clique problem. We give a short summary of the variants of the Bron–Kerbosch algorithm in order to explain the modification of that algorithm such that the detected cliques represent connected maximal common subgraphs. After introducing and proving several variants of the modified algorithm we discuss the runtimes for all variants by means of random graphs. The results show the drastical reduction of the runtimes for the new algorithms.  相似文献   

8.
Frequent subgraphs proved to be powerful features for graph classification and prediction tasks. Their practical use is, however, limited by the computational intractability of pattern enumeration and that of graph embedding into frequent subgraph feature spaces. We propose a simple probabilistic technique that resolves both limitations. In particular, we restrict the pattern language to trees and relax the demand on the completeness of the mining algorithm, as well as on the correctness of the pattern matching operator by replacing transaction and query graphs with small random samples of their spanning trees. In this way we consider only a random subset of frequent subtrees, called probabilistic frequent subtrees, that can be enumerated efficiently. Our extensive empirical evaluation on artificial and benchmark molecular graph datasets shows that probabilistic frequent subtrees can be listed in practically feasible time and that their predictive and retrieval performance is very close even to those of complete sets of frequent subgraphs. We also present different fast techniques for computing the embedding of unseen graphs into (probabilistic frequent) subtree feature spaces. These algorithms utilize the partial order on tree patterns induced by subgraph isomorphism and, as we show empirically, require much less evaluations of subtree isomorphism than the standard brute-force algorithm. We also consider partial embeddings, i.e., when only a part of the feature vector has to be calculated. In particular, we propose a highly effective practical algorithm that significantly reduces the number of pattern matching evaluations required by the classical min-hashing algorithm approximating Jaccard-similarities.  相似文献   

9.
The output of frequent pattern mining is a huge number of frequent patterns, which are very redundant, causing a serious problem in understandability. We focus on mining frequent subgraphs for which well-considered approaches to reduce the redundancy are limited because of the complex nature of graphs. Two known, standard solutions are closed and maximal frequent subgraphs, but closed frequent subgraphs are still redundant and maximal frequent subgraphs are too specific. A more promising solution is δ-tolerance closed frequent subgraphs, which decrease monotonically in δ, being equal to maximal frequent subgraphs and closed frequent subgraphs for δ=0 and 1, respectively. However, the current algorithm for mining δ-tolerance closed frequent subgraphs is a naive, two-step approach in which frequent subgraphs are all enumerated and then sifted according to δ-tolerance closedness. We propose an efficient algorithm based on the idea of “reverse-search” by which the completeness of enumeration is guaranteed and for which new pruning conditions are incorporated. We empirically demonstrate that our approach significantly reduced the amount of real computation time of two compared algorithms for mining δ-tolerance closed frequent subgraphs, being pronounced more for practical settings.  相似文献   

10.
乔连鹏  侯会文  王国仁 《软件学报》2023,34(3):1277-1291
近年来,异质信息网络上的社区搜索问题已经吸引了越来越多的关注,而且被广泛应用在图数据分析工作中.但是现有异质信息网络上的社区搜索问题都没有考虑子图上属性的公平性.将属性的公平性与异质信息网络上的kPcore挖掘问题相结合,提出了基于属性公平的异质信息网络上的极大core挖掘问题.针对该问题,首先提出了一个子图模型FkPcore.当对FkPcore进行枚举时,基础算法Basic-FkPcore遍历了所有路径实例,并枚举了大量k Pcore及其子图.为了提高算法效率,提出了Adv-FkPcore算法,以避免在枚举FkPcore时对所有的kPcore及其子图进行判断.另外,为了提高点的P_neighbor的获取效率,提出了结合点标记的遍历方法(traversalmethod with vertex sign, TMS),并基于TMS算法提出了FkPcore枚举算法Opt-FkPcore.在异质信息网络数据集上进行的大量实验证明了所提方法的有效性和效率.  相似文献   

11.
We present polynomial time algorithms for induced biclique optimization problems in the following families of graphs: polygon-circle graphs, 4-hole-free graphs, complements of interval-filament graphs and complements of subtree-filament graphs. Such problems are to find maximum: induced bicliques, induced balanced bicliques and induced edge bicliques. These problems have applications for biclique clustering of proteins by PPI criteria, of documents, and of web pages.  相似文献   

12.
A significant number of applications require effective and efficient manipulation of relational graphs, towards discovering important patterns. Some example applications are: (i) analysis of microarray data in bioinformatics, (ii) pattern discovery in a large graph representing a social network, (iii) analysis of transportation networks, (iv) community discovery in Web data. The basic approach followed by existing methods is to apply mining techniques on graph data to discover important patterns, such as subgraphs that are likely to be useful. However, in some cases the number of mined patterns is large, posing difficulties in selecting the most important ones. For example, applying frequent subgraph mining on a set of graphs the system returns all connected subgraphs whose frequency is above a specified (usually user-defined) threshold. The number of discovered patterns may be large, and this number depends on the data characteristics and the frequency threshold specified. It would be more convenient for the user if “goodness” criteria could be set to evaluate the usefulness of these patterns, and if the user could provide preferences to the system regarding the characteristics of the discovered patterns. In this paper, we propose a methodology to support such preferences by applying subgraph discovery in relational graphs towards retrieving important connected subgraphs. The importance of a subgraph is determined by: (i) the order of the subgraph (the number of vertices) and (ii) the subgraph edge connectivity. The performance of the proposed technique is evaluated by using real-life as well as synthetically generated data sets.  相似文献   

13.
图挖掘已成为数据挖掘领域研究的热点,然而挖掘全部频繁子图很困难且得到的频繁子图过多,影响结果的理解和应用。可通过挖掘最大频繁子图来解决挖掘结果数量巨大的问题,最大频繁子图挖掘得到的结果数量很少且不丢失信息,节省了空间和以后的分析工作。基于算法FSG提出了最大频繁子图挖掘算法FSG-MaxGraph;结合节点的度、标记及邻接列表来计算规范编码,提出两个定理来减少子图同构判断的次数,并应用改进后的决策树来计算支持度。实验证明,新算法解决了挖掘结果太多理解困难的问题,且提高了挖掘效率。  相似文献   

14.
随着图的广泛应用,图的规模不断扩大,因此提高频繁子图挖掘效率势在必行。本文针对频繁子图挖掘所产生的庞大的结果集,提出了一个最大频繁子图挖掘算法MFME,从而极大地减少了结果集的数量。MFME使用了映射的思想将图集中的边映射到边表中并在此表上进行子图挖掘,有效地提高了算法的效率。实验结果表明,MFME的效率较经典算法SPIN有明显提高。  相似文献   

15.
敦景峰  张伟  柴然 《计算机工程》2011,37(20):27-29
传统Aprior频繁子图挖掘算法中存在大量冗余子图.针对该问题,提出一种新的频繁子图挖掘算法(GAI).介绍一种三层MADI索引结构,用于存储图集的信息,以减少图集的扫描次数,通过扩展ETree树构造频繁子图,并用表来存储候选子图,避免扩展过程中冗余图的产生以及对整个数据库的扫描,从而简化支持度的计算,提高图/子图同构...  相似文献   

16.
Maximal prime subgraph decomposition of Bayesian networks   总被引:1,自引:0,他引:1  
The authors present a method for decomposition of Bayesian networks into their maximal prime subgraphs. The correctness of the method is proven and results relating the maximal prime subgraph decomposition (MPD) to the maximal complete subgraphs of the moral graph of the original Bayesian network are presented. The maximal prime subgraphs of a Bayesian network can be organized as a tree which can be used as the computational structure for LAZY propagation. We also identify a number of tasks performed on Bayesian networks that can benefit from MPD. These tasks are: divide and conquer triangulation, hybrid propagation algorithms combining exact and approximative inference techniques, and incremental construction of junction trees. We compare the proposed algorithm with standard algorithms for decomposition of undirected graphs into their maximal prime subgraphs. The discussion shows that the proposed algorithm is simpler, more easy to comprehend, and it has the same complexity as the standard algorithms.  相似文献   

17.
Finding maximal homogeneous clique sets   总被引:1,自引:0,他引:1  
Many datasets can be encoded as graphs with sets of labels associated with the vertices. We consider this kind of graphs and we propose to look for patterns called maximal homogeneous clique sets, where such a pattern is a subgraph that is structured in several large cliques and where all vertices share enough labels. We present an algorithm based on graph enumeration to compute all patterns satisfying user-defined constraints on the number of separated cliques, on the size of these cliques, and on the number of labels shared by all the vertices. Our approach is tested on real datasets based on a social network of scientific collaborations and on a biological network of protein–protein interactions. The experiments show that the patterns are useful to exhibit subgraphs organized in several core modules of interactions. Performances are reported on real data and also on synthetic ones, showing that the approach can be applied on different kinds of large datasets.  相似文献   

18.
Bicliques of graphs have been studied extensively, partially motivated by the large number of applications. In this paper we improve Prisner’s upper bound on the number of maximal bicliques (Combinatorica, 20, 109–117, 2000) and show that the maximum number of maximal bicliques in a graph on n vertices is Θ(3 n/3). Our major contribution is an exact exponential-time algorithm. This branching algorithm computes the number of distinct maximal independent sets in a graph in time O(1.3642 n ), where n is the number of vertices of the input graph. We use this counting algorithm and previously known algorithms for various other problems related to independent sets to derive algorithms for problems related to bicliques via polynomial-time reductions.  相似文献   

19.
We show that three subclasses of bounded treewidth graphs are well quasi ordered by refinements of the minor order. Specifically, we prove that graphs with bounded vertex cover are well quasi ordered by the induced subgraph order, graphs with bounded feedback vertex set are well quasi ordered by the topological-minor order, and graphs with bounded circumference are well quasi ordered by the induced minor order. Our results give algorithms for recognizing any graph family in these classes which is closed under the corresponding minor order refinement.  相似文献   

20.
A configuration management database (CMDB) can be considered to be a large graph representing the IT infrastructure entities and their interrelationships. Mining such graphs is challenging because they are large, complex, and multi-attributed and have many repeated labels. These characteristics pose challenges for graph mining algorithms, due to the increased cost of subgraph isomorphism (for support counting) and graph isomorphism (for eliminating duplicate patterns). The notion of pattern frequency or support is also more challenging in a single graph, since it has to be defined in terms of the number of its (potentially, exponentially many) embeddings. We present CMDB-Miner, a novel two-step method for mining infrastructure patterns from CMDB graphs. It first samples the set of maximal frequent patterns and then clusters them to extract the representative infrastructure patterns. We demonstrate the effectiveness of CMDB-Miner on real-world CMDB graphs, as well as synthetic graphs.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号