首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
从不确定图中挖掘频繁子图模式   总被引:8,自引:0,他引:8  
邹兆年  李建中  高宏  张硕 《软件学报》2009,20(11):2965-2976
研究不确定图数据的挖掘,主要解决不确定图数据的频繁子图模式挖掘问题.介绍了一种数据模型来表示图的不确定性,以及一种期望支持度来评价子图模式的重要性.利用期望支持度的Apriori性质,给出了一种基于深度优先搜索策略的挖掘算法.该算法使用高效的期望支持度计算方法和搜索空间裁剪技术,使得计算子图模式的期望支持度所需的子图同构测试的数量从指数级降低到线性级.实验结果表明,该算法比简单的深度优先搜索算法快3~5个数量级,有很高的效率和可扩展性.  相似文献   

2.
敦景峰  张伟  柴然 《计算机工程》2011,37(20):27-29
传统Aprior频繁子图挖掘算法中存在大量冗余子图.针对该问题,提出一种新的频繁子图挖掘算法(GAI).介绍一种三层MADI索引结构,用于存储图集的信息,以减少图集的扫描次数,通过扩展ETree树构造频繁子图,并用表来存储候选子图,避免扩展过程中冗余图的产生以及对整个数据库的扫描,从而简化支持度的计算,提高图/子图同构...  相似文献   

3.
Mining globally distributed frequent subgraphs in a single labeled graph   总被引:1,自引:0,他引:1  
Xing  Hui  Chen  Ah-Hwee   《Data & Knowledge Engineering》2009,68(10):1034-1058
Recent years have observed increasing efforts on graph mining and many algorithms have been developed for this purpose. However, most of the existing algorithms are designed for discovering frequent subgraphs in a set of labeled graphs only. Also, the few algorithms that find frequent subgraphs in a single labeled graph typically identify subgraphs appearing regionally in the input graph. In contrast, for real-world applications, it is commonly required that the identified frequent subgraphs in a single labeled graph should also be globally distributed. This paper thus fills this crucial void by proposing a new measure, termed G-Measure, to find globally distributed frequent subgraphs, called G-Patterns, in a single labeled graph. Specifically, we first show that the G-Patterns, selected by G-Measure, tend to be globally distributed in the input graph. Then, we present that G-Measure has the downward closure property, which guarantees the G-Measure value of a G-Pattern is not less than those of its supersets. Consequently, a G-Miner algorithm is developed for finding G-Patterns. Experimental results on four synthetic and seven real-world data sets and comparison with the existing algorithms demonstrate the efficacy of the G-Measure and the G-Miner for finding G-Patterns. Finally, an application of the G-Patterns is given.  相似文献   

4.
A configuration management database (CMDB) can be considered to be a large graph representing the IT infrastructure entities and their interrelationships. Mining such graphs is challenging because they are large, complex, and multi-attributed and have many repeated labels. These characteristics pose challenges for graph mining algorithms, due to the increased cost of subgraph isomorphism (for support counting) and graph isomorphism (for eliminating duplicate patterns). The notion of pattern frequency or support is also more challenging in a single graph, since it has to be defined in terms of the number of its (potentially, exponentially many) embeddings. We present CMDB-Miner, a novel two-step method for mining infrastructure patterns from CMDB graphs. It first samples the set of maximal frequent patterns and then clusters them to extract the representative infrastructure patterns. We demonstrate the effectiveness of CMDB-Miner on real-world CMDB graphs, as well as synthetic graphs.  相似文献   

5.
社会网络上的模式挖掘是近年来的研究热点之一,合作模式是社会网络上个体间的合作方式,这种模式可以通过社会网络的子结构表示。已有的基于频繁模式的挖掘算法主要考虑合作关系的结构特征,并且往往需要给定支持度阈值来控制结果的规模。在本文中,我们认为社会网络中的模式不一定需要是频繁的,模式与社区也并不需要精确匹配。我们在合作模式中考虑节点的社会地位,并在加权图上给出了一种模式的定义方法,和一种基于互相似性的模式匹配衡量标准,目的在于找出网络中具有"代表性"的合作模式。我们设计了一种基于距离的聚类方法用于抽取这种模式,并在一个大规模的真实数据集上进行了验证。  相似文献   

6.
The concept of support is central to data mining. While the definition of support in transaction databases is intuitive and simple, that is not the case in graph datasets and databases. Most mining algorithms require the support of a pattern to be no greater than that of its subpatterns, a property called anti-monotonicity, or admissibility. This paper examines the requirements for admissibility of a support measure. Support measures for mining graphs are usually based on the notion of an instance graph---a graph representing all the instances of the pattern in a database and their intersection properties. Necessary and sufficient conditions for support measure admissibility, based on operations on instance graphs, are developed and proved. The sufficient conditions are used to prove admissibility of one support measure—the size of the independent set in the instance graph. Conversely, the necessary conditions are used to quickly show that some other support measures, such as weighted count of instances, are not admissible. *Partially supported by the KITE consortium under contract to the Israeli Ministry of Trade and Industry, and by the Paul Ivanier Center for Robotics and Production Management.  相似文献   

7.
鉴于图结构能简单方便地描绘复杂的数据以及实际应用中图数据的获得具有不确定性,不确定频繁子图挖掘算法得到广泛的研究。目前一个典型的图挖掘算法是MUSE,但MUSE算法存在期望支持度计算消耗大、时间效率不够高等问题。针对此问题提出了一种基于划分思想混合搜索策略的不确定子图挖掘算法EDFS,它用改进过的GSpan算法进行不确定的子图数据预处理,用裁剪子图模式的搜索空间裁剪不确定子图数据,用基于划分思想的混合策略进行频繁子图的挖掘。子图同构与边存在概率的实验结果证明了EDFS算法能更高效地挖掘出不确定数据频繁子图。  相似文献   

8.
不同时刻的动态网络往往具有不同权重,针对加权动态网络的频繁模式挖掘,提出一种挖掘算法WGDM,它适用于加权动态社会网络、生物网络等方面的频繁模式挖掘。WGDM算法利用支持度的反单调性裁剪搜索空间,从而减少冗余候选子图,提高算法效率。通过实验测试了WGDM算法的性能,并根据中国实际股票市场网络,利用WGDM算法挖掘股票市场网络中有趣的频繁模式。  相似文献   

9.
Frequent subgraphs proved to be powerful features for graph classification and prediction tasks. Their practical use is, however, limited by the computational intractability of pattern enumeration and that of graph embedding into frequent subgraph feature spaces. We propose a simple probabilistic technique that resolves both limitations. In particular, we restrict the pattern language to trees and relax the demand on the completeness of the mining algorithm, as well as on the correctness of the pattern matching operator by replacing transaction and query graphs with small random samples of their spanning trees. In this way we consider only a random subset of frequent subtrees, called probabilistic frequent subtrees, that can be enumerated efficiently. Our extensive empirical evaluation on artificial and benchmark molecular graph datasets shows that probabilistic frequent subtrees can be listed in practically feasible time and that their predictive and retrieval performance is very close even to those of complete sets of frequent subgraphs. We also present different fast techniques for computing the embedding of unseen graphs into (probabilistic frequent) subtree feature spaces. These algorithms utilize the partial order on tree patterns induced by subgraph isomorphism and, as we show empirically, require much less evaluations of subtree isomorphism than the standard brute-force algorithm. We also consider partial embeddings, i.e., when only a part of the feature vector has to be calculated. In particular, we propose a highly effective practical algorithm that significantly reduces the number of pattern matching evaluations required by the classical min-hashing algorithm approximating Jaccard-similarities.  相似文献   

10.
图挖掘已成为数据挖掘领域研究的热点,然而挖掘全部频繁子图很困难且得到的频繁子图过多,影响结果的理解和应用。可通过挖掘最大频繁子图来解决挖掘结果数量巨大的问题,最大频繁子图挖掘得到的结果数量很少且不丢失信息,节省了空间和以后的分析工作。基于算法FSG提出了最大频繁子图挖掘算法FSG-MaxGraph;结合节点的度、标记及邻接列表来计算规范编码,提出两个定理来减少子图同构判断的次数,并应用改进后的决策树来计算支持度。实验证明,新算法解决了挖掘结果太多理解困难的问题,且提高了挖掘效率。  相似文献   

11.
已提出很多图分类方法。这些方法在挖掘频繁子图时,只考虑了子图的结构信息,没有考虑子图的嵌入信息。实际上,有些频繁子图挖掘算法在计算子图的支持度时,可以获得嵌入信息。在L-CCAM子图编码的基础上,提出了一种基于嵌入集的图分类方法。该方法采用基于类别信息的特征子图选择策略,充分利用嵌入集,在频繁子图挖掘过程中直接选择特征子图。通过实验表明,该方法是有效的、可行的。  相似文献   

12.
Discovering Frequent Graph Patterns Using Disjoint Paths   总被引:3,自引:0,他引:3  
Whereas data mining in structured data focuses on frequent data values, in semistructured and graph data mining, the issue is frequent labels and common specific topologies. The structure of the data is just as important as its content. We study the problem of discovering typical patterns of graph data, a task made difficult because of the complexity of required subtasks, especially subgraph isomorphism. In this paper, we propose a new apriori-based algorithm for mining graph data, where the basic building blocks are relatively large, disjoint paths. The algorithm is proven to be sound and complete. Empirical evidence shows practical advantages of our approach for certain categories of graphs  相似文献   

13.
从图数据库中挖掘频繁跳跃模式   总被引:4,自引:0,他引:4  
刘勇  李建中  高宏 《软件学报》2010,21(10):2477-2493
很多频繁子图挖掘算法已被提出.然而,这些算法产生的频繁子图数量太多而不能被用户有效地利用.为此,提出了一个新的研究问题:挖掘图数据库中的频繁跳跃模式.挖掘频繁跳跃模式既可以大幅度地减少输出模式的数量,又能使有意义的图模式保留在挖掘结果中.此外,跳跃模式还具有抗噪声干扰能力强等优点.然而,由于跳跃模式不具有反单调性质,挖掘它们非常具有挑战性.通过研究跳跃模式自身的特性,提出了两种新的裁剪技术:基于内扩展的裁剪和基于外扩展的裁剪.在此基础上又给出了一种高效的挖掘算法GraphJP(an algorithm for mining jump patterns from graph databases).另外,还严格证明了裁剪技术和算法GraphJP的正确性.实验结果表明,所提出的裁剪技术能够有效地裁剪图模式搜索空间,算法GraphJP是高效、可扩展的.  相似文献   

14.
ESPM--频繁子树挖掘算法   总被引:13,自引:2,他引:13  
随着互联网的发展,频繁模式的挖掘由频繁项集扩展到结构化数据:树和图.在这些结构上的挖掘工作被应用于更为复杂的领域,比如生物信息学、网络日志和XML文档.提出了一个新颖的算法:ESPM,以挖掘有序标号树中的频繁子树.不同于以往的工作,把树同构的判断工作放到了算法的晚期,从而减少了整个挖掘过程的时间开销.人工数据集和真实数据集上的实验都证明ESPM相较于其他算法的优越性.还提出了一些可能的改进.  相似文献   

15.
频繁子图挖掘是各种图挖掘的基础和瓶颈,为了提高频繁子图挖掘算法的效率,在频繁闭图方法的基础上提出了一种新算法BPCG.首先使用了一种新结构表存储频繁子图集,从而不需扫描图集就可直接扩展最频繁邻接边及计算支持度阈值;然后算法又利用兄弟剪枝策略和删除局部频繁边,缩小搜索空间并减少不必要的操作.通过实验证明,算法优于其他子图挖掘算法.  相似文献   

16.
频繁子图挖掘算法研究   总被引:3,自引:1,他引:2  
图像能表达丰富语义,但增加了数据结构的复杂性和感兴趣子结构的挖掘难度。综合应用图论知识和数据挖掘的各种技术,对图像进行规范化编码,通过连接和扩展操作产生所有候选子图,引用嵌入集概念,计算候选子图的支持度和频繁度。提出频繁子图挖掘算法FSubgraphM,能从图数据库中挖掘频繁导出子图。  相似文献   

17.
子图查询返回图数据集合中所有包含查询图的数据图。在查询图和数据图同时为不确定性图的前提下,提出了不确定图间的期望子图同构定义和α-β子图同构匹配定义。不确定图间的期望子图同构是确定图上子图同构在概率图模型上的直接推广,不确定图间α-β子图同构利用两个限制阈值来衡量查询图和数据图间的匹配质量。文章详细阐述了α-β子图同构匹配的语义特点,分析了其和期望子图同构的联系和差别,设计实现α-β子图同构匹配判定算法。  相似文献   

18.
The frequent connected subgraph mining problem, i.e., the problem of listing all connected graphs that are subgraph isomorphic to at least a certain number of transaction graphs of a database, cannot be solved in output polynomial time in the general case. If, however, the transaction graphs are restricted to forests then the problem becomes tractable. In this paper we generalize the positive result on forests to graphs of bounded tree-width. In particular, we show that for this class of transaction graphs, frequent connected subgraphs can be listed in incremental polynomial time. Since subgraph isomorphism remains NP-complete for bounded tree-width graphs, the positive complexity result of this paper shows that efficient frequent pattern mining is possible even for computationally hard pattern matching operators.  相似文献   

19.
Frequent subgraph mining from a tremendous amount of small graphs is a primitive operation for many data mining applications. Existing approaches mainly focus on centralized systems and suffer from the scalability issue. Consider the increasing volume of graph data and mining frequent subgraphs is a memory-intensive task, it is difficult to tackle this problem on a centralized machine efficiently. In this paper, we therefore propose an efficient and scalable solution, called MRFSE, using MapReduce. MRFSE adopts the breadth-first search strategy to iteratively extract frequent subgraphs, i.e., all frequent subgraphs with \(i+1\) edges are generated based on frequent subgraphs with i edges at the ith iteration. In our design, existing frequent subgraph mining techniques in centralized systems can be easily extended and integrated. More importantly, new frequent subgraphs are generated without performing any isomorphism test which is costly and imperative in existing frequent subgraph mining techniques. Besides, various optimization techniques are proposed to further reduce the communication and I/O cost. Extensive experiments conducted on our in-house clusters demonstrate the superiority of our proposed solution in terms of both scalability and efficiency.  相似文献   

20.
本文以标记有序树作为半结构化数据的数据模型 ,研究了半结构化数据的树状最大频繁模式挖掘问题 .已有挖掘算法通常挖掘所有频繁模式 ,其中很多模式为其它模式的子模式 ,针对该问题 ,设计实现了一种最大模式挖掘算法 .该算法采用最右扩展枚举方法无重复枚举所有候选模式 ,利用频繁模式扩展森林实现高效剪枝扩展和挖掘频繁叶模式 ,通过计算频繁叶模式间的包含关系挖掘树状最大频繁模式 .试验结果表明该算法具有良好性能  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号