共查询到18条相似文献,搜索用时 140 毫秒
1.
鉴于图结构能简单方便地描绘复杂的数据以及实际应用中图数据的获得具有不确定性,不确定频繁子图挖掘算法得到广泛的研究。目前一个典型的图挖掘算法是MUSE,但MUSE算法存在期望支持度计算消耗大、时间效率不够高等问题。针对此问题提出了一种基于划分思想混合搜索策略的不确定子图挖掘算法EDFS,它用改进过的GSpan算法进行不确定的子图数据预处理,用裁剪子图模式的搜索空间裁剪不确定子图数据,用基于划分思想的混合策略进行频繁子图的挖掘。子图同构与边存在概率的实验结果证明了EDFS算法能更高效地挖掘出不确定数据频繁子图。 相似文献
2.
3.
4.
5.
针对大数据时代的图挖掘算法中必须避免进行子图同构检测的问题,采用社会网络中的信息传播模型研究在单个大图中挖掘近邻频繁模式.首先计算节点标号对邻居节点的关联强度,运行联合概率分布来计算节点标号集合的概率支持度,以概率支持度为判断标准,运用改进的逆矩阵+共生频繁项树(COFI-树)挖掘算法对每个节点的标号构成的项集组成的事务数据集进行频繁项集挖掘.实验分析结果显示,该方法快过传统的单个大图频繁子图挖掘算法,返回的结果也多过频繁子图挖掘算法,并且可以发现一些传统频繁子图挖掘算法发现不了的有趣模式.而且与基于FP-树的频繁模式挖掘算法相比,逆矩阵+COFI-树能够支持大规模数据集,对内存利用效率较高. 相似文献
6.
7.
如何从大量的图中挖掘出令人感兴趣的子图模式已经成为数据挖掘领域研究的热点之一。传统的频繁子图挖掘方法对满足最小支持度阈值的子图同等对待,但在真实数据库中不同的子图往往具有不同的重要程度。为解决上述问题,提出了一种深度优先的挖掘加权最大频繁子图的新算法。首先给出了一种新的用于计算图的邻接矩阵规范编码的结点排序策略,大大降低了求图规范编码的复杂度,并可以加速子图规范编码匹配的速度。其次,给出了加权最大频繁子图的定义,不仅可以找出较为重要的最大频繁子图,而且可以使挖掘结果同样具有反单调性,从而可加速剪枝。实验结果表明,提出的算法不仅可以有效地减少挖掘结果的数量,而且具有较高的效率。 相似文献
8.
9.
图模式广泛应用于构建高效图分类模型的特征空间识别.协同图模式是一种内部节点高度相关的图结构,与普通图模式相比,协同图模式具有更高的区分能力,从而更加适用于分类模型的特征选择.文中研究了从二分类图中挖掘非冗余协同图模式的问题,通过限制协同图模式的区分能力远远高于其所有子图模式的非冗余性质,大幅度减少了挖掘结果的数量,同时保留了具有强区分能力的协同图模式.由于协同图模式理论上必须检测其所有子图是否满足约束条件,挖掘它们非常具有计算挑战性.基于非冗余协同图模式的多种特性,提出相对应的削减规则;通过对区分能力的边界估计,提出两个快速检测非冗余协同图模式方法,在此基础上给出了一种高效的深度优先挖掘算法 GINS.大量真实与合成数据集上的实验结果表明,GINS 算法明显优于其他两个代表性算法,作为图分类模型的分类特征时,非冗余协同图模式获得了较高的分类精度. 相似文献
10.
11.
从图数据库中挖掘频繁跳跃模式 总被引:4,自引:0,他引:4
很多频繁子图挖掘算法已被提出.然而,这些算法产生的频繁子图数量太多而不能被用户有效地利用.为此,提出了一个新的研究问题:挖掘图数据库中的频繁跳跃模式.挖掘频繁跳跃模式既可以大幅度地减少输出模式的数量,又能使有意义的图模式保留在挖掘结果中.此外,跳跃模式还具有抗噪声干扰能力强等优点.然而,由于跳跃模式不具有反单调性质,挖掘它们非常具有挑战性.通过研究跳跃模式自身的特性,提出了两种新的裁剪技术:基于内扩展的裁剪和基于外扩展的裁剪.在此基础上又给出了一种高效的挖掘算法GraphJP(an algorithm for mining jump patterns from graph databases).另外,还严格证明了裁剪技术和算法GraphJP的正确性.实验结果表明,所提出的裁剪技术能够有效地裁剪图模式搜索空间,算法GraphJP是高效、可扩展的. 相似文献
12.
In this paper, we propose an efficient graph-based mining (GBM) algorithm for mining the frequent trajectory patterns in a spatial-temporal database. The proposed method comprises two phases. First, we scan the database once to generate a mapping graph and trajectory information lists (TI-lists). Then, we traverse the mapping graph in a depth-first search manner to mine all frequent trajectory patterns in the database. By using the mapping graph and TI-lists, the GBM algorithm can localize support counting and pattern extension in a small number of TI-lists. Moreover, it utilizes the adjacency property to reduce the search space. Therefore, our proposed method can efficiently mine the frequent trajectory patterns in the database. The experimental results show that it outperforms the Apriori-based and PrefixSpan-based methods by more than one order of magnitude. 相似文献
13.
频繁子图挖掘是图挖掘的一个重要研究课题.gSpan算法作为一种高效的子图挖掘算法具有较好的执行效率,它通过最右扩展生成频繁子图,但不能保证每次扩展得到的均为标准编码.针对此问题本文提出了一种改进的算法CSGM,它采用ADI++存储结构,能处理更大规模的图集,同时保证每次最右扩展均生成标准编码,既避免了对非标准编码图的支持度计算,也避免了对输入编码是否为标准编码的计算.在实际数据集上运行的实验结果表明它比原算法提高了挖掘效率. 相似文献
14.
不同时刻的动态网络往往具有不同权重,针对加权动态网络的频繁模式挖掘,提出一种挖掘算法WGDM,它适用于加权动态社会网络、生物网络等方面的频繁模式挖掘。WGDM算法利用支持度的反单调性裁剪搜索空间,从而减少冗余候选子图,提高算法效率。通过实验测试了WGDM算法的性能,并根据中国实际股票市场网络,利用WGDM算法挖掘股票市场网络中有趣的频繁模式。 相似文献
15.
Over the past era, subgraph mining from a large collection of graph database is a crucial problem. In addition, scalability is another big problem due to insufficient storage. There are several security challenges associated with subgraph mining in today’s on-demand system. To address this downside, our proposed work introduces a Blockchain-based Consensus algorithm for Authenticated query search in the Large-Scale Dynamic Graphs (BCCA-LSDG). The two-fold process is handled in the proposed BCCA-LSDG: graph indexing and authenticated query search (query processing). A blockchain-based reputation system is meant to maintain the trust blockchain and cloud server of the proposed architecture. To resolve the issues and provide safe big data transmission, the proposed technique also combines blockchain with a consensus algorithm architecture. Security of the big data is ensured by dividing the BC network into distinct networks, each with a restricted number of allowed entities, data kept in the cloud gate server, and data analysis in the blockchain. The consensus algorithm is crucial for maintaining the speed, performance and security of the blockchain. Then Dual Similarity based MapReduce helps in mapping and reducing the relevant subgraphs with the use of optimal feature sets. Finally, the graph index refinement process is undertaken to improve the query results. Concerning query error, fuzzy logic is used to refine the index of the graph dynamically. The proposed technique outperforms advanced methodologies in both blockchain and non-blockchain systems, and the combination of blockchain and subgraph provides a secure communication platform, according to the findings. 相似文献
16.
Graph classification has been showing critical importance in a wide variety of applications, e.g. drug activity predictions
and toxicology analysis. Current research on graph classification focuses on single-label settings. However, in many applications,
each graph data can be assigned with a set of multiple labels simultaneously. Extracting good features using multiple labels
of the graphs becomes an important step before graph classification. In this paper, we study the problem of multi-label feature
selection for graph classification and propose a novel solution, called gMLC, to efficiently search for optimal subgraph features
for graph objects with multiple labels. Different from existing feature selection methods in vector spaces that assume the
feature set is given, we perform multi-label feature selection for graph data in a progressive way together with the subgraph
feature mining process. We derive an evaluation criterion to estimate the dependence between subgraph features and multiple
labels of graphs. Then, a branch-and-bound algorithm is proposed to efficiently search for optimal subgraph features by judiciously
pruning the subgraph search space using multiple labels. Empirical studies demonstrate that our feature selection approach
can effectively boost multi-label graph classification performances and is more efficient by pruning the subgraph search space
using multiple labels. 相似文献
17.
图挖掘是数据挖掘的一个重要研究方向,而图挖掘主要集中在图数据集内频繁子图的挖掘。频繁子图挖掘技术的关键是建立有效机制减少冗余候选子图,以便高效计算和处理所需的频繁子图。提出了一种基于路径的频繁子图挖掘算法,该算法首先找出所有频繁边从而挖掘出频繁单路径,然后通过组合、双射和操作扩展出较多的频繁路径,再通过连接操作产生所有频繁子图候选集。通过定理证明了该算法的正确性和完整性,从理论上分析了该算法时间复杂度低于现有的算法,最后进行了2个图数据集实验,在候选集产生的数量和时间性能2方面验证了算法的优越性。 相似文献
18.
The existing methods for graph-based data mining (GBDM) follow the basic approach of applying a single-objective search with a user-defined threshold to discover interesting subgraphs. This obliges the user to deal with simple thresholds and impedes her/him from evaluating the mined subgraphs by defining different “goodness” (i.e., multiobjective) criteria regarding the characteristics of the subgraphs. In previous papers, we defined a multiobjective GBDM framework to perform bi-objective graph mining in terms of subgraph support and size maximization. Two different search methods were considered with this aim, a multiobjective beam search and a multiobjective evolutionary programming (MOEP). In this contribution, we extend the latter formulation to a three-objective framework by incorporating another classical graph mining objective, the subgraph diameter. The proposed MOEP method for multiobjective GBDM is tested on five synthetic and real-world datasets and its performance is compared against single and multiobjective subgraph mining approaches based on the classical Subdue technique in GBDM. The results highlight the application of multiobjective subgraph mining allows us to discover more diversified subgraphs in the objective space. 相似文献