首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基于前缀树的高效频繁项集挖掘算法   总被引:3,自引:3,他引:0       下载免费PDF全文
针对频繁项集挖掘时间与空间效率低的问题,提出一种基于前缀树的高效频繁项集挖掘算法,通过对事务集进行预处理,创建索引表并分配索引编号,保证前缀树中事务顺序的一致性,根据索引编号等信息创建紧凑的前缀树,采用自底向上的挖掘与投影的方式挖掘出频繁项集。实验结果表明,该算法挖掘效率高、占用空间少。  相似文献   

2.
关联规则挖掘的主要任务是根据对事务的统计找出项之间的关系。传统的挖掘算法要求项具有逻辑属性,并在挖掘过程中产生大量的中间项集,成为算法的瓶颈。给出一种基于关联路径树的表格数据组织形式,并采用模式指导的方式进行频繁项集挖掘,该方法不要求项具有逻辑属性,初始模式不同的项集组合迭代可以分配到不同的CPU完成,提高了算法的执行效率。该算法对美国1984年国会选举数据进行了实验,结果完全正确。  相似文献   

3.
基于二进制的长频繁项目集挖掘算法   总被引:1,自引:1,他引:0  
结合挖掘长频繁项目集的自顶向下搜索策略,提出一种基于二进制的长频繁项目集挖掘算法.该算法用数值递减搜索策略产生候选项,在用到频繁项目集修剪其子集减少候选项的基础上还通过事务特征减少搜索事务数,并运用二进制的逻辑"与"运算计算支持数,提高了算法的效率.算法分析和实验表明,该算法是有效的、快速的.  相似文献   

4.
基于Apriori改进算法的入侵检测系统的研究   总被引:3,自引:0,他引:3  
通过对经典Apriori算法的思想和性能的分析,针对算法中存在的项集生成瓶颈问题:连接步骤的存在,使空间的复杂度较大,提出了一种去掉连接步骤的非连接Apriori算法.该算法通过去掉频繁项集的自连接方式来降低生成的候选项集个数,从而减少扫描数据库的次数,以优化空间复杂度.实验结果表明,改进算法比经典Apriori算法执行效率明显提高.  相似文献   

5.
While frequent pattern mining is fundamental for many data mining tasks, mining maximal frequent patterns efficiently is important in both theory and applications of frequent pattern mining. The fundamental challenge is how to search a large space of item combinations. Most of the existing methods search an enumeration tree of item combinations in a depth-first manner. In this paper, we develop a new technique for more efficient max-pattern mining. Our method is pattern-aware: it uses the patterns already found to schedule its future search so that many search subspaces can be pruned. We present efficient techniques to implement the new approach. As indicated by a systematic empirical study using the benchmark data sets, our new approach outperforms the currently fastest max-pattern mining algorithms FPMax* and LCM2 clearly. The source code and the executable code (on both Windows and Linux platforms) are publicly available at .  相似文献   

6.
Conventional data mining methods for finding frequent itemsets require considerable computing time to produce their results from a large data set. Due to this reason, it is almost impossible to apply them to an analysis task in an online data stream where a new transaction is continuously generated at a rapid rate. An algorithm for finding frequent itemsets over an online data stream should support flexible trade-off between processing time and mining accuracy. Furthermore, the most up-to-date resulting set of frequent itemsets should be available quickly at any moment. To satisfy these requirements, this paper proposes a data mining method for finding frequent itemsets over an online data stream. The proposed method examines each transaction one-by-one without any candidate generation process. The count of an itemset that appears in each transaction is monitored by a lexicographic tree resided in main memory. The current set of monitored itemsets in an online data stream is minimized by two major operations: delayed-insertion and pruning. The former is delaying the insertion of a new itemset in recent transactions until the itemset becomes significant enough to be monitored. The latter is pruning a monitored itemset when the itemset turns out to be insignificant. The number of monitored itemsets can be flexibly controlled by the thresholds of these two operations. As the number of monitored itemsets is decreased, frequent itemsets in the online data stream are more rapidly traced while they are less accurate. The performance of the proposed method is analyzed through a series of experiments in order to identify its various characteristics.  相似文献   

7.
挖掘空间关联规则的前缀树算法设计与实现   总被引:5,自引:0,他引:5       下载免费PDF全文
空间关联规则挖掘是在空间数据库中进行知识发现的一类重要问题.为此提出了挖掘空间关联规则的二阶段策略,通过多轮次单层布尔型关联规则挖掘,自顶向下逐步细化空间谓词的粒度,从而空间谓词的计算量大大减少.同时,设计了一种基于前缀树的单层布尔型关联规则挖掘算法(FPT-Generate),不需要反复扫描数据库,不产生候选模式集,并在关键优化技术上取得了突破.实验表明,以FPT-Generate为挖掘引擎的空间关联规则发现系统的时间效率与空间可伸缩性远远优于以经典算法Apriori为引擎的系统。  相似文献   

8.
现有大部分微阵列数据中频繁闭合项集的挖掘需要事先给定最小支持度,但在实际应用中该最小支持度很难确定。针对该问题,提出top-k频繁闭合项集挖掘算法,基于自顶向下宽度优先搜索策略挖掘项集长度不小于min_l的top-k频繁闭合项集,并对搜索空间进行有效修剪,从而提高搜索速度。实验结果表明,该算法的时间性能在多数情况下优于CARPENTER算法。  相似文献   

9.
Apriori算法虽然在候选集的产生时利用了剪支技术,但每次扫描数据库时都必须扫描整个数据库,因此扫描的数据量大,速度较慢。Apriori-sort算法是在Apriori算法基础上的改进,基本思想是把事务数据库变为以度表示的事务度数据库,并对事务度数据库进行排序。Apriori-sort算法查找频繁项集时,只扫描数据库Dd中满足dCk)≦dTi)的事务。对扫描数据库进行了有效剪支,因此Apriori-sort算法的计算效率高。并用仿真数据对Apriori-sort算法和Apriori算法进行了仿真对比实验,实验结果证明了新算法的高效性。  相似文献   

10.
The representation of multiple continuous attributes as dimensions in a vector space has been among the most influential concepts in machine learning and data mining. We consider sets of related continuous attributes as vector data and search for patterns that relate a vector attribute to one or more items. The presence of an item set defines a subset of vectors that may or may not show unexpected density fluctuations. We test for fluctuations by studying density histograms. A vector–item pattern is considered significant if its density histogram significantly differs from what is expected for a random subset of transactions. Using two different density measures, we evaluate the algorithm on two real data sets and one that was artificially constructed from time series data.  相似文献   

11.
频繁闭项集提供了频繁项集的一种完整、最小表示,对频繁闭项集的挖掘是近年来数据挖掘领域研究的热点,研究人员从不同角度对算法改进以提高算法的效率。基于频繁项集中共生项集的性质,提出无须进行子集检查的频繁闭项集挖掘方法,并设计一种变异的FP-树结构,利用FP-树结构来存储结点共生项集信息,以改进CLOSET算法,算法无须遍历结果集进行闭合性检查。实验表明,在支持度阈值减小,结果集变大时,改进算法的时间增长率比原有算法小。  相似文献   

12.
Mining high utility itemsets by dynamically pruning the tree structure   总被引:2,自引:2,他引:0  
Mining high utility itemsets is one of the most important research issues in data mining owing to its ability to consider nonbinary frequency values of items in transactions and different profit values for each item. Mining such itemsets from a transaction database involves finding those itemsets with utility above a user-specified threshold. In this paper, we propose an efficient concurrent algorithm, called CHUI-Mine (Concurrent High Utility Itemsets Mine), for mining high utility itemsets by dynamically pruning the tree structure. A tree structure, called the CHUI-Tree, is introduced to capture the important utility information of the candidate itemsets. By recording changes in support counts of candidate high utility items during the tree construction process, we implement dynamic CHUI-Tree pruning, and discuss the rationality thereof. The CHUI-Mine algorithm makes use of a concurrent strategy, enabling the simultaneous construction of a CHUI-Tree and the discovery of high utility itemsets. Our algorithm reduces the problem of huge memory usage for tree construction and traversal in tree-based algorithms for mining high utility itemsets. Extensive experimental results show that the CHUI-Mine algorithm is both efficient and scalable.  相似文献   

13.
基于FP-Tree有效挖掘最大频繁项集   总被引:36,自引:2,他引:36       下载免费PDF全文
最大频繁项集的挖掘过程中,在最小支持度较小的情况下,超集检测是算法的主要耗时操作.提出了最大频繁项集挖掘算法FPMFI(frequent pattern tree for maximal frequent item set)使用基于投影进行超集检测的机制,有效地缩减了超集检测的时间.另外,算法FPMFI通过删除FP子树(conditional frequent pattern tree)的冗余信息,有效地压缩了FP子树的规模,减少了遍历的开销.分析表明,算法FPMFI具有优越性.实验比较说明,在最小支持度较小时,算法FPMFI的性能优于同类算法1倍以上.  相似文献   

14.
By identifying useful knowledge embedded in the behavior of search engines, users can provide valuable information for web searching and data mining. Numerous algorithms have been proposed to find the desired interesting patterns, i.e., frequent pattern, in real-world applications. Most of those studies use frequency to measure the interestingness of patterns. However, each object may have different importance in these real-world applications, and the frequent ones do not usually contain a large portion of the desired patterns. In this paper, we present a novel method, called exploiting highly qualified patterns with frequency and weight occupancy (QFWO), to suggest the possible highly qualified patterns that utilize the idea of co-occurrence and weight occupancy. By considering item weight, weight occupancy and the frequency of patterns, in this paper, we designed a new highly qualified patterns. A novel Set-enumeration tree called the frequency-weight (FW)-tree and two compact data structures named weight-list and FW-table are designed to hold the global downward closure property and partial downward closure property of quality and weight occupancy to further prune the search space. The proposed method can exploit high qualified patterns in a recursive manner without candidate generation. Extensive experiments were conducted both on real-world and synthetic datasets to evaluate the effectiveness and efficiency of the proposed algorithm. Results demonstrate that the obtained patterns are reasonable and acceptable. Moreover, the designed QFWO with several pruning strategies is quite efficient in terms of runtime and search space.  相似文献   

15.
A transaction mapping algorithm for frequent itemsets mining   总被引:1,自引:0,他引:1  
In this paper, we present a novel algorithm for mining complete frequent itemsets. This algorithm is referred to as the TM (transaction mapping) algorithm from hereon. In this algorithm, transaction ids of each itemset are mapped and compressed to continuous transaction intervals in a different space and the counting of itemsets is performed by intersecting these interval lists in a depth-first order along the lexicographic tree. When the compression coefficient becomes smaller than the average number of comparisons for intervals intersection at a certain level, the algorithm switches to transaction id intersection. We have evaluated the algorithm against two popular frequent itemset mining algorithms, FP-growth and dEclat, using a variety of data sets with short and long frequent patterns. Experimental data show that the TM algorithm outperforms these two algorithms.  相似文献   

16.
结合自底向上与自顶向下的搜索策略,提出一种快速发现最大频繁项目集的算法.该算法利用非频繁项目集对候选最大频繁项目集进行剪枝和降维,减少了候选最大频繁项目集的数量,缩小了搜索空间,提高了算法的效率.算法分析和实验表明,该算法是一种有效、快速的算法.  相似文献   

17.
对频繁模式树中的每个节点引入一个位串存储前缀路径,提出了包含正负项目的频繁模式树的构造方法,它不需要反复遍历节点就可获得包含正负项目的频繁项集.与直接使用FPgrowth算法相比,无需对原始数据库进行负项目的扩展,也不用再构造并销毁额外的数据结构,只需在原始的频繁模式树上修改,因而在时空开销上都具有一定的优势.实验表明,所提出的算法比现有的同类挖掘算法和直接FPgrowth算法具有更好的效率.  相似文献   

18.
We present an algorithm for frequent item set mining that identifies high-utility item combinations. In contrast to the traditional association rule and frequent item mining techniques, the goal of the algorithm is to find segments of data, defined through combinations of few items (rules), which satisfy certain conditions as a group and maximize a predefined objective function. We formulate the task as an optimization problem, present an efficient approximation to solve it through specialized partition trees, called High-Yield Partition Trees, and investigate the performance of different splitting strategies. The algorithm has been tested on “real-world” data sets, and achieved very good results.  相似文献   

19.
采用频繁项目链表变换的频繁项目集挖掘算法   总被引:1,自引:0,他引:1  
频繁项目集的产生是关联规则挖掘的关键问题,经典的关联规则挖掘算法是通过对事务数据库的多次扫描实现的.最新的研究已经开始探索合适的数据结构以支持进行极少次数的事务数据库的扫描,进而减少关联规则挖掘过程中巨大的I/O开销以获得更高的效率.文中利用频繁项目链表的数据结构,给出了一种仅需扫描两次事务数据库的关联规则挖掘算法 ,称为FILLT算法.该算法采取分而治之策略,对频繁项目链表实施分割、变换来进行关联规则挖掘.文中最后对这一算法的效率进行了理论分析和实验验证.  相似文献   

20.
在关联规则挖掘算法中,Apriori由于多次对数据库进行扫描会产生较多的候选集,在多次扫描数据库的情况下容易产生I/O开销问题,并引起数据挖掘效率低.矩阵关联规则在数据挖掘过程中没有删除非频繁项集,致使存在较多的无效扫描,对于挖掘效率的提高也不明显.该文提出了一种改进的矩阵和排序索引关联规则数据挖掘算法,首先,删除不需...  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号