首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 140 毫秒
1.
Constraint-based sequential pattern mining: the pattern-growth methods   总被引:4,自引:0,他引:4  
Constraints are essential for many sequential pattern mining applications. However, there is no systematic study on constraint-based sequential pattern mining. In this paper, we investigate this issue and point out that the framework developed for constrained frequent-pattern mining does not fit our mission well. An extended framework is developed based on a sequential pattern growth methodology. Our study shows that constraints can be effectively and efficiently pushed deep into the sequential pattern mining under this new framework. Moreover, this framework can be extended to constraint-based structured pattern mining as well. This research is supported in part by NSERC Grant 312194-05, NSF Grants IIS-0308001, IIS-0513678, BDI-0515813 and National Science Foundation of China (NSFC) grants No. 60303008 and 69933010. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.  相似文献   

2.
韩萌  丁剑 《计算机应用》2019,39(3):719-727
一些先进应用如欺诈检测和趋势学习等带来了数据流频繁模式挖掘的发展。不同于静态数据,数据流挖掘面临着时空约束和项集组合爆炸等问题。对已有数据流频繁模式挖掘算法进行综述并对经典和最新算法进行分析。按照模式集合的完整程度进行分类,数据流中频繁模式分为全集模式和压缩模式。压缩模式主要包括闭合模式、最大模式、top-k模式以及三者的组合模式。不同之处是闭合模式是无损压缩的,而其他模式是有损压缩的。为了得到有趣的频繁模式,可以挖掘基于用户约束的模式。为了处理数据流中的新近事务,将算法分为基于窗口模型和基于衰减模型的方法。数据流中模式挖掘常见的还包含序列模式和高效用模式,对经典和最新算法进行介绍。最后给出了数据流模式挖掘的下一步工作。  相似文献   

3.
数据流频繁模式挖掘研究进展   总被引:21,自引:3,他引:21  
现实世界和工程实践产生了大量的数据流,这种数据不同于传统的静态数据,对其进行有效处理和挖掘遇到了极大的挑战.如何使用有限存储空间进行快速和近似的频繁模式挖掘是数据流挖掘的基本问题,具有非常重要的研究价值和实践意义,已经引起了国内外研究者的广泛关注.本文深入分析数据流中的频繁模式挖掘,对其特点和算法进行较为全面的总结和分类论述,并讨论了存在的主要问题和未来的研究方向.  相似文献   

4.
频繁模式挖掘是最重要的数据挖掘任务之一,传统的频繁模式挖掘算法是以"批处理"方式执行的,即一次性对所有数据进行挖掘,无法满足不断增长的大数据挖掘的需要。MapReduce是一种流行的并行计算模式,在并行数据挖掘领域已得到了广泛的应用。将传统频繁模式增量挖掘算法CanTree向MapReduce计算模型进行了迁移,实现了并行的频繁模式增量挖掘。实验结果表明,提出的算法实现了较好的负载均衡,执行效率有明显提升。  相似文献   

5.
Computing the minimum-support for mining frequent patterns   总被引:4,自引:4,他引:0  
Frequent pattern mining is based on the assumption that users can specify the minimum-support for mining their databases. It has been recognized that setting the minimum-support is a difficult task to users. This can hinder the widespread applications of these algorithms. In this paper we propose a computational strategy for identifying frequent itemsets, consisting of polynomial approximation and fuzzy estimation. More specifically, our algorithms (polynomial approximation and fuzzy estimation) automatically generate actual minimum-supports (appropriate to a database to be mined) according to users’ mining requirements. We experimentally examine the algorithms using different datasets, and demonstrate that our fuzzy estimation algorithm fittingly approximates actual minimum-supports from the commonly-used requirements. This work is partially supported by Australian ARC grants for discovery projects (DP0449535, DP0559536 and DP0667060), a China NSF Major Research Program (60496327), a China NSF grant (60463003), an Overseas Outstanding Talent Research Program of the Chinese Academy of Sciences (06S3011S01), and an Overseas-Returning High-level Talent Research Program of China Human-Resource Ministry. A preliminary and shortened version of this paper has been published in the Proceedings of the 8th Pacific Rim International Conference on Artificial Intelligence (PRICAI ’04).  相似文献   

6.
序列模式挖掘就是在时序数据库中挖掘相对时间或其他模式出现频率高的模式.序列模式发现是最重要的数据挖掘任务之一,并有着广阔的应用前景.针对静态数据库,序列模式挖掘已经被深入的研究.近年来,出现了一种新的数据形式:数据流.针对基于数据流的序列模式挖掘的研究还不是十分深入.提出一个有效的基于数据流的挖掘频繁序列模式的算法SSPM,利用到2个数据结构(F-list和Tatree)来处理基于数据流的序列模式挖掘的复杂性问题.SSPM的优点是可以最大限度地降低负正例的产生,实验表明SSPM具有较高的准确率.  相似文献   

7.
频繁模式挖掘在数据挖掘领域已经有广泛的应用.然而,对于增量更新频繁模式挖掘研究得不是很多.本文提出了一种新颖的增量更新频繁模式树结构(IUNP_Tree),构建它只需要对数据库扫描一次.此外,提出了基于条件矩阵(conditional matrix)的频繁模式挖掘算法(FPBM_Mine)和增量更新算法INUPA,可以有效地处理数据库的增量更新问题.实验表明,该算法是有效的,并且运行效率高于FP-growth算法.  相似文献   

8.
序列模式发现是最重要的数据挖掘任务之一,并有着广阔的应用前景。针对静态数据库,序列模式挖掘已经被深入地研究,但针对基于数据流的序列模式挖掘的研究还不是十分深入。数据流有着无限性的特性,因此往往不能保存数据流中全部的数据,同时很多时候只对最近的时间段的序列模式感兴趣,提出一个有效的结合滑动窗口技术的挖掘序列模式的算法FPM-SW,算法利用到3个数据结构(PatternTable,CountTable和Ta-tree)来处理基于数据流的序列模式挖掘的复杂性问题。算法通过CountTable结构来保存以往的潜在频繁序列,考虑到在某些情况下CountTable占用内存过多,算法还结合了一种压缩CountTable技术来减少内存占用。FPM-SW的优点是可以最大限度地降低负正例的产生,实验表明FPM-SW具有较高的准确率。  相似文献   

9.
As a core area in data mining, frequent pattern (or itemset) mining has been studied for a long time. Weighted frequent pattern mining prunes unimportant patterns and maximal frequent pattern mining discovers compact frequent patterns. These approaches contribute to improving mining performance by reducing the search space. However, we need to consider both the downward closure property and patterns' subset checking process when integrating these different methods in order to prevent unintended pattern losses. Moreover, it is also essential to extract valid patterns with faster runtime and less memory consumption. For this reason, in this paper, we propose more efficient maximal weighted frequent pattern (MWFP) mining approaches based on tree and array structures. We describe how to handle these problems more efficiently, maintaining the correctness of our method. We develop two types of maximal weighted frequent mining algorithms based on weight ascending order and support descending order and compare these two algorithms to conclude which is more suitable for MWFP mining. In addition, comprehensive tests in this paper show that our algorithms are more efficient and scalable than state‐of‐the‐art algorithms, and they also have the correctness of the MWFP mining in terms of their pattern generation results.  相似文献   

10.
程转流  王本年 《微机发展》2007,17(12):53-55
近年来,数据流挖掘越来越引起研究人员的关注,已逐渐成为许多领域有用的工具。如何利用有限的存储空间高效地挖掘出频繁模式已成为数据流挖掘的基本问题,具有很强的现实意义和理论价值。在论述数据流管理系统模型的基础上,深入分析了国内外的各种频繁模式挖掘算法,并指出这些算法的特点及其局限性。最后对未来的研究方向进行了展望。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号