首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
High utility pattern (HUP) mining is one of the most important research issues in data mining. Although HUP mining extracts important knowledge from databases, it requires long calculations and multiple database scans. Therefore, HUP mining is often unsuitable for real-time data processing schemes such as data streams. Furthermore, many HUPs may be unimportant due to the poor correlations among the items inside of them. Hence,the fast discovery of fewer but more important HUPs would be very useful in many practical domains. In this paper, we propose a novel framework to introduce a very useful measure, called frequency affinity, among the items in a HUP and the concept of interesting HUP with a strong frequency affinity for the fast discovery of more applicable knowledge. Moreover, we propose a new tree structure, utility tree based on frequency affinity (UTFA), and a novel algorithm, high utility interesting pattern mining (HUIPM), for single-pass mining of HUIPs from a database. Our approach mines fewer but more valuable HUPs, significantly reduces the overall runtime of existing HUP mining algorithms and is applicable to real-time data processing. Extensive performance analyses show that the proposed HUIPM algorithm is very efficient and scalable for interesting HUP mining with a strong frequency affinity.  相似文献   

2.
韩萌  丁剑 《计算机应用》2019,39(3):719-727
一些先进应用如欺诈检测和趋势学习等带来了数据流频繁模式挖掘的发展。不同于静态数据,数据流挖掘面临着时空约束和项集组合爆炸等问题。对已有数据流频繁模式挖掘算法进行综述并对经典和最新算法进行分析。按照模式集合的完整程度进行分类,数据流中频繁模式分为全集模式和压缩模式。压缩模式主要包括闭合模式、最大模式、top-k模式以及三者的组合模式。不同之处是闭合模式是无损压缩的,而其他模式是有损压缩的。为了得到有趣的频繁模式,可以挖掘基于用户约束的模式。为了处理数据流中的新近事务,将算法分为基于窗口模型和基于衰减模型的方法。数据流中模式挖掘常见的还包含序列模式和高效用模式,对经典和最新算法进行介绍。最后给出了数据流模式挖掘的下一步工作。  相似文献   

3.
基于滑动窗口的数据流频繁闭项集挖掘   总被引:2,自引:1,他引:1       下载免费PDF全文
李俊  杨天奇 《计算机工程》2009,35(13):37-39
针对数据流的特点,根据Moment算法提出一种基于频繁闭项集挖掘的增量式维护算法。该算法通过滑动窗口增量更新数据流中的事务,采取一种高效的项的位序列表示方法降低窗口滑动的时问和空间复杂度,应用压缩的模式树进行频繁闭项集检查,以确保挖掘结果的准确性。实验证明了该方法的有效性。  相似文献   

4.
Recently, high utility pattern (HUP) mining is one of the most important research issues in data mining due to its ability to consider the nonbinary frequency values of items in transactions and different profit values for every item. On the other hand, incremental and interactive data mining provide the ability to use previous data structures and mining results in order to reduce unnecessary calculations when a database is updated, or when the minimum threshold is changed. In this paper, we propose three novel tree structures to efficiently perform incremental and interactive HUP mining. The first tree structure, Incremental HUP Lexicographic Tree ({rm IHUP}_{{rm {L}}}-Tree), is arranged according to an item's lexicographic order. It can capture the incremental data without any restructuring operation. The second tree structure is the IHUP Transaction Frequency Tree ({rm IHUP}_{{rm {TF}}}-Tree), which obtains a compact size by arranging items according to their transaction frequency (descending order). To reduce the mining time, the third tree, IHUP-Transaction-Weighted Utilization Tree ({rm IHUP}_{{rm {TWU}}}-Tree) is designed based on the TWU value of items in descending order. Extensive performance analyses show that our tree structures are very efficient and scalable for incremental and interactive HUP mining.  相似文献   

5.
Frequent pattern mining in data streams is an important research topic in the data mining community. In previous studies, a minimum support threshold was assumed to be available for mining frequent patterns. However, setting such a threshold is typically difficult. Hence, it is more reasonable to ask users to set a bound on the result size. The present study considers mining top-k frequent patterns from data streams using a sliding window technique. A single-pass algorithm, called MSWTP, is developed for the generation of top-k frequent patterns without a threshold. In the method, the content of the transactions in the sliding window is incrementally maintained in a summary data structure, named SWTP-tree, by scanning the stream only once. To make the mining operation efficient, insignificant patterns are distinguished from others by applying the Chernoff bound. Two kinds of obsolete pattern and one kind of insignificant pattern are periodically pruned from the pattern tree. Whenever necessary, the k most frequent patterns can be selected from SWTP-tree in order of their descending frequency. The performance of the proposed technique is evaluated via simulation experiments. The results show that the proposed method is both efficient and scalable, and that it outperforms comparable algorithms.  相似文献   

6.
Processing changeable data streams in real time is one of the most important issues in the data mining field due to its broad applications such as retail market analysis, wireless sensor networks, and stock market prediction. In addition, it is an interesting and challenging problem to deal with the stream data since not only the data have unbounded, continuous, and high speed characteristics but also their environments have limited resources. High utility pattern mining, meanwhile, is one of the essential research topics in pattern mining to overcome major drawbacks of the traditional framework for frequent pattern mining that takes only binary databases and identical item importance into consideration. This approach conducts mining processes by reflecting characteristics of real world databases, non-binary quantities and relative importance of items. Although relevant algorithms were proposed for finding high utility patterns in stream environments, they suffer from a level-wise candidate generation-and-test and a large number of candidates by their overestimation techniques. As a result, they consume a huge amount of execution time, which is a significant performance issue since a rapid process is necessary in stream data analysis. In this paper, we propose an algorithm for mining high utility patterns from resource-limited environments through efficient processing of data streams in order to solve the problems of the overestimation-based methods. To improve mining performance with fewer candidates and search space than the previous ones, we develop two techniques for reducing overestimated utilities. Moreover, we suggest a tree-based data structure to maintain information of stream data and high utility patterns. The proposed tree is restructured by our updating method with decreased overestimation utilities to keep up-to-date stream information whenever the current window slides. Our approach also has an important effect on expert and intelligent systems in that it can provide users with more meaningful information than traditional analysis methods by reflecting the characteristics of real world non-binary databases in stream environments and emphasizing on recent data. Comprehensive experimental results show that our algorithm outperforms the existing sliding window-based one in terms of runtime efficiency and scalability.  相似文献   

7.
数据流高效用模式挖掘方法是以二进制的频繁模式挖掘方法为前提,引入项的内部效用和外部效用,在模式挖掘过程中可以考虑项的重要性,从而挖掘更有价值的模式。从关键窗口技术、常用方法、表示形式等角度对数据流高效用模式挖掘方法进行分析并总结其相关算法,从而研究其特点、优势、劣势以及其关键问题所在。具体来说,说明了数据流高效用模式常用的概念;对处理数据流高效用模式的关键窗口技术进行了分析,涉及到滑动、衰减、界标和倾斜窗口模型;研究了一阶段和两阶段的数据流高效用模式挖掘方法;分析了高效用模式的表示形式,即完全高效用模式和压缩高效用模式;介绍了其他的数据流高效用模式,包括序列高效用模式、混合高效用模式以及高平均效用模式等;最后展望了数据流高效用模式挖掘的进一步研究方向。  相似文献   

8.
High-utility pattern mining (HUPM) is an emerging topic in recent years instead of association-rule mining to discover more interesting and useful information for decision making. Many algorithms have been developed to find high-utility patterns (HUPs) from quantitative databases without considering timestamp of patterns, especially in recent intervals. A pattern may not be a HUP in an entire database but may be a HUP in recent intervals. In this paper, a new concept namely up-to-date high-utility pattern (UDHUP) is designed. It considers not only utility measure but also timestamp factor to discover the recent HUPs. The UDHUP-apriori is first proposed to mine UDHUPs in a level-wise way. Since UDHUP-apriori uses Apriori-like approach to recursively derive UDHUPs, a second UDHUP-list algorithm is then presented to efficiently discover UDHUPs based on the developed UDU-list structures and a pruning strategy without candidate generation, thus speeding up the mining process. A flexible minimum-length strategy with two specific lifetimes is also designed to find more efficient UDHUPs based on a users’ specification. Experiments are conducted to evaluate the performance of the proposed two algorithms in terms of execution time, memory consumption, and number of generated UDHUPs in several real-world and synthetic datasets.  相似文献   

9.
大数据环境下高效用项集挖掘算法中过多的候选项集极大地降低了算法的时空效率,提出了一种减少候选项集的数据流高效用项集挖掘算法。首先,通过数据流中当前窗口的一次扫描建立一个全局树,并降低全局树中头表入口与节点的冗余效用值;然后,基于全局树生成候选模式,基于增长算法降低局部树的候选项集效用;最终,从候选模式中选出高效用模式。基于真实数据流的实验结果表明,本算法的时空效率与内存占用比均优于其他数据流的高效用模式挖掘算法。  相似文献   

10.
Mining utility itemsets from data steams is one of the most interesting research issues in data mining and knowledge discovery. In this paper, two efficient sliding window-based algorithms, MHUI-BIT (Mining High-Utility Itemsets based on BITvector) and MHUI-TID (Mining High-Utility Itemsets based on TIDlist), are proposed for mining high-utility itemsets from data streams. Based on the sliding window-based framework of the proposed approaches, two effective representations of item information, Bitvector and TIDlist, and a lexicographical tree-based summary data structure, LexTree-2HTU, are developed to improve the efficiency of discovering high-utility itemsets with positive profits from data streams. Experimental results show that the proposed algorithms outperform than the existing approaches for discovering high-utility itemsets from data streams over sliding windows. Beside, we also propose the adapted approaches of algorithms MHUI-BIT and MHUI-TID in order to handle the case when we are interested in mining utility itemsets with negative item profits. Experiments show that the variants of algorithms MHUI-BIT and MHUI-TID are efficient approaches for mining high-utility itemsets with negative item profits over stream transaction-sensitive sliding windows.  相似文献   

11.
As data have been accumulated more quickly in recent years, corresponding databases have also become huger, and thus, general frequent pattern mining methods have been faced with limitations that do not appropriately respond to the massive data. To overcome this problem, data mining researchers have studied methods which can conduct more efficient and immediate mining tasks by scanning databases only once. Thereafter, the sliding window model, which can perform mining operations focusing on recently accumulated parts over data streams, was proposed, and a variety of mining approaches related to this have been suggested. However, it is hard to mine all of the frequent patterns in the data stream environment since generated patterns are remarkably increased as data streams are continuously extended. Thus, methods for efficiently compressing generated patterns are needed in order to solve that problem. In addition, since not only support conditions but also weight constraints expressing items’ importance are one of the important factors in the pattern mining, we need to consider them in mining process. Motivated by these issues, we propose a novel algorithm, weighted maximal frequent pattern mining over data streams based on sliding window model (WMFP-SW) to obtain weighted maximal frequent patterns reflecting recent information over data streams. Performance experiments report that MWFP-SW outperforms previous algorithms in terms of runtime, memory usage, and scalability.  相似文献   

12.
概念漂移数据流挖掘算法综述   总被引:1,自引:0,他引:1  
丁剑  韩萌  李娟 《计算机科学》2016,43(12):24-29, 62
数据流是一种新型的数据模型,具有动态、无限、高维、有序、高速和变化等特性。在真实的数据流环境中,一些数据分布是随着时间改变的,即具有概念漂移特征,称为可变数据流或概念漂移数据流。因此处理数据流模型的方法需要处理时空约束和自适应调整概念变化。对概念漂移问题和概念漂移数据流分类、聚类和模式挖掘等内容进行综述。首先介绍概念漂移的类型和常用概念改变检测方法。为了解决概念漂移问题,数据流挖掘中常使用滑动窗口模型对新近事务进行处理。数据流分类常用的模型包括单分类模型和集成分类模型,常用的方法包括决策树、分类关联规则等。数据流聚类方式通常包括基于k- means的和非基于k- means的。模式挖掘可以为分类、聚类和关联规则等提供有用信息。概念漂移数据流中的模式包括频繁模式、序列模式、episode、模式树、模式图和高效用模式等。最后详细介绍其中的频繁模式挖掘算法和高效用模式挖掘算法。  相似文献   

13.
Recently, high utility pattern mining (HUPM) has been extensively studied. Many approaches for HUPM have been proposed in recent years, but most of them aim at mining HUPs without any consideration for their frequency. This has the major drawback that any combination of a low utility item with a very high utility pattern is regarded as a HUP, even if this combination has low affinity and contains items that rarely co-occur. Thus, frequency should be a key criterion to select HUPs. To address this issue, and derive high utility interesting patterns (HUIPs) with strong frequency affinity, the HUIPM algorithm was proposed. However, it recursively constructs a series of conditional trees to produce candidates and then derive the HUIPs. This procedure is time-consuming and may lead to a combinatorial explosion when the minimum utility threshold is set relatively low. In this paper, an efficient algorithm named fast algorithm for mining discriminative high utility patterns (DHUPs) with strong frequency affinity (FDHUP) is proposed to efficiently discover DHUPs by considering both the utility and frequency affinity constraints. Two compact structures named EI-table and FU-tree and three pruning strategies are introduced in the proposed algorithm to reduce the search space, and efficiently and effectively discover DHUPs. An extensive experimental study shows that the proposed FDHUP algorithm considerably outperforms the state-of-the-art HUIPM algorithm in terms of execution time, memory consumption, and scalability.  相似文献   

14.
Sliding window-based frequent pattern mining over data streams   总被引:2,自引:0,他引:2  
Finding frequent patterns in a continuous stream of transactions is critical for many applications such as retail market data analysis, network monitoring, web usage mining, and stock market prediction. Even though numerous frequent pattern mining algorithms have been developed over the past decade, new solutions for handling stream data are still required due to the continuous, unbounded, and ordered sequence of data elements generated at a rapid rate in a data stream. Therefore, extracting frequent patterns from more recent data can enhance the analysis of stream data. In this paper, we propose an efficient technique to discover the complete set of recent frequent patterns from a high-speed data stream over a sliding window. We develop a Compact Pattern Stream tree (CPS-tree) to capture the recent stream data content and efficiently remove the obsolete, old stream data content. We also introduce the concept of dynamic tree restructuring in our CPS-tree to produce a highly compact frequency-descending tree structure at runtime. The complete set of recent frequent patterns is obtained from the CPS-tree of the current window using an FP-growth mining technique. Extensive experimental analyses show that our CPS-tree is highly efficient in terms of memory and time complexity when finding recent frequent patterns from a high-speed data stream.  相似文献   

15.
The FP-growth algorithm using the FP-tree has been widely studied for frequent pattern mining because it can dramatically improve performance compared to the candidate generation-and-test paradigm of Apriori. However, it still requires two database scans, which are not consistent with efficient data stream processing. In this paper, we present a novel tree structure, called CP-tree (compact pattern tree), that captures database information with one scan (insertion phase) and provides the same mining performance as the FP-growth method (restructuring phase). The CP-tree introduces the concept of dynamic tree restructuring to produce a highly compact frequency-descending tree structure at runtime. An efficient tree restructuring method, called the branch sorting method, that restructures a prefix-tree branch-by-branch, is also proposed in this paper. Moreover, the CP-tree provides full functionality for interactive and incremental mining. Extensive experimental results show that the CP-tree is efficient for frequent pattern mining, interactive, and incremental mining with a single database scan.  相似文献   

16.
Online mining of path traversal patterns from continuous Web click streams is one of the challenging research problems of Web usage mining. Most of previous works focus on mining path traversal patterns over the entire history of Web click streams. Mining the recent changes of Web click streams can provide valuable information for the analysis of the Web click streams. In this paper, we propose a new, online mining algorithm, called Top-DSW (top-k path traversal patterns of stream Damped Sliding Window), to discover the set of top-k path traversal patterns from streaming maximal forward references, where k is the desired number of path traversal patterns to be mined. An effective summary data structure, called TKP-DSW-list (a list of top-k path traversal patterns of stream Damped Sliding Windows) is developed to maintain the essential information about the top-k path traversal patterns from the maximal forward references within a stream damped sliding window. An effective space pruning mechanism, called TKR-list-maintain, is developed to control the memory requirement of the TKP-DSW-list. Experimental studies show that the proposed Top-DSW algorithm is an efficient, single-pass algorithm for online mining of the set of top-k path traversal patterns over stream damped sliding windows.  相似文献   

17.
张月琴  陈东 《计算机工程》2010,36(22):86-87
提出基于事务矩阵挖掘最大频繁项集的方法AFMI,该方法采取迭代精简事务矩阵的方式求解所有事务中的最大频繁项集,从精简后的事务向量交集的子集中搜索最大频繁项集,并运用逻辑运算和剪枝方法提高挖掘效率。基于AFMI方法,研究挖掘滑动窗口数据流最大频繁项集算法AFMI+,该算法可使用户周期性地挖掘当前窗口中的最大频繁项集。实验结果表明,AFMI和AFMI+算法均具有较好的性能。  相似文献   

18.
Effective and efficient mining of music structure patterns from music query data is one of the most interesting issues of multimedia data mining. In this paper, we introduce a new kind of pattern, called emerging melody structure (EMS), for knowledge discovery from music melody streams. EMSs are defined as music data items with melody strings whose support increase significantly from one sliding window to another window from streaming melody sequences. The discovered EMS can be used to predict the future trend of online music style recommendation, to personalize the Web service of music downloading priority, for music composers to compose new music or for service provider to collect more similar music. Therefore, an efficient data mining approach, called MEMSA (Mining Emerging Melody Structure Algorithm), is proposed to discover all EMSs from streaming music query data over sliding windows. In the framework of MEMSA, a prefix tree-based data structure, called EMS-tree (Emerging Melody Structure tree), is constructed for maintaining temporal EMSs effectively. Experimental results show that the proposed method MEMSA is an efficient algorithm for mining all EMSs from streaming melody sequences efficiently.  相似文献   

19.
基于滑动窗口的聚集查询是数据流研究领域的一个热点问题。在已有的研究工作中,聚集算法都是针对立即执行的连续查询提出的,这些算法均是当数据流新到一个元组立即计算一次聚集结果。而在实际应用中,连续查询有时采取的是周期执行方式。论文针对周期执行的连续查询提出了复合滑动窗口聚集算法,即数据流新到一个元组,将它插入到基本窗口中,当基本窗口被插满时计算一次聚集结果。给出了非增量式和增量式两种算法。理论分析和实验结果表明增量式算法具有较好的性能。  相似文献   

20.
In recent years, data stream mining has become an important research topic. With the emergence of new applications, the data we process are not again static, but the continuous dynamic data stream. Examples include network traffic analysis, Web click stream mining, network intrusion detection, and on-line transaction analysis. In this paper, we propose a new framework for data stream mining, called the weighted sliding window model. The proposed model allows the user to specify the number of windows for mining, the size of a window, and the weight for each window. Thus users can specify a higher weight to a more significant data section, which will make the mining result closer to user’s requirements. Based on the weighted sliding window model, we propose a single pass algorithm, called WSW, to efficiently discover all the frequent itemsets from data streams. By analyzing data characteristics, an improved algorithm, called WSW-Imp, is developed to further reduce the time of deciding whether a candidate itemset is frequent or not. Empirical results show that WSW-Imp outperforms WSW under the weighted sliding window model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号