首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Abstract

To overcome the limitation of high-utility itemset mining, more compact, lossless, and concise representations of high utility itemsets (HUIs) have been proposed in previous works, such as closed HUIs (CHUIs) or maximal HUIs (MHUIs). Focusing into MHUI mining, in this article, we present efficient approaches to directly mine MHUIs from transactional databases without generating any candidates. The proposed algorithms, which all execute in one phase, utilize many efficient data structures and pruning techniques such as EUCP combined with EUCS, CUIP combined with FUCS, and the P-set structure to significantly reduce the search space and remove nonpromising itemsets, thus, increase the performance of the MHUI mining process. Furthermore, while previous works assumed that the unit profit of items is fixed, which is not practical in many real-world applications, our work resolved this issue by applying a new utility calculation into the mining process to reflect the true nature of real-world databases, thus, generating more accurate results.  相似文献   

2.
Fuzzy utility mining has been an emerging research issue because of its simplicity and comprehensibility. Different from traditional fuzzy data mining, fuzzy utility mining considers not only quantities of items in transactions but also their profits for deriving high fuzzy utility itemsets. In this paper, we introduce a new fuzzy utility measure with the fuzzy minimum operator to evaluate the fuzzy utilities of itemsets. Besides, an effective fuzzy utility upper-bound model based on the proposed measure is designed to provide the downward-closure property in fuzzy sets, thus reducing the search space of finding high fuzzy utility itemsets. A two-phase fuzzy utility mining algorithm, named TPFU, is also proposed and described for solving the problem of fuzzy utility mining. At last, the experimental results on both synthetic and real datasets show that the proposed algorithm has good performance.  相似文献   

3.
王敬华  罗相洲  吴倩 《计算机应用》2016,36(11):3062-3066
高效用项集挖掘在数据挖掘领域中受到了广泛的关注,但是高效用项集挖掘并没有考虑项集长度对效用值的影响,所以高平均效用项集挖掘被提出;而目前的一些高平均效用项集挖掘算法需要耗费大量的时间才能挖掘出有效的高平均效用项集。针对此问题,给出了一个高平均效用项集挖掘的改进算法——FHAUI。FHAUI算法将效用信息保存到效用列表中,通过效用列表的比较来挖掘出所有的高平均效用值,同时FHAUI算法还采用了一个二维矩阵来有效减少二项效用值的连接比较次数。最后将FHAUI算法在多个经典的数据集上测试。实验结果表明,FHAUI算法在效用列表的连接比较次数上有了极大的降低,同时其时间性能也有非常大提高。  相似文献   

4.
On-shelf utility mining has recently received interest in the data mining field due to its practical considerations. On-shelf utility mining considers not only profits and quantities of items in transactions but also their on-shelf time periods in stores. Profit values of items in traditional on-shelf utility mining are considered as being positive. However, in real-world applications, items may be associated with negative profit values. This paper proposes an efficient three-scan mining approach to efficiently find high on-shelf utility itemsets with negative profit values from temporal databases. In particular, an effective itemset generation method is developed to avoid generating a large number of redundant candidates and to effectively reduce the number of data scans in mining. Experimental results for several synthetic and real datasets show that the proposed approach has good performance in pruning effectiveness and execution efficiency.  相似文献   

5.
基于图的关联规则改进算法   总被引:1,自引:0,他引:1  
关联规则挖掘是数据挖掘研究的最重要课题之一。基于图的关联规则挖掘DLG算法通过一次扫描数据库构建关联图,然后遍历该关联图产生频繁项集,有效地提高了关联规则挖掘的性能。在分析该算法基本原理基础上,提出了一种改进的算法—DLG#。改进算法在关联图构造同时构造项集关联矩阵,在候选项集生成时结合关联图和Apriori性质对冗余项集进行剪枝,减少了候选项集数,简化了候选项集的验证。比较实验结果表明,在不同数据集和不同支持度阈值下,改进算法都能更快速的发现频繁项集,当频繁项集平均长度较大时性能提高明显。  相似文献   

6.
Mining frequent itemsets is an essential problem in data mining and plays an important role in many data mining applications. In recent years, some itemset representations based on node sets have been proposed, which have shown to be very efficient for mining frequent itemsets. In this paper, we propose DiffNodeset, a novel and more efficient itemset representation, for mining frequent itemsets. Based on the DiffNodeset structure, we present an efficient algorithm, named dFIN, to mining frequent itemsets. To achieve high efficiency, dFIN finds frequent itemsets using a set-enumeration tree with a hybrid search strategy and directly enumerates frequent itemsets without candidate generation under some case. For evaluating the performance of dFIN, we have conduct extensive experiments to compare it against with existing leading algorithms on a variety of real and synthetic datasets. The experimental results show that dFIN is significantly faster than these leading algorithms.  相似文献   

7.
Association-rule mining, which is based on frequency values of items, is the most common topic in data mining. In real-world applications, customers may, however, buy many copies of products and each product may have different factors, such as profits and prices. Only mining frequent itemsets in binary databases is thus not suitable for some applications. Utility mining is thus presented to consider additional measures, such as profits or costs according to user preference. In the past, a two-phase mining algorithm was designed for fast discovering high utility itemsets from databases. When data come intermittently, the approach needs to process all the transactions in a batch way. In this paper, an incremental mining algorithm for efficiently mining high utility itemsets is proposed to handle the above situation. It is based on the concept of the fast-update (FUP) approach, which was originally designed for association mining. The proposed approach first partitions itemsets into four parts according to whether they are high transaction-weighted utilization itemsets in the original database and in the newly inserted transactions. Each part is then executed by its own procedure. Experimental results also show that the proposed algorithm executes faster than the two-phase batch mining algorithm in the intermittent data environment  相似文献   

8.
High-utility itemset mining (HUIM) is a popular data mining task with applications in numerous domains. However, traditional HUIM algorithms often produce a very large set of high-utility itemsets (HUIs). As a result, analyzing HUIs can be very time consuming for users. Moreover, a large set of HUIs also makes HUIM algorithms less efficient in terms of execution time and memory consumption. To address this problem, closed high-utility itemsets (CHUIs), concise and lossless representations of all HUIs, were proposed recently. Although mining CHUIs is useful and desirable, it remains a computationally expensive task. This is because current algorithms often generate a huge number of candidate itemsets and are unable to prune the search space effectively. In this paper, we address these issues by proposing a novel algorithm called CLS-Miner. The proposed algorithm utilizes the utility-list structure to directly compute the utilities of itemsets without producing candidates. It also introduces three novel strategies to reduce the search space, namely chain-estimated utility co-occurrence pruning, lower branch pruning, and pruning by coverage. Moreover, an effective method for checking whether an itemset is a subset of another itemset is introduced to further reduce the time required for discovering CHUIs. To evaluate the performance of the proposed algorithm and its novel strategies, extensive experiments have been conducted on six benchmark datasets having various characteristics. Results show that the proposed strategies are highly efficient and effective, that the proposed CLS-Miner algorithmoutperforms the current state-ofthe- art CHUD and CHUI-Miner algorithms, and that CLSMiner scales linearly.  相似文献   

9.
高效用序列模式挖掘是数据挖掘领域的一项重要内容,在生物信息学、消费行为分析等方面具有重要的应用.与传统基于频繁项模式挖掘方法不同,高效用序列模式挖掘不仅考虑项集的内外效用,更突出项集的时间序列含义,计算复杂度较高.尽管已经有一定数量的算法被提出应用于解决该类问题,挖掘算法的时空效率依然成为该领域的主要研究热点问题.鉴于...  相似文献   

10.
Utility of an itemset is considered as the value of this itemset, and utility mining aims at identifying the itemsets with high utilities. The temporal high utility itemsets are the itemsets whose support is larger than a pre-specified threshold in current time window of the data stream. Discovery of temporal high utility itemsets is an important process for mining interesting patterns like association rules from data streams. In this paper, we propose a novel method, namely THUI (Temporal High Utility Itemsets)-Mine, for mining temporal high utility itemsets from data streams efficiently and effectively. To the best of our knowledge, this is the first work on mining temporal high utility itemsets from data streams. The novel contribution of THUI-Mine is that it can effectively identify the temporal high utility itemsets by generating fewer candidate itemsets such that the execution time can be reduced substantially in mining all high utility itemsets in data streams. In this way, the process of discovering all temporal high utility itemsets under all time windows of data streams can be achieved effectively with less memory space and execution time. This meets the critical requirements on time and space efficiency for mining data streams. Through experimental evaluation, THUI-Mine is shown to significantly outperform other existing methods like Two-Phase algorithm under various experimental conditions.  相似文献   

11.
针对已有概率频繁项集挖掘算法采用模式增长的方式构建树时产生大量树节点,导致内存空间占用较大以及发现概率频繁项集效率低等问题,提出了改进的不确定数据频繁模式增长(PUFP-Growth)算法。该算法通过逐条读取不确定事务数据库中数据,构造类似频繁模式树(FP-Tree)的紧凑树结构,同时更新项头表中保存所有尾节点相同项集的期望值的动态数组。当所有事务数据插入到改进的不确定数据频繁模式树(PUFP-Tree)中以后,通过遍历数组得到所有的概率频繁项集。最后通过实验结果和理论分析表明:PUFP-Growth算法可以有效地发现概率频繁项集;与不确定数据频繁模式增长(UF-Growth)算法和压缩的不确定频繁模式挖掘(CUFP-Mine)算法相比,提出的PUFP-Growth算法能够提高不确定数据概率频繁项集挖掘的效率,并且减少了内存空间的使用。  相似文献   

12.
王乐  熊松泉  常艳芬  王水 《自动化学报》2015,41(9):1616-1626
高效用模式挖掘是数据挖掘领域的一个重要研究内容; 由于其计算过程包含对模式的内、外效用值的处理, 计算复杂度较大, 因此挖掘算法的主要研究热点问题就是提高算法的时间效率.针对此问题, 本文给出一个基于模式增长方式的高效用模式挖掘算法HUPM-FP, 该算法可以从全局树上挖掘高效用模式, 避免产生候选项集.实验中, 采用6个典型数据集进行实验, 并和目前效率较好的算法FHM (Faster high-utility itemset mining)做了对比, 实验结果表明本文给出的算法时空效率都有较大的提高, 特别是时间效率提高较大, 可以达到1个数量级以上.  相似文献   

13.
李慧  刘贵全  瞿春燕 《计算机科学》2015,42(5):82-87, 123
对从事务数据库中挖掘有意义的项集的研究已超过10年.然而,大多数的研究要么使用频繁度或支持度(如频繁项集挖掘),要么使用效用值或利润(如高效用项集挖掘)作为主要的衡量标准.单独使用这两种衡量方式都有各自的局限性,比如频繁度很高的项集其效用值有可能很低,而效用值很高的项集其频繁度往往很低,将这些项集推荐给用户没有意义.将这两种衡量标准综合考虑,希望找出那些频繁度和效用值都很高的项集.该项工作最大的挑战是效用值既不满足单调性也不满足反单调性.因此,提出了高效算法FHIMA.FHIMA采用PrefixSpan的思想,挖掘时能避免产生非频繁的候选项集.此外,还根据效用和质量上界的一些性质,有效地缩小了搜索空间,极大地提高了FHIMA算法的效率.  相似文献   

14.
High utility pattern mining has been studied as an essential topic in the field of pattern mining in order to satisfy requirements of many real-world applications that need to process non-binary databases including item importance such as market analysis. In this paper, we propose an efficient algorithm with a novel indexed list-based data structure for mining high utility patterns. Previous approaches first generate an enormous number of candidate patterns on the basis of overestimation methods in their mining processes and then identify actual high utility patterns from the candidates through an additional database scan, which leads to high computational overheads. Although several list-based algorithms to discover high utility patterns without candidate generation have been suggested in recent years, they require a large number of comparison operations. Our method facilitates efficient mining of high utility patterns with the proposed indexed list by effectively reducing the total number of such operations. Moreover, we develop two techniques based on this novel data structure to more enhance mining performance of the proposed method. Experimental results on real and synthetic datasets show that the proposed algorithm mines high utility patterns more efficiently than the state-of-the-art algorithms.  相似文献   

15.
针对最大频繁项目集挖掘算法(DMFIA)当候选项目集维数高而最大频繁项目集维数较低的情况下要产生大量的候选项目集的缺点,提出了一种改进的基于频繁模式树(FP-tree)结构的最大频繁项目集挖掘算法--FP-MFIA。该算法根据FP-tree的项目头表,采用自底向上的搜索策略逐层挖掘最大频繁项目集,从而加速每次对候选集计数的操作。在挖掘时根据每层的条件模式基产生维数较低的非频繁项目集,尽早对候选项目集进行剪枝和降维,可大量减少候选项目集的数量。同时在挖掘时充分利用最大频繁项集的性质,减少搜索空间。通过算法在不同支持度下挖掘时间的对比可知,算法FP-MFIA在最小支持度较低的情况下时间效率是DMFIA以及基于降维的最大频繁模式挖掘算法(BDRFI)的2倍以上,说明FP-MFIA在候选集维数较高的时候优势明显。  相似文献   

16.
Data stream mining is an emerging research topic in the data mining field. Finding frequent itemsets is one of the most important tasks in data stream mining with wide applications like online e-business and web click-stream analysis. However, two main problems existed in relevant studies: (1) The utilities (e.g., importance or profits) of items are not considered. Actual utilities of patterns cannot be reflected in frequent itemsets. (2) Existing utility mining methods produce too many patterns and this makes it difficult for the users to filter useful patterns among the huge set of patterns. In view of this, in this paper we propose a novel framework, named GUIDE (Generation of maximal high Utility Itemsets from Data strEams), to find maximal high utility itemsets from data streams with different models, i.e., landmark, sliding window and time fading models. The proposed structure, named MUI-Tree (Maximal high Utility Itemset Tree), maintains essential information for the mining processes and the proposed strategies further facilitates the performance of GUIDE. Main contributions of this paper are as follows: (1) To the best of our knowledge, this is the first work on mining the compact form of high utility patterns from data streams; (2) GUIDE is an effective one-pass framework which meets the requirements of data stream mining; (3) GUIDE generates novel patterns which are not only high utility but also maximal, which provide compact and insightful hidden information in the data streams. Experimental results show that our approach outperforms the state-of-the-art algorithms under various conditions in data stream environments on different models.  相似文献   

17.
Processing changeable data streams in real time is one of the most important issues in the data mining field due to its broad applications such as retail market analysis, wireless sensor networks, and stock market prediction. In addition, it is an interesting and challenging problem to deal with the stream data since not only the data have unbounded, continuous, and high speed characteristics but also their environments have limited resources. High utility pattern mining, meanwhile, is one of the essential research topics in pattern mining to overcome major drawbacks of the traditional framework for frequent pattern mining that takes only binary databases and identical item importance into consideration. This approach conducts mining processes by reflecting characteristics of real world databases, non-binary quantities and relative importance of items. Although relevant algorithms were proposed for finding high utility patterns in stream environments, they suffer from a level-wise candidate generation-and-test and a large number of candidates by their overestimation techniques. As a result, they consume a huge amount of execution time, which is a significant performance issue since a rapid process is necessary in stream data analysis. In this paper, we propose an algorithm for mining high utility patterns from resource-limited environments through efficient processing of data streams in order to solve the problems of the overestimation-based methods. To improve mining performance with fewer candidates and search space than the previous ones, we develop two techniques for reducing overestimated utilities. Moreover, we suggest a tree-based data structure to maintain information of stream data and high utility patterns. The proposed tree is restructured by our updating method with decreased overestimation utilities to keep up-to-date stream information whenever the current window slides. Our approach also has an important effect on expert and intelligent systems in that it can provide users with more meaningful information than traditional analysis methods by reflecting the characteristics of real world non-binary databases in stream environments and emphasizing on recent data. Comprehensive experimental results show that our algorithm outperforms the existing sliding window-based one in terms of runtime efficiency and scalability.  相似文献   

18.
高平均效用项集挖掘是当前研究的热点之一。针对高平均效用项集挖掘算法产生大量无意义的候选项集,而导致高内存消耗和运行时间长的问题,提出了dMHAUI算法。首先定义了集成矩阵Q,并提出了4种基于垂直数据库表示的紧凑平均效用上界及3种有效的修剪策略;将高平均效用项集挖掘所需的信息存储于IDUL结构树,利用改进的diffset技术快速计算项集的平均效用和上界;最后通过递归调用搜索函数得到高平均效用项集。与EHAUPM算法和MHAI算法进行仿真比较,结果表明,dMHAUI算法在运行时间、连接比较次数和可扩展性等方面都有较优的性能。  相似文献   

19.
Mining itemset utilities from transaction databases   总被引:4,自引:0,他引:4  
The rationale behind mining frequent itemsets is that only itemsets with high frequency are of interest to users. However, the practical usefulness of frequent itemsets is limited by the significance of the discovered itemsets. A frequent itemset only reflects the statistical correlation between items, and it does not reflect the semantic significance of the items. In this paper, we propose a utility based itemset mining approach to overcome this limitation. The proposed approach permits users to quantify their preferences concerning the usefulness of itemsets using utility values. The usefulness of an itemset is characterized as a utility constraint. That is, an itemset is interesting to the user only if it satisfies a given utility constraint. We show that the pruning strategies used in previous itemset mining approaches cannot be applied to utility constraints. In response, we identify several mathematical properties of utility constraints. Then, two novel pruning strategies are designed. Two algorithms for utility based itemset mining are developed by incorporating these pruning strategies. The algorithms are evaluated by applying them to synthetic and real world databases. Experimental results show that the proposed algorithms are effective on the databases tested.  相似文献   

20.
由于能反映用户的偏好,可以弥补传统频繁项集挖掘仅由支持度来衡量项集重要性的不足,高效用项集正在成为当前数据挖掘研究的热点。为使高效用项集挖掘更好地适应数据规模不断增大的实际需求,提出了一种高效用项集的并行挖掘算法PHUI-Mine。提出了记录挖掘高效用项集信息的DHUI-树结构,描述了DHUI-树的构造方法,论证了DHUI-树的动态剪枝策略。在此基础上,给出了高效用项集挖掘的并行算法描述。实验结果表明,PHUI-Mine算法具有较高的挖掘效率及较低的存储开销。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号