期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

An efficient algorithm for mining temporal high utility itemsets from data streams 总被引：1，自引：0，他引：1

Chun-Jung Chu Tyne Liang 《Journal of Systems and Software》2008,81(7):1105-1117

Utility of an itemset is considered as the value of this itemset, and utility mining aims at identifying the itemsets with high utilities. The temporal high utility itemsets are the itemsets whose support is larger than a pre-specified threshold in current time window of the data stream. Discovery of temporal high utility itemsets is an important process for mining interesting patterns like association rules from data streams. In this paper, we propose a novel method, namely THUI (Temporal High Utility Itemsets)-Mine, for mining temporal high utility itemsets from data streams efficiently and effectively. To the best of our knowledge, this is the first work on mining temporal high utility itemsets from data streams. The novel contribution of THUI-Mine is that it can effectively identify the temporal high utility itemsets by generating fewer candidate itemsets such that the execution time can be reduced substantially in mining all high utility itemsets in data streams. In this way, the process of discovering all temporal high utility itemsets under all time windows of data streams can be achieved effectively with less memory space and execution time. This meets the critical requirements on time and space efficiency for mining data streams. Through experimental evaluation, THUI-Mine is shown to significantly outperform other existing methods like Two-Phase algorithm under various experimental conditions. 相似文献

2.

On-shelf utility mining with negative item values

《Expert systems with applications》2014,41(7):3450-3459

On-shelf utility mining has recently received interest in the data mining field due to its practical considerations. On-shelf utility mining considers not only profits and quantities of items in transactions but also their on-shelf time periods in stores. Profit values of items in traditional on-shelf utility mining are considered as being positive. However, in real-world applications, items may be associated with negative profit values. This paper proposes an efficient three-scan mining approach to efficiently find high on-shelf utility itemsets with negative profit values from temporal databases. In particular, an effective itemset generation method is developed to avoid generating a large number of redundant candidates and to effectively reduce the number of data scans in mining. Experimental results for several synthetic and real datasets show that the proposed approach has good performance in pruning effectiveness and execution efficiency. 相似文献

3.

一种高效的关联规则增量更新算法 总被引：3，自引：0，他引：3

商志会陶树平《计算机应用》2005,25(4):830-832

对挖掘关联规则中FUP算法的关键思想以及性能进行了研究,提出了改进的FUP算法SFUP。该算法充分利用原有挖掘结果中候选频繁项集的支持数,能有效减少对数据库的重复扫描次数,并通过实验对这两种算法进行比较,结果充分说明了SFUP算法的效率要明显优于FUP算法。相似文献

4.

A new algorithm for fast mining frequent itemsets using N-lists

DENG ZhiHong WANG ZhongHui JIANG JiaJian 《中国科学:信息科学(英文版)》2012,(9):2008-2030

相似文献

5.

A new algorithm for fast mining frequent itemsets using N-lists

DENG ZhiHong WANG ZhongHui & JIANG JiaJian 《中国科学:信息科学(英文版)》2012,(9):2008-2030

Mining frequent itemsets has emerged as a fundamental problem in data mining and plays an essential role in many important data mining tasks.In this paper,we propose a novel vertical data representation called N-list,which originates from an FP-tree-like coding prefix tree called PPC-tree that stores crucial information about frequent itemsets.Based on the N-list data structure,we develop an efficient mining algorithm,PrePost,for mining all frequent itemsets.Efficiency of PrePost is achieved by the following three reasons.First,N-list is compact since transactions with common prefixes share the same nodes of the PPC-tree.Second,the counting of itemsets’ supports is transformed into the intersection of N-lists and the complexity of intersecting two N-lists can be reduced to O(m + n) by an efficient strategy,where m and n are the cardinalities of the two N-lists respectively.Third,PrePost can directly find frequent itemsets without generating candidate itemsets in some cases by making use of the single path property of N-list.We have experimentally evaluated PrePost against four state-of-the-art algorithms for mining frequent itemsets on a variety of real and synthetic datasets.The experimental results show that the PrePost algorithm is the fastest in most cases.Even though the algorithm consumes more memory when the datasets are sparse,it is still the fastest one. 相似文献

6.

Fuzzy utility mining with upper-bound measure

《Applied Soft Computing》2015

Fuzzy utility mining has been an emerging research issue because of its simplicity and comprehensibility. Different from traditional fuzzy data mining, fuzzy utility mining considers not only quantities of items in transactions but also their profits for deriving high fuzzy utility itemsets. In this paper, we introduce a new fuzzy utility measure with the fuzzy minimum operator to evaluate the fuzzy utilities of itemsets. Besides, an effective fuzzy utility upper-bound model based on the proposed measure is designed to provide the downward-closure property in fuzzy sets, thus reducing the search space of finding high fuzzy utility itemsets. A two-phase fuzzy utility mining algorithm, named TPFU, is also proposed and described for solving the problem of fuzzy utility mining. At last, the experimental results on both synthetic and real datasets show that the proposed algorithm has good performance. 相似文献

7.

A novel methodology for stock investment using high utility episode mining and genetic algorithm

《Applied Soft Computing》2017

In this paper, we present a novel methodology for stock investment using the technique of high utility episode mining and genetic algorithms. Our objective is to devise a profitable episode-based investment model to reveal hidden events that are associated with high utility in the stock market. The time series data of stock price and the derived technical indicators, including moving average, moving average convergence and divergence, random index and bias index, are used for the construction of episode events. We then employ the genetic algorithm for the simultaneous optimization on parameters and selection of subsets of models. The empirical results show that our proposed method significantly outperforms the state-of-the-art methods in terms of annualized returns of investment and precision. We also provide a set of Z-tests to statistically validate the effectiveness of our proposed method. Based upon the promising results obtained, we expect this novel methodology can advance the research in data mining for computational finance and provide an alternative to stock investment in practice. 相似文献

8.

一种基于FP-tree的频繁项集增量更新算法

廖仁全王利华邱江涛《计算机工程与应用》2007,43(4):176-178,233

针对频繁项集增量更新的问题,提出算法FIU。该算法将保存了数据库事务的FP-tree存储在磁盘上,当挖掘新支持度阈值的频繁项集时,只需从磁盘上读入FP-tree,再挖掘新支持度阈值下的频繁项集。当新增数据库事务记录后,首先建立新项目表,然后根据新项目表建立新增事务记录的FP-tree,读入存储在磁盘上的FP-tree,抽取出所有的事务记录,再插入到新FP-tree中．从而得到增量更新后的FP-tree。最后在增量更新后的FP-tree上挖掘频繁项集。实验证明,FIU算法执行时间不随数据库大小变化,与其他算法相比有较好的性能。相似文献

9.

Applying the maximum utility measure in high utility sequential pattern mining

《Expert systems with applications》2014,41(11):5071-5081

Recently, high utility sequential pattern mining has been an emerging popular issue due to the consideration of quantities, profits and time orders of items. The utilities of subsequences in sequences in the existing approach are difficult to be calculated due to the three kinds of utility calculations. To simplify the utility calculation, this work then presents a maximum utility measure, which is derived from the principle of traditional sequential pattern mining that the count of a subsequence in the sequence is only regarded as one. Hence, the maximum measure is properly used to simplify the utility calculation for subsequences in mining. Meanwhile, an effective upper-bound model is designed to avoid information losing in mining, and also an effective projection-based pruning strategy is designed as well to cause more accurate sequence-utility upper-bounds of subsequences. The indexing strategy is also developed to quickly find the relevant sequences for prefixes in mining, and thus unnecessary search time can be reduced. Finally, the experimental results on several datasets show the proposed approach has good performance in both pruning effectiveness and execution efficiency. 相似文献

10.

DiffNodesets: An efficient structure for fast mining frequent itemsets

《Applied Soft Computing》2016

Mining frequent itemsets is an essential problem in data mining and plays an important role in many data mining applications. In recent years, some itemset representations based on node sets have been proposed, which have shown to be very efficient for mining frequent itemsets. In this paper, we propose DiffNodeset, a novel and more efficient itemset representation, for mining frequent itemsets. Based on the DiffNodeset structure, we present an efficient algorithm, named dFIN, to mining frequent itemsets. To achieve high efficiency, dFIN finds frequent itemsets using a set-enumeration tree with a hybrid search strategy and directly enumerates frequent itemsets without candidate generation under some case. For evaluating the performance of dFIN, we have conduct extensive experiments to compare it against with existing leading algorithms on a variety of real and synthetic datasets. The experimental results show that dFIN is significantly faster than these leading algorithms. 相似文献

11.

High utility itemset mining with techniques for reducing overestimated utilities and pruning candidates

《Expert systems with applications》2014,41(8):3861-3878

High utility itemset mining considers the importance of items such as profit and item quantities in transactions. Recently, mining high utility itemsets has emerged as one of the most significant research issues due to a huge range of real world applications such as retail market data analysis and stock market prediction. Although many relevant algorithms have been proposed in recent years, they incur the problem of generating a large number of candidate itemsets, which degrade mining performance. In this paper, we propose an algorithm named MU-Growth (Maximum Utility Growth) with two techniques for pruning candidates effectively in mining process. Moreover, we suggest a tree structure, named MIQ-Tree (Maximum Item Quantity Tree), which captures database information with a single-pass. The proposed data structure is restructured for reducing overestimated utilities. Performance evaluation shows that MU-Growth not only decreases the number of candidates but also outperforms state-of-the-art tree-based algorithms with overestimated methods in terms of runtime with a similar memory usage. 相似文献

12.

Efficient updating of discovered high-utility itemsets for transaction deletion in dynamic databases

《Advanced Engineering Informatics》2015,29(1):16-27

Most algorithms related to association rule mining are designed to discover frequent itemsets from a binary database. Other factors such as profit, cost, or quantity are not concerned in binary databases. Utility mining was thus proposed to measure the utility values of purchased items for finding high-utility itemsets from a static database. In real-world applications, transactions are changed whether insertion or deletion in a dynamic database. An existing maintenance approach for handling high-utility itemsets in dynamic databases with transaction deletion must rescan the database when necessary. In this paper, an efficient algorithm, called PRE-HUI-DEL, for updating high-utility itemsets based on the pre-large concept for transaction deletion is proposed. The pre-large concept is used to partition transaction-weighted utilization itemsets into three sets with nine cases according to whether they have large (high), pre-large, or small transaction-weighted utilization in the original database and in the deleted transactions. Specific procedures are then applied to each case for maintaining and updating the discovered high-utility itemsets. Experimental results show that the proposed PRE-HUI-DEL algorithm outperforms a batch two-phase algorithm and a FUP2-based algorithm in maintaining high-utility itemsets. 相似文献

13.

Fast mining frequent itemsets using Nodesets

《Expert systems with applications》2014,41(10):4505-4512

Node-list and N-list, two novel data structure proposed in recent years, have been proven to be very efficient for mining frequent itemsets. The main problem of these structures is that they both need to encode each node of a PPC-tree with pre-order and post-order code. This causes that they are memory-consuming and inconvenient to mine frequent itemsets. In this paper, we propose Nodeset, a more efficient data structure, for mining frequent itemsets. Nodesets require only the pre-order (or post-order code) of each node, which makes it saves half of memory compared with N-lists and Node-lists. Based on Nodesets, we present an efficient algorithm called FIN to mining frequent itemsets. For evaluating the performance of FIN, we have conduct experiments to compare it with PrePost and FP-growth¹, two state-of-the-art algorithms, on a variety of real and synthetic datasets. The experimental results show that FIN is high performance on both running time and memory usage. 相似文献

14.

An efficient approach for mining cross-level closed itemsets and minimal association rules using closed itemset lattices

《Expert systems with applications》2014,41(6):2914-2938

Multilevel knowledge in transactional databases plays a significant role in our real-life market basket analysis. Many researchers have mined the hierarchical association rules and thus proposed various approaches. However, some of the existing approaches produce many multilevel and cross-level association rules that fail to convey quality information. From these large number of redundant association rules, it is extremely difficult to extract any meaningful information. There also exist some approaches that mine minimal association rules, but these have many shortcomings due to their naïve-based approaches. In this paper, we have focused on the need for generating hierarchical minimal rules that provide maximal information. An algorithm has been proposed to derive minimal multilevel association rules and cross-level association rules. Our work has made significant contributions in mining the minimal cross-level association rules, which express the mixed relationship between the generalized and specialized view of the transaction itemsets. We are the first to design an efficient algorithm using a closed itemset lattice-based approach, which can mine the most relevant minimal cross-level association rules. The parent–child relationship of the lattices has been exploited while mining cross-level closed itemset lattices. We have extensively evaluated our proposed algorithm’s efficiency using a variety of real-life datasets and performing a large number of experiments. The proposed algorithm has outperformed the existing related work significantly during the pervasive performance comparison. 相似文献

15.

A framework for mining interesting high utility patterns with a strong frequency affinity

Chowdhury Farhan Ahmed Ho-Jin Choi 《Information Sciences》2011,181(21):4878-4894

High utility pattern (HUP) mining is one of the most important research issues in data mining. Although HUP mining extracts important knowledge from databases, it requires long calculations and multiple database scans. Therefore, HUP mining is often unsuitable for real-time data processing schemes such as data streams. Furthermore, many HUPs may be unimportant due to the poor correlations among the items inside of them. Hence,the fast discovery of fewer but more important HUPs would be very useful in many practical domains. In this paper, we propose a novel framework to introduce a very useful measure, called frequency affinity, among the items in a HUP and the concept of interesting HUP with a strong frequency affinity for the fast discovery of more applicable knowledge. Moreover, we propose a new tree structure, utility tree based on frequency affinity (UTFA), and a novel algorithm, high utility interesting pattern mining (HUIPM), for single-pass mining of HUIPs from a database. Our approach mines fewer but more valuable HUPs, significantly reduces the overall runtime of existing HUP mining algorithms and is applicable to real-time data processing. Extensive performance analyses show that the proposed HUIPM algorithm is very efficient and scalable for interesting HUP mining with a strong frequency affinity. 相似文献

16.

垂直模式类高效用模式挖掘的改进算法

《微型机与应用》2016,(22):22-25

由于高效用模式挖掘较为复杂,提高其挖掘算法的效率是数据挖掘的研究热点。HUP-miner算法是典型的基于垂直模式类的高效用模式挖掘算法,虽然能够有效地减少效用列表的总个数,但对于项集的划分,效用列表需要更多的空间。针对该问题,在HUI-miner算法的基础上充分考虑了1-扩展集中项集的关联性,减少了效用列表个数,提出了改进的IHUI-miner算法。实验结果表明,改进算法IHUI-miner在时间效率和减少效用列表的个数上都优于HUP-miner与HUI-miner算法。相似文献

17.

An efficient projection-based indexing approach for mining high utility itemsets

Guo-Cheng Lan Tzung-Pei Hong Vincent S. Tseng 《Knowledge and Information Systems》2014,38(1):85-107

Recently, utility mining has widely been discussed in the field of data mining. It finds high utility itemsets by considering both profits and quantities of items in transactional data sets. However, most of the existing approaches are based on the principle of levelwise processing, as in the traditional two-phase utility mining algorithm to find a high utility itemsets. In this paper, we propose an efficient utility mining approach that adopts an indexing mechanism to speed up the execution and reduce the memory requirement in the mining process. The indexing mechanism can imitate the traditional projection algorithms to achieve the aim of projecting sub-databases for mining. In addition, a pruning strategy is also applied to reduce the number of unpromising itemsets in mining. Finally, the experimental results on synthetic data sets and on a real data set show the superior performance of the proposed approach. 相似文献

18.

A fast updated algorithm to maintain the discovered high-utility itemsets for transaction modification

《Advanced Engineering Informatics》2015,29(3):562-574

High-utility itemsets mining (HUIM) is a critical issue which concerns not only the occurrence frequencies of itemsets in association-rule mining (ARM), but also the factors of quantity and profit in real-life applications. Many algorithms have been developed to efficiently mine high-utility itemsets (HUIs) from a static database. Discovered HUIs may become invalid or new HUIs may arise when transactions are inserted, deleted or modified. Existing approaches are required to re-process the updated database and re-mine HUIs each time, as previously discovered HUIs are not maintained. Previously, a pre-large concept was proposed to efficiently maintain and update the discovered information in ARM, which cannot be directly applied into HUIM. In this paper, a maintenance (PRE-HUI-MOD) algorithm with transaction modification based on a new pre-large strategy is presented to efficiently maintain and update the discovered HUIs. When the transactions are consequentially modified from the original database, the discovered information is divided into three parts with nine cases. A specific procedure is then performed to maintain and update the discovered information for each case. Based on the designed PRE-HUI-MOD algorithm, it is unnecessary to rescan original database until the accumulative total utility of the modified transactions achieves the designed safety bound, which can greatly reduce the computations of multiple database scans when compared to the batch-mode approaches. 相似文献

19.

快速挖掘加权频繁项集的矩阵位串算法

李娟张明义汪维清《计算机工程与设计》2007,28(11):2533-2536

关联规则挖掘的应用日益广泛,但已经提出的大多关联规则挖掘算法都是把数据仓库中各个项目按平等一致的方式加以处理的.然而,在现实世界中,不同的项目往往有着不同的重要性.现有的有关加权关联规则的研究中,大多采用的加权方法不太好,或挖掘算法效率不够高.为此,提出了一种新的挖掘加权关联规则的算法,该算法采用矩阵和位串技术,只需要对数据库扫描一遍,可快速挖掘出所有的加权频繁项集,并且存放辅助信息所需要的空间也较少.研究表明该算法比已有的算法更高效. 相似文献

20.

An efficient fast algorithm for discovering closed+ high utility itemsets

Jayakrushna Sahoo Ashok Kumar Das A. Goswami 《Applied Intelligence》2016,45(1):44-74

In recent years, high utility itemsets (HUIs) mining from the transactional databases becomes one of the most emerging research topic in the field of data mining due to its wide range of applications in online e-commerce data analysis, identifying interesting patterns in biomedical data and for cross marketing solutions in retail business. It aims to discover the itemsets with high utilities efficiently by considering item quantities in a transaction and profit values of each item. However, it produces a tremendous number of HUIs, which imposes further burden in analysis of the extracted patterns and also degrades the performance of mining methods. Mining the set of closed ⁺ high utility itemsets (CHUIs) solves this issue as it is a loss-less and condensed representation of all HUIs. In this paper, we aim to present a new algorithm for finding CHUIs from a transactional database, called the CHUM (Closed ⁺ High Utility itemset Miner), which is scalable and efficient. The proposed mining algorithm adopts a tricky aimed vertical representation of the database in order to speed up the execution time in generating itemset closures and compute their utility information without accessing the database. The proposed method makes use of the item co-occurrences strategy in order to further reduce the number of intersections needed to be performed. Several experiments are conducted on various sparse and dense datasets and the simulation results clearly show the scalability and superior performance of our algorithm as compared to those for the existing state-of-the-art CHUD (Closed ⁺ High Utility itemset Discovery) algorithm. 相似文献