DSM-FI: an efficient algorithm for mining frequent itemsets in data streams |
| |
Authors: | Hua-Fu Li Man-Kwan Shan Suh-Yin Lee |
| |
Affiliation: | (1) Department of Computer Science, Kainan University, Taoyuan, Taiwan;(2) Department of Computer Science, National Chengchi University, Taipei, Taiwan;(3) Department of Computer Science, National Chiao-Tung University, Hsinchu, Taiwan |
| |
Abstract: | Online mining of data streams is an important data mining problem with broad applications. However, it is also a difficult
problem since the streaming data possess some inherent characteristics. In this paper, we propose a new single-pass algorithm,
called DSM-FI (data stream mining for frequent itemsets), for online incremental mining of frequent itemsets over a continuous
stream of online transactions. According to the proposed algorithm, each transaction of the stream is projected into a set
of sub-transactions, and these sub-transactions are inserted into a new in-memory summary data structure, called SFI-forest
(summary frequent itemset forest) for maintaining the set of all frequent itemsets embedded in the transaction data stream
generated so far. Finally, the set of all frequent itemsets is determined from the current SFI-forest. Theoretical analysis
and experimental studies show that the proposed DSM-FI algorithm uses stable memory, makes only one pass over an online transactional
data stream, and outperforms the existing algorithms of one-pass mining of frequent itemsets.
|
| |
Keywords: | Data mining Data streams Frequent itemsets Single-pass algorithm Landmark window |
本文献已被 SpringerLink 等数据库收录! |
|