A survey on algorithms for mining frequent itemsets over data streams |
| |
Authors: | James Cheng Yiping Ke Wilfred Ng |
| |
Affiliation: | (1) Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, HKUST, Clear Water Bay, Kowloon, Hong Kong |
| |
Abstract: | The increasing prominence of data streams arising in a wide range of advanced applications such as fraud detection and trend
learning has led to the study of online mining of frequent itemsets (FIs). Unlike mining static databases, mining data streams
poses many new challenges. In addition to the one-scan nature, the unbounded memory requirement and the high data arrival
rate of data streams, the combinatorial explosion of itemsets exacerbates the mining task. The high complexity of the FI mining
problem hinders the application of the stream mining techniques. We recognize that a critical review of existing techniques
is needed in order to design and develop efficient mining algorithms and data structures that are able to match the processing
rate of the mining with the high arrival rate of data streams. Within a unifying set of notations and terminologies, we describe
in this paper the efforts and main techniques for mining data streams and present a comprehensive survey of a number of the
state-of-the-art algorithms on mining frequent itemsets over data streams. We classify the stream-mining techniques into two
categories based on the window model that they adopt in order to provide insights into how and why the techniques are useful.
Then, we further analyze the algorithms according to whether they are exact or approximate and, for approximate approaches, whether they are false-positive or false-negative. We also discuss various interesting issues, including the merits and limitations in existing research and substantive areas
for future research. |
| |
Keywords: | Frequent itemsets Stream mining Window models Approximate algorithms |
本文献已被 SpringerLink 等数据库收录! |
|