Tracking clusters in evolving data streams over sliding windows |
| |
Authors: | Aoying Zhou Feng Cao Weining Qian Cheqing Jin |
| |
Affiliation: | (1) Department of Computer Science and Engineering, Fudan University, Shanghai, 200433, P.R. China;(2) IBM China Research Lab, Beijing, 100094, P.R. China;(3) Department of Computer Science, East China University of Science and Technology, Shanghai, 200237, P.R. China |
| |
Abstract: | Mining data streams poses great challenges due to the limited memory availability and real-time query response requirement.
Clustering an evolving data stream is especially interesting because it captures not only the changing distribution of clusters
but also the evolving behaviors of individual clusters. In this paper, we present a novel method for tracking the evolution
of clusters over sliding windows. In our SWClustering algorithm, we combine the exponential histogram with the temporal cluster
features, propose a novel data structure, the Exponential Histogram of Cluster Features (EHCF). The exponential histogram
is used to handle the in-cluster evolution, and the temporal cluster features represent the change of the cluster distribution.
Our approach has several advantages over existing methods: (1) the quality of the clusters is improved because the EHCF captures
the distribution of recent records precisely; (2) compared with previous methods, the mechanism employed to adaptively maintain the in-cluster synopsis
can track the cluster evolution better, while consuming much less memory; (3) the EHCF provides a flexible framework for analyzing
the cluster evolution and tracking a specific cluster efficiently without interfering with other clusters, thus reducing the
consumption of computing resources for data stream clustering. Both the theoretical analysis and extensive experiments show
the effectiveness and efficiency of the proposed method.
Aoying Zhou is currently a Professor in Computer Science at Fudan University, Shanghai, P.R. China. He won his Bachelor and Master degrees
in Computer Science from Sichuan University in Chengdu, Sichuan, P.R. China in 1985 and 1988, respectively, and Ph.D. degree
from Fudan University in 1993. He served as the member or chair of program committee for many international conferences such
as WWW, SIGMOD, VLDB, EDBT, ICDCS, ER, DASFAA, PAKDD, WAIM, and etc. His papers have been published in ACM SIGMOD, VLDB, ICDE,
and several other international journals. His research interests include Data mining and knowledge discovery, XML data management,
Web mining and searching, data stream analysis and processing, peer-to-peer computing.
Feng Cao is currently an R&D engineer in IBM China Research Laboratories. He received a B.E. degree from Xi'an Jiao Tong University,
Xi'an, P.R. China, in 2000 and an M.E. degree from Huazhong University of Science and Technology, Wuhan, P.R. China, in 2003.
From October 2004 to March 2005, he worked in Fudan-NUS Competency Center for Peer-to-Peer Computing, Singapore. In 2006,
he received his Ph.D. degree from Fudan University, Shanghai, P.R. China. His current research interests include data mining
and data stream.
Weining Qian is currently an Assistant Professor in computer science at Fudan University, Shanghai, P.R. China. He received his M.S. and
Ph.D. degree in computer science from Fudan University in 2001 and 2004, respectively. He is supported by Shanghai Rising-Star
Program under Grant No. 04QMX1404 and National Natural Science Foundation of China (NSFC) under Grant No. 60673134. He served
as the program committee member of several international conferences, including DASFAA 2006, 2007 and 2008, APWeb/WAIM 2007,
INFOSCALE 2007, and ECDM 2007. His papers have been published in ICDE, SIAM DM, and CIKM. His research interests include data
stream query processing and mining, and large-scale distributed computing for database applications.
Cheqing Jin is currently an Assistant Professor in Computer Science at East China University of Science and Technology. He received his
Bachelor and Master degrees in Computer Science from Zhejiang University in Hangzhou, P.R. China in 1999 and 2002, respectively,
and the Ph.D. degree from Fudan University, Shanghai, P.R. China. He worked as a Research Assistant at E-business Technology
Institute, the Hong Kong University from December 2003 to May 2004. His current research interests include data mining and
data stream. |
| |
Keywords: | Cluster tracking Evolving Data streams Sliding windows |
本文献已被 SpringerLink 等数据库收录! |
|