首页 | 本学科首页   官方微博 | 高级检索  
     


Squeezer: An efficient algorithm for clustering categorical data
Authors:He?Zengyou?  author-information"  >  author-information__contact u-icon-before"  >  mailto:zengyouhe@yahoo.com"   title="  zengyouhe@yahoo.com"   itemprop="  email"   data-track="  click"   data-track-action="  Email author"   data-track-label="  "  >Email author,Xu?Xiaofei,Deng?Shengchun
Affiliation:(1) Department of Computer Science and Engineering, Harbin Institute of Technology, 150001 Harbin, P.R. China
Abstract:This paper presents a new efficient algorithm for clustering categorical data,Squeezer, which can produce high quality clustering results and at the same time deserve good scalability. TheSqueezer algorithm reads each tuplet in sequence, either assigningt to an existing cluster (initially none), or creatingt as a new cluster, which is determined by the similarities betweent and clusters. Due to its characteristics, the proposed algorithm is extremely suitable for clustering data streams, where given a sequence of points, the objective is to maintain consistently good clustering of the sequence so far, using a small amount of memory and time. Outliers can also be handled efficiently and directly inSqueezer. Experimental results on real-life and synthetic datasets verify the superiority ofSqueezer. This work was supported by the National Natural Science Foundation of China (Grant No. 60084004) and the IBM AS/400 Research Fund. HE Zengyon received his M.S. degree in computer science from harbin Institute of Technology (HIT) in 2002. He is currently a Ph.D. candidate in the Department of Computer Science and Engineering, HIT. His main research interests include data mining, multi-database systems and approximate query answering. XU Xiaofei received his M.S. and Ph.D. degrees in computer science from HIT in 1985 and 1988 respectively. He is currently a professor in the Department of Computer Science and Engineering, HIT. His main research interests include CIMS and database systems. DENG Shengchun received his Ph.D. degree in computer science from HIT in 2002. He is currently an associate professor in the Department of Computer Science and Engineering, HIT. His main research interests include data mining and data warehouse.
Keywords:clustering  categorical data  data stream  data mining
本文献已被 维普 万方数据 SpringerLink 等数据库收录!
点击此处可从《计算机科学技术学报》浏览原始摘要信息
点击此处可从《计算机科学技术学报》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号