首页 | 本学科首页   官方微博 | 高级检索  
     


Mining frequent patterns with the pattern tree
Authors:Hao Huang  Xindong Wu  Richard Relue
Affiliation:(1) Department of Computer Science, University of Virginia, 22904 Charlottesville, Virginia, USA;(2) Department of Computer Science, University of Vermont, 05405 Burlington, Vermont, USA;(3) Department of Mathematical and Computer Sciences, Colorado School of Mines, 80401 Golden, Colorado, USA
Abstract:Mining frequent patterns with a frequent pattern tree (FP-tree in short) avoids costly candidate generation and repeatedly occurrence frequency checking against the support threshold. It therefore achieves much better performance and efficiency than Apriori-like algorithms. However, the database still needs to be scanned twice to get the FP-tree. This can be very time-consuming when new data is added to an existing database because two scans may be needed for not only the new data but also the existing data. In this research we propose a new data structure, the pattern tree (P-tree in short), and a new technique, which can get the P-tree through only one scan of the database and can obtain the corresponding FP-tree with a specified support threshold. Updating a P-tree with new data needs one scan of the new data only, and the existing data does not need to be re-scanned. Our experiments show that the P-tree method outperforms the FP-tree method by a factor up to an order of magnitude in large datasets. A preliminary version of this paper has been published in theProceedings of the 2002 IEEE International Conference on Data Mining (ICDM ’02), 629–632. Hao Huang: He is pursuing his Ph.D. degree in the Department of Computer Science at the University of Virginia. His research interests are Gird Computing, Data Mining and their applications in Bioinformatics. He received his M.S. in Computer Science from Colorado School of Mines in 2001. Xindong Wu, Ph.D.: He is Professor and Chair of the Department of Computer Science at the University of Vermont, USA. He holds a Ph.D. in Artificial Intelligence from the University of Edinburgh, Britain. His research interests include data mining, knowledge-based systems, and Web information exploration. He has published extensively in these areas in various journals and conferences, including IEEE TKDE, TPAMI, ACM TOIS, IJCAI, AAAI, ICML, KDD, ICDM, and WWW. Dr. Wu is the Executive Editor (January 1, 1999-December 31, 2004) and an Honorary Editor-in-Chief (starting January 1, 2005) of Knowledge and Information Systems (a peer-reviewed archival journal published by Springer), the founder and current Steering Committee Chair of the IEEE International Conference on Data Mining (ICDM), a Series Editor of the Springer Book Series on Advanced Information and Knowledge Processing (AI&KP), and the Chair of the IEEE Computer Society Technical Committee on Computational Intelligence (TCCI). He served as an Associate Editor for the IEEE Transactions on Knowledge and Data Engineering (TKDE) between January 1, 2000 and December 31, 2003, and is the Editor-in-Chief of TKDE since January 1, 2005. He is the winner of the 2004 ACM SIGKDD Service Award. Richard Relue, Ph.D.: He received his Ph.D. in Computer Science from the Colorado School of Mines in 2003. His research interests include association rules in data mining, neural networks for automated classification, and artificial intelligence for robot navigation. He has been an Information Technology consultant since 1992, working with Ball Aerospace and Technology, Rational Software, Natural Fuels Corporation, and Western Interstate Commission for Higher Education (WICHE).
Keywords:Data Mining  Association Rules  Frequent Patterns
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号