首页 | 本学科首页   官方微博 | 高级检索  
     


Summarization – compressing data into an informative representation
Authors:Varun Chandola  Vipin Kumar
Affiliation:(1) Department of Computer Science, University of Minnesota, Minneapolis, MN 55414, USA
Abstract:In this paper, we formulate the problem of summarization of a data set of transactions with categorical attributes as an optimization problem involving two objective functions – compaction gain and information loss. We propose metrics to characterize the output of any summarization algorithm. We investigate two approaches to address this problem. The first approach is an adaptation of clustering and the second approach makes use of frequent itemsets from the association analysis domain. We illustrate one application of summarization in the field of network data where we show how our technique can be effectively used to summarize network traffic into a compact but meaningful representation. Specifically, we evaluate our proposed algorithms on the 1998 DARPA Off-Line Intrusion Detection Evaluation data and network data generated by SKAION Corp for the ARDA information assurance program. Vipin Kumar is currently William Norris Professor and Head of the Computer Science and Engineering Department at the University of Minnesota. His research interests include high-performance computing and data mining. He has authored over 200 research articles, and has coedited or coauthored nine books including the widely used text booksIntroduction to Parallel Computing andIntroduction to Data Mining, both published by Addison Wesley. He has served as chair/co-chair for many conferences/workshops in the area of data mining and parallel computing, including the IEEE International Conference on Data Mining (2002) and the 15th International Parallel and Distributed Processing Symposium (2001). He serves as the chair of the steering committee of the SIAM International Conference on Data Mining, and is a member of the steering committee of the IEEE International Conference on Data Mining. Dr. Kumar serves or has served on the editorial boards of several journals includingKnowledge and Information Systems,Journal of Parallel and Distributed Computing andIEEE Transactions of Data and Knowledge Engineering (1993–1997). He is a Fellow of the ACM and IEEE, and a member of SIAM. Varun Chandola received his BTech degree in Computer Science from the Indian Institute of Technology, Madras, India, in 2002. He is currently a PhD student in the Computer Science and Engineering Department at the University of Minnesota. His research interests include data mining, cyber-security and machine learning.
Keywords:Summarization  Frequent itemsets  Categorical attributes
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号