Summarization – compressing data into an informative representation |
| |
Authors: | Varun Chandola Vipin Kumar |
| |
Affiliation: | (1) Department of Computer Science, University of Minnesota, Minneapolis, MN 55414, USA |
| |
Abstract: | In this paper, we formulate the problem of summarization of a data set of transactions with categorical attributes as an optimization
problem involving two objective functions – compaction gain and information loss. We propose metrics to characterize the output
of any summarization algorithm. We investigate two approaches to address this problem. The first approach is an adaptation
of clustering and the second approach makes use of frequent itemsets from the association analysis domain. We illustrate one
application of summarization in the field of network data where we show how our technique can be effectively used to summarize
network traffic into a compact but meaningful representation. Specifically, we evaluate our proposed algorithms on the 1998
DARPA Off-Line Intrusion Detection Evaluation data and network data generated by SKAION Corp for the ARDA information assurance
program.
Vipin Kumar is currently William Norris Professor and Head of the Computer Science and Engineering Department at the University of Minnesota.
His research interests include high-performance computing and data mining. He has authored over 200 research articles, and
has coedited or coauthored nine books including the widely used text booksIntroduction to Parallel Computing andIntroduction to Data Mining, both published by Addison Wesley. He has served as chair/co-chair for many conferences/workshops in the area of data mining
and parallel computing, including the IEEE International Conference on Data Mining (2002) and the 15th International Parallel
and Distributed Processing Symposium (2001). He serves as the chair of the steering committee of the SIAM International Conference
on Data Mining, and is a member of the steering committee of the IEEE International Conference on Data Mining. Dr. Kumar serves
or has served on the editorial boards of several journals includingKnowledge and Information Systems,Journal of Parallel and Distributed Computing andIEEE Transactions of Data and Knowledge Engineering (1993–1997). He is a Fellow of the ACM and IEEE, and a member of SIAM.
Varun Chandola received his BTech degree in Computer Science from the Indian Institute of Technology, Madras, India, in 2002. He is currently
a PhD student in the Computer Science and Engineering Department at the University of Minnesota. His research interests include
data mining, cyber-security and machine learning. |
| |
Keywords: | Summarization Frequent itemsets Categorical attributes |
本文献已被 SpringerLink 等数据库收录! |
|