Efficient algorithms for segmentation of item-set time series |
| |
Authors: | Parvathi Chundi Daniel J Rosenkrantz |
| |
Affiliation: | (1) Computer Science Department, University of Nebraska at Omaha, Omaha, NE 68106, USA;(2) Computer Science Department, SUNY at Albany, Albany, NY 12222, USA |
| |
Abstract: | We propose a special type of time series, which we call an item-set time series, to facilitate the temporal analysis of software version histories, email logs, stock market data, etc. In an item-set time
series, each observed data value is a set of discrete items. We formalize the concept of an item-set time series and present
efficient algorithms for segmenting a given item-set time series. Segmentation of a time series partitions the time series
into a sequence of segments where each segment is constructed by combining consecutive time points of the time series. Each segment is associated with
an item set that is computed from the item sets of the time points in that segment, using a function which we call a measure function. We then define a concept called the segment difference, which measures the difference between the item set of a segment and the item sets of the time points in that segment. The
segment difference values are required to construct an optimal segmentation of the time series. We describe novel and efficient
algorithms to compute segment difference values for each of the measure functions described in the paper. We outline a dynamic
programming based scheme to construct an optimal segmentation of the given item-set time series. We use the item-set time
series segmentation techniques to analyze the temporal content of three different data sets–Enron email, stock market data,
and a synthetic data set. The experimental results show that an optimal segmentation of item-set time series data captures
much more temporal content than a segmentation constructed based on the number of time points in each segment, without examining
the item set data at the time points, and can be used to analyze different types of temporal data. |
| |
Keywords: | Item-set time series Measure function Segment difference Segmentation algorithms Optimal segmentation |
本文献已被 SpringerLink 等数据库收录! |
|