Clustering Time Series with Clipped Data期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Clustering Time Series with Clipped Data

Authors:	Anthony Bagnall Gareth Janacek

Affiliation:	(1) School of Computing Sciences, University of East Anglia, Norwich, UK

Abstract:	Clustering time series is a problem that has applications in a wide variety of fields, and has recently attracted a large amount of research. Time series data are often large and may contain outliers. We show that the simple procedure of clipping the time series (discretising to above or below the median) reduces memory requirements and significantly speeds up clustering without decreasing clustering accuracy. We also demonstrate that clipping increases clustering accuracy when there are outliers in the data, thus serving as a means of outlier detection and a method of identifying model misspecification. We consider simulated data from polynomial, autoregressive moving average and hidden Markov models and show that the estimated parameters of the clipped data used in clustering tend, asymptotically, to those of the unclipped data. We also demonstrate experimentally that, if the series are long enough, the accuracy on clipped data is not significantly less than the accuracy on unclipped data, and if the series contain outliers then clipping results in significantly better clusterings. We then illustrate how using clipped series can be of practical benefit in detecting model misspecification and outliers on two real world data sets: an electricity generation bid data set and an ECG data set.

Keywords:	clustering time series clipping
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏