首页 | 本学科首页   官方微博 | 高级检索  
     


Storage-optimizing clustering algorithms for high-dimensional tick data
Affiliation:1. Faculty of Electrical Engineering and Informatics, Budapest University of Technology and Economics, 1117 Budapest, Magyar tudósok körútja 2., Hungary;2. University of Eichstätt-Ingolstadt, Auf der Schanz 49, 85049 Ingolstadt, Germany;1. Department of Business and Entrepreneurial Management, Kainan University, 1, Kainan Road, Luchu Shiang, Taoyuan 33857, Taiwan;2. Graduate Institute of Management Science, National Chiao Tung University, 1001, Ta-Hsueh Road, Hsinchu 300, Taiwan;3. Graduate Institute of Urban Planning, College of Public Affairs, National Taipei University, 151, University Road, San Shia 237, Taiwan;1. Graduate Program in Computer Science, PPGI, UFES Federal University of Espirito Santo, Av. Fernando Ferrari, 514, CEP 29075-910 Vitória, Espírito Santo, ES, Brazil;2. Department of Production Engineering & Graduate Program in Computer Science, PPGI, UFES Federal University of Espirito Santo, Av. Fernando Ferrari, 514, CEP 29075-910 Vitória, Espírito Santo, ES, Brazil;1. University of Pinar del Rio “Hermanos Saiz Montes de Oca”, Road Marti, No. 272, Pinar del Rio, Cuba;2. University “Pablo de Olavide”, Road Utrera, km 1, 41013 Sevilla, Spain;1. Information Technology Research Group (GTI), Universidad del Cauca, Sector Tulcán Office 450, Popayán, Colombia;2. Computer Science Department, Electronic and Telecommunications Engineering Faculty, Universidad del Cauca, Colombia;3. Data Mining Research Group (MIDAS), Engineering Faculty, Universidad Nacional de Colombia, Bogotá, Colombia;1. School of Telecommunication and Information Engineering, Xi’an University of Posts and Telecommunications, Xi’an, PR China;2. School of Computer Science, Shaanxi Normal University, Xi’an, PR China
Abstract:Tick data are used in several applications that need to keep track of values changing over time, like prices on the stock market or meteorological measurements. Due to the possibly very frequent changes, the size of tick data tends to increase rapidly. Therefore, it becomes of paramount importance to reduce the storage space of tick data while, at the same time, allowing queries to be executed efficiently. In this paper, we propose an approach to decompose the original tick data matrix by clustering their attributes using a new clustering algorithm called Storage-Optimizing Hierarchical Agglomerative Clustering (SOHAC). We additionally propose a method for speeding up SOHAC based on a new lower bounding technique that allows SOHAC to be applied to high-dimensional tick data. Our experimental evaluation shows that the proposed approach compares favorably to several baselines in terms of compression. Additionally, it can lead to significant speedup in terms of running time.
Keywords:Tick data  Clustering  Storage
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号