首页 | 本学科首页   官方微博 | 高级检索  
     


Non-uniformity issues and workarounds in bounded-size sampling
Authors:Rainer Gemulla  Peter J Haas  Wolfgang Lehner
Affiliation:1. Max-Planck-Institut für Informatik, Saarbrücken, Germany
2. IBM Almaden Research Center, San Jose, CA, USA
3. Technische Universit?t Dresden, Dresden, Germany
Abstract:A variety of schemes have been proposed in the literature to speed up query processing and analytics by incrementally maintaining a bounded-size uniform sample from a dataset in the presence of a sequence of insertion, deletion, and update transactions. These algorithms vary according to whether the dataset is an ordinary set or a multiset and whether the transaction sequence consists only of insertions or can include deletions and updates. We report on subtle non-uniformity issues that we found in a number of these prior bounded-size sampling schemes, including some of our own. We provide workarounds that can avoid the non-uniformity problem; these workarounds are easy to implement and incur negligible additional cost. We also consider the impact of non-uniformity in practice and describe simple statistical tests that can help detect non-uniformity in new algorithms.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号