首页 | 本学科首页   官方微博 | 高级检索  
     


Jump Filter: A Dynamic Sketch for Big Data Governance
Authors:Pengtao Fu  Lailong Luo  Deke Guo  Xiang Zhao  Shangsen Li  Huaimin Wang
Affiliation:College of Systems Engineering, National University of Defense Technology, Changsha 410073, China;College of Systems Engineering, National University of Defense Technology, Changsha 410073, China; College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China
Abstract:With the rapid development of information technology, the volume of data maintains exponential growth, and the value of data is hard to mine. This brings significant challenges to the efficient management and control of each link in the data life cycle, such as data collection, cleaning, storage, and sharing. Sketch uses a hash table/matrix/bit vector to track the core characteristics of data, such as frequency, cardinality, and membership. This mechanism makes the sketch itself metadata, which has been widely used in sharing, transmission, update, and other scenarios. The rapid flow characteristic of big data has spawned dynamic sketches. The existing dynamic sketches have the advantage of expanding or shrinking the capacity with the size of the data stream by dynamically maintaining a list of probabilistic data structures in a chain or tree structure. However, there are problems with the excessive space overhead and time overhead increasing with the increase in the dataset cardinality. This paper designs a dynamic sketch for big data governance on the basis of the advanced jump consistent hash. This method can simultaneously achieve the space overhead that grows linearly with the dataset cardinality and the constant time overhead of data processing and analysis, effectively supporting the demanding big data processing and analysis tasks for big data governance. The validity and efficiency of the proposed method are verified by the comparison with traditional methods on various synthetic and natural datasets.
Keywords:big data  big data governance  metadata  dynamic sketch  probabilistic data structure
点击此处可从《》浏览原始摘要信息
点击此处可从《》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号