首页 | 本学科首页   官方微博 | 高级检索  
     

面向数据特征的内存跳表优化技术
引用本文:李梁,吴刚,王国仁.面向数据特征的内存跳表优化技术[J].软件学报,2020,31(3):663-679.
作者姓名:李梁  吴刚  王国仁
作者单位:东北大学计算机科学与工程学院,辽宁沈阳 110169;东北大学计算机科学与工程学院,辽宁沈阳 110169;计算机软件新技术国家重点实验室(南京大学),江苏南京210093;北京理工大学计算机学院,北京 100081
基金项目:国家自然科学基金(61872072,U1811262,61572119,61622202,61672145,61732003,61572121,61332006,61332014,61328202,61702087);中央高校基础科研业务费(N181605012,N171604007,N171904007);中国博士后科学基金(2018M631358)
摘    要:跳表作为数据库中被广泛采用的索引技术,优点在于可以达到类似折半查找的复杂度O(log(n)).但是标准跳表算法中,结点的层数是通过随机算法生成的,这就导致跳表的性能是不稳定的.在极端情况下,查找复杂度会退化到O(n).这是因为经典跳表结构没有结合数据的特征.一个稳定的跳表结构应该充分考虑数据的分布特征去决定结点层数.基于核密度估计的方式估计数据累积分布函数,预测数据在跳表中的位置,进而设计用于判定结点层数的跳表算法.另外,跳表的查找过程中,结点层数越大的结点被访问的概率越高.针对历史数据的访问频次,设计一种保证频繁访问的“热”数据尽可能地在跳表的上层,而访问较少的“冷”数据在跳表的下层的跳表算法.最后,基于合成数据和真实数据对标准跳表和5种改进的跳表算法进行了全面的实验评估并开源代码.实验结果表明,优化的跳表最高可以获取60%的性能提升.这为未来的科研工作者和系统开发人员指出了一个很好的方向.

关 键 词:内存索引  跳表  机器学习  密度估计
收稿时间:2019/7/19 0:00:00
修稿时间:2019/11/25 0:00:00

In-memory Skiplist Optimization Technologies Based on Data Feature
LI Liang,WU Gang and WANG Guo-Ren.In-memory Skiplist Optimization Technologies Based on Data Feature[J].Journal of Software,2020,31(3):663-679.
Authors:LI Liang  WU Gang and WANG Guo-Ren
Affiliation:School of Computer Science and Engineering, Northeastern University, Shenyang 110169,School of Computer Science and Engineering, Northeastern University, Shenyang 110169;State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210093 and School of Computer Science and Technology, Beijing Institute of Technology University, Beijing 100081
Abstract:Skiplist is a widely used indexing technology in the database systems.The advantage is that the complexity of skiplist is O(log(n)). However, in the standard skiplist algorithm, the level of each nodes are generated by a random generator, thus, the performance of the skiplist is unstable. In extremely cases, the searching complexity deceases to O(n) which is similar to the list searching time. This is because the classic skiplist do not combine data features to generate a its structure. We believe that a stable skiplist structure should fully consider the distribution characteristics of the data to determine the number of node levels. This paper estimates the data cumulative distribution function based on the kernel density estimation method, and predicts the position of the data in the skiplist, determines the number of node levels. In addition, we find that the node with a higher level has a higher probability of being accessed. This paper also focuses on the access frequency and the hot data of frequent access, make sure that the upper level of the skiplist are hot data, and access the less cold data in the lower level of skiplist. Finally, we perform a comprehensive experimental evaluation of the six kinds of skiplist algorithms based on the synthesis dataset and real dataset, besides, we open the source code. The results show that the best skiplist algorithm can achieve a 60% performance improvement. This point out a good direction for the future researchers and system developers.
Keywords:In-Memory Index  skiplist  machine learning  density estimation
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号