首页 | 本学科首页   官方微博 | 高级检索  
     

桶外排序算法的抽样分点分发策略
引用本文:杨磊,黄辉,宋涛.桶外排序算法的抽样分点分发策略[J].软件学报,2005,16(5):643-651.
作者姓名:杨磊  黄辉  宋涛
作者单位:1. 清华大学,计算机科学与技术系,北京,100084
2. 中联绿盟信息技术(北京)有限公司,开发部,北京,100089
基金项目:Supported by the National Natural Science Foundation of China under Grant Nos.60223004,60321002,60303005(国家自然科学基金)
摘    要:计算机外排序常用二阶段多路归并算法和桶算法.后者运算开销小,效率更高.但基于关键字高位比特的子文件分发策略应用受限:关键字必须是整数;得到的子文件可能大小不一;子文件数不能任意选择.基于统计学理论,提出抽样分点分发策略克服以上问题,扩展桶排序的应用范围.讨论了抽样分点估计的收敛性,给出了不发生内存溢出的保证概率.该策略使桶排序算法在SheenkSort排序系统上得到成功应用,并最终获得2003年度PennySort世界排序比赛Indy组冠军.

关 键 词:外排序  桶排序  多路归并  分发策略  抽样分点  PennySort
文章编号:1000-9825/2005/16(05)0643
收稿时间:2004/3/23 0:00:00
修稿时间:2004年3月23日

The Sample-Seperators Based Distributing Scheme of the External Bucket Sort Algorithm
YANG Lei,HUANG Hui and SONG Tao.The Sample-Seperators Based Distributing Scheme of the External Bucket Sort Algorithm[J].Journal of Software,2005,16(5):643-651.
Authors:YANG Lei  HUANG Hui and SONG Tao
Abstract:Two ways to sort externally are Multi-Line Merging Sort and Bucket Sort, both with two passes. The Bucket Sort burdens the CPU less and is more efficient, while its usage is restricted heavily by the High-Bit scheme that distributes records into subfiles: the keys have to be integers; the sizes of subfiles may vary too much; the number of subfiles cannot be chosen freely. Based on statistical theory, this paper presents a sample-seperators scheme to broaden the ussage of bucket sort algorithm. A brief discussion on the convergance of sample-seperator estimation is given and the probability to avoid memory overflow is calculated. This scheme enables the bucket sort algorithm to be applied in the SheenkSort system to win the 2003 PennySort (the Indy category) competition.
Keywords:external sort  bucket sort  multi-line merging  distributing scheme  sample-separators  PennySort
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号