首页 | 本学科首页   官方微博 | 高级检索  
     

奇偶合并排序的数据级并行实现
引用本文:张珂良,李佳佳,陈钢,吴百锋.奇偶合并排序的数据级并行实现[J].小型微型计算机系统,2012,33(6):1343-1349.
作者姓名:张珂良  李佳佳  陈钢  吴百锋
作者单位:复旦大学 计算机科学技术学院,上海,201203
基金项目:专用集成电路与系统国家重点实验室,AMD大学合作计划基金
摘    要:针对奇偶合并排序中存在的巨大数据级并行性潜力,通过将其实现于提供了强大数据级并行性的GPU处理器之上而获取较高的加速比.同时,针对OpenCL不支持各工作组间的工作线程的同步问题,提出两种解决方法,一种是通过主机程序控制迭代过程,从而完全避免所有工作线程对于同步操作的需求;另一种是通过桶划分预处理技术将对于同步操作的需求控制在单个工作组,然后利用单个工作组提供的各工作线程间的同步机制以正确的处理同步操作.实验结果表明,按照本文方法实现的程序性能相对于C++STL库中的sort实现有着明显的提高.

关 键 词:奇偶合并排序  数据级并行  通用目的计算  图形处理器  OpenCL

Data Level Parallelism Implementation of Odd-even Merge Sort
ZHANG Ke-liang , LI Jia-jia , CHEN Gang , WU Bai-feng.Data Level Parallelism Implementation of Odd-even Merge Sort[J].Mini-micro Systems,2012,33(6):1343-1349.
Authors:ZHANG Ke-liang  LI Jia-jia  CHEN Gang  WU Bai-feng
Affiliation:(School of Computer Science,Fudan University,Shanghai 201203,China)
Abstract:Due to the significant data level parallelism of the odd-even merge sort algorithm,we implement it on the GPU processor with enormous computational potential for the sake of getting much better performance.However,there is no synchronization mechanism for different work-items among various work-groups.Thus,we present two methods to solve this problem: the first method is using the host program to control the iteration process,it can completely avoid the requirement of synchronization operation among all work-items;and the second method is utilizing the bucket partition preprocessing technology to make the requirement of synchronization operation to individual work-group only,and then can deal with the synchronization operation correctly by the synchronization mechanism supported by every single work-group.Experiment result shows that the performance of our implementation has achieved obvious performance improvement compared with the sort algorithm in the C++ STL library.
Keywords:odd-even merge sort  data level parallelism  GPGPU  GPU  OpenCL
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号