首页 | 本学科首页   官方微博 | 高级检索  
     

MapReduce框架下基于超平面投影划分的Skyline计算
引用本文:王淑艳, 杨鑫, 李克秋. MapReduce框架下基于超平面投影划分的Skyline计算[J]. 计算机研究与发展, 2014, 51(12): 2702-2710. DOI: 10.7544/issn1000-1239.2014.20131329
作者姓名:王淑艳  杨鑫  李克秋
作者单位:1.1(大连理工大学软件学院 辽宁大连 116024);2.2(大连理工大学计算机科学与技术学院 辽宁大连 116024) (wangshuyandlut@gmail.com)
基金项目:国家自然科学基金项目(61225010,61432002,61173162,61300084);微软亚洲研究院与中国科学院计算机网络信息中心合作项目
摘    要:近年来,Skyline计算在决策应用中起着越来越重要的作用.针对单机处理的研究已较为成熟.现今大数据爆炸,Skyline计算面临着大数据处理的问题.MapReduce是一个并行模型,广泛应用于数据密集型应用处理中.众所周知,MapReduce处理要求任务是可分解的.Skyline计算在MapReduce上执行时,分解任务的方法有网格划分、基于角度的划分等.网格划分仅在数据维度较低时表现良好;基于角度的划分适用于低维和高维数据,但在划分前需要一个复杂并且费时的坐标转换过程.现采用一种与基于角度的划分类似的基于超平面投影的划分来分解数据集,这种划分适用于低维和高维数据,而且其在划分前的坐标转换较为简单.根据超平面投影的划分提出了一种在MapReduce上处理Skyline计算的算法MR-HPP(MapReduce with hyperplane-projections-based partition),并在该算法的过滤阶段提出了一种有效的过滤算法PSF(presorting filter).大量基于Hadoop平台的对比实验表明该算法的准确性、高效性和稳定性.

关 键 词:Skyline计算  大数据  MapReduce  超平面投影划分  过滤

Skyline Computing on MapReduce with Hyperplane-Projections-Based Partition
Wang Shuyan, Yang Xin, Li Keqiu. Skyline Computing on MapReduce with Hyperplane-Projections-Based Partition[J]. Journal of Computer Research and Development, 2014, 51(12): 2702-2710. DOI: 10.7544/issn1000-1239.2014.20131329
Authors:Wang Shuyan  Yang Xin  Li Keqiu
Affiliation:1.1(School of Software, Dalian University of Technology, Dalian, Liaoning 116024);2.2(School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024)
Abstract:Recently, Skyline computing has been playing a more and more important role in decision-making applications. Centralized processing has become relatively mature. Today with explosion of big data, Skyline computing faces the same problem of big data processing. MapReduce is a parallel model and it is widely used in data-intensive processing. As we all know, processing on MapReduce requires the task be decomposable. There are some partition methods for Skyline computing on MapReduce, such as grid partition, angle-based partition and so on. Grid partition can only get good performance on low dimensional dataset. Angle-based partition applies to both low dimensional and high dimensional dataset. But it needs a complex and time-consuming coordinates conversion process before partitioning. In this paper, we employ a method similar to angle-based partition method called hyperplane-projections-based partition to break down our dataset. It applies to both low dimensional and high dimensional dataset and at the same time the coordinates conversion process before partitioning is very simple. We propose an algorithm to process Skyline computing on MapReduce called MR-HPP(MapReduce with hyperplane-projections-based partition) based on hyperplane-projections partition. Moreover, we propose an effective filter method called PSF(presorting filter) in the filter period of MR-HPP. Extensive comparative experiments based on Hadoop have proved that our method is accurate, efficient and stable.
Keywords:Skyline computing  big data  MapReduce  hyperplane-projections-based partition  filter
本文献已被 CNKI 等数据库收录!
点击此处可从《计算机研究与发展》浏览原始摘要信息
点击此处可从《计算机研究与发展》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号