首页 | 本学科首页   官方微博 | 高级检索  
     

基于Spark的ItemBased推荐算法性能优化
引用本文:廖彬,张陶,国冰磊,于炯,张旭光,刘炎.基于Spark的ItemBased推荐算法性能优化[J].计算机应用,2017,37(7):1900-1905.
作者姓名:廖彬  张陶  国冰磊  于炯  张旭光  刘炎
作者单位:1. 新疆财经大学 统计与信息学院, 乌鲁木齐 830012;2. 新疆医科大学 医学工程技术学院, 乌鲁木齐 830011;3. 新疆大学 信息科学与工程学院, 乌鲁木齐 830008;4. 清华大学 软件学院, 北京 100084
基金项目:国家自然科学基金资助项目(61562078,61262088);新疆维吾尔自治区自然科学基金资助项目(2016D01B014)。
摘    要:MapReduce计算场景下,复杂的大数据挖掘类算法通常需要多个MapReduce作业协作完成,但多个作业之间严重的冗余磁盘读写及重复的资源申请操作,使得算法的性能严重降低。为提高ItemBased推荐算法的计算效率,首先对MapReduce平台下ItemBased协同过滤算法存在的性能问题进行了分析;在此基础上利用Spark迭代计算及内存计算上的优势提高算法的执行效率,并实现了基于Spark平台的ItemBased推荐算法。实验结果表明:当集群节点规模分别为10与20时,算法在Spark中的运行时间分别只有MapReduce中的25.6%及30.8%,Spark平台下的算法相比MapReduce平台,执行效率整体提高3倍以上。

关 键 词:协同过滤  MapReduce  Spark算法  性能优化  有向非循环图  
收稿时间:2017-01-16
修稿时间:2017-03-01

Performance optimization of ItemBased recommendation algorithm based on Spark
LIAO Bin,ZHANG Tao,GUO Binglei,YU Jiong,ZHANG Xuguang,LIU Yan.Performance optimization of ItemBased recommendation algorithm based on Spark[J].journal of Computer Applications,2017,37(7):1900-1905.
Authors:LIAO Bin  ZHANG Tao  GUO Binglei  YU Jiong  ZHANG Xuguang  LIU Yan
Affiliation:1. College of Statistics and Information, Xinjiang University of Finance and Economics, Urumqi Xinjiang 830012, China;2. College of Medical Engineering and Technology, Xinjiang Medical University, Urumqi Xinjiang 830011, China;3. School of Information Science and Engineering, Xinjiang University, Urumqi Xinjiang 830008, China;4. School of Software, Tsinghua University, Beijing 100084, China
Abstract:Under MapReduce computing scenarios, complex data mining algorithms typically require multiple MapReduce jobs' collaboration process to compete the task. However, serious redundant disk read and write and repeat resource request operations among multiple MapReduce jobs seriously degrade the performance of the algorithm under MapReduce. To improve the computational efficiency of ItemBased recommendation algorithm, firstly, the performance issues of the ItemBased collaborative filtering algorithm under MapReduce platform were analyzed. Secondly, the execution efficiency of the algorithm was improved by taking advantage of Spark's performance superiority on iterative computation and memory computing, and the ItemBased collaborative filtering algorithm under Spark platform was implemented. The experimental results show that, when the size of the cluster nodes is 10 and 20, the running time of the algorithm in Spark is only 25.6% and 30.8% of that in MapReduce. The algorithm's overall computing efficiency of Spark platform improves more than 3 times compared with that of MapReduce platform.
Keywords:collaborative filtering                                                                                                                        MapReduce                                                                                                                        Spark algorithm                                                                                                                        performance optimization                                                                                                                        Directed Acyclic Graph (DAG)
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号