基于Spark的ItemBased推荐算法性能优化 Performance optimization of ItemBased recommendation algorithm based on Spark期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于Spark的ItemBased推荐算法性能优化

引用本文：	廖彬,张陶,国冰磊,于炯,张旭光,刘炎.基于Spark的ItemBased推荐算法性能优化[J].计算机应用,2017,37(7):1900-1905.

作者姓名：	廖彬张陶国冰磊于炯张旭光刘炎

作者单位：	1. 新疆财经大学统计与信息学院, 乌鲁木齐 830012;2. 新疆医科大学医学工程技术学院, 乌鲁木齐 830011;3. 新疆大学信息科学与工程学院, 乌鲁木齐 830008;4. 清华大学软件学院, 北京 100084

基金项目：	国家自然科学基金资助项目（61562078，61262088）；新疆维吾尔自治区自然科学基金资助项目（2016D01B014）。

摘要：	MapReduce计算场景下，复杂的大数据挖掘类算法通常需要多个MapReduce作业协作完成，但多个作业之间严重的冗余磁盘读写及重复的资源申请操作，使得算法的性能严重降低。为提高ItemBased推荐算法的计算效率，首先对MapReduce平台下ItemBased协同过滤算法存在的性能问题进行了分析；在此基础上利用Spark迭代计算及内存计算上的优势提高算法的执行效率，并实现了基于Spark平台的ItemBased推荐算法。实验结果表明：当集群节点规模分别为10与20时，算法在Spark中的运行时间分别只有MapReduce中的25.6%及30.8%，Spark平台下的算法相比MapReduce平台，执行效率整体提高3倍以上。
关键词：	协同过滤 MapReduce Spark算法性能优化有向非循环图
收稿时间：	2017-01-16
修稿时间：	2017-03-01
Performance optimization of ItemBased recommendation algorithm based on Spark

LIAO Bin,ZHANG Tao,GUO Binglei,YU Jiong,ZHANG Xuguang,LIU Yan.Performance optimization of ItemBased recommendation algorithm based on Spark[J].journal of Computer Applications,2017,37(7):1900-1905.

Authors:	LIAO Bin ZHANG Tao GUO Binglei YU Jiong ZHANG Xuguang LIU Yan

Affiliation:	1. College of Statistics and Information, Xinjiang University of Finance and Economics, Urumqi Xinjiang 830012, China;2. College of Medical Engineering and Technology, Xinjiang Medical University, Urumqi Xinjiang 830011, China;3. School of Information Science and Engineering, Xinjiang University, Urumqi Xinjiang 830008, China;4. School of Software, Tsinghua University, Beijing 100084, China

Abstract:	Under MapReduce computing scenarios, complex data mining algorithms typically require multiple MapReduce jobs' collaboration process to compete the task. However, serious redundant disk read and write and repeat resource request operations among multiple MapReduce jobs seriously degrade the performance of the algorithm under MapReduce. To improve the computational efficiency of ItemBased recommendation algorithm, firstly, the performance issues of the ItemBased collaborative filtering algorithm under MapReduce platform were analyzed. Secondly, the execution efficiency of the algorithm was improved by taking advantage of Spark's performance superiority on iterative computation and memory computing, and the ItemBased collaborative filtering algorithm under Spark platform was implemented. The experimental results show that, when the size of the cluster nodes is 10 and 20, the running time of the algorithm in Spark is only 25.6% and 30.8% of that in MapReduce. The algorithm's overall computing efficiency of Spark platform improves more than 3 times compared with that of MapReduce platform.

Keywords:	collaborative filtering MapReduce Spark algorithm performance optimization Directed Acyclic Graph (DAG)

	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏