首页 | 本学科首页   官方微博 | 高级检索  
     

基于Parameter Server框架的大数据挖掘优化算法
引用本文:刘洋,刘博,王峰.基于Parameter Server框架的大数据挖掘优化算法[J].山东大学学报(工学版),2017,47(4):1-6.
作者姓名:刘洋  刘博  王峰
作者单位:1. 河南财经政法大学云计算与大数据研究所, 河南 郑州 450046;2. 华中科技大学计算机学院, 湖北 武汉 430074
基金项目:河南省重点科技攻关资助项目(162102210096,152102210088,142102210090);河南省高等学校重点科研资助项目(18A520014)
摘    要:基于大数据挖掘的实时性要求和数据样本的多样性特征,提出一种面向大数据挖掘的机器学习模型训练优化算法。分析当前算法的迭代计算过程,根据模型向量的改变量将迭代过程分为粗调和微调两个阶段,并发现在微调阶段绝大部分样本对计算结果的影响极小,因此可以在微调阶段不计算此类样本的梯度而直接采用上次迭代的计算结果,从而减小计算量,提升计算效率。试验结果表明,算法在分布式集群环境下可以减小模型训练约35%的计算量,且训练得到的模型准确度在正常范围内,可有效提高大数据挖掘的实时性。

关 键 词:优化算法  分布式系统  大数据  样本差异性  机器学习  
收稿时间:2016-09-03

Optimization algorithm for big data mining based on parameter server framework
LIU Yang,LIU Bo,WANG Feng.Optimization algorithm for big data mining based on parameter server framework[J].Journal of Shandong University of Technology,2017,47(4):1-6.
Authors:LIU Yang  LIU Bo  WANG Feng
Affiliation:1. Institute of Cloud Computing and Big Data, Henan University of Economics and Law, Zhengzhou 450046, Henan, China;2. School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
Abstract:Traditional machine learning algorithms for small data were not applicable for mining of big data. An optimization algorithm for machine learning and big data mining was proposed. The iterative computation of machine learning algorithms was divided into two phases according to the change of model vector. According to the observation that most samples contributed little to the model update during the iteration, the computation load of machine learning algorithms could be reduced by reusing the iterative computing results of this kind of samples. The experimental results showed that the proposed method could reduce the computation load by 35%, with little effect on prediction accuracy of the training model.
Keywords:big data  sample diversity  machine learning  distributed system  optimization  
本文献已被 CNKI 等数据库收录!
点击此处可从《山东大学学报(工学版)》浏览原始摘要信息
点击此处可从《山东大学学报(工学版)》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号