首页 | 本学科首页   官方微博 | 高级检索  
     

基于Flink的分布式在线集成学习框架研究
引用本文:曹张宇,钟原,周静. 基于Flink的分布式在线集成学习框架研究[J]. 计算机应用研究, 2023, 40(6)
作者姓名:曹张宇  钟原  周静
作者单位:西南石油大学,西南石油大学,西南石油大学
基金项目:国家自然基金资助项目(61873218);西南石油大学创新基地资助项目(642)
摘    要:在大数据环境背景下,传统机器学习算法多采用单机离线训练的方式,显然已经无法适应持续增长的大规模流式数据的变化。针对该问题,提出一种基于Flink平台的分布式在线集成学习算法。该方法基于Flink分布式计算框架,首先通过数据并行的方式对在线学习算法进行分布式在线训练;然后将训练出的多个子模型通过随机梯度下降算法进行模型的动态权重分配,实现对多个子模型的结果聚合;与此同时,对于训练效果不好的模型利用其样本进行在线更新;最后通过单机与集群环境在不同数据集上做实验对比分析。实验结果表明,在线学习算法结合Flink框架的分布式集成训练,能达到集中训练方式下的性能,同时大大提高了训练的时间效率。

关 键 词:分布式流计算   在线学习   集成学习   Flink
收稿时间:2022-09-04
修稿时间:2023-05-18

Research on distributed online integrated learning framework based on Flink
Affiliation:Southwest Petroleum University,,
Abstract:In the environment of big data, traditional machine learning algorithms mostly use stand-alone offline training, which is obviously unable to adapt to the continuous growth of large-scale streaming data changes. Considering the problems above, this paper proposed a distributed online integrated learning algorithm based on Flink platform, which was based on Flink distributed computing framework. The algorithm firstly performed distributed online training on the online learning algorithm through data parallelism. And then the trained multiple submodels were dynamically weighted allocation of the model through stochastic gradient descent algorithm to realize the results aggregation of multiple submodels. At the same time, the samples of the models with poor training effect were updating online. Finally, by comparing the performance of stand-alone and cluster environments on different datasets, the experimental results show that the online learning algorithm combined with the distributed ensemble training of the Flink framework can achieve the performance in the centralized training mode, and greatly improve the time efficiency of training.
Keywords:distributed stream computing   online learning   ensemble learning   Flink
点击此处可从《计算机应用研究》浏览原始摘要信息
点击此处可从《计算机应用研究》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号