首页 | 本学科首页   官方微博 | 高级检索  
     

基于Spark平台和并行随机森林回归算法的短期电力负荷预测
引用本文:刘琪琛,雷景生,郝珈玮,黄燕刚,李强,罗海波. 基于Spark平台和并行随机森林回归算法的短期电力负荷预测[J]. 电力建设, 2017, 38(10). DOI: 10.3969/j.issn.1000-7229.2017.10.012
作者姓名:刘琪琛  雷景生  郝珈玮  黄燕刚  李强  罗海波
作者单位:1. 国网四川省电力公司眉山供电公司,四川省眉山市,620010;2. 上海电力学院计算机科学与技术学院,上海市,200090
基金项目:国家自然科学基金项目,国网眉山供电公司雄鹰创新攻关团队项目(基于调度技术支持系统的大数据分析与应用)Project supported by National Natural Science Foundation of China
摘    要:随着智能电网、全球能源互联网的建设与相关技术的发展,现代电力系统中电力大数据的格局已经形成,如何对高维海量数据进行深度挖掘以实现数据的充分利用,成为当前电力工作者们关心的问题。该文针对电力大数据环境下高精度和实时性的负荷预测展开了研究,提出了基于Spark平台和并行随机森林回归算法(Spark platform and parallel random forest regression,SP-RFR)的短期电力负荷预测方法,通过3次弹性分布式数据集(resilient distributed datasets,RDD)转换实现单机随机森林算法的并行化改进,并在Spark分布式集群环境下实现部署。结合某区域实际电力负荷数据设计试验,进行模型训练和回归预测,通过试验证明,对同等的数据集,基于Spark平台的并行随机森林回归算法预测精度高于单机负荷预测算法;并行随机森林算法受离群数据干扰较小,且随着数据集的增大,并行随机森林算法表现出良好的鲁棒性;与单机算法在运行时间上相比,随着数据集的增大,基于分布式集群的方法优势明显。该文提出的方法能够有效地在分布式环境中进行电力负荷预测,为负荷预测提供了一种新思路。

关 键 词:电力大数据  分布式计算  并行随机森林回归算法  Spark平台  短期电力负荷预测

Short-Term Power Load Forecasting Based on Spark Platform and Parallel Random Forest Regression Algorithm Model
LIU Qichen,LEI Jingsheng,HAO Jiawei,HUANG Yangang,LI Qiang,LUO Haibo. Short-Term Power Load Forecasting Based on Spark Platform and Parallel Random Forest Regression Algorithm Model[J]. Electric Power Construction, 2017, 38(10). DOI: 10.3969/j.issn.1000-7229.2017.10.012
Authors:LIU Qichen  LEI Jingsheng  HAO Jiawei  HUANG Yangang  LI Qiang  LUO Haibo
Abstract:With the development of smart grid,global energy Intemet and related technologies,the structure of power big data is already formed.How to make full use of the high-dimensional massive data through data mining to make full use of data has aroused widespread concern of power workers.Aiming at the high precision and real-time load forecasting with the background of power big data,this paper proposes the short-term power load forecasting based on Spark platform and parallel random forest regression (SP-RFR) algorithm.The parallelization improvement of single machine random forest algorithm is realized by three transforms of resilient distributed datasets(RDD),and can be deployed on a Spark distributed cluster.Experiments are designed by using actual power load data of a transformer substation,and model training and regression prediction are implemented.The conclusions are as follows,for the same testing data set,the short-term power load forecasting method based on SP-RFR model is superior to single machine regression forecasting model;SP-RFR model is less disturbed by outlier data,and SP-RFR model has good robustness with the increase of data set;compared with the single machine model,with the increase of the data set,the SPRFR,which is based on the distributed cluster,has obvious advantages.The proposed method can effectively forecast power load in distributed background,which can provide a new idea for power load forecasting.
Keywords:power big data  distributed computing  parallel random forest regression algorithm  Spark platform  short-term power load forecasting
本文献已被 CNKI 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号