基于深度强化学习的微电网在线优化调度 Online optimal scheduling of a microgrid based on deep reinforcement learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于深度强化学习的微电网在线优化调度

引用本文：	季颖,王建辉.基于深度强化学习的微电网在线优化调度[J].控制与决策,2022,37(7):1675-1684.

作者姓名：	季颖王建辉

作者单位：	东北大学信息科学与工程学院,沈阳 110004

基金项目：	国家自然科学基金项目(61733003).

摘要：	提出一种基于深度强化学习的微电网在线优化调度策略.针对可再生能源的随机性及复杂的潮流约束对微电网经济安全运行带来的挑战,以成本最小为目标,考虑微电网运行状态及调度动作的约束,将微电网在线调度问题建模为一个约束马尔可夫决策过程.为避免求解复杂的非线性潮流优化、降低对高精度预测信息及系统模型的依赖,设计一个卷积神经网络结构学习最优的调度策略.所提出的神经网络结构可以从微电网原始观测数据中提取高质量的特征,并基于提取到的特征直接产生调度决策.为了确保该神经网络产生的调度决策能够满足复杂的网络潮流约束,结合拉格朗日乘子法与soft actor-critic,提出一种新的深度强化学习算法来训练该神经网络.最后,为验证所提出方法的有效性,利用真实的电力系统数据进行仿真.仿真结果表明,所提出的在线优化调度方法可以有效地从数据中学习到满足潮流约束且具有成本效益的调度策略,降低随机性对微电网运行的影响.
关键词：	微电网约束马尔可夫过程深度强化学习卷积神经网络
Online optimal scheduling of a microgrid based on deep reinforcement learning

JI Ying,WANG Jian-hui.Online optimal scheduling of a microgrid based on deep reinforcement learning[J].Control and Decision,2022,37(7):1675-1684.

Authors:	JI Ying WANG Jian-hui

Affiliation:	College of Information Science and Engineering,Northeastern University,Shenyang 110004,China

Abstract:	This paper proposes an online scheduling strategy based on deep reinforcement learning(DRL). To overcome the challenges in economic and safe operation of microgrids posed by uncertain renewable energy resources and complex power flow constraints, in this paper, we formulate the microgrid online scheduling problem as a constrained Markov decision process(CMDP) with the objective of operating cost minimization while considering the constraints on the operating states and scheduling actions. To avoid solving complicated nonlinear optimal power flow and reduce the dependency on accurate forecasting information and system model, we design a convolutional neural network(CNN) architecture to learn the optimal scheduling policy. The neural network can extract high-quality features from the original observation data of the microgrid and directly make scheduling decisions based on the extracted features. To ensure the satisfaction of complex power flow constraints, we propose a novel DRL algorithm by combining the Lagrange multiplier method and the soft actor-critic algorithm to train the neural network. To verify the effectiveness of the proposed approach, we use real-world power system data to perform simulation studies. Simulation results demonstrate that the proposed online scheduling optimization approach can effectively learn a cost-effective scheduling strategy that satisfies power flow constraints, mitigating the effect of randomness on microgrids.

Keywords:

	点击此处可从《控制与决策》浏览原始摘要信息
	点击此处可从《控制与决策》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏