首页 | 本学科首页   官方微博 | 高级检索  
     

基于多线程并行强化学习的建筑节能方法
引用本文:陈建平,康怡怡,胡龄爻,陆悠,吴宏杰,傅启明. 基于多线程并行强化学习的建筑节能方法[J]. 计算机工程与应用, 2019, 55(15): 219-227. DOI: 10.3778/j.issn.1002-8331.1809-0134
作者姓名:陈建平  康怡怡  胡龄爻  陆悠  吴宏杰  傅启明
作者单位:1.苏州科技大学 电子与信息工程学院,江苏 苏州 2150092.苏州科技大学 江苏省建筑智慧节能重点实验室,江苏 苏州 2150093.苏州科技大学 苏州市移动网络技术与应用重点实验室,江苏 苏州 215009
基金项目:国家自然科学基金(No.61502329,No.61772357,No.61750110519,No.61772355,No.61702055,No.61672371,No.61602334);江苏省自然科学基金(No.BK20140283);江苏省重点研发计划项目(No.BE2017663);江苏省高校自然科学研究项目(No.13KJB520020);苏州市应用基础研究计划工业部分(No.SYG201422)
摘    要:提出一种基于并行强化学习的建筑节能方法,该方法结合多线程技术和经验回放方法提出一个多线程并行强化学习算法框架,其新颖点在于:在经验回放过程中引入自模拟度量方法,通过计算样本之间的距离,选取低相似度的样本构造多样样本池,Agent的学习过程从多样样本池中选取样本学习,可有效避免浪费学习资源。实验包括在仿真房间模型上与Q-Learning算法的对比实验和与经典PID控制方法的对比实验。实验结果表明,所提出的并行算法有更快的学习速率和收敛速度,能更快地求解出最优策略,并拥有更高的运行效率。

关 键 词:强化学习  并行强化学习  经验回放  多线程技术  建筑节能

Building Energy Efficiency Method Based on Multi-Thread Parallel Reinforcement Learning
CHEN Jianping,KANG Yiyi,HU Lingyao,LU You,WU Hongjie,FU Qiming. Building Energy Efficiency Method Based on Multi-Thread Parallel Reinforcement Learning[J]. Computer Engineering and Applications, 2019, 55(15): 219-227. DOI: 10.3778/j.issn.1002-8331.1809-0134
Authors:CHEN Jianping  KANG Yiyi  HU Lingyao  LU You  WU Hongjie  FU Qiming
Affiliation:1.College of Electronics and Information Engineering, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, China2.Jiangsu Province Key Laboratory of Intelligent Building Energy Efficiency, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, China3.Suzhou Key Laboratory of Mobile Network Technology and Application, Suzhou University of Science and Technology, Suzhou, Jiangsu 215009, China
Abstract:This paper proposes a method of building energy conservation based on parallel reinforcement learning, which combines a multi-threading technique and the experiment replay method to propose a multi-threading parallel reinforcement learning algorithm framework. Its novelty lies in that it introduces the self-simulation metric method in the experience replay process, which selects samples with low similarity by calculating the distance between samples to construct multiple sample pool. And in the learning process, Agent selects samples from the multiple sample pool to effectively avoid wasting learning resources. The experiments include the comparison experiments with the Q-Learning algorithm and the PID method on the simulation room model. Experimental results show that the proposed parallel algorithm has faster learning rate and convergence rate, and it can solve the optimal policy faster with higher operating efficiency.
Keywords:reinforcement learning  parallel reinforcement learning  experiment replay  multi-threading technology  building conversation  
本文献已被 维普 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号