用于连续时间中策略梯度算法的动作稳定更新算法 Action stable updating algorithm for policy gradient methods in continuous time期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

用于连续时间中策略梯度算法的动作稳定更新算法

引用本文：	宋江帆,李金龙. 用于连续时间中策略梯度算法的动作稳定更新算法[J]. 计算机应用研究, 2023, 40(10): 2928-2932+2944

作者姓名：	宋江帆李金龙

作者单位：	中国科学技术大学计算机科学与技术学院

摘要：	在强化学习中，策略梯度法经常需要通过采样将连续时间问题建模为离散时间问题。为了建模更加精确，需要提高采样频率，然而过高的采样频率可能会使动作改变频率过高，从而降低训练效率。针对这个问题，提出了动作稳定更新算法。该方法使用策略函数输出的改变量计算动作重复的概率，并根据该概率随机地重复或改变动作。在理论上分析了算法性能。之后在九个不同的环境中评估算法的性能，并且将它和已有方法进行了比较。该方法在其中六个环境下超过了现有方法。实验结果表明，动作稳定更新算法可以有效提高策略梯度法在连续时间问题中的训练效率。
关键词：	强化学习连续时间策略梯度动作重复
收稿时间：	2023-02-27
修稿时间：	2023-05-05
Action stable updating algorithm for policy gradient methods in continuous time

Song JiangFan and JinLongLi. Action stable updating algorithm for policy gradient methods in continuous time[J]. Application Research of Computers, 2023, 40(10): 2928-2932+2944

Authors:	Song JiangFan and JinLongLi

Affiliation:	School of Computer Science and Technology,University of Science and Technology of China,Hefei Anhui,

Abstract:	In reinforcement learning, the policy gradient algorithm often needs to model the continuous-time process as a discrete-time process through sampling. To model the problem more accurately, it improved the sampling frequency. However, the excessive sampling frequency may reduce the training efficiency. To solve this problem, this paper proposed action stable updating algorithm. This method calculated the probability of action repetition using the change of the output of the policy function, and randomly repeated or changed the action based on this probability. This paper theoretically analyzed the performance of this method. This paper evaluated the performance of this method in 9 different environments and compared it with the existing methods. This method surpassed existing methods in six of these environments. The experimental results show that this method can improve the training efficiency of the policy gradient algorithm in continuous-time problems.

Keywords:	reinforcement learning continuous time policy gradient action repetition

	点击此处可从《计算机应用研究》浏览原始摘要信息
	点击此处可从《计算机应用研究》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏