基于强化学习的多技能项目调度算法 Reinforcement learning-based algorithm for multi-skill project scheduling problem期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于强化学习的多技能项目调度算法

引用本文：	胡振涛,崔南方,胡雪君,雷晓琪.基于强化学习的多技能项目调度算法[J].控制理论与应用,2024,41(3):502-511.

作者姓名：	胡振涛崔南方胡雪君雷晓琪

作者单位：	华中科技大学,华中科技大学,湖南大学,华中科技大学

基金项目：	国家自然科学基金项目(71971094, 71701067, 72071075), 湖南省自然科学基金项目(2019JJ50039)资助.

摘要：	多技能项目调度存在组合爆炸的现象, 其问题复杂度远超传统的单技能项目调度, 启发式算法和元启发式算法在求解多技能项目调度问题时也各有缺陷. 为此, 根据项目调度的特点和强化学习的算法逻辑, 本文设计了基于强化学习的多技能项目调度算法. 首先, 将多技能项目调度过程建模为符合马尔科夫性质的序贯决策过程, 并依据决策过程设计了双智能体机制. 而后, 通过状态整合和行动分解, 降低了价值函数的学习难度. 最后, 为进一步提高算法性能, 针对资源的多技能特性, 设计了技能归并法, 显著降低了资源分配算法的时间复杂度. 与启发式算法的对比实验显示, 本文所设计的强化学习算法求解性能更高, 与元启发式算法的对比实验表明, 该算法稳定性更强, 且求解速度更快.
关键词：	多技能资源项目调度智能算法强化学习并行调度
收稿时间：	2022/6/27 0:00:00
修稿时间：	2024/1/17 0:00:00
Reinforcement learning-based algorithm for multi-skill project scheduling problem

HU Zhen-tao,CUI Nan-fang,HU Xue-jun and LEI Xiao-qi.Reinforcement learning-based algorithm for multi-skill project scheduling problem[J].Control Theory & Applications,2024,41(3):502-511.

Authors:	HU Zhen-tao CUI Nan-fang HU Xue-jun and LEI Xiao-qi

Affiliation:	Huazhong University of Science and Technology,Huazhong University of Science and Technology,Hunan University,Huazhong University of Science and Technology

Abstract:	Combinatorial explosion is a common phenomenon in multi-skill project scheduling, which leads to higher complexity in multi-skill project scheduling problem (MSPSP) than in traditional single-skill project scheduling problem. Heuristics and meta-heuristics have disadvantages in solving MSPSP. Therefore, based on the characteristics of project scheduling and the algorithmic logic of reinforcement learning, a multi-skilled project scheduling algorithm based on reinforcement learning is designed in this paper. Firstly, the multi-skill project scheduling process is modeled as a Markov decision process (MDP). Then, a double-agent mechanism is proposed, and state integration method and action decomposition method are designed to reduce the complexity of value function learning. Finally, skills conflation algorithm is developed to reduce the time complexity of allocating resources in MSPSP. Comparative experiments between the proposed RL algorithm and heuristics show that the reinforcement learning (RL) has better performance, and experiments between the proposed RL algorithm and meta-heuristics show that the RL has higher stability and shorter running time.

Keywords:	multi-skill resource project scheduling intelligence algorithm reinforcement learning PSGS

	点击此处可从《控制理论与应用》浏览原始摘要信息
	点击此处可从《控制理论与应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏