求解二维装箱问题的强化学习启发式算法 Reinforcement Learning Heuristic Algorithm for Solving the Two-dimensional Strip Packing Problem期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

求解二维装箱问题的强化学习启发式算法

引用本文：	阳名钢,陈梦烦,杨双远,张德富.求解二维装箱问题的强化学习启发式算法[J].软件学报,2021,32(12):3684-3697.

作者姓名：	阳名钢陈梦烦杨双远张德富

作者单位：	厦门大学信息学院, 福建厦门 361005

基金项目：	国家自然科学基金（61672439）

摘要：	二维带形装箱问题是一个经典的NP-hard的组合优化问题，该问题在实际的生活和工业生产中有着广泛的应用.研究该问题，对企业节约成本、节约资源以及提高生产效率有着重要的意义.提出了一个强化学习求解算法.新颖地使用强化学习为启发式算法提供一个初始的装箱序列，有效地改善启发式冷启动的问题.该强化学习模型能进行自我驱动学习，仅使用启发式计算的解决方案的目标值作为奖励信号来优化网络，使网络能学习到更好的装箱序列.使用简化版的指针网络来解码输出装箱序列，该模型由嵌入层、解码器和注意力机制组成.使用Actor-Critic算法对模型进行训练，提高了模型的效率.在714个标准问题实例和随机生成的400个问题实例上测试提出的算法，实验结果显示：提出的算法能有效地改善启发式冷启动的问题，性能超过当前最优秀的启发式求解算法.
关键词：	二维装箱问题强化学习指针网络启发式算法分层搜索
收稿时间：	2020/7/14 0:00:00
修稿时间：	2020/8/20 0:00:00
Reinforcement Learning Heuristic Algorithm for Solving the Two-dimensional Strip Packing Problem

YANG Ming-Gang,CHEN Meng-Fan,YANG Shuang-Yuan,ZHANG De-Fu.Reinforcement Learning Heuristic Algorithm for Solving the Two-dimensional Strip Packing Problem[J].Journal of Software,2021,32(12):3684-3697.

Authors:	YANG Ming-Gang CHEN Meng-Fan YANG Shuang-Yuan ZHANG De-Fu

Affiliation:	School of Informatics, Xiamen University, Xiamen 361005, China

Abstract:	The two-dimensional strip packing problem is a classic NP-hard combinatorial optimization problem, which has been widely used in daily life and industrial production. This study proposes a reinforcement learning heuristic algorithm for it. The reinforcement learning is used to provide an initial boxing sequence for the heuristic algorithm to effectively improve the heuristic cold start problem. The reinforcement learning model can perform self-driven learning, using only the value of the heuristically calculated solution as a reward signal to optimize the network, so that the network can learn a better packing sequence. A simplified version of the pointer network is used to decode the output boxing sequence. The model consists of an embedding layer, a decoder, and an attention mechanism. Actor-critic algorithm is used to train the model, which improves the efficiency of the model. The reinforcement learning heuristic algorithm is tested on 714 standard problem instances and 400 generated problem instances. Experimental results show that the proposed algorithm can effectively improve the heuristic cold start problem and outperform the state-of-the-art heuristics with much higher solution quality.

Keywords:	two-dimensional strip packing problem reinforcement learning pointer network heuristics hierarchical search

	点击此处可从《软件学报》浏览原始摘要信息
	点击此处可从《软件学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏