基于强化学习的模型选择和超参数优化 Reinforcement Learning for Model Selection and Hyperparameter Optimization期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于强化学习的模型选择和超参数优化

引用本文：	吴佳,陈森朋,陈修云,周瑞. 基于强化学习的模型选择和超参数优化[J]. 电子科技大学学报(自然科学版), 2020, 49(2): 255-261. DOI: 10.12178/1001-0548.2018279

作者姓名：	吴佳陈森朋陈修云周瑞

作者单位：	电子科技大学信息与软件工程学院　成都　610054

基金项目：	国家自然科学基金(61503059)；四川省科技厅重点研发计划(2018GZ0464)

摘要：	随着机器学习技术的不断发展,机器学习算法种类的增多以及模型复杂度提高,造成了实践应用中的两大难题:算法模型选择及模型超参数优化。为了实现模型选择和超参数优化的自动处理,该文提出了一种基于深度强化学习的优化方法。利用长短期记忆(LSTM)网络构建一个智能体(Agent),自动选择机器学习算法模型及对应的超参数组合。该智能体以最大化机器学习模型在验证数据集上的准确率为目标,利用所选择的模型在验证数据集上的准确率作为奖赏值(reward),通过强化学习算法不断学习直到找到最优的模型以及超参数组合。为了验证该方法的可行性及性能,在UCI标准数据集上将其与传统优化方法中基于树状结构Parzen的估计方法和随机搜索方法进行比较。多次实验结果证明该优化方法在稳定性、时间效率、准确度方面均具有优势。
关键词：	深度强化学习超参数优化 LSTM网络机器学习模型选择
收稿时间：	2018-10-31
Reinforcement Learning for Model Selection and Hyperparameter Optimization

Affiliation:	School of Information and Software Engineering, University of Electronic Science and Technology of China　Chengdu　610054

Abstract:	With the development of machine learning technology, the number of machine learning algorithms grows rapidly and the models become more and more complex. That causes two major problems in practice: the selection of machine learning models and the hyperparameter optimization. In order to tackle these issues, this paper proposes a new method based on deep reinforcement learning. Long short-term memory (LSTM) network is used to build an agent which automatically selects the machine learning model and optimizes hyperparameters for a given dataset. The agent aims to maximize the accuracy of the selected machine learning model on the validation dataset. At each iteration, it utilizes the accuracy of the selected model on the validation dataset as a reward signal to improve its decision for the next time. The reinforcement learning algorithm is used to guide the learning process for the agent. To verify the idea, the proposed method is compared with two widely optimization methods, tree-structured Parzen estimator and random search on UCI datasets. The results show that the proposed method outperforms other methods in terms of stability, time efficiency and accuracy.

Keywords:

	点击此处可从《电子科技大学学报(自然科学版)》浏览原始摘要信息
	点击此处可从《电子科技大学学报(自然科学版)》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏