首页 | 本学科首页   官方微博 | 高级检索  
     

基于自适应多目标强化学习的服务集成方法
引用本文:郭潇,李春山,张宇跃,初佃辉.基于自适应多目标强化学习的服务集成方法[J].计算机应用,2022,42(11):3500-3505.
作者姓名:郭潇  李春山  张宇跃  初佃辉
作者单位:哈尔滨工业大学(威海) 计算机科学与技术学院,山东 威海 264209
基金项目:国家重点研发计划项目(2018YFB1402500);国家自然科学基金资助项目(61902090);山东省自然科学基金资助项目(ZR2020KF019)
摘    要:当前服务互联网(IoS)中的服务资源呈现精细化、专业化的趋势,功能单一的服务无法满足用户复杂多变的需求,服务集成调度方法已经成为服务计算领域的热点。现有的服务集成调度方法大都只考虑用户需求的满足,未考虑IoS生态系统的可持续性。针对上述问题,提出一种基于自适应多目标强化学习的服务集成方法,该方法在异步优势演员评论家(A3C)算法的框架下引入多目标优化策略,从而在满足用户需求的同时保证IoS生态系统的健康发展。所提方法可以根据遗憾值对多目标值集成权重进行动态调整,改善多目标强化学习中子目标值不平衡的现象。在真实大规模服务环境下进行了服务集成验证,实验结果表明所提方法相对于传统机器学习方法在大规模服务环境下求解速度更快;相较于权重固定的强化学习(RL),各目标的求解质量更均衡。

关 键 词:服务集成  强化学习  异步优势演员评论家算法  多目标优化  自适应权重  
收稿时间:2021-12-06
修稿时间:2021-12-29

Service integration method based on adaptive multi?objective reinforcement learning
Xiao GUO,Chunshan LI,Yuyue ZHANG,Dianhui CHU.Service integration method based on adaptive multi?objective reinforcement learning[J].journal of Computer Applications,2022,42(11):3500-3505.
Authors:Xiao GUO  Chunshan LI  Yuyue ZHANG  Dianhui CHU
Affiliation:School of Computer Science and Technology,Harbin Institute of Technology (Weihai),Weihai Shandong 264209,China
Abstract:The current service resources in Internet of Services (IoS) show a trend of refinement and specialization. Services with single function cannot meet the complex and changeable requirements of users. Service integrating and scheduling methods have become hot spots in the field of service computing. However, most existing service integrating and scheduling methods only consider the satisfaction of user requirements and do not consider the sustainability of the IoS ecosystem. In response to the above problems, a service integration method based on adaptive multi?objective reinforcement learning was proposed. In this method, a multi?objective optimization strategy was introduced into the framework of Asynchronous Advantage Actor?Critic (A3C) algorithm, so as to ensure the healthy development of the IoS ecosystem while satisfying user needs. The integrated weight of the multi?objective value was able to adjusted dynamically according to the regret value, which improved the imbalance of sub?objective values in multi?objective reinforcement learning. The service integration verification was carried out in a real large?scale service environment. Experimental results show that the proposed method is faster than traditional machine learning methods in large?scale service environment, and has a more balanced solution quality of each objective compared with Reinforcement Learning (RL) with fixed weights.
Keywords:service integration  Reinforcement Learning (RL)  Asynchronous Advantage Actor?Critic (A3C) algorithm  multi?objective optimization  adaptive weight  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号