首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
基于MetrOPOlis准则的Q-学习算法研究   总被引:3,自引:0,他引:3  
探索与扩张是Q-学习算法中动作选取的关键问题,一味地扩张使智能体很快地陷入局部最优,虽然探索可以跳出局部最优并加速学习,而过多的探索将影响算法的性能,通过把Q-学习中寻求成策略表示为组合优化问题中最优解的搜索,将模拟退火算法的Mketropolis准则用于Q-学习中探索和扩张之间的折衷处理,提出基于Metropolis准则的Q-学习算法SA-Q-learning,通过比较,它具有更快的收敛速度,而且避免了过多探索引起的算法性能下降。  相似文献   

2.
The policy of balance between exploration capability and exploitation capability directly affects the solution performance of the meta-heuristic algorithm in a limited time. In order to better balance the exploration and exploitation capabilities of the algorithm and meet the solution requirements of complex real-world problems, the adaptive balance optimization algorithm (ABOA) is proposed in this paper. The algorithm consists of a global search phase (GSP) and a local search phase (LSP) and is controlled by a fixed parameter. ABOA not only considers the balance of exploration and exploitation capabilities of the algorithm throughout the whole iterative process but also focuses on the balance of exploration and exploitation in both GSP and LSP. The search in both phases is focused around the respective search centers from outside to inside. ABOA balances the exploration and exploitation capabilities of the algorithm throughout the search process by two adaptive policies: changing the search area and changing the search center. Fifty-two unconstrained benchmark test functions were employed to evaluate the performance of ABOA. The results of ABOA were compared with nine excellent optimization algorithms available in the literature. The statistical results and Friedman test showed that ABOA was significantly competitive. Finally, the results of the examined engineering design problems showed that ABOA can solve the constrained optimization problem better compared to other methods.  相似文献   

3.
不等式约束的非线性规划混合遗传算法   总被引:1,自引:0,他引:1  
针对带不等式约束的非线性规划问题,提出了一个混合遗传算法。该算法分为全局探测和局部开采两个阶段,全局探测阶段是通过在有潜力的小生境内嵌入单纯形搜索,快速确定有前景的区域;而局部开采阶段则是在最有前景的区域进行单纯形搜索。该算法增强了局部搜索能力并同时保持种群的多样性,有效地解决了遗传算法的过早收敛和局部搜索能力弱的问题。典型非线性规划算例验证了混合算法的效率、精度和可靠性。  相似文献   

4.
韩红桂  徐子昂  王晶晶 《控制与决策》2023,38(11):3039-3047
多任务粒子群优化算法(multi-task particle swarm ptimization, MTPSO)通过知识迁移学习,具有快速收敛能力,广泛应用于求解多任务多目标优化问题.然而, MTPSO难以根据种群进化状态自适应调整优化过程,容易陷入局部最优,收敛性能较差.针对此问题,利用强化学习的自我进化与预测能力,提出一种基于Q学习的多任务多目标粒子群优化算法(QM2PSO).首先,设计粒子群参数动态更新方法,利用Q学习方法在线更新粒子群算法的惯性权重和加速度参数,提高当前粒子收敛到Pareto前沿的能力;其次,提出基于柯西分布的突变搜索策略,通过全局和局部交替搜索多任务最优解,避免算法陷入局部最优;最后,设计基于正向迁移准则的知识迁移方法,采用Q学习方法更新知识迁移率,改善知识负迁移现象.与已有经典算法的对比实验结果表明所提出的QM2PSO算法具有更优越的收敛性.  相似文献   

5.
Artificial Bee Colony (ABC) algorithm is a wildly used optimization algorithm. However, ABC is excellent in exploration but poor in exploitation. To improve the convergence performance of ABC and establish a better searching mechanism for the global optimum, an improved ABC algorithm is proposed in this paper. Firstly, the proposed algorithm integrates the information of previous best solution into the search equation for employed bees and global best solution into the update equation for onlooker bees to improve the exploitation. Secondly, for a better balance between the exploration and exploitation of search, an S-type adaptive scaling factors are introduced in employed bees’ search equation. Furthermore, the searching policy of scout bees is modified. The scout bees need update food source in each cycle in order to increase diversity and stochasticity of the bees and mitigate stagnation problem. Finally, the improved algorithms is compared with other two improved ABCs and three recent algorithms on a set of classical benchmark functions. The experimental results show that the our proposed algorithm is effective and robust and outperform than other algorithms.  相似文献   

6.
针对蝙蝠算法在求解多峰、复杂非线性问题时,搜索效率降低、易陷入局部最优等不足,提出了一种改进的蝙蝠算法。引入具有短期记忆特性的分数阶策略来更新蝙蝠位置,增加种群多样性,提高了算法收敛速度;用带有Lévy飞行的阿基米德螺旋策略产生局部新解,增强局部开发能力,同时有助于算法跳出局部最优;采用新的非线性动态机制调节响度和脉冲发射率,以平衡算法的探索和开发。选取CEC2014测试集,包括单峰、多峰、混合以及复合函数,对提出的算法和其他群智能算法进行仿真实验,测试结果表明提出的算法搜索效率和求解精度相较于对比算法得到提升,用Friedman统计分析验证了算法的优越性。将提出的算法用于求解机械工程减速器设计问题,与PSO-DE、WCA、APSO进行实验对比,验证该算法的有效性。  相似文献   

7.

针对差分进化算法开发能力较差的问题, 提出一种具有快速收敛的新型差分进化算法. 首先, 利用最优高斯随机游走策略提高算法的开发能力; 然后, 采用基于个体优化性能的简化交叉变异策略实现种群的进化操作以加强其局部搜索能力; 最后, 通过个体筛选策略进一步提高算法的探索能力以避免陷入局部最优. 12 个标准测试函 数和两种带约束的工程优化问题的实验结果表明, 所提出的算法在收敛速度、算法可靠性及收敛精度方面均优于EPSDE、SaDE、JADE、BSA、CoBiDE、GSA和ABC等算法, 在加强算法探索能力的同时能够有效地提高算法的开发能力.

  相似文献   

8.
The ability of an Evolutionary Algorithm (EA) to find a global optimal solution depends on its capacity to find a good rate between exploitation of found-so-far elements and exploration of the search space. Inspired by natural phenomena, researchers have developed many successful evolutionary algorithms which, at original versions, define operators that mimic the way nature solves complex problems, with no actual consideration of the exploration-exploitation balance. In this paper, a novel nature-inspired algorithm called the States of Matter Search (SMS) is introduced. The SMS algorithm is based on the simulation of the states of matter phenomenon. In SMS, individuals emulate molecules which interact to each other by using evolutionary operations which are based on the physical principles of the thermal-energy motion mechanism. The algorithm is devised by considering each state of matter at one different exploration–exploitation ratio. The evolutionary process is divided into three phases which emulate the three states of matter: gas, liquid and solid. In each state, molecules (individuals) exhibit different movement capacities. Beginning from the gas state (pure exploration), the algorithm modifies the intensities of exploration and exploitation until the solid state (pure exploitation) is reached. As a result, the approach can substantially improve the balance between exploration–exploitation, yet preserving the good search capabilities of an evolutionary approach. To illustrate the proficiency and robustness of the proposed algorithm, it is compared to other well-known evolutionary methods including novel variants that incorporate diversity preservation schemes. The comparison examines several standard benchmark functions which are commonly considered within the EA field. Experimental results show that the proposed method achieves a good performance in comparison to its counterparts as a consequence of its better exploration–exploitation balance.  相似文献   

9.
为了解决人工蜂群算法(ABC)容易陷入局部最优、易早熟收敛等问题,提出一种基于反馈机制和丛林法则的人工蜂群算法(Artificial Bee Colony algorithm based on Feedback and the Law of the jungle,LFABC)。该算法在全局搜索公式中引入反馈机制,直接搜索最优解可能存在的区域,提高了算法的开发能力和收敛速度。同时加入线性微分递增策略,平衡算法各个阶段的开发能力和探索能力。根据丛林法则,该算法随机选择较差个体进行初始化,有效防止算法陷入局部最优。实验结果证明,LFABC算法有效提高了算法的收敛精度,且其收敛速度非常突出。  相似文献   

10.
信息年龄(AoI)是一种从目的端的角度衡量所捕获数据新鲜度的性能指标。在能量受限的实时感知物联网场景中,为了提高系统的AoI性能,提出了联合采样和混合反向散射通信更新的策略。该策略通过允许源端选择状态采样动作以及更新过程的传输模式来最小化系统的长期平均AoI。具体来说,首先将该优化问题建模为一个平均成本马尔可夫决策过程(MDP);然后在已知环境动态信息的情况下,通过相关值迭代算法获取最优策略;在缺乏环境动态信息的情况下,采用Q学习算法和探索利用方法,通过与环境的试错交互来学习最优策略。仿真结果表明,与两种参考策略相比,所提出的策略明显提高了系统AoI性能,同时发现系统的AoI性能随更新包尺寸的减小或者电池容量的增大而提升。  相似文献   

11.
为更有效地解决以最大完工时间最小化为目标的置换流水车间调度问题,提出了一种自适应混合粒子群算法(SHPSO)。该算法结合Q学习设计了参数自适应更新策略,以平衡算法的探索和开发;同时引入粒子停滞判断方法,使用平局决胜机制和Taillard加速算法改进基于迭代贪婪的局部搜索策略,对全局极值进行局部搜索,帮助粒子跳出局部最优。实验结果表明,SHPSO算法取得的平均相对百分偏差(RPDavg)对比其他四种改进PSO算法至少下降了83.2%,在求解质量上具有明显优势。  相似文献   

12.
Floorplanning is an important problem in very large scale integrated-circuit (VLSI) design automation as it determines the performance, size, yield, and reliability of VLSI chips. From the computational point of view, VLSI floorplanning is an NP-hard problem. In this paper, a memetic algorithm (MA) for a nonslicing and hard-module VLSI floorplanning problem is presented. This MA is a hybrid genetic algorithm that uses an effective genetic search method to explore the search space and an efficient local search method to exploit information in the search region. The exploration and exploitation are balanced by a novel bias search strategy. The MA has been implemented and tested on popular benchmark problems. Experimental results show that the MA can quickly produce optimal or nearly optimal solutions for all the tested benchmark problems.  相似文献   

13.
针对传统人工蜂群算法局部搜索的低效性,提出了双重进化人工蜂群算法。在需要两点进行操作的搜索过程中,采用一点随机选取,另一点通过遍历可行解,以其中最优解确定位置的半随机式搜索策略。用该策略改进插入点算子和逆转序列算子,分别在两对以及三对城市间距离之和的解空间维度上交叉搜索,并应用到局部搜索中构成双重进化过程,提高了搜索效率和适应值引导性。实验结果表明,该算法较已有方法提高了收敛速度,优化了目标解,并可通过合理设置终止阈值提高时效性。  相似文献   

14.
针对乌鸦搜索算法(CSA)的不足,提出采用多模式飞行的乌鸦搜索算法(MFCSA)。算法基于觅食能力的强弱,将群体分成觅食能力较强和较弱两个组,觅食能力较强者采用尾随跟踪当前群体最优目标策略,在群体信息指引下飞到群体当前最优位置附近开展搜索活动,增强了算法的局部开发能力; 觅食能力较弱者采用观察和学习强者的觅食方法、遇到危险迅速飞离两种策略,前者可提升算法的全局探索能力,后者可保持种群的多样性。通过15个基准测试函数和两个工程应用问题的数值实验仿真结果表明,MFCSA在优化精度、收敛速度等方面有更好的表现,增强了规避陷入局部最优的能力,稳定性更好。  相似文献   

15.
针对差分进化算法DE 传统变异策略不能有效平衡全局搜索和局部搜索,并且算 子固定,导致算法早收敛、搜索效率较低。基于DE 变异策略性能,提出一种混合变异策略, 力图平衡算法探索和开发能力,使得前期增强全局搜索,保持种群多样性; 后期偏重局部搜 索,尽快收敛到全局最优值。同时操作算子采用随机正态缩放因子F 和时变交叉概率因子CR, 进一步改善算法性能。几个典型Benchmarks 测试函数实验表明: 该改进型差分进化算法能有 效避免早收敛,较好地提高算法的全局收敛能力和搜索效率。  相似文献   

16.
Optimisation in changing environments is a challenging research topic since many real-world problems are inherently dynamic. Inspired by the natural evolution process, evolutionary algorithms (EAs) are among the most successful and promising approaches that have addressed dynamic optimisation problems. However, managing the exploration/exploitation trade-off in EAs is still a prevalent issue, and this is due to the difficulties associated with the control and measurement of such a behaviour. The proposal of this paper is to achieve a balance between exploration and exploitation in an explicit manner. The idea is to use two equally sized populations: the first one performs exploration while the second one is responsible for exploitation. These tasks are alternated from one generation to the next one in a regular pattern, so as to obtain a balanced search engine. Besides, we reinforce the ability of our algorithm to quickly adapt after cnhanges by means of a memory of past solutions. Such a combination aims to restrain the premature convergence, to broaden the search area, and to speed up the optimisation. We show through computational experiments, and based on a series of dynamic problems and many performance measures, that our approach improves the performance of EAs and outperforms competing algorithms.  相似文献   

17.
Reinforcement learning (RL) has been applied to many fields and applications, but there are still some dilemmas between exploration and exploitation strategy for action selection policy. The well-known areas of reinforcement learning are the Q-learning and the Sarsa algorithms, but they possess different characteristics. Generally speaking, the Sarsa algorithm has faster convergence characteristics, while the Q-learning algorithm has a better final performance. However, Sarsa algorithm is easily stuck in the local minimum and Q-learning needs longer time to learn. Most literatures investigated the action selection policy. Instead of studying an action selection strategy, this paper focuses on how to combine Q-learning with the Sarsa algorithm, and presents a new method, called backward Q-learning, which can be implemented in the Sarsa algorithm and Q-learning. The backward Q-learning algorithm directly tunes the Q-values, and then the Q-values will indirectly affect the action selection policy. Therefore, the proposed RL algorithms can enhance learning speed and improve final performance. Finally, three experimental results including cliff walk, mountain car, and cart–pole balancing control system are utilized to verify the feasibility and effectiveness of the proposed scheme. All the simulations illustrate that the backward Q-learning based RL algorithm outperforms the well-known Q-learning and the Sarsa algorithm.  相似文献   

18.
Finite-time Analysis of the Multiarmed Bandit Problem   总被引:1,自引:0,他引:1  
Auer  Peter  Cesa-Bianchi  Nicolò  Fischer  Paul 《Machine Learning》2002,47(2-3):235-256
Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multi-armed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has to grow at least logarithmically in the number of plays. Since then, policies which asymptotically achieve this regret have been devised by Lai and Robbins and many others. In this work we show that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.  相似文献   

19.
Salp Swarm Algorithm (SSA) is a novel swarm intelligent algorithm with good performance. However, like other swarm-based algorithms, it has insufficiencies of low convergence precision and slow convergence speed when dealing with high-dimensional complex optimisation problems. In response to this concerning issue, in this paper, we propose an improved SSA named as WASSA. First of all, dynamic weight factor is added to the update formula of population position, aiming to balance global exploration and local exploitation. In addition, in order to avoid premature convergence and evolution stagnation, an adaptive mutation strategy is introduced during the evolution process. Disturbance to the global extremum promotes the population to jump out of local extremum and continue to search for an optimal solution. The experiments conducted on a set of 28 benchmark functions show that the improved algorithm presented in this paper displays obvious superiority in convergence performance, robustness as well as the ability to escape local optimum when compared with SSA.  相似文献   

20.
Prey predator algorithm is a population based metaheuristic algorithm inspired by the interaction between a predator and its prey. In the algorithm, a solution with a better performance is called best prey and focuses totally on exploitation whereas the solution with least performance is called predator and focuses totally on exploration. The remaining solutions are called ordinary prey and either exploit promising regions by following better performing solutions or explore the solution space by randomly running away from the predator. Recently, it has been shown that by increasing the number of best prey or predator, it is possible to adjust the degree of exploitation and exploration. Even though, this tuning has the advantage of easily controlling these search behaviors, it is not an easy task. As any other metaheuristic algorithm, the performance of prey predator algorithm depends on the proper degree of exploration and exploitation of the decision space. In this paper, the concept of hyperheuristic is employed to balance the degree of exploration and exploitation of the algorithm. So that it learns and decides the best search behavior for the problem at hand in iterations. The ratio of the number of the best prey and the predators are used as low level heuristics. From the simulation results the balancing of the degree of exploration and exploitation by using hyperheuristic mechanism indeed improves the performance of the algorithm. Comparison with other algorithms shows the effectiveness of the proposed approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号