首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A study on expertise of agents and its effects on cooperative Q-learning.   总被引:1,自引:0,他引:1  
Cooperation in learning (CL) can be realized in a multiagent system, if agents are capable of learning from both their own experiments and other agents' knowledge and expertise. Extra resources are exploited into higher efficiency and faster learning in CL as compared to that of individual learning (IL). In the real world, however, implementation of CL is not a straightforward task, in part due to possible differences in area of expertise (AOE). In this paper, reinforcement-learning homogenous agents are considered in an environment with multiple goals or tasks. As a result, they become expert in different domains with different amounts of expertness. Each agent uses a one-step Q-learning algorithm and is capable of exchanging its Q-table with those of its teammates. Two crucial questions are addressed in this paper: "How the AOE of an agent can be extracted?" and "How agents can improve their performance in CL by knowing their AOEs?" An algorithm is developed to extract the AOE based on state transitions as a gold standard from a behavioral point of view. Moreover, it is discussed that the AOE can be implicitly obtained through agents' expertness in the state level. Three new methods for CL through the combination of Q-tables are developed and examined for overall performance after CL. The performances of developed methods are compared with that of IL, strategy sharing (SS), and weighted SS (WSS). Obtained results show the superior performance of AOE-based methods as compared to that of existing CL methods, which do not use the notion of AOE. These results are very encouraging in support of the idea that "cooperation based on the AOE" performs better than the general CL methods.  相似文献   

2.
为了研究Pearson负相关性信息对协同过滤算法的影响, 提出了一种考虑负相关性信息的协同过滤算法。该算法选取正相关用户作为最近邻居, 负相关用户作为最远邻居, 使用参数调节最近邻居和最远邻居在推荐过程中的作用。MovieLens数据集上的对比实验表明, 负相关性不仅可以提高推荐结果的准确性, 而且可以增加推荐列表的多样性; 进一步分析发现, 负相关性还可以大幅度提高不活跃用户的推荐准确性。该工作表明, 负相关性有助于解决推荐系统中准确性、多样性两难的问题和冷启动问题。  相似文献   

3.
神经模糊系统在机器人的智能控制中具有巨大的应用潜力,但已有的系统构造方法几乎都面临着样本资源匮乏这一巨大困难。为克服传统系统构造方法可能因样本获取困难而引起的“维数灾难”等问题,该文在模糊神经网络中引入了Q-学习机制,提出了一种基于Q-学习的模糊神经网络模型,从而赋予神经模糊系统自学习能力。文章最后给出了其在菅野模糊小车控制中的仿真结果。实验表明,在神经模糊系统中融入智能学习机制Q-学习是行之有效的;它可以被用来实现机器人智能行为的自学习。值得一提的是,该文的仿真实验在真实系统上同样是容易实现的,只要系统能提供作为评价信号的传感信息即可。  相似文献   

4.
Reinforcement learning (RL) has been applied to many fields and applications, but there are still some dilemmas between exploration and exploitation strategy for action selection policy. The well-known areas of reinforcement learning are the Q-learning and the Sarsa algorithms, but they possess different characteristics. Generally speaking, the Sarsa algorithm has faster convergence characteristics, while the Q-learning algorithm has a better final performance. However, Sarsa algorithm is easily stuck in the local minimum and Q-learning needs longer time to learn. Most literatures investigated the action selection policy. Instead of studying an action selection strategy, this paper focuses on how to combine Q-learning with the Sarsa algorithm, and presents a new method, called backward Q-learning, which can be implemented in the Sarsa algorithm and Q-learning. The backward Q-learning algorithm directly tunes the Q-values, and then the Q-values will indirectly affect the action selection policy. Therefore, the proposed RL algorithms can enhance learning speed and improve final performance. Finally, three experimental results including cliff walk, mountain car, and cart–pole balancing control system are utilized to verify the feasibility and effectiveness of the proposed scheme. All the simulations illustrate that the backward Q-learning based RL algorithm outperforms the well-known Q-learning and the Sarsa algorithm.  相似文献   

5.
6.
A dynamic channel assignment policy through Q-learning   总被引:2,自引:0,他引:2  
One of the fundamental issues in the operation of a mobile communication system is the assignment of channels to cells and to calls. This paper presents a novel approach to solving the dynamic channel assignment (DCA) problem by using a form of real-time reinforcement learning known as Q-learning in conjunction with neural network representation. Instead of relying on a known teacher the system is designed to learn an optimal channel assignment policy by directly interacting with the mobile communication environment. The performance of the Q-learning based DCA was examined by extensive simulation studies on a 49-cell mobile communication system under various conditions. Comparative studies with the fixed channel assignment (FCA) scheme and one of the best dynamic channel assignment strategies, MAXAVAIL, have revealed that the proposed approach is able to perform better than the FCA in various situations and capable of achieving a performance similar to that achieved by the MAXAVAIL, but with a significantly reduced computational complexity.  相似文献   

7.
8.
多Agent系统是近年来比较热门的一个研究领域,而Q-learning算法是强化学习算法中比较著名的算法,也是应用最广泛的一种强化学习算法。以单Agent强化学习Qlearning算法为基础,提出了一种新的学习协作算法,并根据此算法提出了一种新的多Agent系统体系结构模型,该结构的最大特点是提出了知识共享机制、团队结构思想和引入了服务商概念,最后通过仿真实验说明了该结构体系的优越性。  相似文献   

9.
张峰  刘凌云  郭欣欣 《控制与决策》2019,34(9):1917-1922
多阶段群体决策问题是一类典型的动态群体决策问题,主要针对离散的确定状态下的最优群体决策问题求解.但由于现实环境面临的大部分是不确定状态空间,甚至是未知环境空间(例如状态转移概率矩阵完全未知),为了寻求具有较高共识度的多阶段群体最优策略,决策者需要通过对环境的动态交互来获得进一步的信息.针对该问题,利用强化学习技术,提出一种求解多阶段群体决策的最优决策算法,以解决在不确定状态空间下的多阶段群体决策问题.结合强化学习中的Q-学习算法,建立多阶段群体决策Q-学习基本算法模型,并改进该算法的迭代过程,从中学习得到群体最优策略.同时证明基于Q-学习得到的多阶段群体最优策略也是群体共识度最高的策略.最后,通过一个计算实例说明算法的合理性及可行性.  相似文献   

10.
This research focuses on the study of the relationships between sample data characteristics and metamodel performance considering different types of metamodeling methods. In this work, four types of metamodeling methods, including multivariate polynomial method, radial basis function method, kriging method and Bayesian neural network method, three sample quality merits, including sample size, uniformity and noise, and four performance evaluation measures considering accuracy, confidence, robustness and efficiency, are considered. Different from other comparative studies, quantitative measures, instead of qualitative ones, are used in this research to evaluate the characteristics of the sample data. In addition, the Bayesian neural network method, which is rarely used in metamodeling and has never been considered in comparative studies, is selected in this research as a metamodeling method and compared with other metamodeling methods. A simple guideline is also developed for selecting candidate metamodeling methods based on sample quality merits and performance requirements.  相似文献   

11.
Network congestion has a negative impact on the performance of on-chip networks due to the increased packet latency. Many congestion-aware routing algorithms have been developed to alleviate traffic congestion over the network. In this paper, we propose a congestion-aware routing algorithm based on the Q-learning approach for avoiding congested areas in the network. By using the learning method, local and global congestion information of the network is provided for each switch. This information can be dynamically updated, when a switch receives a packet. However, Q-learning approach suffers from high area overhead in NoCs due to the need for a large routing table in each switch. In order to reduce the area overhead, we also present a clustering approach that decreases the number of routing tables by the factor of 4. Results show that the proposed approach achieves a significant performance improvement over the traditional Q-learning, C-routing, DBAR and Dynamic XY algorithms.  相似文献   

12.
A new Q-learning algorithm based on the metropolis criterion   总被引:4,自引:0,他引:4  
The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is described as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration.  相似文献   

13.
This paper studies a multi-goal Q-learning algorithm of cooperative teams. Member of the cooperative teams is simulated by an agent. In the virtual cooperative team, agents adapt its knowledge according to cooperative principles. The multi-goal Q-learning algorithm is approached to the multiple learning goals. In the virtual team, agents learn what knowledge to adopt and how much to learn (choosing learning radius). The learning radius is interpreted in Section 3.1. Five basic experiments are manipulated proving the validity of the multi-goal Q-learning algorithm. It is found that the learning algorithm causes agents to converge to optimal actions, based on agents’ continually updated cognitive maps of how actions influence learning goals. It is also proved that the learning algorithm is beneficial to the multiple goals. Furthermore, the paper analyzes how sensitive the learning performance is affected by the parameter values of the learning algorithm.  相似文献   

14.
异构车载网络环境下如何选择接入网络对于车载终端用户的服务体验而言至关重要,目前基于Q学习的网络选择方法利用智能体与环境的交互来迭代学习网络选择策略,从而实现较优的网络资源分配.然而该类方法通常存在状态空间过大引起迭代效率低下和收敛速度较慢的问题,同时由于Q值表更新产生的过高估计现象容易导致网络资源利用不均衡.针对上述问...  相似文献   

15.
模糊Q学习的足球机器人双层协作模型   总被引:1,自引:0,他引:1  
针对传统的足球机器人3层决策模型存在决策不连贯的问题和缺乏适应性与学习能力的缺点,提出了一种基于模糊Q学习的足球机器人双层协作模型.该模型使协调决策和机器人运动成为2个功能独立的层次,使群体意图到个体行为的过度变为一个直接的过程,并在协调层通过采用Q学习算法在线学习不同状态下的最优策略,增强了决策系统的适应性和学习能力.在Q学习中通过把状态繁多的系统状态映射为为数不多的模糊状态,大大减少了状态空间的大小,避免了传统Q学习在状态空间和动作空间较大的情况下收敛速度慢,甚至不能收敛的缺点,提高了Q学习算法的收敛速度.最后,通过在足球机器人SimuroSot仿真比赛平台上进行实验,验证了双层协作模型的有效性.  相似文献   

16.
一种基于Q学习的有限理性博弈模型及其应用   总被引:1,自引:0,他引:1  
传统博弈理论模型建立在人的完全理性基础之上,难以切合实际。有限理性博弈则能够很好地描述实际问题。有限理性的博弈者参与到不完全信息博弈中,对博弈的规则、结构以及对手等博弈信息有一个逐渐适应和了解的过程,因此博弈应是动态进化的模型。针对这一问题,提出了一种基于Q学习算法的不完全信息博弈模型,根据Littman的最大最小原则建立了多指标体系下的策略选择概率分布;构建了Q学习与博弈融合的数学模型,使用Q学习机制来实现博弈模型的动态进化;最后将模型应用于两人追逐的仿真实验,结果表明所提出的模型能够很好地再现追逐情景。  相似文献   

17.
We propose two algorithms for Q-learning that use the two-timescale stochastic approximation methodology. The first of these updates Q-values of all feasible state-action pairs at each instant while the second updates Q-values of states with actions chosen according to the ‘current’ randomized policy updates. A proof of convergence of the algorithms is shown. Finally, numerical experiments using the proposed algorithms on an application of routing in communication networks are presented on a few different settings.  相似文献   

18.
论文针对协同工作中的任务调度问题,建立了相应的马尔可夫决策过程模 型,在此基础上提出了一种改进的基于模拟退火的Q 学习算法。该算法通过引入模拟退火, 并结合贪婪策略,以及在状态空间上的筛选判断,显著地提高了收敛速度,缩短了执行时间。 最后与其它文献中相关算法的对比分析,验证了本改进算法的高效性。  相似文献   

19.
变论域模糊控制器的控制函数被"复制"到后代中,往往存在着"失真"现象,这种现象的后果是造成算法本身的误差.针对这一问题,本文提出了一种基于Q学习算法的变论域模糊控制优化设计方法.本算法在变论域模糊控制算法基础上提出了一种利用伸缩因子、等比因子相互协调来调整论域的构想,且通过用Q学习算法来寻优参数使控制器性能指标最小,使其在控制过程中能够降低"失真率",从而进一步提高控制器性能.最后,把算法运用于一个二阶系统与非最小相位系统,实验表明,该算法不但具有很好的鲁棒性及动态性能,且与变论域模糊控制器比较起来,其控制性能也更加提高.  相似文献   

20.
Production systems continuously deteriorate with age and usage due to corrosion, fatigue, and cumulative wear in production processes, resulting in an increasing possibility of producing defective products. To prevent selling defective products, inspection is usually carried out to ensure that the performance of a sold product satisfies the customer requirements. Nevertheless, some defective products may still be sold in practice. In such a case, warranties are essential in marketing products and can improve the unfavorable image by applying higher product quality and better customer service. The purpose of this paper is to provide manufacturers with an effective inspection strategy in which the task of quality management is performed under the considerations of related costs for production, sampling, inventory, and warranty. A Weibull power law process is used to describe the imperfection of the production system, and a negative binomial sampling is adopted to learn the operational states of the production process. A free replacement warranty policy is assumed in this paper, and the reworking of defective products before shipment is also discussed. A numerical application is employed to demonstrate the usefulness of the proposed approach, and sensitivity analyses are performed to study the various effects of some influential factors.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号