首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 125 毫秒
1.
本文讨论了基于案例的学习方法在水下机器人全局路径规划中的应用问题.基于案例的学习方法是一种增量式的学习过程,它根据过去的经验进行学习及问题求解.本文对基于案例的学习方法在规划中的应用框架进行了初步研究,对案例属性的提取,案例的匹配和择优,以及案例库的更新等问题提出了相应的算法.最后给出了几组仿真结果.  相似文献   

2.
尚游  徐玉如  庞永杰 《机器人》1998,20(6):427-432
本文讨论了基于案例的学习方法在水下机器人全局路径规划中的应用问题.基于案例的学习方法是一种增量式的学习过程,它根据过去的经验进行学习及问题求解.本文对基于案例的学习方法在规划中的应用框架进行了初步研究,对案例属性的提取,案例的匹配和择优,以及案例库的更新等问题提出了相应的算法.最后给出了几组仿真结果.  相似文献   

3.
基于网格聚类的案例检索策略   总被引:3,自引:1,他引:2       下载免费PDF全文
基于案例推理的智能推荐系统是大型科学仪器协作共用网的重要组成部分。通过对案例库按网格进行聚类,设计并实现一个基于异构案例库的检索策略。分析案例库网格划分原则及案例聚类规则,论述案例聚类算法及在聚类基础上的案例检索策略。实验结果表明,该方法能够有效地降低案例检索时间,提高案例库的可维护性。  相似文献   

4.
基于群体环境中个体agent局部感知和交互的生物原型,提出一种随机对策框架下的多agent局部学习算法.算法在与局部环境交互中采用贪婪策略最大化自身利益.分别在零和、一般和的单个平衡点和多个平衡点情形下改进了Nash-Q学习算法;提出了行为修正方法,并证明了算法收敛、计算复杂度降低.  相似文献   

5.
并行多任务分配是多agent系统中极具挑战性的课题, 主要面向资源分配、灾害应急管理等应用需求, 研究如何把一组待求解任务分配给相应的agent联盟去执行. 本文提出了一种基于自组织、自学习agent的分布式并行多任务分配算法, 该算法引入P学习设计了单agent寻找任务的学习模型, 并给出了agent之间通信和协商策略. 对比实验说明该算法不仅能快速寻找到每个任务的求解联盟, 而且能明确给出联盟中各agent成员的实际资源承担量, 从而可以为实际的控制和决策任务提供有价值的参考依据.  相似文献   

6.
曹洁 《电脑开发与应用》2010,23(5):44-46,49
扩大数据挖掘系统的使用人群,使普通用户能够方便地操作数据挖掘系统,是数据挖掘算法搜索策略的主要研究目标。建立案例库存储专家经验,采用面向对象的方法来表示案例库中的案例,利用模糊商空间来描述案例库的组织结构,结合统计启发式搜索技术实现案例检索,缩小检索范围,加快求解速度,提高了运行效率和准确率。以银行客户经理分析客户流失群体为例进行相应的操作,验证了案例推理数据挖掘算法搜索策略的准确性。  相似文献   

7.
最近的研究工作突现了在案例推理过程中案例库维护的重要性,越来越多的人认为基于案例推理系统包含了案例库维护的有关过程(Review和Restore)。案例库维护作为CBR研究的一个分支,已经研究出不同的案例库维护策略,其中一些是限制案例库的规模,由此引发了CBR系统的能力与效率问题。相似粗糙集技术可以有效地利用差别矩阵,通过不同的相似度阈值发现以及处理案例库的冗余,有选择地删除多余的案例;同时案例库的覆盖度不降低,减少了案例适应性修改的代价,从而确保了CBR系统的能力与效率的兼顾。  相似文献   

8.
一般和博弈中的合作多agent学习   总被引:1,自引:0,他引:1  
理性和收敛是多agent学习研究所追求的目标,在理性合作的多agent系统中提出利用Pareto占优解代替非合作的Nash平衡解进行学习,使agent更具理性,另一方面引入社会公约来启动和约束agent的推理,统一系统中所有agent的决策,从而保证学习的收敛性.利用2人栅格游戏对多种算法进行验证,成功率的比较说明了所提算法具有较好的学习性能.  相似文献   

9.
自主系统中,agent通过与环境交互来执行分配给他们的任务,采用分层强化学习技术有助于agent在大型、复杂的环境中提高学习效率。提出一种新方法,利用蚂蚁系统优化算法来识别分层边界发现子目标状态,蚂蚁遍历过程中留下信息素,利用信息素的变化率定义了粗糙度,用粗糙度界定子目标;agent使用发现的子目标创建抽象,能够更有效地探索。在出租车环境下验证算法的性能,实验结果表明该方法可以显著提高agent的学习效率。  相似文献   

10.
一种改进的案例推理分类方法研究   总被引:1,自引:0,他引:1  
张春晓  严爱军  王普 《自动化学报》2014,40(9):2015-2021
特征属性的权重分配和案例检索策略对案例推理(Case-based reasoning,CBR)分类的准确率有显著影响. 本文提出一种结合遗传算法、内省学习和群决策思想改进的CBR分类方法. 首先,利用遗传算法得到多组属性权重,再根据内省学习原理对每组权重进行迭代调整;然后,通过案例群检索策略得到满足大多数原则的群决策分类结果;最后,以典型分类数据集的对比实验证明了本文方法能进一步提高CBR分类的准确率. 这表明内省学习可以保证权重分配的合理性,案例群检索策略能充分利用案例库的潜在信息,对提升CBR的学习能力有显著作用.  相似文献   

11.
In this paper, a multi-agent reinforcement learning method based on action prediction of other agent is proposed. In a multi-agent system, action selection of the learning agent is unavoidably impacted by other agents’ actions. Therefore, joint-state and joint-action are involved in the multi-agent reinforcement learning system. A novel agent action prediction method based on the probabilistic neural network (PNN) is proposed. PNN is used to predict the actions of other agents. Furthermore, the sharing policy mechanism is used to exchange the learning policy of multiple agents, the aim of which is to speed up the learning. Finally, the application of presented method to robot soccer is studied. Through learning, robot players can master the mapping policy from the state information to the action space. Moreover, multiple robots coordination and cooperation are well realized.  相似文献   

12.
Learning Team Strategies: Soccer Case Studies   总被引:1,自引:0,他引:1  
We use simulated soccer to study multiagent learning. Each team's players (agents) share action set and policy, but may behave differently due to position-dependent inputs. All agents making up a team are rewarded or punished collectively in case of goals. We conduct simulations with varying team sizes, and compare several learning algorithms: TD-Q learning with linear neural networks (TD-Q), Probabilistic Incremental Program Evolution (PIPE), and a PIPE version that learns by coevolution (CO-PIPE). TD-Q is based on learning evaluation functions (EFs) mapping input/action pairs to expected reward. PIPE and CO-PIPE search policy space directly. They use adaptive probability distributions to synthesize programs that calculate action probabilities from current inputs. Our results show that linear TD-Q encounters several difficulties in learning appropriate shared EFs. PIPE and CO-PIPE, however, do not depend on EFs and find good policies faster and more reliably. This suggests that in some multiagent learning scenarios direct search in policy space can offer advantages over EF-based approaches.  相似文献   

13.
In enterprise networks, companies interact on a temporal basis through client–server relationships between order agents (clients) and resource agents (servers) acting as autonomic managers. In this work, the autonomic MES (@MES) proposed by Rolón and Martinez (2012) has been extended to allow selfish behavior and adaptive decision-making in distributed execution control and emergent scheduling. Agent learning in the @MES is addressed by rewarding order agents in order to continuously optimize their processing routes based on cost and reliability of alternative resource agents (servers). Service providers are rewarded so as to learn the quality level corresponding to each task which is used to define the processing time and cost for each client request. Two reinforcement learning algorithms have been implemented to simulate learning curves of client–server relationships in the @MES. Emerging behaviors obtained through generative simulation in a case study show that despite selfish behavior and policy adaptation in order and resource agents, the autonomic MES is able to reject significant disturbances and handle unplanned events successfully.  相似文献   

14.
基于后悔值的多Agent冲突博弈强化学习模型   总被引:1,自引:0,他引:1  
肖正  张世永 《软件学报》2008,19(11):2957-2967
对于冲突博弈,研究了一种理性保守的行为选择方法,即最小化最坏情况下Agent的后悔值.在该方法下,Agent当前的行为策略在未来可能造成的损失最小,并且在没有任何其他Agent信息的条件下,能够得到Nash均衡混合策略.基于后悔值提出了多Agent复杂环境下冲突博弈的强化学习模型以及算法实现.该模型中通过引入交叉熵距离建立信念更新过程,进一步优化了冲突博弈时的行为选择策略.基于Markov重复博弈模型验证了算法的收敛性,分析了信念与最优策略的关系.此外,与MMDP(multi-agent markov decision process)下Q学习扩展算法相比,该算法在很大程度上减少了冲突发生的次数,增强了Agent行为的协调性,并且提高了系统的性能,有利于维持系统的稳定.  相似文献   

15.
The cooperative learning systems (COLS) are an interesting way of research in Artificial Intelligence. This is because an intelligence form can emerge by interacting simple agents in these systems. In literature, we can find many learning techniques, which can be improved by combining them to a cooperative learning, this one can be considered as a special case of bagging. In particular, learning classifier systems (LCS) are adapted to cooperative learning systems because LCS manipulate rules and, hence, knowledge exchange between agents is facilitated. However, a COLS has to use a combination mechanism in order to aggregate information exchanged between agents, this combination mechanism must take in consideration the nature of information manipulated by the agents. In this paper we investigate a cooperative learning system based on the Evidential Classifier System, the cooperative system uses Dempster–Shafer theory as a support to make data fusion. This is due to the fact that the Evidential Classifier System is itself based on this theory. We present some ways to make cooperation by using this architecture and discuss the characteristics of such architecture by comparing the obtained results with those obtained by an individual approach.  相似文献   

16.
In enterprise networks, companies interact on a temporal basis through client–server relationships between order agents (clients) and resource agents (servers) acting as autonomic managers. In this work, the autonomic MES (@MES) proposed by Rolón and Martinez (2012) has been extended to allow selfish behavior and adaptive decision-making in distributed execution control and emergent scheduling. Agent learning in the @MES is addressed by rewarding order agents in order to continuously optimize their processing routes based on cost and reliability of alternative resource agents (servers). Service providers are rewarded so as to learn the quality level corresponding to each task which is used to define the processing time and cost for each client request. Two reinforcement learning algorithms have been implemented to simulate learning curves of client–server relationships in the @MES. Emerging behaviors obtained through generative simulation in a case study show that despite selfish behavior and policy adaptation in order and resource agents, the autonomic MES is able to reject significant disturbances and handle unplanned events successfully.  相似文献   

17.
刘健  顾扬  程玉虎  王雪松 《自动化学报》2022,48(5):1246-1258
通过分析基因突变过程, 提出利用强化学习对癌症患者由正常状态至患病状态的过程进行推断, 发现导致患者死亡的关键基因突变. 首先, 将基因视为智能体, 基于乳腺癌突变数据设计多智能体强化学习环境; 其次, 为保证智能体探索到与专家策略相同的策略和满足更多智能体快速学习, 根据演示学习理论, 分别提出两种多智能体深度Q网络: 基于行为克隆的多智能体深度Q网络和基于预训练记忆的多智能体深度Q网络; 最后, 根据训练得到的多智能体深度Q网络进行基因排序, 实现致病基因预测. 实验结果表明, 提出的多智能体强化学习方法能够挖掘出与乳腺癌发生、发展过程密切相关的致病基因.  相似文献   

18.
Multi-agent learning (MAL) studies how agents learn to behave optimally and adaptively from their experience when interacting with other agents in dynamic environments. The outcome of a MAL process is jointly determined by all agents’ decision-making. Hence, each agent needs to think strategically about others’ sequential moves, when planning future actions. The strategic interactions among agents makes MAL go beyond the direct extension of single-agent learning to multiple agents. With the strategic thinking, each agent aims to build a subjective model of others decision-making using its observations. Such modeling is directly influenced by agents’ perception during the learning process, which is called the information structure of the agent’s learning. As it determines the input to MAL processes, information structures play a significant role in the learning mechanisms of the agents. This review creates a taxonomy of MAL and establishes a unified and systematic way to understand MAL from the perspective of information structures. We define three fundamental components of MAL: the information structure (i.e., what the agent can observe), the belief generation (i.e., how the agent forms a belief about others based on the observations), as well as the policy generation (i.e., how the agent generates its policy based on its belief). In addition, this taxonomy enables the classification of a wide range of state-of-the-art algorithms into four categories based on the belief-generation mechanisms of the opponents, including stationary, conjectured, calibrated, and sophisticated opponents. We introduce Value of Information (VoI) as a metric to quantify the impact of different information structures on MAL. Finally, we discuss the strengths and limitations of algorithms from different categories and point to promising avenues of future research.  相似文献   

19.
Fundamental to case-based reasoning is the assumption that similar problems have similar solutions. The meaning of the concept of “similarity” can vary in different situations and remains an issue. This paper proposes a novel similarity model consisting of fuzzy rules to represent the semantics and evaluation criteria for similarity. We believe that fuzzy if-then rules present a more powerful and flexible means to capture domain knowledge for utility oriented similarity modeling than traditional similarity measures based on feature weighting. Fuzzy rule-based reasoning is utilized as a case matching mechanism to determine whether and to which extent a known case in the case library is similar to a given problem in query. Further, we explain that such fuzzy rules for similarity assessment can be learned from the case library using genetic algorithms. The key to this is pair-wise comparisons of cases with known solutions in the case library such that sufficient training samples can be derived for genetic-based fuzzy rule learning. The evaluations conducted have shown the superiority of the proposed method in similarity modeling over traditional schemes as well as the feasibility of learning fuzzy similarity rules from a rather small case base while still yielding competent system performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号