首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We present a novel and uniform formulation of the problem of reinforcement learning against bounded memory adaptive adversaries in repeated games, and the methodologies to accomplish learning in this novel framework. First we delineate a novel strategic definition of best response that optimises rewards over multiple steps, as opposed to the notion of tactical best response in game theory. We show that the problem of learning a strategic best response reduces to that of learning an optimal policy in a Markov Decision Process (MDP). We deal with both finite and infinite horizon versions of this problem. We adapt an existing Monte Carlo based algorithm for learning optimal policies in such MDPs over finite horizon, in polynomial time. We show that this new efficient algorithm can obtain higher average rewards than a previously known efficient algorithm against some opponents in the contract game. Though this improvement comes at the cost of increased domain knowledge, simple experiments in the Prisoner's Dilemma, and coordination games show that even when no extra domain knowledge (besides that an upper bound on the opponent's memory size is known) is assumed, the error can still be small. We also experiment with a general infinite-horizon learner (using function-approximation to tackle the complexity of history space) against a greedy bounded memory opponent and show that while it can create and exploit opportunities of mutual cooperation in the Prisoner's Dilemma game, it is cautious enough to ensure minimax payoffs in the Rock–Scissors–Paper game.  相似文献   

2.
Sugawara  Toshiharu  Lesser  Victor 《Machine Learning》1998,33(2-3):129-153
Coordination is an essential technique in cooperative, distributed multiagent systems. However, sophisticated coordination strategies are not always cost-effective in all problem-solving situations. This paper presents a learning method to identify what information will improve coordination in specific problem-solving situations. Learning is accomplished by recording and analyzing traces of inferences after problem solving. The analysis identifies situations where inappropriate coordination strategies caused redundant activities, or the lack of timely execution of important activities, thus degrading system performance. To remedy this problem, situation-specific control rules are created which acquire additional nonlocal information about activities in the agent networks and then select another plan or another scheduling strategy. Examples from a real distributed problem-solving application involving diagnosis of a local area network are described.  相似文献   

3.
The problem of group consensus is investigated in this paper, where all agents possess double‐integrator dynamics. Two different kinds of consensus protocols are proposed for networks with fixed communication topology to reach group consensus for the agents’ positions and velocities. Convergence analysis is discussed, and necessary and/or sufficient conditions are presented for multiagent systems to achieve group consensus. The first protocol leads to dynamic consensus where positions of all agents reach time‐varying consensus values. By applying the second protocol, both the agents’ positions and their velocities reach constant consensus values. That is, static consensus is achieved. Simulation examples are given to show the effectiveness of the theoretical results.Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

4.
基于深度强化学习的机器人操作技能学习成为研究热点, 但由于任务的稀疏奖励性质, 学习效率较低. 本 文提出了基于元学习的双经验池自适应软更新事后经验回放方法, 并将其应用于稀疏奖励的机器人操作技能学习 问题求解. 首先, 在软更新事后经验回放算法的基础上推导出可以提高算法效率的精简值函数, 并加入温度自适应 调整策略, 动态调整温度参数以适应不同的任务环境; 其次, 结合元学习思想对经验回放进行分割, 训练时动态调整 选取真实采样数据和构建虚拟数的比例, 提出了DAS-HER方法; 然后, 将DAS-HER算法应用到机器人操作技能学 习中, 构建了一个稀疏奖励环境下具有通用性的机器人操作技能学习框架; 最后, 在Mujoco下的Fetch和Hand环境 中, 进行了8项任务的对比实验, 实验结果表明, 无论是在训练效率还是在成功率方面, 本文算法表现均优于其他算 法.  相似文献   

5.
针对传统的基于稀疏表示的DOA估计算法单纯利用信号的空域稀疏性,导致在低信噪比时稀疏性能变差,影响信号稀疏重构效果的问题,使用分块稀疏理论对信号进行稀疏分解。随着目标增多及作战任务改变,DOA估计往往呈现目标群测向的特点,为了能够更好地利用信号的结构特征和统计特征,提出了基于空时联合的块稀疏DOA估计算法,使用块稀疏理论挖掘信号的内部结构,充分利用了信号的块内稀疏性和块间相关性,提高稀疏重构性能,进而对DOA估计效果有很大的提升。仿真实验表明,相比于经典的DOA方法,本方法有更好的估计效果。  相似文献   

6.
Coordinating Multiple Agents via Reinforcement Learning   总被引:2,自引:0,他引:2  
In this paper, we attempt to use reinforcement learning techniques to solve agent coordination problems in task-oriented environments. The Fuzzy Subjective Task Structure model (FSTS) is presented to model the general agent coordination. We show that an agent coordination problem modeled in FSTS is a Decision-Theoretic Planning (DTP) problem, to which reinforcement learning can be applied. Two learning algorithms, coarse-grained and fine-grained, are proposed to address agents coordination behavior at two different levels. The coarse-grained algorithm operates at one level and tackle hard system constraints, and the fine-grained at another level and for soft constraints. We argue that it is important to explicitly model and explore coordination-specific (particularly system constraints) information, which underpins the two algorithms and attributes to the effectiveness of the algorithms. The algorithms are formally proved to converge and experimentally shown to be effective.  相似文献   

7.
为解决k‐NN算法中固定k的选定问题,引入稀疏学习和重构技术用于最近邻分类,通过数据驱动(data‐driven)获得k值,不需人为设定。由于样本之间存在相关性,用训练样本重构所有测试样本,生成重构系数矩阵,用 l1‐范数稀疏重构系数矩阵,使每个测试样本用它邻域内最近的k (不定值)个训练样本来重构,解决k‐NN算法对每个待分类样本都用同一个k值进行分类造成的分类不准确问题。UCI数据集上的实验结果表明,在分类时,改良k‐NN算法比经典k‐NN算法效果要好。  相似文献   

8.
通过分析经典稀疏视觉跟踪算法在粒子滤波框架下的采样粒子分布与运动目标真实状态的差异,提出了一个基于在线判别分析的改进稀疏视觉跟踪算法。该跟踪算法通过在线逻辑斯蒂判别分析模型及其更新过程,自主获取运动目标的实时状态与变化,增强运动目标与背景信息之间的可判别性。同时,实现对采样粒子的预先筛选,尽量排除与运动目标差异大的粒子,以提高跟踪算法的鲁棒性,同时减少L1优化求解的次数从而提高算法的执行效率。与5个高水平跟踪算法在4段公开视频上的实验结果表明,提出的算法能够长时间鲁棒地对运动目标进行跟踪,同时相对典型稀疏跟踪算法而言,明显地降低了计算复杂度。  相似文献   

9.
针对回归模型在进行属性选择未考虑类标签之间关系从而导致回归效果不理想,提出了一种新的具有鲁棒性的低秩属性选择算法。具体为,在线性回归的模型框架下,通过低秩约束来考虑类标签间的相关性和通过稀疏学习理论中的[l2,p-]范数来考虑属性间的关联结构,以此去除不相关的冗余属性的影响;算法通过嵌入子空间学习方法(线性判别分析(LDA))来调整属性选择结果。经实验验证,提出的属性选择算法在六个公开数据集上的效果均优于四种对比算法。  相似文献   

10.
针对应用传统强化学习进行城市自适应交通信号配时决策时存在维数灾难和缺乏协调机制等问题,提出引入交互协调机制的强化学习算法。以车均延误为性能指标设计了针对城市交通信号配时决策的独立Q-强化学习算法。在此基础上,通过引入直接交互机制对独立强化学习算法进行了延伸,即相邻交叉口交通信号控制agent间直接交换配时动作和交互点值。通过仿真实验分析表明,引入交互协调机制的强化学习的控制效果明显优于独立强化学习算法,协调更有效,并且其学习算法具有较好的收敛性能,交互点值趋向稳定。  相似文献   

11.
针对无标签高维数据的大量出现,对机器学习中无监督特征选择进行了研究。提出了一种结合自表示相似矩阵和流形学习的无监督特征选择算法。首先,通过数据的自表示性质,构建相似矩阵,结合低维流形能够表示高维数据结构这一流形学习思想,建立一种考虑流形学习的无监督特征选择优化模型。其次,为了保证选择更有用及更稀疏的特征,采用◢l◣▼2,1▽范数对优化模型进行约束,使特征之间相互竞争,消除冗余。进而,通过变量交替迭代对优化模型进行求解,并证明了算法的收敛性。最后,通过与其他几个无监督特征算法在四个数据集上的对比实验,表明所给算法的有效性。  相似文献   

12.
针对处理高维度属性的大数据的属性约减方法进行了研究。发现属性选择和子空间学习是属性约简的两种常见方法,其中属性选择具有很好的解释性,子空间学习的分类效果优于属性选择。而往往这两种方法是各自独立进行应用。为此,提出了综合这两种属性约简方法,设计出新的属性选择方法。即利用子空间学习的两种技术(即线性判别分析(LDA)和局部保持投影(LPP)),考虑数据的全局特性和局部特性,同时设置稀疏正则化因子实现属性选择。基于分类准确率、方差和变异系数等评价指标的实验结果比较,表明该算法相比其它对比算法,能更有效的选取判别属性,并能取得很好的分类效果。  相似文献   

13.
针对基于实例的迁移学习在关联多源异构领域数据时遇到的数据颗粒度不匹配问题,以单领域分层概率自组织图(HiPSOG)聚类方法为基础,提出一种具有迁移学习能力的稀疏化非监督分层概率自组织图(TSHiPSOG)方法。首先,在源领域和目标领域分别基于概率混合多变量高斯分布生成分层自组织模型以便在多领域中分别提取不同粒度的表示向量,并用稀疏图方法通过概率准则控制模型增长;其次,利用最大信息系数(MIC),在具有富信息的源领域中寻找与目标领域表示向量最相似的表示向量,并利用这些源领域表示向量的类别标签细化目标领域数据分类;最后,在国际通用分类数据集20新闻组数据集和垃圾邮件检测数据集上进行了实验,结果表明算法可以利用源领域的有用信息辅助目标领域的分类问题,并使分类准确率最高提高约15.26%和9.05%;对比其他经典迁移学习方法,通过稀疏分层可以挖掘不同颗粒度的表示向量,分类准确率最高提高约4.48%和4.13%。  相似文献   

14.
This paper presents a new formulation of input-constrained optimal output synchronization problem and proposes an observer-based distributed optimal control protocol for discrete-time heterogeneous multiagent systems with input constraints via model-free reinforcement learning. First, distributed adaptive observers are designed for all agents to estimate the leader's trajectory without requiring its dynamics knowledge. Subsequently, the optimal control input associated with the optimal value function is derived based on the solution to the tracking Hamilton-Jacobi-Bellman equation, which is always difficult to solve analytically. To this end, motivated by reinforcement learning technique, a model-free Q-learning policy iteration algorithm is proposed, and the actor-critic neural network structure is implemented to iteratively find the optimal tracking control input without knowing system dynamics. Moreover, inputs of all agents are constrained in the permitted bounds by inserting a nonquadratic function into the performance function, where input constraints are encoded into the optimization problem. Finally, a numerical simulation example is provided to illustrate the effectiveness of the proposed theoretical results.  相似文献   

15.
In this article, we present the development of a simple multiagent‐based system for the control of a flexible manufacturing system. We followed the stages of a methodology specially conceived for the development of agent‐based system, which is an integration of the classical methodology for agent‐oriented analysis and design Gaia, and AUML (Agent‐Unified Modeling Language). We adopted as study case the CIMUBB Laboratory at the University of Bio‐Bio, which has a flexible manufacturing system including three flexible manufacturing cells interconnected by a conveyor belt. In the analysis stage, we identified roles involved, and we design models representing roles and protocols. In the design stage, we applied Gaia agent, services, and acquaintance models from Gaia, and we complemented with AUML as the adopted methodology suggests. With the developed models, we constructed a fully functional system where each agent was built as an independent process tree. Agents communicate by passing messages through the Ethernet network with socket interfaces. Various tests executed in our laboratory scale manufacturing system show the effectiveness of our implementation. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

16.
基于递阶强化学习的多智能体AGV 调度系统   总被引:3,自引:1,他引:3  
递阶强化学习是解决状态空间庞大的复杂系统智能体决策的有效方法。具有离散动态特性的AGV调度系统需要实时动态的调度方法,而具有MaxQ递阶强化学习能力的多智能体通过高效的强化学习方法和协作,可以实现AGV的实时调度。仿真实验证明了这种方法的有效性。  相似文献   

17.
多智能体系统中的分布式强化学习研究现状   总被引:4,自引:0,他引:4  
对目前世界上分布式强化学习方法的研究成果加以总结, 分析比较了独立强化学习、社会强化学习和群体强化学习三类分布式强化学习方法的特点、差别和适用范围, 并对分布式强化学习仍需解决的问题和未来的发展方向进行了探讨.  相似文献   

18.
目前多数图像分类的方法是采用监督学习或者半监督学习对图像进行降维,然而监督学习与半监督学习需要图像携带标签信息。针对无标签图像的降维及分类问题,提出采用混阶栈式稀疏自编码器对图像进行无监督降维来实现图像的分类学习。首先,构建一个具有三个隐藏层的串行栈式自编码器网络,对栈式自编码器的每一个隐藏层单独训练,将前一个隐藏层的输出作为后一个隐藏层的输入,对图像数据进行特征提取并实现对数据的降维。其次,将训练好的栈式自编码器的第一个隐藏层和第二个隐藏层的特征进行拼接融合,形成一个包含混阶特征的矩阵。最后,使用支持向量机对降维后的图像特征进行分类,并进行精度评价。在公开的四个图像数据集上将所提方法与七个对比算法进行对比实验,实验结果表明,所提方法能够对无标签图像进行特征提取,实现图像分类学习,减少分类时间,提高图像的分类精度。  相似文献   

19.
Interaction in the online learning environment has been regarded as one of the most critical elements that affect learning outcomes. This study examined what factors in learner–instructor interaction can predict the learner's outcomes in the online learning environment. Learners in K Online University participated by answering the survey, and data from 654 respondents were analysed for this study. Results showed that factors related to instructional interaction predicted perceived learning achievement and satisfaction better than factors related to social interaction. However, it was revealed that social interaction such as social intimacy could negatively affect perceived learning achievement and satisfaction. This study has value because it found factors under learner–instructor interaction which predict perceived learning achievement and satisfaction with empirical evidence.  相似文献   

20.
针对传统的稀疏表示字典学习图像分类方法在大规模分布式环境下效率低下的问题,设计一种基于稀疏表示全局字典的图像学习方法。将传统的字典学习步骤分布到并行节点上,使用凸优化方法在节点上学习局部字典并实时更新全局字典,从而提高字典学习效率和大规模数据的分类效率。最后在MapReduce平台上进行并行化实验,结果显示该方法在不影响分类精度的情况下对大规模分布式数据的分类有明显的加速,可以更高效地运用于各种大规模图像分类任务中。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号