首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 62 毫秒
1.
一种基于案例推理的多agent 强化学习方法研究   总被引:3,自引:0,他引:3  
提出一种基于案例推理的多agent 强化学习方法.构建了系统策略案例库,通过判断agent 之间的协作 关系选择相应案例库子集.利用模拟退火方法从中寻找最合适的可再用案例策略,agent 按照案例指导执行动作选 择.在没有可用案例的情况下,agent 执行联合行为学习(JAL).在学习结果的基础上实时更新系统策略案例库.追 捕问题的仿真结果表明所提方法明显提高了学习速度与收敛性.  相似文献   

2.
数据库系统经过近50年的发展,虽然已经普遍商用,但随着大数据时代的到来,数据库系统在2个方面面临挑战.首先数据量持续增大期望单个查询任务具有更快的处理速度;其次查询负载的快速变化及其多样性使得基于DBA经验的数据库配置和查询优化偏好不能实时地调整为最佳运行时状态.而数据库系统的性能优化进入瓶颈期,优化空间收窄,进一步优化只能依托新的硬件加速器来实现,传统的数据库系统不能够有效利用现代的硬件加速器;数据库系统具有成百个可调参数,面对工作负载频繁变化,大量繁琐的参数配置已经超出DBA的能力,这使得数据库系统面对快速而又多样性的变化缺乏实时响应能力.当下机器学习技术恰好同时符合这2个条件:应用现代加速器以及从众多参数调节经验中学习.机器学习化数据库系统将机器学习技术引入到数据库系统设计中.一方面将顺序扫描转化为计算模型,从而能够利用现代硬件加速平台;另一方面将DBA的经验转化为预测模型,从而使得数据库系统更加智能地动态适应工作负载的快速多样性变化.将对机器学习化数据库系统当前的研究工作进行总结与归纳,主要包括存储管理、查询优化的机器学习化研究以及自动化的数据库管理系统.在对已有技术分析的基础上,指出了机器学习化数据库系统的未来研究方向及可能面临的问题与挑战.  相似文献   

3.
利用健康医疗领域的海量临床数据进行辅助医疗决策支持是智慧医疗的核心技术和必然的发展趋势。医疗决策支持主要包括疾病风险预测与疾病智能诊断两方面,以临床积累和实时获取的多种数据来源为基础,通过多种机器学习算法实现对患者疾病类型的分类或者对患病风险的预测。从医疗决策支持的概念和方法框架出发,按照不同疾病种类,总结了当前采用的机器学习诊断和预测方法,着重介绍这些方法的特点和区别,并对存在的挑战和未来发展进行分析。  相似文献   

4.
Reinforcement Learning (RL) is learning through directexperimentation. It does not assume the existence of a teacher thatprovides examples upon which learning of a task takes place. Instead, inRL experience is the only teacher. With historical roots on the study ofbiological conditioned reflexes, RL attracts the interest of Engineersand Computer Scientists because of its theoretical relevance andpotential applications in fields as diverse as Operational Research andIntelligent Robotics.Computationally, RL is intended to operate in a learning environmentcomposed by two subjects: the learner and a dynamic process. Atsuccessive time steps, the learner makes an observation of the processstate, selects an action and applies it back to the process. Its goal isto find out an action policy that controls the behavior of the dynamicprocess, guided by signals (reinforcements) that indicate how badly orwell it has been performing the required task. These signals are usuallyassociated to a dramatic condition – e.g., accomplishment of a subtask(reward) or complete failure (punishment), and the learner tries tooptimize its behavior by using a performance measure (a function of thereceived reinforcements). The crucial point is that in order to do that,the learner must evaluate the conditions (associations between observedstates and chosen actions) that led to rewards or punishments.Starting from basic concepts, this tutorial presents the many flavorsof RL algorithms, develops the corresponding mathematical tools, assesstheir practical limitations and discusses alternatives that have beenproposed for applying RL to realistic tasks.  相似文献   

5.
人工智能已经渗透到生活的各个角落,给人类带来了极大的便利.尤其是近年来,随着机器学习中深度学习这一分支的蓬勃发展,生活中的相关应用越来越多.不幸的是,机器学习系统也面临着许多安全隐患,而机器学习系统的普及更进一步放大了这些风险.为了揭示这些安全隐患并实现一个强大的机器学习系统,对主流的深度学习系统进行了调查.首先设计了一个剖析深度学习系统的分析模型,并界定了调查范围.调查的深度学习系统跨越了4个领域——图像分类、音频语音识别、恶意软件检测和自然语言处理,提取了对应4种类型的安全隐患,并从复杂性、攻击成功率和破坏等多个维度对其进行了表征和度量.随后,调研了针对深度学习系统的防御技术及其特点.最后通过对这些系统的观察,提出了构建健壮的深度学习系统的建议.  相似文献   

6.
Multiagent systems are increasingly present in computational environments. However, the problem of agent design or control is an open research field. Reinforcement learning approaches offer solutions that allow autonomous learning with minimal supervision. The Q‐learning algorithm is a model‐free reinforcement learning solution that has proven its usefulness in single‐agent domains; however, it suffers from dimensionality curse when applied to multiagent systems. In this article, we discuss two approaches, namely TRQ‐learning and distributed Q‐learning, that overcome the limitations of Q‐learning offering feasible solutions. We test these approaches in two separate domains. The first is the control of a hose by a team of robots. The second is the trash disposal problem. Computational results show the effectiveness of Q‐learning solutions to multiagent systems’ control.  相似文献   

7.
智能控制系统中机器学习的研究   总被引:1,自引:0,他引:1       下载免费PDF全文
本文总结了制造领域中机器学习方法的研究现状,针对智能控制系统所面临的外界环境动态变化和内部技术演化,提出一个多Agent协同实现、多种学习机制集成的学习模型框架;最后,分析并展望了智能制造系统中机器学习研究的发展方向。  相似文献   

8.
孙明轩  何熊熊  陈冰玉 《自动化学报》2007,33(11):1189-1195
Repetitive learning control is presented for finite-time-trajectory tracking of uncertain time-varying robotic systems. A hybrid learning scheme is given to cope with the constant and time-varying unknowns in system dynamics, where the time functions are learned in an iterative learning way, without the aid of Taylor expression, while the conventional differential learning method is suggested for estimating the constant ones. It is distinct that the presented repetitive learning control avoids the requirement for initial repositioning at the beginning of each cycle, and the time-varying unknowns are not necessary to be periodic. It is shown that with the adoption of hybrid learning, the boundedness of state variables of the closed-loop system is guaranteed and the tracking error is ensured to converge to zero as iteration increases. The effectiveness of the proposed scheme is demonstrated through numerical simulation.  相似文献   

9.
Repetitive learning control is presented for finite- time-trajectory tracking of uncertain time-varying robotic sys- tems.A hybrid learning scheme is given to cope with the con- stant and time-varying unknowns in system dynamics,where the time functions are learned in an iterative learning way,without the aid of Taylor expression,while the conventional differential learning method is suggested for estimating the constant ones. It is distinct that the presented repetitive learning control avoids the requirement for initial repositioning at the beginning of each cycle,and the time-varying unknowns are not necessary to be periodic.It is shown that with the adoption of hybrid learning, the boundedness of state variables of the closed-loop system is guaranteed and the tracking error is ensured to converge to zero as iteration increases.The effectiveness of the proposed scheme is demonstrated through numerical simulation.  相似文献   

10.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号