首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Any agent in the real world has to be able to make distinctions between different types of objects, i.e. it must have the competence of categorization. In mobile agents, there is a large variation in proximal sensory stimulation originating from the same object. Therefore, categorization behavior is hard to achieve, and the successes in the past in solving this problem have been limited. In this paper it is proposed that the problem of categorization in the real world is significantly simplified if it is viewed as one of sensory—motor coordination, rather than one of information processing happening “on the input side”. A series of models are presented to illustrate the approach. It is concluded that we should consider replacing the metaphor of information processing for intelligent systems by the one of sensory-motor coordination. However, the principle of sensory-motor coordination is more than a metaphor. It offers concrete mechanisms for putting agents to work in the real world. These ideas are illustrated with a series of experiments.  相似文献   

2.
当前在交通信号控制系统中引入智能化检测和控制已是大势所趋,特别是强化学习和深度强化学习方法在可扩展性、稳定性和可推广性等方面展现出巨大的技术优势,已成为该领域的研究热点。针对基于强化学习的交通信号控制任务进行了研究,在广泛调研交通信号控制方法研究成果的基础上,系统地梳理了强化学习和深度强化学习在智慧交通信号控制领域的分类及应用;并归纳了使用多智能体合作的方法解决大规模交通信号控制问题的可行方案,对大规模交通信号控制的交通场景影响因素进行了分类概述;从提高交通信号控制器性能的角度提出了本领域当前所面临的挑战和未来可能极具潜力的研究方向。  相似文献   

3.
The learning of complex control behaviour of autonomous mobile robots is one of the actual research topics. In this article an intelligent control architecture is presented which integrates learning methods and available domain knowledge. This control architecture is based on Reinforcement Learning and allows continuous input and output parameters, hierarchical learning, multiple goals, self-organized topology of the used networks and online learning. As a testbed this architecture is applied to the six-legged walking machine LAURON to learn leg control and leg coordination.  相似文献   

4.
In order to accomplish diverse tasks successfully in a dynamic (i.e., changing over time) construction environment, robots should be able to prioritize assigned tasks to optimize their performance in a given state. Recently, a deep reinforcement learning (DRL) approach has shown potential for addressing such adaptive task allocation. It remains unanswered, however, whether or not DRL can address adaptive task allocation problems in dynamic robotic construction environments. In this paper, we developed and tested a digital twin-driven DRL learning method to explore the potential of DRL for adaptive task allocation in robotic construction environments. Specifically, the digital twin synthesizes sensory data from physical assets and is used to simulate a variety of dynamic robotic construction site conditions within which a DRL agent can interact. As a result, the agent can learn an adaptive task allocation strategy that increases project performance. We tested this method with a case project in which a virtual robotic construction project (i.e., interlocking concrete bricks are delivered and assembled by robots) was digitally twinned for DRL training and testing. Results indicated that the DRL model’s task allocation approach reduced construction time by 36% in three dynamic testing environments when compared to a rule-based imperative model. The proposed DRL learning method promises to be an effective tool for adaptive task allocation in dynamic robotic construction environments. Such an adaptive task allocation method can help construction robots cope with uncertainties and can ultimately improve construction project performance by efficiently prioritizing assigned tasks.  相似文献   

5.
The increasing demand for mobility in our society poses various challenges to traffic engineering, computer science in general, and artificial intelligence and multiagent systems in particular. As it is often the case, it is not possible to provide additional capacity, so that a more efficient use of the available transportation infrastructure is necessary. This relates closely to multiagent systems as many problems in traffic management and control are inherently distributed. Also, many actors in a transportation system fit very well the concept of autonomous agents: the driver, the pedestrian, the traffic expert; in some cases, also the intersection and the traffic signal controller can be regarded as an autonomous agent. However, the “agentification” of a transportation system is associated with some challenging issues: the number of agents is high, typically agents are highly adaptive, they react to changes in the environment at individual level but cause an unpredictable collective pattern, and act in a highly coupled environment. Therefore, this domain poses many challenges for standard techniques from multiagent systems such as coordination and learning. This paper has two main objectives: (i) to present problems, methods, approaches and practices in traffic engineering (especially regarding traffic signal control); and (ii) to highlight open problems and challenges so that future research in multiagent systems can address them.  相似文献   

6.
Some emerging computing systems (especially autonomic computing systems) raise several challenges to autonomous agents, including (1) how to reflect the dynamics of business requirements, (2) how to coordinate with external agents with sufficient level of security and predictability, and (3) how to perform reasoning with dynamic and incomplete knowledge, including both informational knowledge (observations) and motivational knowledge (for example, policy rules and contract rules). On the basis of defeasible logic and argumentation, this paper proposes an autonomous, normative and guidable agent model, called ANGLE, to cope with these challenges. This agent is established by combining beliefs-desires-intentions (BDI) architecture with policy-based method and the mechanism of contract-based coordination. Its architecture, knowledge representation, as well as reasoning and decision-making, are presented in this paper. ANGLE is characteristic of the following three aspects. First, both its motivational knowledge and informational knowledge are changeable, and allowed to be incomplete, inconsistent/conflicting. Second, its knowledge is represented in terms of extended defeasible logic with modal operators. Different from the existing defeasible theories, its theories (including belief theory, goal theory and intention theory) are dynamic (called dynamic theories), reflecting the variations of observations and external motivational knowledge. Third, its reasoning and decision-making are based on argumentation. Due to the dynamics of underlying theories, argument construction is not a monotonic process, which is different from the existing argumentation framework where arguments are constructed incrementally.  相似文献   

7.
As an important management tool of winning competitive advantage, induced learning effect has been widely studied in empirical research area. But it is hardly considered in scheduling problems. In this paper, autonomous and induced learning are both taken into consideration. The investment of induced learning is interpreted as specialized time intervals to implement training, knowledge sharing and transferring etc. We present algorithms to determine jointly the optimal job sequence and the optimal position of induced learning intervals, with the objective of minimizing makespan.  相似文献   

8.
Maze problems represent a simplified virtual model of the real environment and can be used for developing core algorithms of many real-world application related to the problem of navigation. Learning Classifier Systems (LCS) are the most widely used class of algorithms for reinforcement learning in mazes. However, LCSs best achievements in maze problems are still mostly bounded to non-aliasing environments, while LCS complexity seems to obstruct a proper analysis of the reasons for failure. Moreover, there is a lack of knowledge of what makes a maze problem hard to solve by a learning agent. To overcome this restriction we try to improve our understanding of the nature and structure of maze environments. In this paper we describe a new LCS agent that has a simpler and more transparent performance mechanism. We use the structure of a predictive LCS model, strip out the evolutionary mechanism, simplify the reinforcement learning procedure and equip the agent with the ability to Associative Perception, adopted from psychology. We then assess the new LCS with Associative Perception on an extensive set of mazes and analyse the results to discover which features of the environments play the most significant role in the learning process. We identify a particularly hard feature for learning in mazes, aliasing clones, which arise when groups of aliasing cells occur in similar patterns in different parts of the maze. We discuss the impact of aliasing clones and other types of aliasing on learning algorithms.  相似文献   

9.
Iterative learning control (ILC) is a 2-degree-of-freedom technique that seeks to improve system performance along the time and iteration domains. Traditionally, ILC has been implemented to minimize trajectory-tracking errors across an entire cycle period. However, there are applications in which the necessity for improved tracking performance can be limited to a few specific locations. For such systems, a modified learning controller focused on improved tracking at the selected points can be leveraged to address multiple performance metrics, resulting in systems that exhibit significantly improved behaviors across a wide variety of performance metrics. This paper presents a pareto learning control framework that incorporates multiple objectives into a single design architecture.  相似文献   

10.
We apply the recurrent reinforcement learning method of Moody, Wu, Liao, and Saffell (1998) in the context of the strategic asset allocation computed for sample data from US, UK, Germany, and Japan. It is found that the optimal asset allocation deviates substantially from the fixed-mix rule. The investor actively times the market and he is able to outperform it consistently over the almost two decades we analyze.  相似文献   

11.
In this paper, we introduce an online algorithm that uses integral reinforcement knowledge for learning the continuous‐time optimal control solution for nonlinear systems with infinite horizon costs and partial knowledge of the system dynamics. This algorithm is a data‐based approach to the solution of the Hamilton–Jacobi–Bellman equation, and it does not require explicit knowledge on the system's drift dynamics. A novel adaptive control algorithm is given that is based on policy iteration and implemented using an actor/critic structure having two adaptive approximator structures. Both actor and critic approximation networks are adapted simultaneously. A persistence of excitation condition is required to guarantee convergence of the critic to the actual optimal value function. Novel adaptive control tuning algorithms are given for both critic and actor networks, with extra terms in the actor tuning law being required to guarantee closed loop dynamical stability. The approximate convergence to the optimal controller is proven, and stability of the system is also guaranteed. Simulation examples support the theoretical result. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

12.
We present a method for automatic grasp generation based on object shape primitives in a Programming by Demonstration framework. The system first recognizes the grasp performed by a demonstrator as well as the object it is applied on and then generates a suitable grasping strategy on the robot. We start by presenting how to model and learn grasps and map them to robot hands. We continue by performing dynamic simulation of the grasp execution with a focus on grasping objects whose pose is not perfectly known.  相似文献   

13.
磨矿粒度和循环负荷是磨矿过程产品质量与生产效率的关键运行指标,相对于底层控制偏差,回路设定值对其影响要严重的多.然而,磨矿过程受矿石成分与性质、设备状态等变化因素影响,运行工况动态时变,难以建立模型,因此难以通过传统的模型方法优化回路设定值.本文将增强学习与案例推理相结合,提出一种数据驱动的磨矿过程设定值优化方法.首先根据当前运行工况,采用基于Prey-Predator优化的案例推理方法,决策出可行的基于Elman神经网络的Q函数网络模型;然后利用实际运行数据,在增强学习的框架下,根据Q函数网络模型优化回路设定值.在基于METSIM的磨矿流程模拟系统上进行实验研究,结果表明所提方法可根据工况变化在线优化回路设定值,实现磨矿运行指标的优化控制.  相似文献   

14.
As the number of online education and training programs increase, researchers and practitioners are interested in investigating ways to design and develop effective e-learning programs. One of the major design decisions that affects learning effectiveness is the choice of media to present the contents of such programs. The prevailing tendency seems to be to use “richer” medium, in the progression from text to graphics to audio to video, for designing and developing e-learning programs. It is not clear, however, if a “richer” medium provides proportionately higher learning effectiveness. To investigate this gap in our understanding, we developed an integrated research model and tested it empirically. Our results showed that the relationship between media choice in an e-learning program and the effectiveness of that program is moderated by the learning domain of the program and the learning styles of learners.  相似文献   

15.
The use of evolutionary methods to generate controllers for real-world autonomous agents has attracted recent attention. Most of the pertinent research has employed genetic algorithms or variations thereof. Recent research has indicated that the presence of epistasis drastically slows down genetic algorithms. For this reason, this paper uses a different evolutionary method, evolution strategies, for the evolution of various (complex) neuronal control architectures for mobile robots inspired by Braitenberg vehicles. In these experiments, the evolution strategy accelerates the development process by more than an order of magnitude (a few hours compared to more than two days). Furthermore, the evolution strategy yields the same efficacy when applied to receptive-field controllers that require many more parameters than Braitenberg controllers. This dramatic speedup is very important, since the development process is to be done in real robots.  相似文献   

16.
This paper deals with the model-free adaptive control (MFAC) based on the reinforcement learning (RL) strategy for a family of discrete-time nonlinear processes. The controller is constructed based on the approximation ability of neural network architecture, a new actor-critic algorithm for neural network control problem is developed to estimate the strategic utility function and the performance index function. More specifically, the novel RL-based MFAC scheme is reasonable to design the controller without need to estimate y(k+1) information. Furthermore, based on Lyapunov stability analysis method, the closed-loop systems can be ensured uniformly ultimately bounded. Simulations are shown to validate the theoretical results.  相似文献   

17.
In this paper, an adaptive reinforcement learning approach is developed for a class of discrete‐time affine nonlinear systems with unmodeled dynamics. The multigradient recursive (MGR) algorithm is employed to solve the local optimal problem, which is inherent in gradient descent method. The MGR radial basis function neural network approximates the utility functions and unmodeled dynamics, which has a faster rate of convergence than that of the gradient descent method. A novel strategic utility function and cost function are defined for the affine systems. Finally, it concludes that all the signals in the closed‐loop system are semiglobal uniformly ultimately bounded through differential Lyapunov function method, and two simulation examples are presented to demonstrate the effectiveness of the proposed scheme.  相似文献   

18.
Composite adaptation and learning techniques were initially proposed for improving parameter convergence in adaptive control and have generated considerable research interest in the last three decades, inspiring numerous robot control applications. The key idea is that more sources of parametric information are applied to drive parameter estimates aside from trajectory tracking errors. Both composite adaptation and learning can ensure superior stability and performance. However, composite learning possesses a unique feature in that online data memory is fully exploited to extract parametric information such that parameter convergence can be achieved without a stringent condition termed persistent excitation. In this article, we provide the first systematic and comprehensive survey of prevalent composite adaptation and learning approaches for robot control, especially focusing on exponential parameter convergence. Composite adaptation is classified into regressor-filtering composite adaptation and error-filtering composite adaptation, and composite learning is classified into discrete-data regressor extension and continuous-data regressor extension. For the sake of clear presentation and better understanding, a general class of robotic systems is applied as a unifying framework to show the motivation, synthesis, and characteristics of each parameter estimation method for adaptive robot control. The strengths and deficiencies of all these methods are also discussed sufficiently. We have concluded by suggesting possible directions for future research in this area.  相似文献   

19.
仿生机器人是一类典型的多关节非线性欠驱动系统,其步态控制是一个非常具有挑战性的问题。对于该问题,传统的控制和规划方法需要针对具体的运动任务进行专门设计,需要耗费大量时间和精力,而且所设计出来的控制器往往没有通用性。基于数据驱动的强化学习方法能对不同的任务进行自主学习,且对不同的机器人和运动任务具有良好的通用性。因此,近年来这种基于强化学习的方法在仿生机器人运动步态控制方面获得了不少应用。针对这方面的研究,本文从问题形式化、策略表示方法和策略学习方法3个方面对现有的研究情况进行了分析和总结,总结了强化学习应用于仿生机器人步态控制中尚待解决的问题,并指出了后续的发展方向。  相似文献   

20.
针对发生故障的飞行控制系统,在强化学习算法的基础上,提出了一种基于增量式策略的强化学习容错方法.本方法利用传感器获取的系统状态值,根据系统预先设定的奖励函数对当前控制系统状况做出最优的决策并不断更新价值网络,将系统的容错控制过程转换为强化学习Agent的贯序决策过程,并使用一种改进型的增量式策略实现对当前故障的正确补偿策略的逐渐逼近.同时,针对连续控制系统,提出一种状态转移预测网络来得到下一步状态值.最后,通过南京航空航天大学"先进飞行器导航、控制与健康管理"工信部重点实验室的飞行器故障诊断实验平台验证了该方法的有效性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号