首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 9 毫秒
1.
In this work, RL is used to find an optimal policy for a marketing campaign. Data show a complex characterization of state and action spaces. Two approaches are proposed to circumvent this problem. The first approach is based on the self-organizing map (SOM), which is used to aggregate states. The second approach uses a multilayer perceptron (MLP) to carry out a regression of the action-value function. The results indicate that both approaches can improve a targeted marketing campaign. Moreover, the SOM approach allows an intuitive interpretation of the results, and the MLP approach yields robust results with generalization capabilities.  相似文献   

2.
In this paper, we propose fuzzy logic-based cooperative reinforcement learning for sharing knowledge among autonomous robots. The ultimate goal of this paper is to entice bio-insects towards desired goal areas using artificial robots without any human aid. To achieve this goal, we found an interaction mechanism using a specific odor source and performed simulations and experiments [1]. For efficient learning without human aid, we employ cooperative reinforcement learning in multi-agent domain. Additionally, we design a fuzzy logic-based expertise measurement system to enhance the learning ability. This structure enables the artificial robots to share knowledge while evaluating and measuring the performance of each robot. Through numerous experiments, the performance of the proposed learning algorithms is evaluated.  相似文献   

3.
T.  S. S.  C. C. 《Automatica》2000,36(12)
This paper focuses on adaptive control of strict-feedback nonlinear systems using multilayer neural networks (MNNs). By introducing a modified Lyapunov function, a smooth and singularity-free adaptive controller is firstly designed for a first-order plant. Then, an extension is made to high-order nonlinear systems using neural network approximation and adaptive backstepping techniques. The developed control scheme guarantees the uniform ultimate boundedness of the closed-loop adaptive systems. In addition, the relationship between the transient performance and the design parameters is explicitly given to guide the tuning of the controller. One important feature of the proposed NN controller is the highly structural property which makes it particularly suitable for parallel processing in actual implementation. Simulation studies are included to illustrate the effectiveness of the proposed approach.  相似文献   

4.
This paper presents an adaptive neural control design for nonlinear pure-feedback systems with an input time-delay. Novel state variables and the corresponding transform are introduced, such that the state-feedback control of a pure-feedback system can be viewed as the output-feedback control of a canonical system. An adaptive predictor incorporated with a high-order neural network (HONN) observer is proposed to obtain the future system states predictions, which are used in the control design to circumvent the input delay and nonlinearities. The proposed predictor, observer and controller are all online implemented without iterative predictive calculations, and the closed-loop system stability is guaranteed. The conventional backstepping design and analysis for pure-feedback systems are avoided, which renders the developed scheme simpler in its synthesis and application. Practical guidelines on the control implementation and the parameter design are provided. Simulation on a continuous stirred tank reactor (CSTR) and practical experiments on a three-tank liquid level process control system are included to verify the reliability and effectiveness.  相似文献   

5.
Robot arm reaching through neural inversions and reinforcement learning   总被引:1,自引:0,他引:1  
We present a neural method that computes the inverse kinematics of any kind of robot manipulators, both redundant and non-redundant. Inverse kinematics solutions are obtained through the inversion of a neural network that has been previously trained to approximate the manipulator forward kinematics. The inversion provides difference vectors in the joint space from difference vectors in the workspace. Our differential inverse kinematics (DIV) approach can be viewed as a neural network implementation of the Jacobian transpose method for arm kinematic control that does not require previous knowledge of the arm forward kinematics. Redundancy can be exploited to obtain a special inverse kinematic solution that meets a particular constraint (e.g. joint limit avoidance) by inverting an additional neural network The usefulness of our DIV approach is further illustrated with sensor-based multilink manipulators that learn collision-free reaching motions in unknown environments. For this task, the neural controller has two modules: a reinforcement-based action generator (AG) and a DIV module that computes goal vectors in the joint space. The actions given by the AG are interpreted with regard to those goal vectors.  相似文献   

6.
The learning of complex control behaviour of autonomous mobile robots is one of the actual research topics. In this article an intelligent control architecture is presented which integrates learning methods and available domain knowledge. This control architecture is based on Reinforcement Learning and allows continuous input and output parameters, hierarchical learning, multiple goals, self-organized topology of the used networks and online learning. As a testbed this architecture is applied to the six-legged walking machine LAURON to learn leg control and leg coordination.  相似文献   

7.
In this paper, a new formulation for the optimal tracking control problem (OTCP) of continuous-time nonlinear systems is presented. This formulation extends the integral reinforcement learning (IRL) technique, a method for solving optimal regulation problems, to learn the solution to the OTCP. Unlike existing solutions to the OTCP, the proposed method does not need to have or to identify knowledge of the system drift dynamics, and it also takes into account the input constraints a priori. An augmented system composed of the error system dynamics and the command generator dynamics is used to introduce a new nonquadratic discounted performance function for the OTCP. This encodes the input constrains into the optimization problem. A tracking Hamilton–Jacobi–Bellman (HJB) equation associated with this nonquadratic performance function is derived which gives the optimal control solution. An online IRL algorithm is presented to learn the solution to the tracking HJB equation without knowing the system drift dynamics. Convergence to a near-optimal control solution and stability of the whole system are shown under a persistence of excitation condition. Simulation examples are provided to show the effectiveness of the proposed method.  相似文献   

8.
基于自适应评价的非线性系统神经网络控制   总被引:1,自引:0,他引:1  
针对一类非线性系统,提出了一种自适应评价方法.该方法可以控制系统输出对参考信号进行跟踪,其评价函数可直接解析求出.该方法只需一个动作网络用于产生控制动作,并且方法中的网络权值初始化可随机选取.使用Lyapunov方法对整个系统的动态性能进行分析,证明了在一定条件下此方法能保证闭环误差及网络权值一致最终有界.仿真结果与理论分析相一致,证明了所提出方法的有效性.  相似文献   

9.
S.N. Huang  K.K. Tan  T.H. Lee 《Automatica》2005,41(9):1645-1649
This paper designs a decentralized neural network (NN) controller for a class of nonlinear large-scale systems, in which strong interconnections are involved. NNs are used to handle unknown functions. The proposed scheme is proved guaranteeing the boundedness of the closed-loop subsystems using only local feedback signals.  相似文献   

10.
A combination of multiple neural networks (NNs) is selected and used to model nonlinear multi-input multi-output (MIMO) processes with time delays. An optimisation procedure for a nonlinear model-predictive control (MPC) algorithm based on this model is then developed. The proposed scheme has been applied and evaluated for two example problems, including the MPC of a multi-component distillation column.  相似文献   

11.
针对无线传感器网络面向移动汇聚节点的自适应路由问题,为实现路由过程中对节点能量以及计算、存储、通信资源的优化利用,并对数据传输时延和投递率等服务质量进行优化,提出一种基于强化学习的自适应路由方法,设计综合的奖赏函数以实现对能量、时延和投递率等多个指标的综合优化。从报文结构、路由初始化、路径选择等方面对路由协议进行详细设计,采用汇聚节点声明以及周期性洪泛机制加速收敛速度,从而支持汇聚节点的快速移动。理论分析表明基于强化学习的路由方法具备收敛快、协议开销低以及存储计算需求小等特点,能够适用于能量和资源受限的传感器节点。在仿真平台中通过性能评估和对比分析验证了所述自适应路由算法的可行性和优越性。  相似文献   

12.
A procedure is developed for the design of adaptive neural network controller for a class of SISO uncertain nonlinear systems in pure-feedback form. The design procedure is a combination of adaptive backstepping and neural network based design techniques. It is shown that, under appropriate assumptions, the solution of the closed-loop system is uniformly ultimately bounded.  相似文献   

13.
The use of genetic algorithms to design neural networks for real-time control of flows in sewerage networks is discussed. In many control applications, standard supervised learning techniques (such as back-propagation) cannot be used through lack of training data. Reinforcement learning techniques, such as genetic algorithms, are a computationally-expensive but viable alternative if a simulator is available for the system in question. The paper briefly describes why genetic algorithms and neural networks were selected, then reports the results of a feasibility study. This demonstrates that the approach does indeed have merits. The implications of high computational cost are discussed, in terms of scaling up to significantly complex problems.  相似文献   

14.
具有未知死区输入非线性系统的迭代学习控制   总被引:1,自引:0,他引:1  
针对一类具有死区输入非线性系统,提出一种实现有限作业区间轨迹跟踪控制的神经网络迭代学习算法.基于Lyapunov-like方法设计学习控制器,回避了常规迭代学习控制中受控系统非线性特性需满足全局Lipschitz连续条件的要求.为处理输入死区,利用神经网络逼近这种强非线性特性;同时,通过对神经网络逼近误差界的估计并在控制器中设置补偿作用以消除其影响,从而提高系统的跟踪性能.  相似文献   

15.
针对一类未知的纯反馈非线性离散系统,提出了基于反步法设计的自适应神经网络控制方法.为避免反步法设计中可能出现的因果矛盾问题,首先将系统进行等价变换,然后利用隐函数定理证实了理想虚拟控制输入和实际控制输入的存在性.利用高阶神经网络估计这些控制量,并基于反步法设计自适应神经网络控制系统,证明了闭环系统半全局一致最终有界.仿真结果验证了所提出方法的有效性.  相似文献   

16.
While driving a vehicle safely at its handling limit is essential in autonomous vehicles in Level 5 autonomy, it is a very challenging task for current conventional methods. Therefore, this study proposes a novel controller of trajectory planning and motion control for autonomous driving through manifold corners at the handling limit to improve the speed and shorten the lap time of the vehicle. The proposed controller innovatively combines the advantages of conventional model-based control algorithm, model-free reinforcement learning algorithm, and prior expert knowledge, to improve the training efficiency for autonomous driving in extreme conditions. The reward shaping of this algorithm refers to the procedure and experience of race training of professional drivers in real time. After training on track maps that exhibit different levels of difficulty, the proposed controller implemented a superior strategy compared to the original reference trajectory, and can to other tougher maps based on the basic driving knowledge learned from the simpler map, which verifies its superiority and extensibility. We believe this technology can be further applied to daily life to expand the application scenarios and maneuvering envelopes of autonomous vehicles.  相似文献   

17.
即时学习算法在非线性系统迭代学习控制中的应用   总被引:4,自引:1,他引:4       下载免费PDF全文
孙维  王伟  朱瑞军 《控制与决策》2003,18(3):263-266
运用即时学习算法来解决一类非线性系统的迭代学习控制初值问题。对于任何类型的迭代学习控制算法,即时学习算法都能有效地估计初始控制量,减小了初始输出误差,加快了算法的收敛速度,使得经过有限次迭代后系统输出能严格跟踪理想信号。对机器人系统的仿真结果表明了该方法的有效性。  相似文献   

18.
A neural network model predictive controller   总被引:2,自引:0,他引:2  
A neural network controller is applied to the optimal model predictive control of constrained nonlinear systems. The control law is represented by a neural network function approximator, which is trained to minimize a control-relevant cost function. The proposed procedure can be applied to construct controllers with arbitrary structures, such as optimal reduced-order controllers and decentralized controllers.  相似文献   

19.
基于平衡学习的CMAC神经网络非线性滑模容错控制   总被引:2,自引:1,他引:1  
以一改进的信度分配CMAC(cerebellar model articulation controllers)神经网络为在线故障诊断的手段,将变结构滑模摔制技术引入容错控制器设计之中,提出一种动态非线性系统主动容错控制方法.在常规CMAC学习算法中,误差被平均地分配给所有被激活的存储单元,不管各存储单元存储数据(权值)的可信程度.改进的CMAC中,利用激活单元先前学习次数作为可信度,其误差校正值与激活单元先前学习次数的-p次方成比例,从而提高神经网络的在线学习速度和精度;在此基础上利用滑模控制算法进行容错控制律的在线重构,实现动态非线性系统在线故障诊断与容错控制的集成.分析了系统的稳定性,仿真结果表明改进故障学习算法及容错控制的有效性.  相似文献   

20.
We describe an implementation of a vector quantization codebook design algorithm based on the frequencysensitive competitive learning artificial neural network. The implementation, designed for use on high-performance computers, employs both multitasking and vectorization techniques. A C version of the algorithm tested on a CRAY Y-MP8/864 is discussed. We show how the implementation can be used to perform vector quantization, and demonstrate its use in compressing digital video image data. Two images are used, with various size codebooks, to test the performance of the implementation. The results show that the supercomputer techniques employed have significantly decreased the total execution time without affecting vector quantization performance.This work was supported by a Cray University Research Award and by NASA Lewis research grant number NAG3-1164.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号