首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 750 毫秒
1.
The probabilistic RAM (pRAM) is a hardware-realizable neural device which is stochastic in operation and highly nonlinear. Even small nets of pRAMs offer high levels of functionality. The means by which a pRAM network generalizes when trained in noise is shown and the results of this behavior are described.  相似文献   

2.
Second-order training of adaptive critics for online process control.   总被引:1,自引:0,他引:1  
This paper deals with reinforcement learning for process modeling and control using a model-free, action- dependent adaptive critic (ADAC). A new modified recursive Levenberg Marquardt (RLM) training algorithm, called temporal difference RLM, is developed to improve the ADAC performance. Novel application results for a simulated continuously-stirred-tank-reactor process are included to show the superiority of the new algorithm to conventional temporal-difference stochastic backpropagation.  相似文献   

3.
This paper introduces ANASA (adaptive neural algorithm of stochastic activation), a new, efficient, reinforcement learning algorithm for training neural units and networks with continuous output. The proposed method employs concepts, found in self-organizing neural networks theory and in reinforcement estimator learning algorithms, to extract and exploit information relative to previous input pattern presentations. In addition, it uses an adaptive learning rate function and a self-adjusting stochastic activation to accelerate the learning process. A form of optimal performance of the ANASA algorithm is proved (under a set of assumptions) via strong convergence theorems and concepts. Experimentally, the new algorithm yields results, which are superior compared to existing associative reinforcement learning methods in terms of accuracy and convergence rates. The rapid convergence rate of ANASA is demonstrated in a simple learning task, when it is used as a single neural unit, and in mathematical function modeling problems, when it is used to train various multilayered neural networks.  相似文献   

4.
Adaptivevirtual cut-throughis considered as a viable alternative towormhole switchingfor fast and hardware-efficient interprocessor communication in multicomputers. Computer simulations are used to show that our implementation of a minimal-path fully-adaptive virtual cut-through algorithm outperforms both deterministic and adaptive wormhole switching methods under both uniform random message distributions and clustered distributions such as the matrix transpose. A hardware-efficient implementation of adaptive virtual cut-through has been implemented using a semi-custom-designed router chip that requires only 2.3% more area than a comparable deterministic wormhole router chip. A network interface controller chip, which is crucial to our adaptive virtual cut-through method, has also been designed and is under fabrication.  相似文献   

5.
We present a system for rapidly and easily building instructable and self-adaptive software agents that retrieve and extract information. Our Wisconsin Adaptive Web Assistant (WAWA) constructs intelligent agents by accepting user preferences in the form of instructions. These user-provided instructions are compiled into neural networks that are responsible for the adaptive capabilities of an intelligent agent. The agent’s neural networks are modified via user-provided and system-constructed training examples. Users can create training examples by rating Web pages (or documents), but more importantly WAWA’s agents uses techniques from reinforcement learning to internally create their own examples. Users can also provide additional instruction throughout the life of an agent. Our experimental evaluations on a ‘home-page finder’ agent and a ‘seminar-announcement extractor’ agent illustrate the value of using instructable and adaptive agents for retrieving and extracting information.  相似文献   

6.
This paper presents an adaptive polar-space motion controller for trajectory tracking and stabilization of a three-wheeled, embedded omnidirectional mobile robot with parameter variations and uncertainties caused by friction, slip and payloads. With the derived dynamic model in polar coordinates, an adaptive motion controller is synthesized via the adaptive backstepping approach. This proposed polar-space robust adaptive motion controller was implemented into an embedded processor using a field-programmable gate array (FPGA) chip. Furthermore, the embedded adaptive motion controller works with a reusable user IP (Intellectual Property) core library and an embedded real-time operating system (RTOS) in the same chip to steer the mobile robot to track the desired trajectory by using hardware/software co-design technique and SoPC (system-on-a-programmable-chip) technology. Simulation results are conducted to show the merit of the proposed polar-space control method in comparison with a conventional proportional-integral (PI) feedback controller and a non-adaptive polar-space kinematic controller. Finally, the effectiveness and performance of the proposed embedded adaptive motion controller are exemplified by conducting several experiments on steering an embedded omnidirectional mobile robot.  相似文献   

7.
The IBM RS/6000 SP is one of the most successful commercially available multicomputers. SP owes its success partially to the scalable, high bandwidth, low latency network. This paper describes the architecture of Switch2 switch chip, the recently developed third generation switching element which future IBM RS/6000 SP systems may be based on. Switch2 offers significant enhancements over the existing SP switch chips by incorporating advances in both VLSI technology and interconnection network research. One of the major new features of Switch2 is the incorporation of adaptive routing support into it. We describe the adaptive source routing architecture of the Switch2 chip which is a unique feature of this chip. The performance of the adaptive source routing and oblivious routing for a wide range of system characteristics and traffic patterns is evaluated. It is shown that adaptive source routing outperforms or performs comparably with oblivious routing. We propose two novel algorithms for generating adaptive routes specifications required for enabling the usage of adaptive source routing. A comparison between the cost of these two algorithms and the performance improvement obtained from using these algorithms are discussed. We also propose different output selection functions to be used in switching elements for implementing the adaptive routing. We evaluate and compare the performance of these selection functions and discover that the best selection functions for BMINs are not dependent on the traffic pattern, message size, or system size.  相似文献   

8.
探讨空间连续型机械臂执行在轨操作任务过程中的自适应轨迹跟踪控制器设计问题.首先,对于具有显著非线性特征的连续型机械臂动力学模型,考虑运动过程中存在的建模误差和外部干扰因素,设计变结构动力学控制器;然后,基于深度强化学习(deep reinforcement learning, DRL)对变结构控制器参数进行在线调整,实时优化控制器性能;最后,提出一种针对强化学习网络稀疏训练方法,训练过程中采用具有随机稀疏拓扑结构的稀疏连接层代替神经网络的全连接层,并以一定概率对连接薄弱的网络进行迭代剪枝,使得DRL的策略网络由初始稀疏拓扑结构演化为无标度网络,在不降低训练精度的基础上压缩网络规模.仿真结果表明,所提出基于强化学习的自适应控制器能够有效地进行连续型机械臂的跟踪控制,通过稀疏学习的方法,控制器在保证控制精度的同时,双隐层网络节点参数量下降99%,大幅降低了计算成本.  相似文献   

9.
The attitude control of a satellite is often characterized by a limit cycle, caused by measurement inaccuracies and noise in the sensor output. In order to reduce the limit cycle, a nonlinear fuzzy controller was applied. The controller was tuned by means of reinforcement learning without using any model of the sensors or the satellite. The reinforcement signal is computed as a fuzzy performance measure using a noncompensatory aggregation of two control subgoals. Convergence of the reinforcement learning scheme is improved by computing the temporal difference error over several time steps and adapting the critic and the controller at a lower sampling rate. The results show that an adaptive fuzzy controller can better cope with the sensor noise and nonlinearities than a standard linear controller  相似文献   

10.
连续状态自适应离散化基于K-均值聚类的强化学习方法   总被引:6,自引:1,他引:5  
文锋  陈宗海  卓睿  周光明 《控制与决策》2006,21(2):143-0148
使用聚类算法对连续状态空间进行自适应离散化.得到了基于K-均值聚类的强化学习方法.该方法的学习过程分为两部分:对连续状态空间进行自适应离散化的状态空间学习,使用K-均值聚类算法;寻找最优策略的策略学习.使用替代合适迹Sarsa学习算法.对连续状态的强化学习基准问题进行仿真实验,结果表明该方法能实现对连续状态空间的自适应离散化,并最终学习到最优策略.与基于CMAC网络的强化学习方法进行比较.结果表明该方法具有节省存储空间和缩短计算时间的优点.  相似文献   

11.
IP电话系统中语音芯片的DSP实现   总被引:4,自引:0,他引:4  
介绍了一种具有G.723.1语音编解码、自适应回声抵消DTMF编解码功能的语音芯片,以满足IP电话系统中低速率语音通信的需要。根据DSP内核的特点,对G.723.1算法进行了优化,降低了算法 复杂度;回声抵消部分使用归一化最小二乘(NL MS)自适应滤波器实现,采用Goertzel算法实现DTMF的检测。使用较少的资源,在一块TMS320C5402芯片上集成了以上全部功能。  相似文献   

12.
This paper proposes an adaptive critic tracking control design for a class of nonlinear systems using fuzzy basis function networks (FBFNs). The key component of the adaptive critic controller is the FBFN, which implements an associative learning network (ALN) to approximate unknown nonlinear system functions, and an adaptive critic network (ACN) to generate the internal reinforcement learning signal to tune the ALN. Another important component, the reinforcement learning signal generator, requires the solution of a linear matrix inequality (LMI), which should also be satisfied to ensure stability. Furthermore, the robust control technique can easily reject the effects of the approximation errors of the FBFN and external disturbances. Unlike traditional adaptive critic controllers that learn from trial-and-error interactions, the proposed on-line tuning algorithm for ALN and ACN is derived from Lyapunov theory, thereby significantly shortening the learning time. Simulation results of a cart-pole system demonstrate the effectiveness of the proposed FBFN-based adaptive critic controller.  相似文献   

13.
The traditional maintenance training method with a physical model of the product is costly and inconvenient. Computer‐aided instruction (CAI) technology along with multimedia can provide much help in the training but provides limited interaction between the user and the system. In this article, a 3D model‐based product structure browsing system for maintenance training, CAMT, is developed for complex products adopting desktop virtual reality technology. To improve training performance, the interaction between the trainee and the CAMT system is enhanced by adaptive change of the zoom level, mouse sensitivity, and rotation origin. Details about the implementation of adaptive interaction are discussed. Experiments were conducted to test the effectiveness of this adaptive interaction. Seventy participants were arranged randomly into two groups assigned to perform product structure learning tasks using software with or without adaptive interaction functions. Statistical analysis showed that there were significant differences between the groups in task time, operation convenience, and learning satisfaction. Most participants preferred to use the system with adaptive interaction. It may be concluded that using adaptive interaction with maintenance training systems can significantly improve the usability of the systems and the efficiency of interactive learning. Although adaptive interaction has obvious advantages, our experiment also suggested that it is not a good idea to provide only the adaptive interaction mode. It is better to set adaptive interaction as the default mode but also to provide the possibility for a user to switch to a mode without adaptive interaction because a static view scope is also helpful to learn the 3D structure of a complex product. © 2007 Wiley Periodicals, Inc.  相似文献   

14.
基于强化学习的模型参考自适应控制   总被引:3,自引:0,他引:3  
提出了一种基于强化学习的模型参考自适应控制方法,控制器采用自适应启发评价算法,它由两部分组成:自适应评价单元及联想搜索单元.由参考模型给出系统的性能指标,利用系统反馈的强化信号在线更新控制器的参数.仿真结果表明:基于强化学习的模型参考自适应控制方法可以实现对一类复杂的非线性系统的稳定控制和鲁棒控制,该控制方法不仅响应速度快,而且具有较高的学习速率,实时性较强.  相似文献   

15.
陈根社  朱志刚 《控制与决策》1994,9(5):391-393,400
本文研究采用并行处理技术产现对象具有未建模动态时的间接式混合自适应控制算法并重新设计了周期协方差重置序列最小二乘和补偿器增益计算方法,给出便于超大规模集成电路脉动阵列实现的结构,加快了高速高性能自适应控制器的参数综合。  相似文献   

16.
This paper proposes a reinforcement fuzzy adaptive learning control network (RFALCON), constructed by integrating two fuzzy adaptive learning control networks (FALCON), each of which has a feedforward multilayer network and is developed for the realization of a fuzzy controller. One FALCON performs as a critic network (fuzzy predictor), the other as an action network (fuzzy controller). Using temporal difference prediction, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network performs a stochastic exploratory algorithm to adapt itself according to the internal reinforcement signal. An ART-based reinforcement structure/parameter-learning algorithm is developed for constructing the RFALCON dynamically. During the learning process, structure and parameter learning are performed simultaneously. RFALCON can construct a fuzzy control system through a reward/penalty signal. It has two important features; it reduces the combinatorial demands of system adaptive linearization, and it is highly autonomous.  相似文献   

17.
A reinforcement scheme for learning automata, applicable to real situations where the reinforcement received from the environment is delayed, is presented. The scheme divides the state space into regions following the boxes approach of Michie and Chambers. Each region maintains estimates of the reward characteristics of the environment and contains a local automaton that updates action probabilities whenever the system state enters it. Estimates of reward characteristics are obtained using reinforcement received during the period of eligibility. Results obtained through computer simulation of the inverted pendulum problem are compared with the adaptive critic learning developed by Barto et al. (1983).  相似文献   

18.
在不同环境中,各种强化学习算法的控制效果存在差异,针对特定环境下算法难以选择的问题,基于Gym与Gazebo搭建了一种强化学习算法仿真的小车平台,使用其对Q-Learning算法、Sarsa算法和DQN算法在两轮模型车的行走控制训练中进行测试验证,利用三种复杂度不同的地图,在训练次数相同的情况下测试算法的有效性与鲁棒性。实验结果与预期符合:Q-Learning算法在较简单的地图中可以使模型车获得较高的奖励;Sarsa算法的稳定性更佳,训练收敛速度更快、效果更优;DQN算法收敛性与鲁棒性最优。该平台提供了一种利用仿真环境模拟实物运动控制的有效方案。  相似文献   

19.
基于模型的强化学习方法利用已收集的样本对环境进行建模并使用构建的环境模型生成虚拟样本以辅助训练,因而有望提高样本效率.但由于训练样本不足等问题,构建的环境模型往往是不精确的,其生成的样本也会因携带的预测误差而对训练过程产生干扰.针对这一问题,提出了一种可学习的样本加权机制,通过对生成样本重加权以减少它们对训练过程的负面影响.该影响的量化方法为,先使用待评估样本更新价值和策略网络,再在真实样本上计算更新前后的损失值,使用损失值的变化量来衡量待评估样本对训练过程的影响.实验结果表明,按照该加权机制设计的强化学习算法在多个任务上均优于现有的基于模型和无模型的算法.  相似文献   

20.
This paper provides an overview of the reinforcement learning and optimal adaptive control literature and its application to robotics. Reinforcement learning is bridging the gap between traditional optimal control, adaptive control and bio-inspired learning techniques borrowed from animals. This work is highlighting some of the key techniques presented by well known researchers from the combined areas of reinforcement learning and optimal control theory. At the end, an example of an implementation of a novel model-free Q-learning based discrete optimal adaptive controller for a humanoid robot arm is presented. The controller uses a novel adaptive dynamic programming (ADP) reinforcement learning (RL) approach to develop an optimal policy on-line. The RL joint space tracking controller was implemented for two links (shoulder flexion and elbow flexion joints) of the arm of the humanoid Bristol-Elumotion-Robotic-Torso II (BERT II) torso. The constrained case (joint limits) of the RL scheme was tested for a single link (elbow flexion) of the BERT II arm by modifying the cost function to deal with the extra nonlinearity due to the joint constraints.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号