共查询到20条相似文献,搜索用时 0 毫秒
1.
Recently much attention has been paid to intelligent systems which can adapt themselves to dynamic and/or unknown environments
by the use of learning methods. However, traditional learning methods have a disadvantage that learning requires enormously
long amounts of time with the degree of complexity of systems and environments to be considered. We thus propose a novel reinforcement
learning method based on adaptive immunity. Our proposed method can provide a near-optimal solution with less learning time
by self-learning using the concept of adaptive immunity. The validity of our method is demonstrated through some simulations
with Sutton’s maze problem.
This work was present in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February
2, 2008 相似文献
2.
One of the difficulties encountered in the application of reinforcement learning methods to real-world problems is their limited ability to cope with large-scale or continuous spaces. In order to solve the curse of the dimensionality problem, resulting from making continuous state or action spaces discrete, a new fuzzy Actor-Critic reinforcement learning network (FACRLN) based on a fuzzy radial basis function (FRBF) neural network is proposed. The architecture of FACRLN is realized by a four-layer FRBF neural network that is used to approximate both the action value function of the Actor and the state value function of the Critic simultaneously. The Actor and the Critic networks share the input, rule and normalized layers of the FRBF network, which can reduce the demands for storage space from the learning system and avoid repeated computations for the outputs of the rule units. Moreover, the FRBF network is able to adjust its structure and parameters in an adaptive way with a novel self-organizing approach according to the complexity of the task and the progress in learning, which ensures an economic size of the network. Experimental studies concerning a cart-pole balancing control illustrate the performance and applicability of the proposed FACRLN. 相似文献
3.
模糊Sarsa学习(FSL)是基于Sarsa学习而提出来的一种模糊强化学习算法,它是一种通过在线策略来逼近动作值函数的算法,在其每条模糊规则中,动作的选择是按照Softmax公式选择下一个动作。对于连续空间的复杂学习任务,FSL不能较好平衡探索和利用之间的关系,为此,本文提出了一种新的基于蚁群优化的模糊强化学习算法(ACO-FSL),主要工作是把蚁群优化(ACO)思想和传统的模糊强化学习算法结合起来形成一种新的算法。给出了算法的设计原理、方法和具体步骤,小车爬山问题的仿真实验表明本文提出的ACO-FSL算法在学习速度和稳定性上优于FSL算法。 相似文献
4.
This paper addresses a new method for combination of supervised learning and reinforcement learning (RL). Applying supervised learning in robot navigation encounters serious challenges such as inconsistent and noisy data, difficulty for gathering training data, and high error in training data. RL capabilities such as training only by one evaluation scalar signal, and high degree of exploration have encouraged researchers to use RL in robot navigation problem. However, RL algorithms are time consuming as well as suffer from high failure rate in the training phase. Here, we propose Supervised Fuzzy Sarsa Learning (SFSL) as a novel idea for utilizing advantages of both supervised and reinforcement learning algorithms. A zero order Takagi–Sugeno fuzzy controller with some candidate actions for each rule is considered as the main module of robot's controller. The aim of training is to find the best action for each fuzzy rule. In the first step, a human supervisor drives an E-puck robot within the environment and the training data are gathered. In the second step as a hard tuning, the training data are used for initializing the value (worth) of each candidate action in the fuzzy rules. Afterwards, the fuzzy Sarsa learning module, as a critic-only based fuzzy reinforcement learner, fine tunes the parameters of conclusion parts of the fuzzy controller online. The proposed algorithm is used for driving E-puck robot in the environment with obstacles. The experiment results show that the proposed approach decreases the learning time and the number of failures; also it improves the quality of the robot's motion in the testing environments. 相似文献
5.
Fuzzy logic systems are promising for efficient obstacle avoidance. However, it is difficult to maintain the correctness, consistency, and completeness of a fuzzy rule base constructed and tuned by a human expert. A reinforcement learning method is capable of learning the fuzzy rules automatically. However, it incurs a heavy learning phase and may result in an insufficiently learned rule base due to the curse of dimensionality. In this paper, we propose a neural fuzzy system with mixed coarse learning and fine learning phases. In the first phase, a supervised learning method is used to determine the membership functions for input and output variables simultaneously. After sufficient training, fine learning is applied which employs reinforcement learning algorithm to fine-tune the membership functions for output variables. For sufficient learning, a new learning method using a modification of Sutton and Barto's model is proposed to strengthen the exploration. Through this two-step tuning approach, the mobile robot is able to perform collision-free navigation. To deal with the difficulty of acquiring a large amount of training data with high consistency for supervised learning, we develop a virtual environment (VE) simulator, which is able to provide desktop virtual environment (DVE) and immersive virtual environment (IVE) visualization. Through operating a mobile robot in the virtual environment (DVE/IVE) by a skilled human operator, training data are readily obtained and used to train the neural fuzzy system. 相似文献
6.
This paper studies evolutionary programming and adopts reinforcement learning theory to learn individual mutation operators. A novel algorithm named RLEP (Evolutionary Programming based on Reinforcement Learning) is proposed. In this algorithm, each individual learns its optimal mutation operator based on the immediate and delayed performance of mutation operators. Mutation operator selection is mapped into a reinforcement learning problem. Reinforcement learning methods are used to learn optimal policies by maximizing the accumulated rewards. According to the calculated Q function value of each candidate mutation operator, an optimal mutation operator can be selected to maximize the learned Q function value. Four different mutation operators have been employed as the basic candidate operators in RLEP and one is selected for each individual in different generations. Our simulation shows the performance of RLEP is the same as or better than the best of the four basic mutation operators. 相似文献
7.
The democratization of robotics technology and the development of new actuators progressively bring robots closer to humans. The applications that can now be envisaged drastically contrast with the requirements of industrial robots. In standard manufacturing settings, the criterions used to assess performance are usually related to the robot’s accuracy, repeatability, speed or stiffness. Learning a control policy to actuate such robots is characterized by the search of a single solution for the task, with a representation of the policy consisting of moving the robot through a set of points to follow a trajectory. With new environments such as homes and offices populated with humans, the reproduction performance is portrayed differently. These robots are expected to acquire rich motor skills that can be generalized to new situations, while behaving safely in the vicinity of users. Skills acquisition can no longer be guided by a single form of learning, and must instead combine different approaches to continuously create, adapt and refine policies. The family of search strategies based on expectation-maximization (EM) looks particularly promising to cope with these new requirements. The exploration can be performed directly in the policy parameters space, by refining the policy together with exploration parameters represented in the form of covariances. With this formulation, RL can be extended to a multi-optima search problem in which several policy alternatives can be considered. We present here two applications exploiting EM-based exploration strategies, by considering parameterized policies based on dynamical systems, and by using Gaussian mixture models for the search of multiple policy alternatives. 相似文献
8.
Stock trading is an important decision-making problem that involves both stock selection and asset management. Though many promising results have been reported for predicting prices, selecting stocks, and managing assets using machine-learning techniques, considering all of them is challenging because of their complexity. In this paper, we present a new stock trading method that incorporates dynamic asset allocation in a reinforcement-learning framework. The proposed asset allocation strategy, called meta policy (MP), is designed to utilize the temporal information from both stock recommendations and the ratio of the stock fund over the asset. Local traders are constructed with pattern-based multiple predictors, and used to decide the purchase money per recommendation. Formulating the MP in the reinforcement learning framework is achieved by a compact design of the environment and the learning agent. Experimental results using the Korean stock market show that the proposed MP method outperforms other fixed asset-allocation strategies, and reduces the risks inherent in local traders. 相似文献
9.
Neural Computing and Applications - In this paper, an adaptive fuzzy control approach for incommensurate fractional-order multi-input multi-output (MIMO) systems with unknown nonlinearities and... 相似文献
10.
This paper proposes a three-layered parallel fuzzy inference model called reinforcement fuzzy neural network with distributed prediction scheme (RFNN-DPS), which performs reinforcement learning with a novel distributed prediction scheme. In RFNN-DPS, an additional predictor for predicting the external reinforcement signal is not necessary, and the internal reinforcement information is distributed into fuzzy rules (rule nodes). Therefore, using RFNN-DPS, only one network is needed to construct a fuzzy logic system with the abilities of parallel inference and reinforcement learning. Basically, the information for prediction in RFNN-DPS is composed of credit values stored in fuzzy rule nodes, where each node holds a credit vector to represent the reliability of the corresponding fuzzy rule. The credit values are not only accessed for predicting external reinforcement signals, but also provide a more profitable internal reinforcement signal to each fuzzy rule itself. RFNN-DPS performs a credit-based exploratory algorithm to adjust its internal status according to the internal reinforcement signal. During learning, the RFNN-DPS network is constructed by a single-step or multistep reinforcement learning algorithm based on the ART concept. According to our experimental results, RFNN-DPS shows the advantages of simple network structure, fast learning speed, and explicit representation of rule reliability. 相似文献
12.
In this study, a novel image-based visual servo (IBVS) controller for robot manipulators is investigated using an optimized extreme learning machine (ELM) algorithm and an offline reinforcement learning (RL) algorithm. First of all, the classical IBVS method and its difficulties in accurately estimating the image interaction matrix and avoiding the singularity of pseudo-inverse are introduced. Subsequently, an IBVS method based on ELM and RL is proposed to solve the problem of the singularity of the pseudo-inverse solution and tune adaptive servo gain, improving the servo efficiency and stability. Specifically, the ELM algorithm optimized by particle swarm optimization (PSO) was used to approximate the pseudo-inverse of the image interaction matrix to reduce the influence of camera calibration errors. Then, the RL algorithm was adopted to tune the adaptive visual servo gain in continuous space and improve the convergence speed. Finally, comparative simulation experiments on a 6-DOF robot manipulator were conducted to verify the effectiveness of the proposed IBVS controller. 相似文献
13.
The finite time tracking control of n-link robotic system is studied for model uncertainties and actuator saturation. Firstly, a smooth function and adaptive fuzzy neural network online learning algorithm are designed to address the actuator saturation and dynamic model uncertainties. Secondly, a new finite-time command filtered technique is proposed to filter the virtual control signal. The improved error compensation signal can reduce the impact of filtering errors, and the tracking errors of system quickly converge to a smaller compact set within finite time. Finally, adaptive fuzzy neural network finite-time command filtered control achieves finite-time stability through Lyapunov stability criterion. Simulation results verify the effectiveness of the proposed control. 相似文献
14.
针对当前各种路由算法在广域网环境下由于不能适应各种拓扑环境和负载不均衡时所引起的路由性能不高等问题,提出了一种基于梯度上升算法实现的增强学习的自适应路由算法RLAR。增强学习意味着学习一种策略,即基于环境的反馈信息构造从状态到行为的映射,其本质为通过与环境的交互试验对策略集合进行评估。将增强学习策略运用于网络路由优化中,为路由研究提供了一种全新的思路。对比了多种现有的路由算法,实验结果表明,RLAR能有效提高网络路由性能。 相似文献
15.
Fuzzy spiking neural P systems (in short, FSN P systems) are a novel class of distributed parallel computing models, which can model fuzzy production rules and apply their dynamic firing mechanism to achieve fuzzy reasoning. However, these systems lack adaptive/learning ability. Addressing this problem, a class of FSN P systems are proposed by introducing some new features, called adaptive fuzzy spiking neural P systems (in short, AFSN P systems). AFSN P systems not only can model weighted fuzzy production rules in fuzzy knowledge base but also can perform dynamically fuzzy reasoning. It is important to note that AFSN P systems have learning ability like neural networks. Based on neuron's firing mechanisms, a fuzzy reasoning algorithm and a learning algorithm are developed. Moreover, an example is included to illustrate the learning ability of AFSN P systems. 相似文献
16.
提出了一种新的基于模糊竞争学习的自调整的模糊建模方法. 基于模糊竞争学习, 模糊系统能够进行自适应模糊推理. 在被调整模糊系统基础上, 提出了一种非线性系统在线估计参数的在线辨识算法. 为了证明提出算法的有效性, 最后给出了几个例子的仿真结果. 相似文献
17.
以手写体数字识别问题为背景,提出了一种基于表格查寻学习算法的自适应模糊分类 器,并用Matlab给出了自适应模糊分类器的实现,进而对其进行了仿真。仿真结果表明,该自适应模 糊分类器在手写体数字识别的识别性能、利用语言信息、计算复杂性等方面均优于采用BP算法的三 层前馈分类器,体现了自适应模糊处理技术用于模式识别的优越性和潜力。 相似文献
18.
This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system. 相似文献
19.
Unit commitment problem (UCP) aims at optimizing generation cost for meeting a given load demand under several operational constraints. We propose to use fuzzy reinforcement learning (RL) approach for efficient and reliable solution to the unit commitment problem. In particular, we cast UCP as a multiagent fuzzy reinforcement learning task wherein individual generators act as players for optimizing the cost to meet a given load over a twenty-four-hour period. Unit commitment task has been fuzzified, and the most optimal unit commitment solution is generated by employing RL on this fuzzy multigenerator setup. Our proposed multiagent RL framework does not assume any a priori task or system knowledge, and the generators gradually learn to produce most optimal output solely based on their collective generation. We look at the UCP as a sequential decision-making task with reward/penalty to reduce the collective generation cost of generators. To the best of our knowledge, ours is a first attempt at solving UCP by employing fuzzy reinforcement learning. We test our approach on a ten-generating-unit system with several equality and inequality constraints. Simulation results and comparisons against several recent UCP solution methods prove superiority and viability of our proposed multiagent fuzzy reinforcement learning technique. 相似文献
20.
Multiagent systems (MASs) are increasingly popular for modeling distributed environments that are highly complex and dynamic, such as e‐commerce, smart buildings, and smart grids. Typically, agents assumed to be goal driven with limited abilities, which restrains them to working with other agents for accomplishing complex tasks. Trust is considered significant in MASs to make interactions effectively, especially when agents cannot assure that potential partners share the same core beliefs about the system or make accurate statements regarding their competencies and abilities. Due to the imprecise and dynamic nature of trust in MASs, we propose a hybrid trust model that uses fuzzy logic and Q‐learning for trust modeling. as an improvement over Q‐learning‐based trust evaluation. Q‐learning is used to estimate trust on the long term, fuzzy inferences are used to aggregate different trust factors, and suspension is used as a short‐term response to dynamic changes. The performance of the proposed model is evaluated using simulation. Simulation results indicate that the proposed model can help agents select trustworthy partners to interact with. It has a better performance compared to some of the popular trust models in the presence of misbehaving interaction partners. 相似文献
|