期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Adaptive immunity based reinforcement learning 总被引：2，自引：2，他引：0

Jungo Ito Kazushi Nakano Kazunori Sakurama Shu Hosokawa 《Artificial Life and Robotics》2008,13(1):188-193

Recently much attention has been paid to intelligent systems which can adapt themselves to dynamic and/or unknown environments by the use of learning methods. However, traditional learning methods have a disadvantage that learning requires enormously long amounts of time with the degree of complexity of systems and environments to be considered. We thus propose a novel reinforcement learning method based on adaptive immunity. Our proposed method can provide a near-optimal solution with less learning time by self-learning using the concept of adaptive immunity. The validity of our method is demonstrated through some simulations with Sutton’s maze problem. This work was present in part at the 13th International Symposium on Artificial Life and Robotics, Oita, Japan, January 31–February 2, 2008 相似文献

2.

A fuzzy Actor-Critic reinforcement learning network

Xue-Song Wang Yu-Hu Cheng 《Information Sciences》2007,177(18):3764-3781

One of the difficulties encountered in the application of reinforcement learning methods to real-world problems is their limited ability to cope with large-scale or continuous spaces. In order to solve the curse of the dimensionality problem, resulting from making continuous state or action spaces discrete, a new fuzzy Actor-Critic reinforcement learning network (FACRLN) based on a fuzzy radial basis function (FRBF) neural network is proposed. The architecture of FACRLN is realized by a four-layer FRBF neural network that is used to approximate both the action value function of the Actor and the state value function of the Critic simultaneously. The Actor and the Critic networks share the input, rule and normalized layers of the FRBF network, which can reduce the demands for storage space from the learning system and avoid repeated computations for the outputs of the rule units. Moreover, the FRBF network is able to adjust its structure and parameters in an adaptive way with a novel self-organizing approach according to the complexity of the task and the progress in learning, which ensures an economic size of the network. Experimental studies concerning a cart-pole balancing control illustrate the performance and applicability of the proposed FACRLN. 相似文献

3.

Adaptive evolutionary programming based on reinforcement learning

Huaxiang Zhang Jing Lu 《Information Sciences》2008,178(4):971-984

This paper studies evolutionary programming and adopts reinforcement learning theory to learn individual mutation operators. A novel algorithm named RLEP (Evolutionary Programming based on Reinforcement Learning) is proposed. In this algorithm, each individual learns its optimal mutation operator based on the immediate and delayed performance of mutation operators. Mutation operator selection is mapped into a reinforcement learning problem. Reinforcement learning methods are used to learn optimal policies by maximizing the accumulated rewards. According to the calculated Q function value of each candidate mutation operator, an optimal mutation operator can be selected to maximize the learned Q function value. Four different mutation operators have been employed as the basic candidate operators in RLEP and one is selected for each individual in different generations. Our simulation shows the performance of RLEP is the same as or better than the best of the four basic mutation operators. 相似文献

4.

A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance 总被引：4，自引：0，他引：4

Cang Ye Yung N.H.C. Danwei Wang 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2003,33(1):17-27

Fuzzy logic systems are promising for efficient obstacle avoidance. However, it is difficult to maintain the correctness, consistency, and completeness of a fuzzy rule base constructed and tuned by a human expert. A reinforcement learning method is capable of learning the fuzzy rules automatically. However, it incurs a heavy learning phase and may result in an insufficiently learned rule base due to the curse of dimensionality. In this paper, we propose a neural fuzzy system with mixed coarse learning and fine learning phases. In the first phase, a supervised learning method is used to determine the membership functions for input and output variables simultaneously. After sufficient training, fine learning is applied which employs reinforcement learning algorithm to fine-tune the membership functions for output variables. For sufficient learning, a new learning method using a modification of Sutton and Barto's model is proposed to strengthen the exploration. Through this two-step tuning approach, the mobile robot is able to perform collision-free navigation. To deal with the difficulty of acquiring a large amount of training data with high consistency for supervised learning, we develop a virtual environment (VE) simulator, which is able to provide desktop virtual environment (DVE) and immersive virtual environment (IVE) visualization. Through operating a mobile robot in the virtual environment (DVE/IVE) by a skilled human operator, training data are readily obtained and used to train the neural fuzzy system. 相似文献

5.

Adaptive stock trading with dynamic asset allocation using reinforcement learning

Jangmin O Jongwoo Lee 《Information Sciences》2006,176(15):2121-2147

Stock trading is an important decision-making problem that involves both stock selection and asset management. Though many promising results have been reported for predicting prices, selecting stocks, and managing assets using machine-learning techniques, considering all of them is challenging because of their complexity. In this paper, we present a new stock trading method that incorporates dynamic asset allocation in a reinforcement-learning framework. The proposed asset allocation strategy, called meta policy (MP), is designed to utilize the temporal information from both stock recommendations and the ratio of the stock fund over the asset. Local traders are constructed with pattern-based multiple predictors, and used to decide the purchase money per recommendation. Formulating the MP in the reinforcement learning framework is achieved by a compact design of the environment and the learning agent. Experimental results using the Korean stock market show that the proposed MP method outperforms other fixed asset-allocation strategies, and reduces the risks inherent in local traders. 相似文献

6.

Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning

Sylvain Calinon Petar Kormushev Darwin G. Caldwell 《Robotics and Autonomous Systems》2013,61(4):369-379

The democratization of robotics technology and the development of new actuators progressively bring robots closer to humans. The applications that can now be envisaged drastically contrast with the requirements of industrial robots. In standard manufacturing settings, the criterions used to assess performance are usually related to the robot’s accuracy, repeatability, speed or stiffness. Learning a control policy to actuate such robots is characterized by the search of a single solution for the task, with a representation of the policy consisting of moving the robot through a set of points to follow a trajectory. With new environments such as homes and offices populated with humans, the reproduction performance is portrayed differently. These robots are expected to acquire rich motor skills that can be generalized to new situations, while behaving safely in the vicinity of users. Skills acquisition can no longer be guided by a single form of learning, and must instead combine different approaches to continuously create, adapt and refine policies. The family of search strategies based on expectation-maximization (EM) looks particularly promising to cope with these new requirements. The exploration can be performed directly in the policy parameters space, by refining the policy together with exploration parameters represented in the form of covariances. With this formulation, RL can be extended to a multi-optima search problem in which several policy alternatives can be considered. We present here two applications exploiting EM-based exploration strategies, by considering parameterized policies based on dynamical systems, and by using Gaussian mixture models for the search of multiple policy alternatives. 相似文献

7.

Adaptive fuzzy command filtered control for incommensurate fractional-order MIMO nonlinear systems with input saturation

Lu Senkui Li Xiang Lu Ke Wang Zhengzhong Ma Yujie 《Neural computing & applications》2023,35(11):8157-8170

Neural Computing and Applications - In this paper, an adaptive fuzzy control approach for incommensurate fractional-order multi-input multi-output (MIMO) systems with unknown nonlinearities and... 相似文献

8.

A parallel fuzzy inference model with distributed prediction schemefor reinforcement learning 总被引：2，自引：0，他引：2

Yau-Hwang Kuo Jang-Pong Hsu Cheng-Wen Wang 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》1998,28(2):160-172

This paper proposes a three-layered parallel fuzzy inference model called reinforcement fuzzy neural network with distributed prediction scheme (RFNN-DPS), which performs reinforcement learning with a novel distributed prediction scheme. In RFNN-DPS, an additional predictor for predicting the external reinforcement signal is not necessary, and the internal reinforcement information is distributed into fuzzy rules (rule nodes). Therefore, using RFNN-DPS, only one network is needed to construct a fuzzy logic system with the abilities of parallel inference and reinforcement learning. Basically, the information for prediction in RFNN-DPS is composed of credit values stored in fuzzy rule nodes, where each node holds a credit vector to represent the reliability of the corresponding fuzzy rule. The credit values are not only accessed for predicting external reinforcement signals, but also provide a more profitable internal reinforcement signal to each fuzzy rule itself. RFNN-DPS performs a credit-based exploratory algorithm to adjust its internal status according to the internal reinforcement signal. During learning, the RFNN-DPS network is constructed by a single-step or multistep reinforcement learning algorithm based on the ART concept. According to our experimental results, RFNN-DPS shows the advantages of simple network structure, fast learning speed, and explicit representation of rule reliability. 相似文献

9.

Online exploratory behavior acquisition model based on reinforcement learning

Manabu Gouko Yuichi Kobayashi Chyon Hae Kim 《Applied Intelligence》2015,42(1):75-86

相似文献

10.

Adaptive fuzzy spiking neural P systems for fuzzy inference and learning

Jun Wang 《国际计算机数学杂志》2013,90(4):857-868

Fuzzy spiking neural P systems (in short, FSN P systems) are a novel class of distributed parallel computing models, which can model fuzzy production rules and apply their dynamic firing mechanism to achieve fuzzy reasoning. However, these systems lack adaptive/learning ability. Addressing this problem, a class of FSN P systems are proposed by introducing some new features, called adaptive fuzzy spiking neural P systems (in short, AFSN P systems). AFSN P systems not only can model weighted fuzzy production rules in fuzzy knowledge base but also can perform dynamically fuzzy reasoning. It is important to note that AFSN P systems have learning ability like neural networks. Based on neuron's firing mechanisms, a fuzzy reasoning algorithm and a learning algorithm are developed. Moreover, an example is included to illustrate the learning ability of AFSN P systems. 相似文献

11.

RLAR:基于增强学习的自适应路由算法

郑力明李晓冬李小勇《计算机工程与设计》2011,32(4):1190-1194

针对当前各种路由算法在广域网环境下由于不能适应各种拓扑环境和负载不均衡时所引起的路由性能不高等问题,提出了一种基于梯度上升算法实现的增强学习的自适应路由算法RLAR。增强学习意味着学习一种策略,即基于环境的反馈信息构造从状态到行为的映射,其本质为通过与环境的交互试验对策略集合进行评估。将增强学习策略运用于网络路由优化中,为路由研究提供了一种全新的思路。对比了多种现有的路由算法,实验结果表明,RLAR能有效提高网络路由性能。相似文献

12.

GA-based fuzzy reinforcement learning for control of a magneticbearing system 总被引：2，自引：0，他引：2

Chin-Teng Lin Chong-Ping Jou 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2000,30(2):276-289

This paper proposes a TD (temporal difference) and GA (genetic algorithm)-based reinforcement (TDGAR) learning method and applies it to the control of a real magnetic bearing system. The TDGAR learning scheme is a new hybrid GA, which integrates the TD prediction method and the GA to perform the reinforcement learning task. The TDGAR learning system is composed of two integrated feedforward networks. One neural network acts as a critic network to guide the learning of the other network (the action network) which determines the outputs (actions) of the TDGAR learning system. The action network can be a normal neural network or a neural fuzzy network. Using the TD prediction method, the critic network can predict the external reinforcement signal and provide a more informative internal reinforcement signal to the action network. The action network uses the GA to adapt itself according to the internal reinforcement signal. The key concept of the TDGAR learning scheme is to formulate the internal reinforcement signal as the fitness function for the GA such that the GA can evaluate the candidate solutions (chromosomes) regularly, even during periods without external feedback from the environment. This enables the GA to proceed to new generations regularly without waiting for the arrival of the external reinforcement signal. This can usually accelerate the GA learning since a reinforcement signal may only be available at a time long after a sequence of actions has occurred in the reinforcement learning problem. The proposed TDGAR learning system has been used to control an active magnetic bearing (AMB) system in practice. A systematic design procedure is developed to achieve successful integration of all the subsystems including magnetic suspension, mechanical structure, and controller training. The results show that the TDGAR learning scheme can successfully find a neural controller or a neural fuzzy controller for a self-designed magnetic bearing system. 相似文献

13.

Embedding fuzzy mechanisms and knowledge in box-type reinforcement learning controllers 总被引：3，自引：0，他引：3

Shun-Feng Su Sheng-Hsiung Hsieh 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2002,32(5):645-653

In this paper, we report our study on embedding fuzzy mechanisms and knowledge into box-type reinforcement learning controllers. One previous approach for incorporating fuzzy mechanisms can only achieve one successful run out of nine tests compared to eight successful runs in a nonfuzzy learning control scheme. After analysis, the credit assignment problem and the weighting domination problem are identified. Furthermore, the use of fuzzy mechanisms in temporal difference seems to play a negative factor. Modifications to overcome those problems are proposed. Furthermore, several remedies are employed in that approach. The effects of those remedies applied to our learning scheme are presented and possible variations are also studied. Finally, the issue of incorporating knowledge into reinforcement learning systems is studied. From our simulations, it is concluded that the use of knowledge for the control network can provide good learning results, but the use of knowledge for the evaluation network alone seems unable to provide any significant advantages. Furthermore, we also employ Makarovic's (1988) rules as the knowledge for the initial setting of the control network. In our study, the rules are separated into four groups to avoid the ordering problem. 相似文献

14.

A fuzzy reinforcement learning approach to thermal unit commitment problem

Navin Nandan Kumar Sharma Rajneesh 《Neural computing & applications》2019,31(3):737-750

Unit commitment problem (UCP) aims at optimizing generation cost for meeting a given load demand under several operational constraints. We propose to use fuzzy reinforcement learning (RL) approach for efficient and reliable solution to the unit commitment problem. In particular, we cast UCP as a multiagent fuzzy reinforcement learning task wherein individual generators act as players for optimizing the cost to meet a given load over a twenty-four-hour period. Unit commitment task has been fuzzified, and the most optimal unit commitment solution is generated by employing RL on this fuzzy multigenerator setup. Our proposed multiagent RL framework does not assume any a priori task or system knowledge, and the generators gradually learn to produce most optimal output solely based on their collective generation. We look at the UCP as a sequential decision-making task with reward/penalty to reduce the collective generation cost of generators. To the best of our knowledge, ours is a first attempt at solving UCP by employing fuzzy reinforcement learning. We test our approach on a ten-generating-unit system with several equality and inequality constraints. Simulation results and comparisons against several recent UCP solution methods prove superiority and viability of our proposed multiagent fuzzy reinforcement learning technique.

相似文献

15.

Adaptive co-construction of state and action spaces in reinforcement learning

Masato Nagayoshi Hajime Murao Hisashi Tamaki 《Artificial Life and Robotics》2011,16(1):48-52

Reinforcement learning (RL) attracts much attention as a technique for realizing computational intelligence such as adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL to practical use. The difficulty includes the problem of designing suitable state and action spaces for an agent. Previously, we proposed an adaptive state space construction method which is called a “state space filter,” and an adaptive action space construction method which is called “switching RL,” after the other space has been fixed. In this article, we reconstitute these two construction methods as one method by treating the former and the latter as a combined method for mimicking an infant’s perceptual development. In this method, perceptual differentiation progresses as an infant become older and more experienced, and the infant’s motor development, in which gross motor skills develop before fine motor skills, also progresses. The proposed method is based on introducing and referring to “entropy.” In addition, a computational experiment was conducted using a so-called “path planning problem” with continuous state and action spaces. As a result, the validity of the proposed method has been confirmed. 相似文献

16.

Adaptive learning of fuzzy BSB and GBSB neural models

Ye. V. Bodyanskiy N. A. Teslenko 《Cybernetics and Systems Analysis》2006,42(6):786-794

This article deals with a special class of neural autoassociative memory, namely, with fuzzy BSB and GBSB models and their learning algorithms. These models defined on a hypercube solve the problem of fuzzy clusterization of a data array owing to the fact that the vertices of the hypercube act as point attractors. A membership function is introduced that allows one to classify data that belong to overlapping clusters. __________ Translated from Kibernetika i Sistemnyi Analiz, No. 6, pp. 18–28, November–December 2006. 相似文献

17.

A reinforcement learning approach based on the fuzzy min-max neural network

Aristidis Likas Kostas Blekas 《Neural Processing Letters》1996,4(3):167-172

The fuzzy min-max neural network constitutes a neural architecture that is based on hyperbox fuzzy sets and can be incrementally trained by appropriately adjusting the number of hyperboxes and their corresponding volumes. Two versions have been proposed: for supervised and unsupervised learning. In this paper a modified approach is presented that is appropriate for reinforcement learning problems with discrete action space and is applied to the difficult task of autonomous vehicle navigation when no a priori knowledge of the enivronment is available. Experimental results indicate that the proposed reinforcement learning network exhibits superior learning behavior compared to conventional reinforcement schemes. 相似文献

18.

Structured prediction with reinforcement learning

Francis Maes Ludovic Denoyer Patrick Gallinari 《Machine Learning》2009,77(2-3):271-301

We formalize the problem of Structured Prediction as a Reinforcement Learning task. We first define a Structured Prediction Markov Decision Process (SP-MDP), an instantiation of Markov Decision Processes for Structured Prediction and show that learning an optimal policy for this SP-MDP is equivalent to minimizing the empirical loss. This link between the supervised learning formulation of structured prediction and reinforcement learning (RL) allows us to use approximate RL methods for learning the policy. The proposed model makes weak assumptions both on the nature of the Structured Prediction problem and on the supervision process. It does not make any assumption on the decomposition of loss functions, on data encoding, or on the availability of optimal policies for training. It then allows us to cope with a large range of structured prediction problems. Besides, it scales well and can be used for solving both complex and large-scale real-world problems. We describe two series of experiments. The first one provides an analysis of RL on classical sequence prediction benchmarks and compares our approach with state-of-the-art SP algorithms. The second one introduces a tree transformation problem where most previous models fail. This is a complex instance of the general labeled tree mapping problem. We show that RL exploration is effective and leads to successful results on this challenging task. This is a clear confirmation that RL could be used for large size and complex structured prediction problems. 相似文献

19.

A fuzzy reinforcement learning approach to power control in wireless transmitters. 总被引：1，自引：0，他引：1

David Vengerov Nicholas Bambos Hamid R Berenji 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2005,35(4):768-778

We address the issue of power-controlled shared channel access in wireless networks supporting packetized data traffic. We formulate this problem using the dynamic programming framework and present a new distributed fuzzy reinforcement learning algorithm (ACFRL-2) capable of adequately solving a class of problems to which the power control problem belongs. Our experimental results show that the algorithm converges almost deterministically to a neighborhood of optimal parameter values, as opposed to a very noisy stochastic convergence of earlier algorithms. The main tradeoff facing a transmitter is to balance its current power level with future backlog in the presence of stochastically changing interference. Simulation experiments demonstrate that the ACFRL-2 algorithm achieves significant performance gains over the standard power control approach used in CDMA2000. Such a large improvement is explained by the fact that ACFRL-2 allows transmitters to learn implicit coordination policies, which back off under stressful channel conditions as opposed to engaging in escalating "power wars." 相似文献

20.

A learning search algorithm with propagational reinforcement learning

Zhang Wei 《Applied Intelligence》2021,51(11):7990-8009

When reinforcement learning with a deep neural network is applied to heuristic search, the search becomes a learning search. In a learning search system, there are two key components: (1) a deep neural network with sufficient expression ability as a heuristic function approximator that estimates the distance from any state to a goal; (2) a strategy to guide the interaction of an agent with its environment to obtain more efficient simulated experience to update the Q-value or V-value function of reinforcement learning. To date, neither component has been sufficiently discussed. This study theoretically discusses the size of a deep neural network for approximating a product function of p piecewise multivariate linear functions. The existence of such a deep neural network with O(n + p) layers and O(dn + dnp + dp) neurons has been proven, where d is the number of variables of the multivariate function being approximated, ?? is the approximation error, and n = O(p + log₂(pd/??)). For the second component, this study proposes a general propagational reinforcement-learning-based learning search method that improves the estimate h(.) according to the newly observed distance information about the goals, propagates the improvement bidirectionally in the search tree, and consequently obtains a sequence of more accurate V-values for a sequence of states. Experiments on the maze problems show that our method increases the convergence rate of reinforcement learning by a factor of 2.06 and reduces the number of learning episodes to 1/4 that of other nonpropagating methods.

相似文献