期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Developing reinforcement learning for adaptive co-construction of continuous high-dimensional state and action spaces

Masato Nagayoshi Hajime Murao Hisashi Tamaki 《Artificial Life and Robotics》2012,17(2):204-210

Engineers and researchers are paying more attention to reinforcement learning (RL) as a key technique for realizing adaptive and autonomous decentralized systems. In general, however, it is not easy to put RL into practical use. Our approach mainly deals with the problem of designing state and action spaces. Previously, an adaptive state space construction method which is called a ??state space filter?? and an adaptive action space construction method which is called ??switching RL??, have been proposed after the other space has been fixed. Then, we have reconstituted these two construction methods as one method by treating the former method and the latter method as a combined method for mimicking an infant??s perceptual and motor developments and we have proposed a method which is based on introducing and referring to ??entropy??. In this paper, a computational experiment was conducted using a so-called ??robot navigation problem?? with three-dimensional continuous state space and two-dimensional continuous action space which is more complicated than a so-called ??path planning problem??. As a result, the validity of the proposed method has been confirmed. 相似文献

2.

On-line PID tuning for engine idle-speed control using continuous action reinforcement learning automata

《Control Engineering Practice》2000,8(2):147-154

PID systems are widely used to apply control without the need to obtain a dynamic model. However, the performance of controllers designed using standard on-line tuning methods, such as Ziegler–Nichols, can often be significantly improved. In this paper the tuning process is automated through the use of continuous action reinforcement learning automata (CARLA). These are used to simultaneously tune the parameters of a three term controller on-line to minimise a performance objective. Here the method is demonstrated in the context of engine idle-speed control; the algorithm is first applied in simulation on a nominal engine model, and this is followed by a practical study using a Ford Zetec engine in a test cell. The CARLA provides marked performance benefits over a comparable Ziegler–Nichols tuned controller in this application. 相似文献

3.

A consideration of human immunity-based reinforcement learning with continuous states

Shu Hosokawa Kazushi Nakano Kazunori Sakurama 《Artificial Life and Robotics》2010,15(4):560-564

Many reinforcement learning methods have been studied on the assumption that a state is discretized and the environment size is predetermined. However, an operating environment may have a continuous state and its size may not be known in advance, e.g., in robot navigation and control. When applying these methods to the environment described above, we may need a large amount of time for learning or failing to learn. In this study, we improve our previous human immunity-based reinforcement learning method so that it will work in continuous state space environments. Since our method selects an action based on the distance between the present state and the memorized action, information about the environment (e.g., environment size) is not required in advance. The validity of our method is demonstrated through simulations for the swingup control of an inverted pendulum. 相似文献

4.

Efficient reinforcement learning in continuous state and action spaces with Dyna and policy approximation

Shan ZHONG Quan LIU Zongzhang ZHANG Qiming FU 《Frontiers of Computer Science》2019,13(1):106

Dyna is an effective reinforcement learning (RL) approach that combines value function evaluation with model learning. However, existing works on Dyna mostly discuss only its efficiency in RL problems with discrete action spaces. This paper proposes a novel Dyna variant, called Dyna-LSTD-PA, aiming to handle problems with continuous action spaces. Dyna-LSTD-PA stands for Dyna based on least-squares temporal difference (LSTD) and policy approximation. Dyna-LSTD-PA consists of two simultaneous, interacting processes. The learning process determines the probability distribution over action spaces using the Gaussian distribution; estimates the underlying value function, policy, and model by linear representation; and updates their parameter vectors online by LSTD(λ). The planning process updates the parameter vector of the value function again by using offline LSTD(λ). Dyna-LSTD-PA also uses the Sherman–Morrison formula to improve the efficiency of LSTD(λ), and weights the parameter vector of the value function to bring the two processes together. Theoretically, the global error bound is derived by considering approximation, estimation, and model errors. Experimentally, Dyna-LSTD-PA outperforms two representative methods in terms of convergence rate, success rate, and stability performance on four benchmark RL problems. 相似文献

5.

A deep reinforcement learning framework for continuous intraday market bidding

Boukas Ioannis Ernst Damien Théate Thibaut Bolland Adrien Huynen Alexandre Buchwald Martin Wynants Christelle Cornélusse Bertrand 《Machine Learning》2021,110(9):2335-2387

Machine Learning - The large integration of variable energy resources is expected to shift a large part of the energy exchanges closer to real-time, where more accurate forecasts are available. In... 相似文献

6.

Grounded action transformation for sim-to-real reinforcement learning

Hanna Josiah P. Desai Siddharth Karnan Haresh Warnell Garrett Stone Peter 《Machine Learning》2021,110(9):2469-2499

Machine Learning - Reinforcement learning in simulation is a promising alternative to the prohibitive sample cost of reinforcement learning in the physical world. Unfortunately, policies learned in... 相似文献

7.

Acrobot control by learning the switching of multiple controllers

Junichiro Yoshimoto Masaya Nishimura Yoichi Tokita Shin Ishii 《Artificial Life and Robotics》2005,9(2):67-71

Reinforcement learning (RL) has been applied to constructing controllers for nonlinear systems in recent years. Since RL methods do not require an exact dynamics model of the controlled object, they have a higher flexibility and potential for adaptation to uncertain or nonstationary environments than methods based on traditional control theory. If the target system has a continuous state space whose dynamic characteristics are nonlinear, however, RL methods often suffer from unstable learning processes. For this reason, it is difficult to apply RL methods to control tasks in the real world. In order to overcome the disadvantage of RL methods, we propose an RL scheme combining multiple controllers, each of which is constructed based on traditional control theory. We then apply it to a swinging-up and stabilizing task of an acrobot with a limited torque, which is a typical but difficult task in the field of nonlinear control theory. Our simulation result showed that our method was able to realize stable learning and to achieve fairly good control.This work was presented, in part, at the 9th International Symposium on Artificial Life and Robotics, Oita, Japan, January 28–30, 2004 相似文献

8.

A parallel scheduling algorithm for reinforcement learning in large state space

Quan LIU Xudong YANG Ling JING Jin LI Jiao LI 《Frontiers of Computer Science》2012,6(6):631-646

The main challenge in the area of reinforcement learning is scaling up to larger and more complex problems. Aiming at the scaling problem of reinforcement learning, a scalable reinforcement learning method, DCS-SRL, is proposed on the basis of divide-and-conquer strategy, and its convergence is proved. In this method, the learning problem in large state space or continuous state space is decomposed into multiple smaller subproblems. Given a specific learning algorithm, each subproblem can be solved independently with limited available resources. In the end, component solutions can be recombined to obtain the desired result. To address the question of prioritizing subproblems in the scheduler, a weighted priority scheduling algorithm is proposed. This scheduling algorithm ensures that computation is focused on regions of the problem space which are expected to be maximally productive. To expedite the learning process, a new parallel method, called DCS-SPRL, is derived from combining DCS-SRL with a parallel scheduling architecture. In the DCS-SPRL method, the subproblems will be distributed among processors that have the capacity to work in parallel. The experimental results show that learning based on DCS-SPRL has fast convergence speed and good scalability. 相似文献

9.

Embedding fuzzy mechanisms and knowledge in box-type reinforcement learning controllers 总被引：3，自引：0，他引：3

Shun-Feng Su Sheng-Hsiung Hsieh 《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》2002,32(5):645-653

In this paper, we report our study on embedding fuzzy mechanisms and knowledge into box-type reinforcement learning controllers. One previous approach for incorporating fuzzy mechanisms can only achieve one successful run out of nine tests compared to eight successful runs in a nonfuzzy learning control scheme. After analysis, the credit assignment problem and the weighting domination problem are identified. Furthermore, the use of fuzzy mechanisms in temporal difference seems to play a negative factor. Modifications to overcome those problems are proposed. Furthermore, several remedies are employed in that approach. The effects of those remedies applied to our learning scheme are presented and possible variations are also studied. Finally, the issue of incorporating knowledge into reinforcement learning systems is studied. From our simulations, it is concluded that the use of knowledge for the control network can provide good learning results, but the use of knowledge for the evaluation network alone seems unable to provide any significant advantages. Furthermore, we also employ Makarovic's (1988) rules as the knowledge for the initial setting of the control network. In our study, the rules are separated into four groups to avoid the ordering problem. 相似文献

10.

Logic-based switching controllers - A stepwise safe switching approach

Fotis N. Koumboulis Robert E. King 《Information Sciences》2007,177(13):2736-2755

相似文献

11.

A proposition of adaptive state space partition in reinforcement learning with Voronoi tessellation

Takayasu Fuchida Kathy Thi Aung 《Artificial Life and Robotics》2013,18(3-4):172-177

This paper presents a new adaptive segmentation of continuous state space based on vector quantization algorithm such as Linde–Buzo–Gray for high-dimensional continuous state spaces. The objective of adaptive state space partitioning is to develop the efficiency of learning reward values with an accumulation of state transition vector in a single-agent environment. We constructed our single-agent model in continuous state and discrete actions spaces using Q-learning function. Moreover, the study of the resulting state space partition reveals a Voronoi tessellation. In addition, the experimental results show that this proposed method can partition the continuous state space appropriately into Voronoi regions according to not only the number of actions, but also achieve a good performance of reward-based learning tasks compared with other approaches such as square partition lattice on discrete state space. 相似文献

12.

控制器失效的容错切换律设计

下载免费PDF全文

王霞唐予军王培光《计算机工程与应用》2012,48(3):225-227

针对所有控制器同时失效的情况,研究系统容错切换律的设计问题,发展了圆锥切换律,分析了系统稳定的充分条件,通过在不稳定控制器之间切换来保证系统稳定的控制效果。基于系统的状态和子系统的运动方向选择控制器,可保证系统具有最快的收敛速度。对于子系统同向和逆向的系统分别作了数值仿真,验证了结论的有效性。相似文献

13.

Dwell-time controllers for stochastic systems with switching Markov chain

S. Battilotti^{Author Vitae} A. De Santis Author Vitae 《Automatica》2005,41(6):923-934

We study the problem of feedback stabilization of a family of nonlinear stochastic systems with switching mechanism modeled by a Markov chain. We introduce a novel notion of stability under switching, which guarantees a given probability that the trajectories of the system hit some target set in finite time and remain thereinafter. Our main contribution is to prove that if the expectation of the time between two consecutive switching (dwell time) is “sufficiently large”, then the system is stable under switching with guaranteed probability. We illustrate this methodology by constructing measurement feedback controllers for a wide class of stochastic nonlinear systems. 相似文献

14.

A survey on deep learning and deep reinforcement learning in robotics with a tutorial on deep reinforcement learning

Morales Eduardo F. Murrieta-Cid Rafael Becerra Israel Esquivel-Basaldua Marco A. 《Intelligent Service Robotics》2021,14(5):773-805

This article is about deep learning (DL) and deep reinforcement learning (DRL) works applied to robotics. Both tools have been shown to be successful in delivering data-driven solutions for robotics tasks, as well as providing a natural way to develop an end-to-end pipeline from the robot’s sensing to its actuation, passing through the generation of a policy to perform the given task. These frameworks have been proven to be able to deal with real-world complications such as noise in sensing, imprecise actuation, variability in the scenarios where the robot is being deployed, among others. Following that vein, and given the growing interest in DL and DRL, the present work starts by providing a brief tutorial on deep reinforcement learning, where the goal is to understand the main concepts and approaches followed in the field. Later, the article describes the main, recent, and most promising approaches of DL and DRL in robotics, with sufficient technical detail to understand the core of the works and to motivate interested readers to initiate their own research in the area. Then, to provide a comparative analysis, we present several taxonomies in which the references can be classified, according to high-level features, the task that the work addresses, the type of system, and the learning techniques used in the work. We conclude by presenting promising research directions in both DL and DRL.

相似文献

15.

A new criterion using information gain for action selection strategy in reinforcement learning

Iwata K. Ikeda K. Sakai H. 《Neural Networks, IEEE Transactions on》2004,15(4):792-799

In this paper, we regard the sequence of returns as outputs from a parametric compound source. Utilizing the fact that the coding rate of the source shows the amount of information about the return, we describe /spl lscr/-learning algorithms based on the predictive coding idea for estimating an expected information gain concerning future information and give a convergence proof of the information gain. Using the information gain, we propose the ratio /spl omega/ of return loss to information gain as a new criterion to be used in probabilistic action-selection strategies. In experimental results, we found that our /spl omega/-based strategy performs well compared with the conventional Q-based strategy. 相似文献

16.

A learning search algorithm with propagational reinforcement learning

Zhang Wei 《Applied Intelligence》2021,51(11):7990-8009

When reinforcement learning with a deep neural network is applied to heuristic search, the search becomes a learning search. In a learning search system, there are two key components: (1) a deep neural network with sufficient expression ability as a heuristic function approximator that estimates the distance from any state to a goal; (2) a strategy to guide the interaction of an agent with its environment to obtain more efficient simulated experience to update the Q-value or V-value function of reinforcement learning. To date, neither component has been sufficiently discussed. This study theoretically discusses the size of a deep neural network for approximating a product function of p piecewise multivariate linear functions. The existence of such a deep neural network with O(n + p) layers and O(dn + dnp + dp) neurons has been proven, where d is the number of variables of the multivariate function being approximated, ?? is the approximation error, and n = O(p + log₂(pd/??)). For the second component, this study proposes a general propagational reinforcement-learning-based learning search method that improves the estimate h(.) according to the newly observed distance information about the goals, propagates the improvement bidirectionally in the search tree, and consequently obtains a sequence of more accurate V-values for a sequence of states. Experiments on the maze problems show that our method increases the convergence rate of reinforcement learning by a factor of 2.06 and reduces the number of learning episodes to 1/4 that of other nonpropagating methods.

相似文献

17.

Multiagent reinforcement learning applied to a chase problem in a continuous world

Hiroki Tamakoshi Shin Ishii 《Artificial Life and Robotics》2001,5(4):202-206

Reinforcement learning (RL) is one of the methods of solving problems defined in multiagent systems. In the real world, the state is continuous, and agents take continuous actions. Since conventional RL schemes are often defined to deal with discrete worlds, there are difficulties such as the representation of an RL evaluation function. In this article, we intend to extend an RL algorithm so that it is applicable to continuous world problems. This extension is done by a combination of an RL algorithm and a function approximator. We employ Q-learning as the RL algorithm, and a neural network model called the normalized Gaussian network as the function approximator. The extended RL method is applied to a chase problem in a continuous world. The experimental result shows that our RL scheme was successful. This work was presented in part at the Fifth International Symposium on Artificial Life and Robotics, Oita, Japan, January 26–28, 2000 相似文献

18.

Self-healing in transparent optical packet switching mesh networks: A reinforcement learning perspective

《Computer Networks》2014

While transparent optical networks become more and more popular as the basis of the Next Generation Internet (NGI) infrastructure, such networks raise many security issues because they lack the massive use of optoelectronic monitoring. To increase these networks’ security, they will need to be equipped with proactive and reactive mechanisms to protect themselves not only from failures and attacks but also from ordinary reliability problems. This work presents a novel self-healing framework to deal with attacks on Transparent Optical Packet Switching (TOPS) mesh networks. Contrary to traditional approaches which deal with attacks at the fiber level, our framework allows to overcome attacks at the wavelength level as well as to understand how they impact the network’s performance. The framework has two phases: the dimensioning phase (DP) dynamically determines the optical resources for a given mesh network topology whereas the learning phase (LP) generates an intelligent policy to gracefully overcome attacks in the network. DP uses heuristic reasoning to engineer the network while LP relies on a reinforcement learning algorithm that yields a self-healing policy within the network. We use a Monte Carlo simulation to analyze the performance of the aforementioned framework not only under different types of attacks but also using three realistically sized mesh topologies with up to 40 nodes. We compare our framework against shortest path (SP) and multiple path routing (MPR) showing that the self-organized routing outperforms both, leading to a reduction in packet loss of up to

88 %

with average packet loss rates of

1 \times 10^{- 3}

. Finally, some conclusions are presented as well as future research lines. 相似文献

19.

Riccati equation solution for controllers with continuous delays

W. Kilmer W. Kroll 《Systems & Control Letters》1983,3(4):203-209

We reduce the solution of a Riccati equation for infinite-time linear quadratic controllers with continuous delays and n state variables to the problem of finding scalar parameter values for an integral kernel whose form is completely specified. To simplify the exposition, the reduction is described only for a special case involving 2-dimensional state variables, but the method is entirely general. An abstract formulation of Vinter and Kwong is used throughout. 相似文献

20.

DCOB: Action space for reinforcement learning of high DoF robots

Akihiko Yamaguchi Jun Takamatsu Tsukasa Ogasawara 《Autonomous Robots》2013,34(4):327-346

Reinforcement learning (RL) for robot control is an important technology for future robots since it enables us to design a robot’s behavior using the reward function. However, RL for high degree-of-freedom robot control is still an open issue. This paper proposes a discrete action space DCOB which is generated from the basis functions (BFs) given to approximate a value function. The remarkable feature is that, by reducing the number of BFs to enable the robot to learn quickly the value function, the size of DCOB is also reduced, which improves the learning speed. In addition, a method WF-DCOB is proposed to enhance the performance, where wire-fitting is utilized to search for continuous actions around each discrete action of DCOB. We apply the proposed methods to motion learning tasks of a simulated humanoid robot and a real spider robot. The experimental results demonstrate outstanding performance. 相似文献