首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The success of any reinforcement learning (RL) application is in large part due to the design of an appropriate reinforcement function. A methodological framework to support the design of reinforcement functions has not been defined yet, and this critical and often underestimated activity is left to the ability of the RL application designer. We propose an approach to support reinforcement function design in RL applications concerning learning behaviors for autonomous agents. We define some dimensions along which we can describe reinforcement functions; we consider the distribution of reinforcement values, their coherence and their matching with the designer's perspective. We give hints to define measures that objectively describe the reinforcement function; we discuss the trade-offs that should be considered to improve learning and we introduce the dimensions along which this improvement can be expected. The approach we are presenting is general enough to be adopted in a large number of RL projects. We show how to apply it in the design of learning classifier systems (LCS) applications. We consider a simple, but quite complete case study in evolutionary robotics, and we discuss reinforcement function design issues in this sample context.  相似文献   

2.
Scientific computation has unavoidable approximations built into its very fabric. One important source of error that is difficult to detect and control is round-off error propagation which originates from the use of finite precision arithmetic. We propose that there is a need to perform regular numerical ‘health checks’ on scientific codes in order to detect the cancerous effect of round-off error propagation. This is particularly important in scientific codes that are built on legacy software. We advocate the use of the CADNA library as a suitable numerical screening tool. We present a case study to illustrate the practical use of CADNA in scientific codes that are of interest to the Computer Physics Communications readership. In doing so we hope to stimulate a greater awareness of round-off error propagation and present a practical means by which it can be analyzed and managed.  相似文献   

3.
We present a parallel implementation of the Bose Hubbard model, using imaginary time propagation to find the lowest quantum eigenstate and real time propagation for simulation of quantum dynamics. Scaling issues, performance of sparse matrix-vector multiplication, and a parallel algorithm for determining nonzero matrix elements are described. Implementation of imaginary time propagation yields an O(N) linear convergence on a single processor and slightly better than ideal performance on up to 160 processors for a particular problem size. The determination of the nonzero matrix elements is intractable using sequential non-optimized techniques for large problem sizes. Thus, we discuss a parallel algorithm that takes advantage of the intrinsic structural characteristics of the Fock-space matrix representation of the Bose Hubbard Hamiltonian and utilizes a parallel implementation of a Fock state look up table to make this task solvable within reasonable timeframes. Our parallel algorithm demonstrates near ideal scaling on thousand of processors. We include results for a matrix 22.6 million square, with 202 million nonzero elements, utilizing 2048 processors.  相似文献   

4.
We propose an efficient method for editing bidirectional texture functions (BTFs) based on edit propagation scheme. In our approach, users specify sparse edits on a certain slice of BTF. An edit propagation scheme is then applied to propagate edits to the whole BTF data. The consistency of the BTF data is maintained by propagating similar edits to points with similar underlying geometry/reflectance. For this purpose, we propose to use view independent features including normals and reflectance features reconstructed from each view to guide the propagation process. We also propose an adaptive sampling scheme for speeding up the propagation process. Since our method needn't any accurate geometry and reflectance information, it allows users to edit complex BTFs with interactive feedback.  相似文献   

5.
基于Markov博弈模型的网络安全态势感知方法   总被引:8,自引:0,他引:8  
为了分析威胁传播对网络系统的影响,准确、全面地评估系统的安全性,并给出相应的加固方案,提出一种基于Markov博弈分析的网络安全态势感知方法通过对多传感器检测到的安全数据进行融合,得到资产、威胁和脆弱性的规范化数据;对每个威胁,分析其传播规律,建立相应的威胁传播网络:通过对威胁、管理员和普通用户的行为进行博弈分析,建立...  相似文献   

6.
This article proposes a reinforcement learning procedure for mobile robot navigation using a latent-like learning schema. Latent learning refers to learning that occurs in the absence of reinforcement signals and is not apparent until reinforcement is introduced. This concept considers that part of a task can be learned before the agent receives any indication of how to perform such a task. In the proposed topological reinforcement learning agent (TRLA), a topological map is used to perform the latent learning. The propagation of the reinforcement signal throughout the topological neighborhoods of the map permits the estimation of a value function which takes in average less trials and with less updatings per trial than six of the main temporal difference reinforcement learning algorithms: Q-learning, SARSA, Q(λ)-learning, SARSA(λ), Dyna-Q and fast Q(λ)-learning. The RL agents were tested in four different environments designed to consider a growing level of complexity in accomplishing navigation tasks. The tests suggested that the TRLA chooses shorter trajectories (in the number of steps) and/or requires less value function updatings in each trial than the other six reinforcement learning (RL) algorithms.  相似文献   

7.
We present a novel and effective method for modeling a developable surface to simulate paper bending in interactive and animation applications. The method exploits the representation of a developable surface as the envelope of rectifying planes of a curve in 3D, which is therefore necessarily a geodesic on the surface. We manipulate the geodesic to provide intuitive shape control for modeling paper bending. Our method ensures a natural continuous isometric deformation from a piece of bent paper to its flat state without any stretching. Test examples show that the new scheme is fast, accurate, and easy to use, thus providing an effective approach to interactive paper bending. We also show how to handle non-convex piecewise smooth developable surfaces.  相似文献   

8.
Klein  Netzer  Lu 《Algorithmica》2008,35(4):321-345
Abstract. We address the problem of detecting race conditions in programs that use semaphores for synchronization. Netzer and Miller showed that it is NP-complete to detect race conditions in programs that use many semaphores. We show in this paper that it remains NP-complete even if only two semaphores are used in the parallel programs. For the tractable case, i.e., using only one semaphore, we give two algorithms for detecting race conditions from the trace of executing a parallel program on p processors, where n semaphore operations are executed. The first algorithm determines in O(n) time whether a race condition exists between any two given operations. The second algorithm runs in O( np log n) time and outputs a compact representation from which one can determine in O(1) time whether a race condition exists between any two given operations. The second algorithm is near-optimal in that the running time is only O( log n) times the time required simply to write down the output.  相似文献   

9.
Distributed-air-jet MEMS-based systems have been proposed to manipulate small parts with high velocities and without any friction problems. The control of such distributed systems is very challenging and usual approaches for contact arrayed system don’t produce satisfactory results. In this paper, we investigate reinforcement learning control approaches in order to position and convey an object. Reinforcement learning is a popular approach to find controllers that are tailored exactly to the system without any prior model. We show how to apply reinforcement learning in a decentralized perspective and in order to address the global-local trade-off. The simulation results demonstrate that the reinforcement learning method is a promising way to design control laws for such distributed systems.  相似文献   

10.
An implementation of the Constrained Interpolation Profile (CIP) algorithm to magnetohydrodynamic (MHD) simulations is presented. First we transform the original momentum and magnetic induction equations to unfamiliar forms by introducing Elsässer variables [W.M. Elsässer, The hydromagnetic equations, Phys. Rev. (1950)]. In this formulation, while the compressional and pressure gradient terms remain as non-advective terms, the advective and magnetic stress terms are expressed in the form of an advection equation, which enables us to use the CIP algorithm. We have examined some 1D test problems using the code based on this formula. Linear Alfvén wave propagation tests reveal that the developed code is capable of solving any Alfvén wave propagation with only small numerical diffusion and phase errors up to k?h=2.5 (where ?h is the grid spacing). A shock tube test shows good agreement with a previous result with less numerical oscillation at the shock front and the contact discontinuity which are captured within a few grid points. Extension of the 1D code to the multi-dimensional case is straightforward. We have calculated the 3D nonlinear evolution of the Kelvin-Helmholtz instability (KHI) and compared the result with our previous study. We find that our new MHD code is capable of following the 3D turbulence excited by the KHI while retaining the solenoidal property of the magnetic field.  相似文献   

11.
We describe the development of a new toolkit for data analysis. The analysis package is based on Bayes' Theorem, and is realized with the use of Markov Chain Monte Carlo. This gives access to the full posterior probability distribution. Parameter estimation, limit setting and uncertainty propagation are implemented in a straightforward manner.  相似文献   

12.
This research treats a bargaining process as a Markov decision process, in which a bargaining agent’s goal is to learn the optimal policy that maximizes the total rewards it receives over the process. Reinforcement learning is an effective method for agents to learn how to determine actions for any time steps in a Markov decision process. Temporal-difference (TD) learning is a fundamental method for solving the reinforcement learning problem, and it can tackle the temporal credit assignment problem. This research designs agents that apply TD-based reinforcement learning to deal with online bilateral bargaining with incomplete information. This research further evaluates the agents’ bargaining performance in terms of the average payoff and settlement rate. The results show that agents using TD-based reinforcement learning are able to achieve good bargaining performance. This learning approach is sufficiently robust and convenient, hence it is suitable for online automated bargaining in electronic commerce.  相似文献   

13.
It was confirmed that a real mobile robot with a simple visual sensor could learn appropriate motions to reach a target object by direct-vision-based reinforcement learning (RL). In direct-vision-based RL, raw visual sensory signals are put directly into a layered neural network, and then the neural network is trained using back propagation, with the training signal being generated by reinforcement learning. Because of the time-delay in transmitting the visual sensory signals, the actor outputs are trained by the critic output at two time-steps ahead. It was shown that a robot with a simple monochrome visual sensor can learn to reach a target object from scratch without any advance knowledge of this task by direct-vision-based RL. This work was presented in part at the 7th International Symposium on Artificial Life and Robotics, Oita, Japan, January 16–18, 2002  相似文献   

14.
Online bin stretching is a semi-online variant of bin packing in which the algorithm has to use the same number of bins as an optimal packing, but is allowed to slightly overpack the bins. The goal is to minimize the amount of overpacking, i.e., the maximum size packed into any bin. We give an algorithm for online bin stretching with a stretching factor of \(11/8 = 1.375\) for three bins. Additionally, we present a lower bound of \(45/33 = 1.\overline{36}\) for online bin stretching on three bins and a lower bound of 19/14 for four and five bins that were discovered using a computer search.  相似文献   

15.
One of the main challenges in Grid computing is efficient allocation of resources (CPU – hours, network bandwidth, etc.) to the tasks submitted by users. Due to the lack of centralized control and the dynamic/stochastic nature of resource availability, any successful allocation mechanism should be highly distributed and robust to the changes in the Grid environment. Moreover, it is desirable to have an allocation mechanism that does not rely on the availability of coherent global information. In this paper we examine a simple algorithm for distributed resource allocation in a simplified Grid-like environment that meets the above requirements. Our system consists of a large number of heterogenous reinforcement learning agents that share common resources for their computational needs. There is no explicit communication or interaction between the agents: the only information that agents receive is the expected response time of a job it submitted to a particular resource, which serves as a reinforcement signal for the agent. The results of our experiments suggest that even simple reinforcement learning can indeed be used to achieve load balanced resource allocation in large scale heterogenous system.  相似文献   

16.
Trekking in the Alps Without Freezing or Getting Tired   总被引:3,自引:0,他引:3  
  相似文献   

17.
Phillips  Stein  Torng  Wein 《Algorithmica》2008,32(2):163-200
Abstract. We consider two fundamental problems in dynamic scheduling: scheduling to meet deadlines in a preemptive multiprocessor setting, and scheduling to provide good response time in a number of scheduling environments. When viewed from the perspective of traditional worst-case analysis, no good on-line algorithms exist for these problems, and for some variants no good off-line algorithms exist unless P = NP . We study these problems using a relaxed notion of competitive analysis, introduced by Kalyanasundaram and Pruhs, in which the on-line algorithm is allowed more resources than the optimal off-line algorithm to which it is compared. Using this approach, we establish that several well-known on-line algorithms, that have poor performance from an absolute worst-case perspective, are optimal for the problems in question when allowed moderately more resources. For optimization of average flow time, these are the first results of any sort, for any NP -hard version of the problem, that indicate that it might be possible to design good approximation algorithms.  相似文献   

18.
A new model for intrusion and its propagation through various attack schemes in networks is considered. The model is characterized by the number of network nodes n, and two parameters f and g. Parameter f represents the probability of failure of an attack to a node and is a gross measure of the level of security of the attacked system and perhaps of the intruder’s skills; g represents a limit on the number of attacks that the intrusion software can ever try, due to the danger of being discovered, when it issues them from a particular (broken) network node. The success of the attack scheme is characterized by two factors: the number of nodes captured (the spread factor) and the number of virtual links that a defense mechanism has to trace from any node where the attack is active to the origin of the intrusion (the traceability factor). The goal of an intruder is to maximize both factors. In our model we present four different ways (attack schemes) by which an intruder can organize his attacks. Using analytic and experimental methods, we first show that for any 0 < f < 1, there exists a constant g for which any of our attack schemes can achieve a Θ(n) spread and traceability factor with high probability, given sufficient propagation time. We also show for three of our attack schemes that the spread and the traceability factors are, with high probability, linearly related during the whole duration of the attack propagation. This implies that it will not be easy for a detection mechanism to trace the origin of the intrusion, since it will have to trace a number of links proportional to the nodes captured.  相似文献   

19.
Operations research and management science are often confronted with sequential decision making problems with large state spaces. Standard methods that are used for solving such complex problems are associated with some difficulties. As we discuss in this article, these methods are plagued by the so-called curse of dimensionality and the curse of modelling. In this article, we discuss reinforcement learning, a machine learning technique for solving sequential decision making problems with large state spaces. We describe how reinforcement learning can be combined with a function approximation method to avoid both the curse of dimensionality and the curse of modelling. To illustrate the usefulness of this approach, we apply it to a problem with a huge state space—learning to play the game of Othello. We describe experiments in which reinforcement learning agents learn to play the game of Othello without the use of any knowledge provided by human experts. It turns out that the reinforcement learning agents learn to play the game of Othello better than players that use basic strategies.  相似文献   

20.
In the single rent-to-buy decision problem, without a priori knowledge of the amount of time a resource will be used we need to decide when to buy the resource, given that we can rent the resource for $1 per unit time or buy it once and for all for $c . In this paper we study algorithms that make a sequence of single rent-to-buy decisions, using the assumption that the resource use times are independently drawn from an unknown probability distribution. Our study of this rent-to-buy problem is motivated by important systems applications, specifically, problems arising from deciding when to spindown disks to conserve energy in mobile computers [4], [13], [15], thread blocking decisions during lock acquisition in multiprocessor applications [7], and virtual circuit holding times in IP-over-ATM networks [11], [19]. We develop a provably optimal and computationally efficient algorithm for the rent-to-buy problem. Our algorithm uses time and space, and its expected cost for the t th resource use converges to optimal as , for any bounded probability distribution on the resource use times. Alternatively, using O(1) time and space, the algorithm almost converges to optimal. We describe the experimental results for the application of our algorithm to one of the motivating systems problems: the question of when to spindown a disk to save power in a mobile computer. Simulations using disk access traces obtained from an HP workstation environment suggest that our algorithm yields significantly improved power/ response time performance over the nonadaptive 2-competitive algorithm which is optimal in the worst-case competitive analysis model. Received October 22, 1996; revised September 25, 1997.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号