首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm converges. We illustrate the application of the theorem by analyzing the convergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinforcement learning.  相似文献   

2.
This paper investigates new learning algorithms (LF I and LF II) based on Lyapunov function for the training of feedforward neural networks. It is observed that such algorithms have interesting parallel with the popular backpropagation (BP) algorithm where the fixed learning rate is replaced by an adaptive learning rate computed using convergence theorem based on Lyapunov stability theory. LF II, a modified version of LF I, has been introduced with an aim to avoid local minima. This modification also helps in improving the convergence speed in some cases. Conditions for achieving global minimum for these kind of algorithms have been studied in detail. The performances of the proposed algorithms are compared with BP algorithm and extended Kalman filtering (EKF) on three bench-mark function approximation problems: XOR, 3-bit parity, and 8-3 encoder. The comparisons are made in terms of number of learning iterations and computational time required for convergence. It is found that the proposed algorithms (LF I and II) are much faster in convergence than other two algorithms to attain same accuracy. Finally, the comparison is made on a complex two-dimensional (2-D) Gabor function and effect of adaptive learning rate for faster convergence is verified. In a nutshell, the investigations made in this paper help us better understand the learning procedure of feedforward neural networks in terms of adaptive learning rate, convergence speed, and local minima.  相似文献   

3.
Incremental learning has been widely addressed in the machine learning literature to cope with learning tasks where the learning environment is ever changing or training samples become available over time. However, most research work explores incremental learning with statistical algorithms or neural networks, rather than evolutionary algorithms. The work in this paper employs genetic algorithms (GAs) as basic learning algorithms for incremental learning within one or more classifier agents in a multiagent environment. Four new approaches with different initialization schemes are proposed. They keep the old solutions and use an "integration" operation to integrate them with new elements to accommodate new attributes, while biased mutation and crossover operations are adopted to further evolve a reinforced solution. The simulation results on benchmark classification data sets show that the proposed approaches can deal with the arrival of new input attributes and integrate them with the original input space. It is also shown that the proposed approaches can be successfully used for incremental learning and improve classification rates as compared to the retraining GA. Possible applications for continuous incremental training and feature selection are also discussed.  相似文献   

4.
The frequency with which various elements of the search space of a given evolutionary algorithm are sampled is affected by the family of recombination (reproduction) operators. The original Geiringer theorem tells us the limiting frequency of occurrence of a given individual under repeated application of crossover alone for the classical genetic algorithm. Recently, Geiringer's theorem has been generalized to include the case of linear GP with homologous crossover (which can also be thought of as a variable length GA). In the current paper we prove a general theorem which tells us that under rather mild conditions on a given evolutionary algorithm, call it A, the stationary distribution of a certain Markov chain of populations in the absence of selection is unique and uniform. This theorem not only implies the already existing versions of Geiringer's theorem, but also provides a recipe of how to obtain similar facts for a rather wide class of evolutionary algorithms. The techniques which are used to prove this theorem involve a classical fact about random walks on a group and may allow us to compute and/or estimate the eigenvalues of the corresponding Markov transition matrix which is directly related to the rate of convergence towards the unique limiting distribution.  相似文献   

5.
Theorems can be considered independent of abstract domains; a theorem rather depends on a set of properties necessary to prove the theorem correct. Following this observation theorems can be formulated and proven more generally thereby improving reuse of mathematical theorems. We discuss how this view influences the design of mathematical libraries and illustrate our approach with examples written in the Mizar language. We also argue that this approach allows for both stating requirements of generic algorithms and checking whether particular instantiations of generic algorithms are semantically correct.  相似文献   

6.
Efficient Markov Network Structure Discovery Using Independence Tests   总被引:1,自引:0,他引:1  
We present two algorithms for learning the structure of a Markov network from data: GSMN* and GSIMN. Both algorithms use statistical independence tests to infer the structure by successively constraining the set of structures consistent with the results of these tests. Until very recently, algorithms for structure learning were based on maximum likelihood estimation, which has been proved to be NP-hard for Markov networks due to the difficulty of estimating the parameters of the network, needed for the computation of the data likelihood. The independence-based approach does not require the computation of the likelihood, and thus both GSMN* and GSIMN can compute the structure efficiently (as shown in our experiments). GSMN* is an adaptation of the Grow-Shrink algorithm of Margaritis and Thrun for learning the structure of Bayesian networks. GSIMN extends GSMN* by additionally exploiting Pearl's well-known properties of the conditional independence relation to infer novel independences from known ones, thus avoiding the performance of statistical tests to estimate them. To accomplish this efficiently GSIMN uses the Triangle theorem, also introduced in this work, which is a simplified version of the set of Markov axioms. Experimental comparisons on artificial and real-world data sets show GSIMN can yield significant savings with respect to GSMN*, while generating a Markov network with comparable or in some cases improved quality. We also compare GSIMN to a forward-chaining implementation, called GSIMN-FCH, that produces all possible conditional independences resulting from repeatedly applying Pearl's theorems on the known conditional independence tests. The results of this comparison show that GSIMN, by the sole use of the Triangle theorem, is nearly optimal in terms of the set of independences tests that it infers.  相似文献   

7.
Several studies have shown that natural gradient descent for on-line learning is much more efficient than standard gradient descent. In this article, we derive natural gradients in a slightly different manner and discuss implications for batch-mode learning and pruning, linking them to existing algorithms such as Levenberg-Marquardt optimization and optimal brain surgeon. The Fisher matrix plays an important role in all these algorithms. The second half of the article discusses a layered approximation of the Fisher matrix specific to multilayered perceptrons. Using this approximation rather than the exact Fisher matrix, we arrive at much faster "natural" learning algorithms and more robust pruning procedures.  相似文献   

8.
Geiringer's theorem is a statement which tells us something about the limiting frequency of occurrence of a certain individual when a classical genetic algorithm is executed in the absence of selection and mutation. Recently Poli, Stephens, Wright and Rowe extended the original theorem of Geiringer to include the case of variable-length genetic algorithms and linear genetic programming. In the current paper a rather powerful finite population version of Geiringer's theorem which has been established recently by Mitavskiy is used to derive a schema-based version of the theorem for nonlinear genetic programming with homologous crossover. The theorem also applies in the presence of “node mutation”. The corresponding formula in case when “node mutation” is present has been established.The limitation of the finite population Geiringer result is that it applies only in the absence of selection. In the current paper we also observe some general inequalities concerning the stationary distribution of the Markov chain associated to an evolutionary algorithm in which selection is the last (output) stage of a cycle. Moreover we prove an “anti-communism” theorem which applies to a wide class of EAs and says that for small enough mutation rate, the stationary distribution of the Markov chain modelling the EA cannot be uniform.  相似文献   

9.
There is no method to determine the optimal topology for multi-layer neural networks for a given problem. Usually the designer selects a topology for the network and then trains it. Since determination of the optimal topology of neural networks belongs to class of NP-hard problems, most of the existing algorithms for determination of the topology are approximate. These algorithms could be classified into four main groups: pruning algorithms, constructive algorithms, hybrid algorithms and evolutionary algorithms. These algorithms can produce near optimal solutions. Most of these algorithms use hill-climbing method and may be stuck at local minima. In this article, we first introduce a learning automaton and study its behaviour and then present an algorithm based on the proposed learning automaton, called survival algorithm, for determination of the number of hidden units of three layers neural networks. The survival algorithm uses learning automata as a global search method to increase the probability of obtaining the optimal topology. The algorithm considers the problem of optimization of the topology of neural networks as object partitioning rather than searching or parameter optimization as in existing algorithms. In survival algorithm, the training begins with a large network, and then by adding and deleting hidden units, a near optimal topology will be obtained. The algorithm has been tested on a number of problems and shown through simulations that networks generated are near optimal.  相似文献   

10.
We describe a novel framework for the design and analysis of online learning algorithms based on the notion of duality in constrained optimization. We cast a sub-family of universal online bounds as an optimization problem. Using the weak duality theorem we reduce the process of online learning to the task of incrementally increasing the dual objective function. The amount by which the dual increases serves as a new and natural notion of progress for analyzing online learning algorithms. We are thus able to tie the primal objective value and the number of prediction mistakes using the increase in the dual. Editors: Hans Ulrich Simon, Gabor Lugosi, Avrim Blum. A preliminary version of this paper appeared at the 19th Annual Conference on Learning Theory under the title “Online learning meets optimization in the dual”.  相似文献   

11.
进化算法中的模式定理及建筑块   总被引:8,自引:0,他引:8  
杨海军  李敏强 《计算机学报》2003,26(11):1550-1554
探讨了进化算法中的模式定理及建筑块理论.通过引入模式进化、模式进化能力、适度模式等概念,以标准遗传算法为例,证明了在变异算子独立的条件下,进化算法中模式的构成与多点交叉和变异的顺序无关,然后证明了具有强进化能力的模式,将以指数阶增长.该文的模式理论有别于Holland等人提出的模式理论,特别是在交叉算子上采用了多点交叉算子,给出了相应的公式;并从这一推导过程论证了建筑块假设的合理性,可以称之为建筑块理论.  相似文献   

12.
研究了一阶强双曲分布参数系统的迭代学习控制问题.首先利用Fourier变换和半群方法导出了系统状态的适应解.进而基于强双曲条件和Plancheral定理,在允许迭代过程中初值存在一定偏差条件下,给出并证明了系统在P型迭代学习控制算法下的收敛条件.最后应用实例说明了所提方法的有效性.  相似文献   

13.
Scheduling with learning effects has received a lot of research attention lately. However, the flowshop setting is relatively unexplored. On the other hand, the actual processing time of a job under an uncontrolled learning effect will drop to zero precipitously as the number of jobs increases. This is rather absurd in reality. Motivated by these observations, we consider a two-machine flowshop scheduling problem in which the actual processing time of a job in a schedule is a function of the job’s position in the schedule and a control parameter of the learning function. The objective is to minimize the total completion time. We develop a branch-and-bound and three simulated annealing algorithms to solve the problem. Computational results show that the proposed algorithms are efficient in producing near-optimal solutions.  相似文献   

14.
The No Free Lunch (NFL) Theorem imposes a theoretical restriction on optimization algorithms and their equal average performance on different problems, under some particular assumptions. Nevertheless, when brought into practice, a perceived “ranking” on the performance is usually perceived by engineers developing machine learning applications. Questions that naturally arise are what kinds of biases the real world has and in which ways can we take advantage from them. Using exploratory data analysis (EDA) on classification examples, we gather insight on some traits that set apart algorithms, datasets and evaluation measures and to what extent the NFL theorem, a theoretical result, applies under typical real-world constraints.  相似文献   

15.
Barnard E 《Neural computation》2011,23(7):1899-1909
We discuss the no-free-lunch NFL theorem for supervised learning as a logical paradox--that is, as a counterintuitive result that is correctly proven from apparently incontestable assumptions. We show that the uniform prior that is used in the proof of the theorem has a number of unpalatable consequences besides the NFL theorem, and propose a simple definition of determination (by a learning set of given size) that casts additional suspicion on the utility of this assumption for the prior. Whereas others have suggested that the assumptions of the NFL theorem are not practically realistic, we show these assumptions to be at odds with supervised learning in principle. This analysis suggests a route toward the establishment of a more realistic prior probability for use in the extended Bayesian framework.  相似文献   

16.
This paper introduces a setting for multiclass online learning with limited feedback and its application to utterance classification. In this learning setting, a parameter k limits the number of choices presented for selection by the environment (e.g. by the user in the case of an interactive spoken system) during each trial of the online learning sequence. New versions of standard additive and multiplicative weight update algorithms for online learning are presented that are more suited to the limited feedback setting, while sharing the efficiency advantages of the standard ones. The algorithms are evaluated on an utterance classification task in two domains. In this utterance classification task, no training material for the domain is provided (for training the speech recognizer or classifier) prior to the start of online learning. We present experiments on the effect of varying k and the weight update algorithms on the learning curve for online utterance classification. In these experiments, the new online learning algorithms improve classification accuracy compared with the standard ones. The methods presented are directly relevant to applications such as building call routing systems that adapt from feedback rather than being trained in batch mode.Editors: Dan Roth and Pascale FungThe work reported in this paper was carried out while the author was at AT&T Labs.  相似文献   

17.
We investigate the complexity of learning for the well-studied model in which the learning algorithm may ask membership and equivalence queries. While complexity theoretic techniques have previously been used to prove hardness results in various learning models, these techniques typically are not strong enough to use when a learning algorithm may make membership queries. We develop a general technique for proving hardness results for learning with membership and equivalence queries (and for more general query models). We apply the technique to show that, assuming , no polynomial-time membership and (proper) equivalence query algorithms exist for exactly learning read-thrice DNF formulas, unions of halfspaces over the Boolean domain, or some other related classes. Our hardness results are representation dependent, and do not preclude the existence of representation independent algorithms.?The general technique introduces the representation problem for a class F of representations (e.g., formulas), which is naturally associated with the learning problem for F. This problem is related to the structural question of how to characterize functions representable by formulas in F, and is a generalization of standard complexity problems such as Satisfiability. While in general the representation problem is in , we present a theorem demonstrating that for "reasonable" classes F, the existence of a polynomial-time membership and equivalence query algorithm for exactly learning F implies that the representation problem for F is in fact in co-NP. The theorem is applied to prove hardness results such as the ones mentioned above, by showing that the representation problem for specific classes of formulas is NP-hard. Received: December 6, 1994  相似文献   

18.
This paper provides a time-domain feedback analysis of the perceptron learning algorithm and of training schemes for dynamic networks with output feedback. It studies the robustness performance of the algorithms in the presence of uncertainties that might be due to noisy perturbations in the reference signals or due to modeling mismatch. In particular, bounds are established on the step-size parameters in order to guarantee that the resulting algorithms will behave as robust filters. The paper also establishes that an intrinsic feedback structure can be associated with the training schemes. The feedback configuration is motivated via energy arguments and is shown to consist of two major blocks: a time-variant lossless (i.e., energy preserving) feedforward path and a time-variant feedback path. The stability of the feedback structure is then analyzed via the small gain theorem, and choices for the step-size parameter in order to guarantee faster convergence are deduced by using the mean-value theorem. Simulation results are included to demonstrate the findings.  相似文献   

19.
Optimal convergence of on-line backpropagation.   总被引:1,自引:0,他引:1  
Many researchers are quite skeptical about the actual behavior of neural network learning algorithms like backpropagation. One of the major problems is with the lack of clear theoretical results on optimal convergence, particularly for pattern mode algorithms. In this paper, we prove the companion of Rosenblatt's PC (perceptron convergence) theorem for feedforward networks (1960), stating that pattern mode backpropagation converges to an optimal solution for linearly separable patterns.  相似文献   

20.
Unsupervised learning is an important ability of the brain and of many artificial neural networks. A large variety of unsupervised learning algorithms have been proposed. This paper takes a different approach in considering the architecture of the neural network rather than the learning algorithm. It is shown that a self-organizing neural network architecture using pre-synaptic lateral inhibition enables a single learning algorithm to find distributed, local, and topological representations as appropriate to the structure of the input data received. It is argued that such an architecture not only has computational advantages but is a better model of cortical self-organization.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号