首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Approximation and Estimation Bounds for Artificial Neural Networks   总被引:18,自引:0,他引:18  
Barron  Andrew R. 《Machine Learning》1994,14(1):115-133
  相似文献   

2.
A modification of the estimation algorithm stochastic approximation is presented. With assumptions to the statistical distribution of the training data it becomes possible, to estimate not only the mean value but also well directed deviating values of the data distribution. Thus, detailed error models can be identified by means of parameter-linear formulation of the new algorithm. By definition of suitable probabilities, these parametric error models are estimating soft error bounds. That way, an experimental identification method is provided that is able to support a robust controller design. The method was applied at an industrial robot, which is controlled by feedback linearisation. Based on a dynamic model realised by a neural network, the presented approach is utilised for the robust design of the stabilising decentral controllers.  相似文献   

3.
一种基于梯度的健壮的指纹方向场估计算法   总被引:1,自引:0,他引:1  
作为指纹的全局特征,指纹方向场在自动指纹识别系统中发挥了非常重要的作用.提出了一种基于梯度的健壮的指纹方向场估计算法,新算法首先归一化点梯度向量并计算块梯度向量及相应的块一致性;然后估计噪声区域;最后采用基于迭代的方法,重新估计所有块梯度向量并将梯度向量场转化为方向场.实验结果表明,与已有基于梯度的指纹方向场估计算法相比,新算法具有更高的准确性及抗噪性能,并能较好地估计大块噪声内的方向场,是一种较为健壮的指纹方向场估计算法.  相似文献   

4.
We provide new bounds for the worst case approximation ratio of the classic Longest Processing Time (Lpt) heuristic for related machine scheduling (Q||C max?). For different machine speeds, Lpt was first considered by Gonzalez et al. (SIAM J. Comput. 6(1):155–166, 1977). The best previously known bounds originate from more than 20 years back: Dobson (SIAM J. Comput. 13(4):705–716, 1984), and independently Friesen (SIAM J. Comput. 16(3):554–560, 1987) showed that the worst case ratio of Lpt is in the interval (1.512,1.583), and in (1.52,1.67), respectively. We tighten the upper bound to $1+\sqrt{3}/3\approx1.5773We provide new bounds for the worst case approximation ratio of the classic Longest Processing Time (Lpt) heuristic for related machine scheduling (Q||C max ). For different machine speeds, Lpt was first considered by Gonzalez et al. (SIAM J. Comput. 6(1):155–166, 1977). The best previously known bounds originate from more than 20 years back: Dobson (SIAM J. Comput. 13(4):705–716, 1984), and independently Friesen (SIAM J. Comput. 16(3):554–560, 1987) showed that the worst case ratio of Lpt is in the interval (1.512,1.583), and in (1.52,1.67), respectively. We tighten the upper bound to 1+?3/3 ? 1.57731+\sqrt{3}/3\approx1.5773 , and the lower bound to 1.54. Although this improvement might seem minor, we consider the structure of potential lower bound instances more systematically than former works. We present a scheme for a job-exchanging process, which, repeated any number of times, gradually increases the lower bound. For the new upper bound, this systematic method together with a new idea of introducing fractional jobs, facilitated a proof that is surprisingly simple, relative to the result. We present the upper-bound proof in parameterized terms, which leaves room for further improvements.  相似文献   

5.
Gradient based approaches in motion estimation (Optical-Flow) refer to those techniques that estimate the motion of an image sequence based on local derivatives in the image intensity. In order to best evaluate local changes, specific filters are applied to the image sequence. These filters are typically composed of a spatiotemporal pre-smoothing filter followed by discrete derivative ones. The design of these filters plays an important role in the estimation accuracy. This paper proposes a method for such a design. Unlike previous methods that consider these filters as optimized approximations for continuum derivatives, the proposed design procedure defines the optimality directly with respect to the motion estimation goal. One possible result of the suggested scheme is a set of image dependent filters that can be computed prior to the estimation process. An alternative interpretation is the creation of generic filters, capable of treating natural images. Simulations demonstrate the validity of the new design approach.This work was done while the authors where with Hewlett-Packard laboratories, Israel.  相似文献   

6.
Improved Accuracy in Gradient-Based Optical Flow Estimation   总被引:3,自引:0,他引:3  
Optical flow estimation by means of first derivatives can produce surprisingly accurate and dense optical flow fields. In particular, recent empirical evidence suggests that the method that is based on local optimization of first-order constancy constraints is among the most accurate and reliable methods available. Nevertheless, a systematic investigation of the effects of the various parameters for this algorithm is still lacking. This paper reports such an investigation. Performance is assessed in terms of flow-field accuracy, density, and resolution. The investigation yields new information regarding pre-filter, differentiator, least-squares neighborhood, and reliability test selection. Several changes to previously-employed parameter settings result in significant overall performance improvements, while they simultaneously reduce the computational cost of the estimator.  相似文献   

7.
The NP-hard microaggregation problem seeks a partition of data points into groups of minimum specified size k, so as to minimize the sum of the squared euclidean distances of every point to its group's centroid. One recent heuristic provides an O(k3) guarantee for this objective function and an O(k2) guarantee for a version of the problem that seeks to minimize the sum of the distances of the points to its group's centroid. This paper establishes approximation bounds for another microaggregation heuristic, providing better approximation guarantees of O(k2) for the squared distance measure and O(k) for the distance measure.  相似文献   

8.
We analyze the performance of simple algorithms for matching two planar point sets under rigid transformations so as to minimize the directed Hausdorff distance between the sets. This is a well studied problem in computational geometry. Goodrich, Mitchell, and Orletsky presented a very simple approximation algorithm for this problem, which computes transformations based on aligning pairs of points. They showed that their algorithm achieves an approximation ratio of 4. We introduce a modification to their algorithm, which is based on aligning midpoints rather than endpoints. This modification has the same simplicity and running time as theirs, and we show that it achieves a better approximation ratio of roughly 3.14. We also analyze the approximation ratio in terms of a instance-specific parameter that is based on the ratio between diameter of the pattern set to the optimum Hausdorff distance. We show that as this ratio increases (as is common in practical applications) the approximation ratio approaches 3 in the limit. We also investigate the performance of the algorithm by Goodrich et al. as a function of this ratio, and present nearly matching lower bounds on the approximation ratios of both algorithms. This work was supported by the National Science Foundation under grants CCR-0098151 and CCF-0635099.  相似文献   

9.
基于梯度的光流计算方法中梯度计算对性能的影响   总被引:7,自引:1,他引:6  
基于梯度的方法是光流计算中很重要的一类方法,而梯度的计算对整个算法的性能有着重要的影响。文章考察了几种常用梯度算子对光流计算的影响,并给出了理论分析。实验结果证明理论分析是正确的。  相似文献   

10.
Foster and Vovk proved relative loss bounds for linear regression where the total loss of the on-line algorithm minus the total loss of the best linear predictor (chosen in hindsight) grows logarithmically with the number of trials. We give similar bounds for temporal-difference learning. Learning takes place in a sequence of trials where the learner tries to predict discounted sums of future reinforcement signals. The quality of the predictions is measured with the square loss and we bound the total loss of the on-line algorithm minus the total loss of the best linear predictor for the whole sequence of trials. Again the difference of the losses is logarithmic in the number of trials. The bounds hold for an arbitrary (worst-case) sequence of examples. We also give a bound on the expected difference for the case when the instances are chosen from an unknown distribution. For linear regression a corresponding lower bound shows that this expected bound cannot be improved substantially.  相似文献   

11.
Antos  András  Lugosi  Gábor 《Machine Learning》1998,30(1):31-56
Minimax lower bounds for concept learning state, for example, that for each sample size n and learning rule g n , there exists a distribution of the observation X and a concept C to be learnt such that the expected error of g n is at least a constant times V/n, where V is the vc dimension of the concept class. However, these bounds do not tell anything about the rate of decrease of the error for a fixed distribution-concept pair.In this paper we investigate minimax lower bounds in such a (stronger) sense. We show that for several natural k-parameter concept classes, including the class of linear halfspaces, the class of balls, the class of polyhedra with a certain number of faces, and a class of neural networks, for any sequence of learning rules {g n }, there exists a fixed distribution of X and a fixed conceptC such that the expected error is larger than a constant timesk/n for infinitely many n. We also obtain such strong minimax lower bounds for the tail distribution of the probability of error, which extend the corresponding minimax lower bounds.  相似文献   

12.
This paper is a sequel to the analysis of finite-step approximations in solving controlled Markov set-chains for infinite horizon discounted reward by the author. For average-reward-controlled Markov set-chains with finite state and action spaces, we develop a value-iteration-type algorithm and analyze an error bound relative to the optimal average reward that satisfies an optimality equation from the successive approximation under an ergodicity condition. We further analyze an error bound of the rolling horizon control policy defined from a finite-step approximate value by applying the value-iteration-type algorithm.   相似文献   

13.
学习、交互及其结合是建立健壮、自治agent的关键必需能力。强化学习是agent学习的重要部分,agent强化学习包括单agent强化学习和多agent强化学习。文章对单agent强化学习与多agent强化学习进行了比较研究,从基本概念、环境框架、学习目标、学习算法等方面进行了对比分析,指出了它们的区别和联系,并讨论了它们所面临的一些开放性的问题。  相似文献   

14.
Connected dominating set (CDS) in unit disk graphs has a wide range of applications in wireless ad hoc networks. A number of approximation algorithms for constructing a small CDS in unit disk graphs have been proposed in the literature. The majority of these algorithms follow a general two-phased approach. The first phase constructs a dominating set, and the second phase selects additional nodes to interconnect the nodes in the dominating set. In the performance analyses of these two-phased algorithms, the relation between the independence number α and the connected domination number γ c of a unit-disk graph plays the key role. The best-known relation between them is a £ 3\frac23gc+1\alpha\leq3\frac{2}{3}\gamma_{c}+1. In this paper, we prove that α≤3.4306γ c +4.8185. This relation leads to tighter upper bounds on the approximation ratios of two approximation algorithms proposed in the literature.  相似文献   

15.
Risk-Sensitive Reinforcement Learning   总被引:3,自引:0,他引:3  
Mihatsch  Oliver  Neuneier  Ralph 《Machine Learning》2002,49(2-3):267-290
Most reinforcement learning algorithms optimize the expected return of a Markov Decision Problem. Practice has taught us the lesson that this criterion is not always the most suitable because many applications require robust control strategies which also take into account the variance of the return. Classical control literature provides several techniques to deal with risk-sensitive optimization goals like the so-called worst-case optimality criterion exclusively focusing on risk-avoiding policies or classical risk-sensitive control, which transforms the returns by exponential utility functions. While the first approach is typically too restrictive, the latter suffers from the absence of an obvious way to design a corresponding model-free reinforcement learning algorithm.Our risk-sensitive reinforcement learning algorithm is based on a very different philosophy. Instead of transforming the return of the process, we transform the temporal differences during learning. While our approach reflects important properties of the classical exponential utility framework, we avoid its serious drawbacks for learning. Based on an extended set of optimality equations we are able to formulate risk-sensitive versions of various well-known reinforcement learning algorithms which converge with probability one under the usual conditions.  相似文献   

16.
Džeroski  Sašo  De Raedt  Luc  Driessens  Kurt 《Machine Learning》2001,43(1-2):7-52
Relational reinforcement learning is presented, a learning technique that combines reinforcement learning with relational learning or inductive logic programming. Due to the use of a more expressive representation language to represent states, actions and Q-functions, relational reinforcement learning can be potentially applied to a new range of learning tasks. One such task that we investigate is planning in the blocks world, where it is assumed that the effects of the actions are unknown to the agent and the agent has to learn a policy. Within this simple domain we show that relational reinforcement learning solves some existing problems with reinforcement learning. In particular, relational reinforcement learning allows us to employ structural representations, to abstract from specific goals pursued and to exploit the results of previous learning phases when addressing new (more complex) situations.  相似文献   

17.
The key approaches for machine learning, particularly learning in unknown probabilistic environments, are new representations and computation mechanisms. In this paper, a novel quantum reinforcement learning (QRL) method is proposed by combining quantum theory and reinforcement learning (RL). Inspired by the state superposition principle and quantum parallelism, a framework of a value-updating algorithm is introduced. The state (action) in traditional RL is identified as the eigen state (eigen action) in QRL. The state (action) set can be represented with a quantum superposition state, and the eigen state (eigen action) can be obtained by randomly observing the simulated quantum state according to the collapse postulate of quantum measurement. The probability of the eigen action is determined by the probability amplitude, which is updated in parallel according to rewards. Some related characteristics of QRL such as convergence, optimality, and balancing between exploration and exploitation are also analyzed, which shows that this approach makes a good tradeoff between exploration and exploitation using the probability amplitude and can speedup learning through the quantum parallelism. To evaluate the performance and practicability of QRL, several simulated experiments are given, and the results demonstrate the effectiveness and superiority of the QRL algorithm for some complex problems. This paper is also an effective exploration on the application of quantum computation to artificial intelligence.   相似文献   

18.
Reinforcement Learning (RL) is learning through directexperimentation. It does not assume the existence of a teacher thatprovides examples upon which learning of a task takes place. Instead, inRL experience is the only teacher. With historical roots on the study ofbiological conditioned reflexes, RL attracts the interest of Engineersand Computer Scientists because of its theoretical relevance andpotential applications in fields as diverse as Operational Research andIntelligent Robotics.Computationally, RL is intended to operate in a learning environmentcomposed by two subjects: the learner and a dynamic process. Atsuccessive time steps, the learner makes an observation of the processstate, selects an action and applies it back to the process. Its goal isto find out an action policy that controls the behavior of the dynamicprocess, guided by signals (reinforcements) that indicate how badly orwell it has been performing the required task. These signals are usuallyassociated to a dramatic condition – e.g., accomplishment of a subtask(reward) or complete failure (punishment), and the learner tries tooptimize its behavior by using a performance measure (a function of thereceived reinforcements). The crucial point is that in order to do that,the learner must evaluate the conditions (associations between observedstates and chosen actions) that led to rewards or punishments.Starting from basic concepts, this tutorial presents the many flavorsof RL algorithms, develops the corresponding mathematical tools, assesstheir practical limitations and discusses alternatives that have beenproposed for applying RL to realistic tasks.  相似文献   

19.
Kernel-Based Reinforcement Learning   总被引:5,自引:0,他引:5  
Ormoneit  Dirk  Sen  Śaunak 《Machine Learning》2002,49(2-3):161-178
We present a kernel-based approach to reinforcement learning that overcomes the stability problems of temporal-difference learning in continuous state-spaces. First, our algorithm converges to a unique solution of an approximate Bellman's equation regardless of its initialization values. Second, the method is consistent in the sense that the resulting policy converges asymptotically to the optimal policy. Parametric value function estimates such as neural networks do not possess this property. Our kernel-based approach also allows us to show that the limiting distribution of the value function estimate is a Gaussian process. This information is useful in studying the bias-variance tradeoff in reinforcement learning. We find that all reinforcement learning approaches to estimating the value function, parametric or non-parametric, are subject to a bias. This bias is typically larger in reinforcement learning than in a comparable regression problem.  相似文献   

20.
Grids organize resource sharing, a fundamental requirement of large scientific collaborations. Seamless integration of Grids into everyday use requires responsiveness, which can be provided by elastic Clouds, in the Infrastructure as a Service (IaaS) paradigm. This paper proposes a model-free resource provisioning strategy supporting both requirements. Provisioning is modeled as a continuous action-state space, multi-objective reinforcement learning (RL) problem, under realistic hypotheses; simple utility functions capture the high level goals of users, administrators, and shareholders. The model-free approach falls under the general program of autonomic computing, where the incremental learning of the value function associated with the RL model provides the so-called feedback loop. The RL model includes an approximation of the value function through an Echo State Network. Experimental validation on a real data-set from the EGEE Grid shows that introducing a moderate level of elasticity is critical to ensure a high level of user satisfaction.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号