首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Approximation and Estimation Bounds for Artificial Neural Networks   总被引:18,自引:0,他引:18  
Barron  Andrew R. 《Machine Learning》1994,14(1):115-133
  相似文献   

2.
为解决在线近似策略迭代增强学习计算复杂度高、收敛速度慢的问题,引入CMAC结构作为值函数逼近器,提出一种基于CMAC的非参数化近似策略迭代增强学习(NPAPI-CMAC)算法。算法通过构建样本采集过程确定CMAC泛化参数,利用初始划分和拓展划分确定CMAC状态划分方式,利用量化编码结构构建样本数集合定义增强学习率,实现了增强学习结构和参数的完全自动构建。此外,该算法利用delta规则和最近邻思想在学习过程中自适应调整增强学习参数,利用贪心策略对动作投票器得到的结果进行选择。一级倒立摆平衡控制的仿真实验结果验证了算法的有效性、鲁棒性和快速收敛能力。  相似文献   

3.
A modification of the estimation algorithm stochastic approximation is presented. With assumptions to the statistical distribution of the training data it becomes possible, to estimate not only the mean value but also well directed deviating values of the data distribution. Thus, detailed error models can be identified by means of parameter-linear formulation of the new algorithm. By definition of suitable probabilities, these parametric error models are estimating soft error bounds. That way, an experimental identification method is provided that is able to support a robust controller design. The method was applied at an industrial robot, which is controlled by feedback linearisation. Based on a dynamic model realised by a neural network, the presented approach is utilised for the robust design of the stabilising decentral controllers.  相似文献   

4.
We provide new bounds for the worst case approximation ratio of the classic Longest Processing Time (Lpt) heuristic for related machine scheduling (Q||C max?). For different machine speeds, Lpt was first considered by Gonzalez et al. (SIAM J. Comput. 6(1):155–166, 1977). The best previously known bounds originate from more than 20 years back: Dobson (SIAM J. Comput. 13(4):705–716, 1984), and independently Friesen (SIAM J. Comput. 16(3):554–560, 1987) showed that the worst case ratio of Lpt is in the interval (1.512,1.583), and in (1.52,1.67), respectively. We tighten the upper bound to $1+\sqrt{3}/3\approx1.5773We provide new bounds for the worst case approximation ratio of the classic Longest Processing Time (Lpt) heuristic for related machine scheduling (Q||C max ). For different machine speeds, Lpt was first considered by Gonzalez et al. (SIAM J. Comput. 6(1):155–166, 1977). The best previously known bounds originate from more than 20 years back: Dobson (SIAM J. Comput. 13(4):705–716, 1984), and independently Friesen (SIAM J. Comput. 16(3):554–560, 1987) showed that the worst case ratio of Lpt is in the interval (1.512,1.583), and in (1.52,1.67), respectively. We tighten the upper bound to 1+?3/3 ? 1.57731+\sqrt{3}/3\approx1.5773 , and the lower bound to 1.54. Although this improvement might seem minor, we consider the structure of potential lower bound instances more systematically than former works. We present a scheme for a job-exchanging process, which, repeated any number of times, gradually increases the lower bound. For the new upper bound, this systematic method together with a new idea of introducing fractional jobs, facilitated a proof that is surprisingly simple, relative to the result. We present the upper-bound proof in parameterized terms, which leaves room for further improvements.  相似文献   

5.
Gradient based approaches in motion estimation (Optical-Flow) refer to those techniques that estimate the motion of an image sequence based on local derivatives in the image intensity. In order to best evaluate local changes, specific filters are applied to the image sequence. These filters are typically composed of a spatiotemporal pre-smoothing filter followed by discrete derivative ones. The design of these filters plays an important role in the estimation accuracy. This paper proposes a method for such a design. Unlike previous methods that consider these filters as optimized approximations for continuum derivatives, the proposed design procedure defines the optimality directly with respect to the motion estimation goal. One possible result of the suggested scheme is a set of image dependent filters that can be computed prior to the estimation process. An alternative interpretation is the creation of generic filters, capable of treating natural images. Simulations demonstrate the validity of the new design approach.This work was done while the authors where with Hewlett-Packard laboratories, Israel.  相似文献   

6.
一种基于梯度的健壮的指纹方向场估计算法   总被引:1,自引:0,他引:1  
作为指纹的全局特征,指纹方向场在自动指纹识别系统中发挥了非常重要的作用.提出了一种基于梯度的健壮的指纹方向场估计算法,新算法首先归一化点梯度向量并计算块梯度向量及相应的块一致性;然后估计噪声区域;最后采用基于迭代的方法,重新估计所有块梯度向量并将梯度向量场转化为方向场.实验结果表明,与已有基于梯度的指纹方向场估计算法相比,新算法具有更高的准确性及抗噪性能,并能较好地估计大块噪声内的方向场,是一种较为健壮的指纹方向场估计算法.  相似文献   

7.
The NP-hard microaggregation problem seeks a partition of data points into groups of minimum specified size k, so as to minimize the sum of the squared euclidean distances of every point to its group's centroid. One recent heuristic provides an O(k3) guarantee for this objective function and an O(k2) guarantee for a version of the problem that seeks to minimize the sum of the distances of the points to its group's centroid. This paper establishes approximation bounds for another microaggregation heuristic, providing better approximation guarantees of O(k2) for the squared distance measure and O(k) for the distance measure.  相似文献   

8.
Improved Accuracy in Gradient-Based Optical Flow Estimation   总被引:3,自引:0,他引:3  
Optical flow estimation by means of first derivatives can produce surprisingly accurate and dense optical flow fields. In particular, recent empirical evidence suggests that the method that is based on local optimization of first-order constancy constraints is among the most accurate and reliable methods available. Nevertheless, a systematic investigation of the effects of the various parameters for this algorithm is still lacking. This paper reports such an investigation. Performance is assessed in terms of flow-field accuracy, density, and resolution. The investigation yields new information regarding pre-filter, differentiator, least-squares neighborhood, and reliability test selection. Several changes to previously-employed parameter settings result in significant overall performance improvements, while they simultaneously reduce the computational cost of the estimator.  相似文献   

9.
针对高速移动状态下的飞行自组网路由协议链路维护困难问题,提出一种基于强化学习的自适应链路状态路由优化算法QLA-OLSR。借鉴强化学习中的Q学习算法,通过感知动态环境下节点邻居数量变化和业务负载程度,构建价值函数求解最优HELLO时隙,提高节点链路发现与维护能力。利用优化Kanerva编码算法的状态相似度机制,降低QLA-OLSR算法复杂度并增强稳定性。仿真结果表明,QLA-OLSR算法能有效提升网络吞吐量,减少路由维护开销,且具有自学习特性,适用于高动态环境下的飞行自组网。  相似文献   

10.
We analyze the performance of simple algorithms for matching two planar point sets under rigid transformations so as to minimize the directed Hausdorff distance between the sets. This is a well studied problem in computational geometry. Goodrich, Mitchell, and Orletsky presented a very simple approximation algorithm for this problem, which computes transformations based on aligning pairs of points. They showed that their algorithm achieves an approximation ratio of 4. We introduce a modification to their algorithm, which is based on aligning midpoints rather than endpoints. This modification has the same simplicity and running time as theirs, and we show that it achieves a better approximation ratio of roughly 3.14. We also analyze the approximation ratio in terms of a instance-specific parameter that is based on the ratio between diameter of the pattern set to the optimum Hausdorff distance. We show that as this ratio increases (as is common in practical applications) the approximation ratio approaches 3 in the limit. We also investigate the performance of the algorithm by Goodrich et al. as a function of this ratio, and present nearly matching lower bounds on the approximation ratios of both algorithms. This work was supported by the National Science Foundation under grants CCR-0098151 and CCF-0635099.  相似文献   

11.
This paper is a sequel to the analysis of finite-step approximations in solving controlled Markov set-chains for infinite horizon discounted reward by the author. For average-reward-controlled Markov set-chains with finite state and action spaces, we develop a value-iteration-type algorithm and analyze an error bound relative to the optimal average reward that satisfies an optimality equation from the successive approximation under an ergodicity condition. We further analyze an error bound of the rolling horizon control policy defined from a finite-step approximate value by applying the value-iteration-type algorithm.   相似文献   

12.
基于梯度的光流计算方法中梯度计算对性能的影响   总被引:6,自引:1,他引:6  
基于梯度的方法是光流计算中很重要的一类方法,而梯度的计算对整个算法的性能有着重要的影响。文章考察了几种常用梯度算子对光流计算的影响,并给出了理论分析。实验结果证明理论分析是正确的。  相似文献   

13.
Foster and Vovk proved relative loss bounds for linear regression where the total loss of the on-line algorithm minus the total loss of the best linear predictor (chosen in hindsight) grows logarithmically with the number of trials. We give similar bounds for temporal-difference learning. Learning takes place in a sequence of trials where the learner tries to predict discounted sums of future reinforcement signals. The quality of the predictions is measured with the square loss and we bound the total loss of the on-line algorithm minus the total loss of the best linear predictor for the whole sequence of trials. Again the difference of the losses is logarithmic in the number of trials. The bounds hold for an arbitrary (worst-case) sequence of examples. We also give a bound on the expected difference for the case when the instances are chosen from an unknown distribution. For linear regression a corresponding lower bound shows that this expected bound cannot be improved substantially.  相似文献   

14.
Antos  András  Lugosi  Gábor 《Machine Learning》1998,30(1):31-56
Minimax lower bounds for concept learning state, for example, that for each sample size n and learning rule g n , there exists a distribution of the observation X and a concept C to be learnt such that the expected error of g n is at least a constant times V/n, where V is the vc dimension of the concept class. However, these bounds do not tell anything about the rate of decrease of the error for a fixed distribution-concept pair.In this paper we investigate minimax lower bounds in such a (stronger) sense. We show that for several natural k-parameter concept classes, including the class of linear halfspaces, the class of balls, the class of polyhedra with a certain number of faces, and a class of neural networks, for any sequence of learning rules {g n }, there exists a fixed distribution of X and a fixed conceptC such that the expected error is larger than a constant timesk/n for infinitely many n. We also obtain such strong minimax lower bounds for the tail distribution of the probability of error, which extend the corresponding minimax lower bounds.  相似文献   

15.
学习、交互及其结合是建立健壮、自治agent的关键必需能力。强化学习是agent学习的重要部分,agent强化学习包括单agent强化学习和多agent强化学习。文章对单agent强化学习与多agent强化学习进行了比较研究,从基本概念、环境框架、学习目标、学习算法等方面进行了对比分析,指出了它们的区别和联系,并讨论了它们所面临的一些开放性的问题。  相似文献   

16.
Connected dominating set (CDS) in unit disk graphs has a wide range of applications in wireless ad hoc networks. A number of approximation algorithms for constructing a small CDS in unit disk graphs have been proposed in the literature. The majority of these algorithms follow a general two-phased approach. The first phase constructs a dominating set, and the second phase selects additional nodes to interconnect the nodes in the dominating set. In the performance analyses of these two-phased algorithms, the relation between the independence number α and the connected domination number γ c of a unit-disk graph plays the key role. The best-known relation between them is a £ 3\frac23gc+1\alpha\leq3\frac{2}{3}\gamma_{c}+1. In this paper, we prove that α≤3.4306γ c +4.8185. This relation leads to tighter upper bounds on the approximation ratios of two approximation algorithms proposed in the literature.  相似文献   

17.
In this note we refine strategies of the so called dual-weighted-residual (DWR) approach to a posteriori error control for FE-schemes. We derive rigorous error bounds, especially we control the approximation process of the (unknown) dual solution entering the proposed estimate.  相似文献   

18.
Developable surfaces have many desired properties in the manufacturing process. Since most existing CAD systems utilize tensor-product parametric surfaces including B-splines as design primitives, there is a great demand in industry to convert a general free-form parametric surface within a prescribed global error bound into developable patches. In this paper, we propose a practical and efficient solution to approximate a rectangular parametric surface with a small set of C 0 -joint developable strips. The key contribution of the proposed algorithm is that, several optimization problems are elegantly solved in a sequence that offers a controllable global error bound on the developable surface approximation. Experimental results are presented to demonstrate the effectiveness and stability of the proposed algorithm.  相似文献   

19.
The rapid growth in the number of devices and their connectivity has enlarged the attack surface and made cyber systems more vulnerable. As attackers become increasingly sophisticated and resourceful, mere reliance on traditional cyber protection, such as intrusion detection, firewalls, and encryption, is insufficient to secure the cyber systems. Cyber resilience provides a new security paradigm that complements inadequate protection with resilience mechanisms. A Cyber-Resilient Mechanism (CRM) adapts to the known or zero-day threats and uncertainties in real-time and strategically responds to them to maintain the critical functions of the cyber systems in the event of successful attacks. Feedback architectures play a pivotal role in enabling the online sensing, reasoning, and actuation process of the CRM. Reinforcement Learning (RL) is an important gathering of algorithms that epitomize the feedback architectures for cyber resilience. It allows the CRM to provide dynamic and sequential responses to attacks with limited or without prior knowledge of the environment and the attacker. In this work, we review the literature on RL for cyber resilience and discuss the cyber-resilient defenses against three major types of vulnerabilities, i.e., posture-related, information-related, and human-related vulnerabilities. We introduce moving target defense, defensive cyber deception, and assistive human security technologies as three application domains of CRMs to elaborate on their designs. The RL algorithms also have vulnerabilities themselves. We explain the major vulnerabilities of RL and present develop several attack models where the attacker target the information exchanged between the environment and the agent: the rewards, the state observations, and the action commands. We show that the attacker can trick the RL agent into learning a nefarious policy with minimum attacking effort. The paper introduces several defense methods to secure the RL-enabled systems from these attacks. However, there is still a lack of works that focuses on the defensive mechanisms for RL-enabled systems. Last but not least, we discuss the future challenges of RL for cyber security and resilience and emerging applications of RL-based CRMs.  相似文献   

20.
The key approaches for machine learning, particularly learning in unknown probabilistic environments, are new representations and computation mechanisms. In this paper, a novel quantum reinforcement learning (QRL) method is proposed by combining quantum theory and reinforcement learning (RL). Inspired by the state superposition principle and quantum parallelism, a framework of a value-updating algorithm is introduced. The state (action) in traditional RL is identified as the eigen state (eigen action) in QRL. The state (action) set can be represented with a quantum superposition state, and the eigen state (eigen action) can be obtained by randomly observing the simulated quantum state according to the collapse postulate of quantum measurement. The probability of the eigen action is determined by the probability amplitude, which is updated in parallel according to rewards. Some related characteristics of QRL such as convergence, optimality, and balancing between exploration and exploitation are also analyzed, which shows that this approach makes a good tradeoff between exploration and exploitation using the probability amplitude and can speedup learning through the quantum parallelism. To evaluate the performance and practicability of QRL, several simulated experiments are given, and the results demonstrate the effectiveness and superiority of the QRL algorithm for some complex problems. This paper is also an effective exploration on the application of quantum computation to artificial intelligence.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号