首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
在线核回归学习中,每当一个新的样本到来,训练器都需要计算核矩阵的逆矩阵,这个过程的计算复杂度至少为关于回合数的平方级别.提出将素描方法应用于假设的更新,给出一个基于素描方法的更高效的在线核回归算法.首先,将损失函数设定为平方损失,应用Nystr?m近似方法来近似核,并借鉴跟导方法(FTL)的思想,提出一个新的梯度下降算...  相似文献   

2.
Bayesian policy gradient algorithms have been recently proposed for modeling the policy gradient of the performance measure in reinforcement learning as a Gaussian process. These methods were known to reduce the variance and the number of samples needed to obtain accurate gradient estimates in comparison to the conventional Monte-Carlo policy gradient algorithms. In this paper, we propose an improvement over previous Bayesian frameworks for the policy gradient. We use the Hessian matrix distribution as a learning rate schedule to improve the performance of the Bayesian policy gradient algorithm in terms of the variance and the number of samples. As in computing the policy gradient distributions, the Bayesian quadrature method is used to estimate the Hessian matrix distributions. We prove that the posterior mean of the Hessian distribution estimate is symmetric, one of the important properties of the Hessian matrix. Moreover, we prove that with an appropriate choice of kernel, the computational complexity of Hessian distribution estimate is equal to that of the policy gradient distribution estimates. Using simulations, we show encouraging experimental results comparing the proposed algorithm to the Bayesian policy gradient and the Bayesian policy natural gradient algorithms described in Ghavamzadeh and Engel [10].  相似文献   

3.
Machine Learning based on the Regularized Least Squares (RLS) model requires one to solve a system of linear equations. Direct-solution methods exhibit predictable complexity and storage, but often prove impractical for large-scale problems; iterative methods attain approximate solutions at lower complexities, but heavily depend on learning parameters. The paper shows that applying the properties of Toeplitz matrixes to RLS yields two benefits: first, both the computational cost and the memory space required to train an RLS-based machine reduce dramatically; secondly, timing and storage requirements are defined analytically. The paper proves this result formally for the one-dimensional case, and gives an analytical criterion for an effective approximation in multidimensional domains. The approach validity is demonstrated in several real-world problems involving huge data sets with highly dimensional data.  相似文献   

4.
在分析和研究国内外基于向量空间模型的文本拟合度计算方法的基础上,提出了一种递归下降的政策文本拟合度计算方法。该方法基于政策文本预处理和递归下降等技术给出了政策子句拟合度计算方法和政策语篇拟合度计算方法。该方法不直接建立一般经典算法都要涉及的词频向量,而是将相同词汇进行合并。然后利用递归下降的方法,从政策子句、段落和语篇三个不同的层次分析政策语篇的一致性问题,降低拟合度计算的空间复杂度。实验结果表明,该方法与现有的一些拟合度计算方法相比,有效地提高了一致性验证的效率和准确率。  相似文献   

5.
传统的AHP方法由于判断值受人为影响因素太大,导致在衡量多因素权重时常常出现结果不一致,影响了结论的准确性和评估结果的可信任性。建立了基于“网络-主机-服务-评估因子”的层次风险评估模型,利用层次化的计算模型评估网络的风险等级。从给出的区间判断矩阵入手,将区间判断矩阵一致逼近到一般的数字判断矩阵,提出一种自动修正判断矩阵的层次分析法,得到各层元素的近似权重。通过实例验证,表明该方法能精确地、自动化量化实时风险势态状况。  相似文献   

6.
连续空间增量最近邻时域差分学习   总被引:1,自引:1,他引:0  
针对连续空间强化学习问题,提出一种基于局部加权学习的增量最近邻时域差分(TD)学习框架。通过增量方式在线选取部分已观测状态构建实例词典,采用新观测状态的范围最近邻实例逼近其值函数与策略,并结合TD算法对词典中各实例的值函数和资格迹迭代更新。就框架各主要组成部分给出多种设计方案,并对其收敛性进行理论分析。对24种方案组合进行仿真验证的实验结果表明, SNDN组合具有较好的学习性能和计算效率。  相似文献   

7.
Incremental Multi-Step Q-Learning   总被引:23,自引:0,他引:23  
Peng  Jing  Williams  Ronald J. 《Machine Learning》1996,22(1-3):283-290
This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD() return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quantization. The resulting algorithm, Q()-learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.  相似文献   

8.
基于XACML的策略评估优化技术的研究   总被引:3,自引:0,他引:3  
为了提高XACML策略评估逐层匹配的效率, 在规则优化方面提出按规则的请求权重对规则进行排序的思想; 同时在策略评估方面提出XACML合并算法的优先级及主体的规则索引表, 优先选择符合匹配条件的策略和规则来提高匹配速度。仿真实验验证了采取这些措施后, 缩短了PDP 进行评估的时间, 提高了评估效率。  相似文献   

9.
In this work, we combined the model based reinforcement learning (MBRL) and model free reinforcement learning (MFRL) to stabilize a biped robot (NAO robot) on a rotating platform, where the angular velocity of the platform is unknown for the proposed learning algorithm and treated as the external disturbance. Nonparametric Gaussian processes normally require a large number of training data points to deal with the discontinuity of the estimated model. Although some improved method such as probabilistic inference for learning control (PILCO) does not require an explicit global model as the actions are obtained by directly searching the policy space, the overfitting and lack of model complexity may still result in a large deviation between the prediction and the real system. Besides, none of these approaches consider the data error and measurement noise during the training process and test process, respectively. We propose a hierarchical Gaussian processes (GP) models, containing two layers of independent GPs, where the physically continuous probability transition model of the robot is obtained. Due to the physically continuous estimation, the algorithm overcomes the overfitting problem with a guaranteed model complexity, and the number of training data is also reduced. The policy for any given initial state is generated automatically by minimizing the expected cost according to the predefined cost function and the obtained probability distribution of the state. Furthermore, a novel Q(λ) based MFRL method scheme is employed to improve the policy. Simulation results show that the proposed RL algorithm is able to balance NAO robot on a rotating platform, and it is capable of adapting to the platform with varying angular velocity.   相似文献   

10.
面向IPsec安全策略的VPN性能评估模型   总被引:4,自引:0,他引:4       下载免费PDF全文
IPsec安全策略复杂的语义增加了IPsec VPN性能分析的难度,为了解决IPsec VPN性能分析过程中缺乏框架结构而无法保证评估有效性的问题,提出了基于IPsec安全策略的VPN性能评估模型。模型构建了可扩展的虚拟VPN环境,通过维护IPsec安全策略提高VPN性能的可控性,利用多线程并发控制实现数据的并行统计。最后通过实验验证了模型在VPN性能评估中的可靠性和可用性。  相似文献   

11.
Efficient covariance matrix update for variable metric evolution strategies   总被引:2,自引:0,他引:2  
Randomized direct search algorithms for continuous domains, such as evolution strategies, are basic tools in machine learning. They are especially needed when the gradient of an objective function (e.g., loss, energy, or reward function) cannot be computed or estimated efficiently. Application areas include supervised and reinforcement learning as well as model selection. These randomized search strategies often rely on normally distributed additive variations of candidate solutions. In order to efficiently search in non-separable and ill-conditioned landscapes the covariance matrix of the normal distribution must be adapted, amounting to a variable metric method. Consequently, covariance matrix adaptation (CMA) is considered state-of-the-art in evolution strategies. In order to sample the normal distribution, the adapted covariance matrix needs to be decomposed, requiring in general Θ(n 3) operations, where n is the search space dimension. We propose a new update mechanism which can replace a rank-one covariance matrix update and the computationally expensive decomposition of the covariance matrix. The newly developed update rule reduces the computational complexity of the rank-one covariance matrix adaptation to Θ(n 2) without resorting to outdated distributions. We derive new versions of the elitist covariance matrix adaptation evolution strategy (CMA-ES) and the multi-objective CMA-ES. These algorithms are equivalent to the original procedures except that the update step for the variable metric distribution scales better in the problem dimension. We also introduce a simplified variant of the non-elitist CMA-ES with the incremental covariance matrix update and investigate its performance. Apart from the reduced time-complexity of the distribution update, the algebraic computations involved in all new algorithms are simpler compared to the original versions. The new update rule improves the performance of the CMA-ES for large scale machine learning problems in which the objective function can be evaluated fast.  相似文献   

12.
周权  周敏  唐屹 《计算机应用研究》2007,24(12):151-154
针对IPSec协议在安全策略管理存在的问题,引入信任管理的思想,介绍了一种基于信任管理的IPSec安全策略管理方案。该方案对分布式网络中的策略可以进行统一的描述,并通过一致性证明能够实现策略的委托授权管理,这样大大提高了IPSec安全策略管理效率和IPSec的灵活性。  相似文献   

13.
In this paper we consider the problem of finding a near-optimal policy in a continuous space, discounted Markovian Decision Problem (MDP) by employing value-function-based methods when only a single trajectory of a fixed policy is available as the input. We study a policy-iteration algorithm where the iterates are obtained via empirical risk minimization with a risk function that penalizes high magnitudes of the Bellman-residual. Our main result is a finite-sample, high-probability bound on the performance of the computed policy that depends on the mixing rate of the trajectory, the capacity of the function set as measured by a novel capacity concept (the VC-crossing dimension), the approximation power of the function set and the controllability properties of the MDP. Moreover, we prove that when a linear parameterization is used the new algorithm is equivalent to Least-Squares Policy Iteration. To the best of our knowledge this is the first theoretical result for off-policy control learning over continuous state-spaces using a single trajectory. Editors: Hans Ulrich Simon, Gabor Lugosi, Avrim Blum. This paper appeared in a preliminary form at COLT2007 (Antos, et al. in LNCS/LNAI, vol. 4005, pp. 574–588, 2006).  相似文献   

14.
深度矩阵分解采用深层非线性映射,从而突破了矩阵分解中双线性关系影响推荐系统性能的瓶颈,但它没有考虑用户对未评分项目的偏好,且对于稀疏性较高的大规模数据其推荐性能不具有优势,为此提出一种融合矩阵补全与深度矩阵分解的推荐算法.首先通过矩阵补全模型将原始评分矩阵中的未知元素进行填补,然后依据补全后的矩阵,利用深度学习模型分别构建用户和项目潜在向量.最后,在MovieLens和SUSHI数据集上进行测试,实验结果表明,与深度矩阵分解相比,所提算法显著地提高了推荐系统的性能.  相似文献   

15.
增强学习中的直接策略搜索方法综述   总被引:1,自引:0,他引:1  
对增强学习中各种策略搜索算法进行了简单介绍,建立了策略梯度方法的理论框架,并且根据这个理论框架的指导,对一些现有的策略梯度算法进行了推广,讨论了近年来出现的提高策略梯度算法收敛速度的几种方法-对于非策略梯度搜索算法的最新进展进行了介绍,对进一步研究工作的方向进行了展望.  相似文献   

16.
Tesauro  Gerald 《Machine Learning》1998,32(3):241-243
The results obtained by Pollack and Blair substantially underperform my 1992 TD Learning results. This is shown by directly benchmarking the 1992 TD nets against Pubeval. A plausible hypothesis for this underperformance is that, unlike TD learning, the hillclimbing algorithm fails to capture nonlinear structure inherent in the problem, and despite the presence of hidden units, only obtains a linear approximation to the optimal policy for backgammon. Two lines of evidence supporting this hypothesis are discussed, the first coming from the structure of the Pubeval benchmark program, and the second coming from experiments replicating the Pollack and Blair results.  相似文献   

17.
The Daubechies wavelet based differentiation matrix will be constructed for periodic boundary conditions. It will be proved that this matrix displays the very important property of superconvergence. The relationship between Daubechies-based numerical methods and finite difference methods will be seen.This research was supported by AFOSR Grant 90-0093, by DARPA grant N00014-91-4016, and by NSF grant DMS-9211820, in partial fulfillment of a Ph.D. in Applied Mathematics under the guidance of Professor David Gottlieb.  相似文献   

18.
针对已有数据填充方法只考虑评分信息和传统相似性,无法捕获用户间真实相似关系的问题,提出了基于会话时序相似性的矩阵分解数据填充方法来缓解数据稀疏性、提高推荐精度。首先,分析了传统相似性的缺陷,并根据时序相似性和相异性提出了基于会话时序相似性度量,它结合了时间上下文和评分信息,能更好地捕获用户间的真实关系,从而识别近邻;接着,根据目标用户的近邻及其消费的项目抽取了具有用户和项目潜在影响因素的待填充的关键项目集合,并利用矩阵分解填充关键项目集合;然后,利用隐含狄利克雷分布(LDA)抽取用户在每个时间段内的概率主题分布,并利用时间惩罚权值建立用户动态偏好模型;最后,根据用户间概率主题分布的相关性和基于用户的协同过滤完成项目推荐。实验结果表明,与其他数据填充方法相比,基于会话时序相似性的矩阵分解数据填充方法在不同稀疏度下都能降低平均绝对误差(MAE),提高推荐性能。  相似文献   

19.
Efficient algorithm for matrix spectral factorization   总被引:3,自引:0,他引:3  
J. Je ek  V. Ku era 《Automatica》1985,21(6):663-669
An algorithm is presented for the spectral factorization of polynomial (or rational) matrices arising in optimal control and filtering theory as well as in network theory. There are two versions of the algorithm: one applicable to continuous-time problems, the other to discrete-time ones. Both versions are based on Newton's method, feature quadratic convergence and provide a significant improvement in efficiency over the existing methods.  相似文献   

20.
为解决基于属性的访问控制(ABAC)策略自动提取的低质量问题,提出一种基于访问控制日志驱动的ABAC策略自动提取与优化增强方法。首先,构建集成学习模型,将用户行为和权限分配映射为策略逻辑树,识别访问授权决策的关联性及潜在规律,初步生成策略;其次,通过单属性优化和规则二元约简两种方法深度优化策略,简化策略结构并压缩策略规模;最后,提出基于误差度量的规则冲突解决方法,以增强互斥、完备的ABAC策略,并进一步基于多目标优化的策略性能平衡算法实现不同场景需求的最优模型选择。分别在平衡数据集和稀疏数据集上进行测试和验证,实验结果表明,该方法在平衡数据集上的准确性最高可达96.69%,可将策略规模压缩至原来的19.7%。在稀疏数据集上的准确性最高可达87.74%,可将策略规模压缩至原来的23%。此方法兼顾策略的预测精度与结构的简洁性,同时适用于平衡日志和稀疏日志,确保访问控制系统在实际应用中能够实现高效、安全的访问授权管理。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号