首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Matrix-pattern-oriented linear classifier design has been proven successful in improving classification performance. This paper proposes an efficient kernelized classifier for Matrixized Least Square Support Vector Machine (MatLSSVM). The classifier is realized by introducing a kernel-induced distance metric and a majority-voting technique into MatLSSVM, and thus is named Kernel-based Matrixized Least Square Support Vector Machine (KMatLSSVM). Firstly, the original Euclidean distance for optimizing MatLSSVM is replaced by a kernel-induced distance, then different initializations for the weight vectors are given and the correspondingly generated sub-classifiers are combined with the majority vote rule, which can expand the solution space and mitigate the local solution of the original MatLSSVM. The experiments have verified that one iteration is enough for each sub-classifier of the presented KMatLSSVM to obtain a superior performance. As a result, compared with the original linear MatLSSVM, the proposed method has significant advantages in terms of classification accuracy and computational complexity.  相似文献   

2.
Reinforcement learning (RL) is concerned with the identification of optimal controls in Markov decision processes (MDPs) where no explicit model of the transition probabilities is available. We propose a class of RL algorithms which always produces stable estimates of the value function. In detail, we use "local averaging" methods to construct an approximate dynamic programming (ADP) algorithm. Nearest-neighbor regression, grid-based approximations, and trees can all be used as the basis of this approximation. We provide a thorough theoretical analysis of this approach and we demonstrate that ADP converges to a unique approximation in continuous-state average-cost MDPs. In addition, we prove that our method is consistent in the sense that an optimal approximate strategy is identified asymptotically. With regard to a practical implementation, we suggest a reduction of ADP to standard dynamic programming in an artificial finite-state MDP.  相似文献   

3.
季挺  张华 《计算机应用》2018,38(5):1230-1238
为解决当前近似策略迭代增强学习算法逼近器不能完全自动构建的问题,提出一种基于Dyna框架的非参数化近似策略迭代(NPAPI-Dyna)增强学习算法。引入采样缓存和采样变化率设计二级随机采样过程采集样本,基于轮廓指标、采用K均值聚类算法实现trial-and-error过程生成核心状态基函数,采用以样本完全覆盖为目标的估计方法生成Q值函数逼近器,采用贪心策略设计动作选择器,利用对状态基函数的访问频次描述环境拓扑特征并构建环境估计模型;而后基于Dyna框架的模型辨识思想,将学习和规划过程有机结合,进一步加快了增强学习速度。一级倒立摆平衡控制的仿真实验中,当增强学习误差率为0.01时,算法学习成功率为100%,学习成功的最小尝试次数仅为2,平均尝试次数仅为7.73,角度平均绝对偏差为3.0538°,角度平均振荡范围为2.759°;当增强学习误差率为0.1时进行100次独立仿真运算,相比Online-LSPI和BLSPI算法平均需要150次以上尝试才能学习得到控制策略,而NPAPI-Dyna基本可在50次尝试内学习成功。实验分析表明,NPAPI-Dyna能够完全自动地构建、调整增强学习结构,学习结果精度较高,同时较快收敛。  相似文献   

4.
We consider the revenue management problem of capacity control under customer choice behavior. An exact solution of the underlying stochastic dynamic program is difficult because of the multi-dimensional state space and, thus, approximate dynamic programming (ADP) techniques are widely used. The key idea of ADP is to encode the multi-dimensional state space by a small number of basis functions, often leading to a parametric approximation of the dynamic program’s value function. In general, two classes of ADP techniques for learning value function approximations exist: mathematical programming and simulation. So far, the literature on capacity control largely focuses on the first class.In this paper, we develop a least squares approximate policy iteration (API) approach which belongs to the second class. Thereby, we suggest value function approximations that are linear in the parameters, and we estimate the parameters via linear least squares regression. Exploiting both exact and heuristic knowledge from the value function, we enforce structural constraints on the parameters to facilitate learning a good policy. We perform an extensive simulation study to investigate the performance of our approach. The results show that it is able to obtain competitive revenues compared to and often outperforms state-of-the-art capacity control methods in reasonable computational time. Depending on the scarcity of capacity and the point in time, revenue improvements of around 1% or more can be observed. Furthermore, the proposed approach contributes to simulation-based ADP, bringing forth research on numerically estimating piecewise linear value function approximations and their application in revenue management environments.  相似文献   

5.
季挺  张华 《控制与决策》2017,32(12):2153-2161
为解决当前近似策略迭代增强学习算法普遍存在计算量大、基函数不能完全自动构建的问题,提出一种基于状态聚类的非参数化近似广义策略迭代增强学习算法(NPAGPI-SC).该算法利用二级随机采样过程采集样本,利用trial-and-error过程和以样本完全覆盖为目标的估计方法计算逼近器初始参数,利用delta规则和最近邻思想在学习过程中自适应地调整逼近器,利用贪心策略选择应执行的动作.一级倒立摆平衡控制的仿真实验结果验证了所提出算法的有效性和鲁棒性.  相似文献   

6.
This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a preference-based approach to reinforcement learning is the observation that in many real-world domains, numerical feedback signals are not readily available, or are defined arbitrarily in order to satisfy the needs of conventional RL algorithms. Instead, we propose an alternative framework for reinforcement learning, in which qualitative reward signals can be directly used by the learner. The framework may be viewed as a generalization of the conventional RL framework in which only a partial order between policies is required instead of the total order induced by their respective expected long-term reward. Therefore, building on novel methods for preference learning, our general goal is to equip the RL agent with qualitative policy models, such as ranking functions that allow for sorting its available actions from most to least promising, as well as algorithms for learning such models from qualitative feedback. As a proof of concept, we realize a first simple instantiation of this framework that defines preferences based on utilities observed for trajectories. To that end, we build on an existing method for approximate policy iteration based on roll-outs. While this approach is based on the use of classification methods for generalization and policy learning, we make use of a specific type of preference learning method called label ranking. Advantages of preference-based approximate policy iteration are illustrated by means of two case studies.  相似文献   

7.
A learning algorithm for the principal component analysis (PCA) is developed based on the least-square minimization. The dual learning rate parameters are adjusted adaptively to make the proposed algorithm capable of fast convergence and high accuracy for extracting all principal components. The proposed algorithm is robust to the error accumulation existing in the sequential PCA algorithm. We show that all information needed for PCA can he completely represented by the unnormalized weight vector which is updated based only on the corresponding neuron input-output product. The updating of the normalized weight vector can be referred to as a leaky Hebb's rule. The convergence of the proposed algorithm is briefly analyzed. We also establish the relation between Oja's rule and the least squares learning rule. Finally, the simulation results are given to illustrate the effectiveness of this algorithm for PCA and tracking time-varying directions-of-arrival.  相似文献   

8.
Orthogonal least squares learning algorithm for radial basisfunction networks   总被引:146,自引:0,他引:146  
The radial basis function network offers a viable alternative to the two-layer neural network in many applications of signal processing. A common learning algorithm for radial basis function networks is based on first choosing randomly some data points as radial basis function centers and then using singular-value decomposition to solve for the weights of the network. Such a procedure has several drawbacks, and, in particular, an arbitrary selection of centers is clearly unsatisfactory. The authors propose an alternative learning procedure based on the orthogonal least-squares method. The procedure chooses radial basis function centers one by one in a rational way until an adequate network has been constructed. In the algorithm, each selected center maximizes the increment to the explained variance or energy of the desired output and does not suffer numerical ill-conditioning problems. The orthogonal least-squares learning strategy provides a simple and efficient means for fitting radial basis function networks. This is illustrated using examples taken from two different signal processing applications.  相似文献   

9.
Person re-identification means retrieving a same person in large amounts of images among disjoint camera views. An effective and robust similarity measure between a person image pair plays an important role in the re-identification tasks. In this work, we propose a new metric learning method based on least squares for person re-identification. Specifically, the similar training images pairs are used to learn a linear transformation matrix by being projected to finite discrete discriminant points using regression model; then, the metric matrix can be deduced by solving least squares problem with a closed form solution. We call it discriminant analytical least squares (DALS) metric. In addition, we develop the incremental learning scheme of DALS, which is particularly valuable in model retraining when given additional samples. Furthermore, DALS could be effectively kernelized to further improve the matching performance. Extensive experiments on the VIPeR, GRID, PRID450S and CUHK01 datasets demonstrate the effectiveness and efficiency of our approaches.  相似文献   

10.
过程系统的控制与优化要求可靠的过程数据。通过测量得到的过程数据含有随机误差和过失误差,采用数据校正技术可有效地减小过程测量数据的误差,从而提高过程控制与优化的准确性。针对传统基于最小二乘的数据校正方法:和基于准最小二乘的鲁棒数据校正方法:,分析了它们的优缺点,并提出了一种最小二乘与准最小二乘组合方法:。该方法:先采用准最小二乘估计器检测过失误差并剔除,然后再采用最小二乘估计器进行数据校正,可以综合前两种方法:各自的优点,使得数据校正结果:更加准确。将提出最小二乘与准最小二乘组合方法:应用于线性与非线性系统的数据校正中,通过校正结果:的比较说明此方法:的具有较好的过失误差检测能力和较准确的数据校正结果:。最后将此方法:应用于实际过程系统空气分离流程的数据校正中,结果:说明了此方法:的有效性。  相似文献   

11.
Bayesian policy gradient algorithms have been recently proposed for modeling the policy gradient of the performance measure in reinforcement learning as a Gaussian process. These methods were known to reduce the variance and the number of samples needed to obtain accurate gradient estimates in comparison to the conventional Monte-Carlo policy gradient algorithms. In this paper, we propose an improvement over previous Bayesian frameworks for the policy gradient. We use the Hessian matrix distribution as a learning rate schedule to improve the performance of the Bayesian policy gradient algorithm in terms of the variance and the number of samples. As in computing the policy gradient distributions, the Bayesian quadrature method is used to estimate the Hessian matrix distributions. We prove that the posterior mean of the Hessian distribution estimate is symmetric, one of the important properties of the Hessian matrix. Moreover, we prove that with an appropriate choice of kernel, the computational complexity of Hessian distribution estimate is equal to that of the policy gradient distribution estimates. Using simulations, we show encouraging experimental results comparing the proposed algorithm to the Bayesian policy gradient and the Bayesian policy natural gradient algorithms described in Ghavamzadeh and Engel [10].  相似文献   

12.
Neural Computing and Applications - As two kinds of popular data mining methods, metric learning and SVM have a interesting and valuable internal relationship. The basic idea of metric learning is...  相似文献   

13.
In classification problems, the data samples belonging to different classes have different number of samples. Sometimes, the imbalance in the number of samples of each class is very high and the interest is to classify the samples belonging to the minority class. Support vector machine (SVM) is one of the widely used techniques for classification problems which have been applied for solving this problem by using fuzzy based approach. In this paper, motivated by the work of Fan et al. (Knowledge-Based Systems 115: 87–99 2017), we have proposed two efficient variants of entropy based fuzzy SVM (EFSVM). By considering the fuzzy membership value for each sample, we have proposed an entropy based fuzzy least squares support vector machine (EFLSSVM-CIL) and entropy based fuzzy least squares twin support vector machine (EFLSTWSVM-CIL) for class imbalanced datasets where fuzzy membership values are assigned based on entropy values of samples. It solves a system of linear equations as compared to the quadratic programming problem (QPP) as in EFSVM. The least square versions of the entropy based SVM are faster than EFSVM and give higher generalization performance which shows its applicability and efficiency. Experiments are performed on various real world class imbalanced datasets and compared the results of proposed methods with new fuzzy twin support vector machine for pattern classification (NFTWSVM), entropy based fuzzy support vector machine (EFSVM), fuzzy twin support vector machine (FTWSVM) and twin support vector machine (TWSVM) which clearly illustrate the superiority of the proposed EFLSTWSVM-CIL.  相似文献   

14.
The recursive least-squares algorithm with a forgetting factor has been extensively applied and studied for the on-line parameter estimation of linear dynamic systems. This paper explores the use of genetic algorithms to improve the performance of the recursive least-squares algorithm in the parameter estimation of time-varying systems. Simulation results show that the hybrid recursive algorithm (GARLS), combining recursive least-squares with genetic algorithms, can achieve better results than the standard recursive least-squares algorithm using only a forgetting factor.  相似文献   

15.
Online prediction of mill load is useful to control system design in the grinding process. It is a challenging problem to estimate the parameters of the load inside the ball mill using measurable signals. This paper aims to develop a computational intelligence approach for predicting the mill load. Extreme learning machines (ELMs) are employed as learner models to implement the map between frequency spectral features and the mill load parameters. The inputs of the ELM model are reduced features, which are extracted and selected from the vibration frequency spectrum of the mill shell using partial least squares (PLS) algorithm. Experiments are carried out in the laboratory with comparisons on the well-known back-propagation learning algorithm, the original ELM and an optimization-based ELM (OELM). Results indicate that the reduced feature-based OELM can perform reasonably well at mill load parameter estimation, and it outperforms other learner models in terms of generalization capability.  相似文献   

16.
We provide sample complexity of the problem of learning halfspaces with monotonic noise, using the regularized least squares algorithm in the reproducing kernel Hilbert spaces (RKHS) framework.  相似文献   

17.
In this paper a new class of simplified low-cost analog artificial neural networks with on chip adaptive learning algorithms are proposed for solving linear systems of algebraic equations in real time. The proposed learning algorithms for linear least squares (LS), total least squares (TLS) and data least squares (DLS) problems can be considered as modifications and extensions of well known algorithms: the row-action projection-Kaczmarz algorithm and/or the LMS (Adaline) Widrow-Hoff algorithms. The algorithms can be applied to any problem which can be formulated as a linear regression problem. The correctness and high performance of the proposed neural networks are illustrated by extensive computer simulation results.  相似文献   

18.
近年来核学习机已经成为机器学习界的一个热点问题,并在许多领域中得到了成功应用;然而作为一种尚未成熟的新技术,核学习机仍然存在很多局限性。介绍了核方法的基本思想,从有监督和无监督学习算法两方面对基于核的学习机进行了梳理,着重指出了核学习机研究中存在的问题和值得关注的研究方向,以期对核方法研究领域有较全面的把握。  相似文献   

19.
Extended least squares based algorithm for training feedforward networks.   总被引:2,自引:0,他引:2  
An extended least squares-based algorithm for feedforward networks is proposed. The weights connecting the last hidden and output layers are first evaluated by least squares algorithm. The weights between input and hidden layers are then evaluated using the modified gradient descent algorithms. This arrangement eliminates the stalling problem experienced by the pure least squares type algorithms; however, still maintains the characteristic of fast convergence. In the investigated problems, the total number of FLOPS required for the networks to converge using the proposed training algorithm are only 0.221%-16.0% of that using the Levenberg-Marquardt algorithm. The number of floating point operations per iteration of the proposed algorithm are only 1.517-3.521 times of that of the standard backpropagation algorithm.  相似文献   

20.
高速铁路以其运输能力大、速度快、全天候等优势,取得了飞速蓬勃的发展.而恶劣天气等突发事件会导致列车延误晚点,更甚者延误会沿着路网不断传播扩散,其带来的多米诺效应将造成大面积列车无法按计划运行图运行.目前依靠人工经验的动态调度方式难以满足快速优化调整的实际要求.因此,针对突发事件造成高铁列车延误晚点的动态调度问题,设定所有列车在各站到发时间晚点总和最小为优化目标,构建高铁列车可运行情况下的混合整数非线性规划模型,提出基于策略梯度强化学习的高铁列车动态调度方法,包括交互环境建立、智能体状态及动作集合定义、策略网络结构及动作选择方法和回报函数建立,并结合具体问题对策略梯度强化学习(REINFORCE)算法进行误差放大和阈值设定两种改进.最后对算法收敛性及算法改进后的性能提升进行仿真研究,并与Q-learning算法进行比较,结果表明所提出的方法可以有效地对高铁列车进行动态调度,将突发事件带来的延误影响降至最小,从而提高列车的运行效率.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号