共查询到20条相似文献,搜索用时 15 毫秒
1.
Reinforcement learning (RL) is concerned with the identification of optimal controls in Markov decision processes (MDPs) where no explicit model of the transition probabilities is available. We propose a class of RL algorithms which always produces stable estimates of the value function. In detail, we use "local averaging" methods to construct an approximate dynamic programming (ADP) algorithm. Nearest-neighbor regression, grid-based approximations, and trees can all be used as the basis of this approximation. We provide a thorough theoretical analysis of this approach and we demonstrate that ADP converges to a unique approximation in continuous-state average-cost MDPs. In addition, we prove that our method is consistent in the sense that an optimal approximate strategy is identified asymptotically. With regard to a practical implementation, we suggest a reduction of ADP to standard dynamic programming in an artificial finite-state MDP. 相似文献
2.
Matrix-pattern-oriented linear classifier design has been proven successful in improving classification performance. This paper proposes an efficient kernelized classifier for Matrixized Least Square Support Vector Machine (MatLSSVM). The classifier is realized by introducing a kernel-induced distance metric and a majority-voting technique into MatLSSVM, and thus is named Kernel-based Matrixized Least Square Support Vector Machine (KMatLSSVM). Firstly, the original Euclidean distance for optimizing MatLSSVM is replaced by a kernel-induced distance, then different initializations for the weight vectors are given and the correspondingly generated sub-classifiers are combined with the majority vote rule, which can expand the solution space and mitigate the local solution of the original MatLSSVM. The experiments have verified that one iteration is enough for each sub-classifier of the presented KMatLSSVM to obtain a superior performance. As a result, compared with the original linear MatLSSVM, the proposed method has significant advantages in terms of classification accuracy and computational complexity. 相似文献
3.
Johannes Fürnkranz Eyke Hüllermeier Weiwei Cheng Sang-Hyeun Park 《Machine Learning》2012,89(1-2):123-156
This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a preference-based approach to reinforcement learning is the observation that in many real-world domains, numerical feedback signals are not readily available, or are defined arbitrarily in order to satisfy the needs of conventional RL algorithms. Instead, we propose an alternative framework for reinforcement learning, in which qualitative reward signals can be directly used by the learner. The framework may be viewed as a generalization of the conventional RL framework in which only a partial order between policies is required instead of the total order induced by their respective expected long-term reward. Therefore, building on novel methods for preference learning, our general goal is to equip the RL agent with qualitative policy models, such as ranking functions that allow for sorting its available actions from most to least promising, as well as algorithms for learning such models from qualitative feedback. As a proof of concept, we realize a first simple instantiation of this framework that defines preferences based on utilities observed for trajectories. To that end, we build on an existing method for approximate policy iteration based on roll-outs. While this approach is based on the use of classification methods for generalization and policy learning, we make use of a specific type of preference learning method called label ranking. Advantages of preference-based approximate policy iteration are illustrated by means of two case studies. 相似文献
4.
Shan Ouyang Zheng Bao Gui-Sheng Liao 《Neural Networks, IEEE Transactions on》2000,11(1):215-221
A learning algorithm for the principal component analysis (PCA) is developed based on the least-square minimization. The dual learning rate parameters are adjusted adaptively to make the proposed algorithm capable of fast convergence and high accuracy for extracting all principal components. The proposed algorithm is robust to the error accumulation existing in the sequential PCA algorithm. We show that all information needed for PCA can he completely represented by the unnormalized weight vector which is updated based only on the corresponding neuron input-output product. The updating of the normalized weight vector can be referred to as a leaky Hebb's rule. The convergence of the proposed algorithm is briefly analyzed. We also establish the relation between Oja's rule and the least squares learning rule. Finally, the simulation results are given to illustrate the effectiveness of this algorithm for PCA and tracking time-varying directions-of-arrival. 相似文献
5.
Orthogonal least squares learning algorithm for radial basisfunction networks 总被引:146,自引:0,他引:146
The radial basis function network offers a viable alternative to the two-layer neural network in many applications of signal processing. A common learning algorithm for radial basis function networks is based on first choosing randomly some data points as radial basis function centers and then using singular-value decomposition to solve for the weights of the network. Such a procedure has several drawbacks, and, in particular, an arbitrary selection of centers is clearly unsatisfactory. The authors propose an alternative learning procedure based on the orthogonal least-squares method. The procedure chooses radial basis function centers one by one in a rational way until an adequate network has been constructed. In the algorithm, each selected center maximizes the increment to the explained variance or energy of the desired output and does not suffer numerical ill-conditioning problems. The orthogonal least-squares learning strategy provides a simple and efficient means for fitting radial basis function networks. This is illustrated using examples taken from two different signal processing applications. 相似文献
6.
Zhao Yang Xiao Hu Fei Dai Jianxin Pang Tao Jiang Dapeng Tao 《Machine Vision and Applications》2018,29(6):1019-1031
Person re-identification means retrieving a same person in large amounts of images among disjoint camera views. An effective and robust similarity measure between a person image pair plays an important role in the re-identification tasks. In this work, we propose a new metric learning method based on least squares for person re-identification. Specifically, the similar training images pairs are used to learn a linear transformation matrix by being projected to finite discrete discriminant points using regression model; then, the metric matrix can be deduced by solving least squares problem with a closed form solution. We call it discriminant analytical least squares (DALS) metric. In addition, we develop the incremental learning scheme of DALS, which is particularly valuable in model retraining when given additional samples. Furthermore, DALS could be effectively kernelized to further improve the matching performance. Extensive experiments on the VIPeR, GRID, PRID450S and CUHK01 datasets demonstrate the effectiveness and efficiency of our approaches. 相似文献
7.
过程系统的控制与优化要求可靠的过程数据。通过测量得到的过程数据含有随机误差和过失误差,采用数据校正技术可有效地减小过程测量数据的误差,从而提高过程控制与优化的准确性。针对传统基于最小二乘的数据校正方法:和基于准最小二乘的鲁棒数据校正方法:,分析了它们的优缺点,并提出了一种最小二乘与准最小二乘组合方法:。该方法:先采用准最小二乘估计器检测过失误差并剔除,然后再采用最小二乘估计器进行数据校正,可以综合前两种方法:各自的优点,使得数据校正结果:更加准确。将提出最小二乘与准最小二乘组合方法:应用于线性与非线性系统的数据校正中,通过校正结果:的比较说明此方法:的具有较好的过失误差检测能力和较准确的数据校正结果:。最后将此方法:应用于实际过程系统空气分离流程的数据校正中,结果:说明了此方法:的有效性。 相似文献
8.
Bayesian policy gradient algorithms have been recently proposed for modeling the policy gradient of the performance measure in reinforcement learning as a Gaussian process. These methods were known to reduce the variance and the number of samples needed to obtain accurate gradient estimates in comparison to the conventional Monte-Carlo policy gradient algorithms. In this paper, we propose an improvement over previous Bayesian frameworks for the policy gradient. We use the Hessian matrix distribution as a learning rate schedule to improve the performance of the Bayesian policy gradient algorithm in terms of the variance and the number of samples. As in computing the policy gradient distributions, the Bayesian quadrature method is used to estimate the Hessian matrix distributions. We prove that the posterior mean of the Hessian distribution estimate is symmetric, one of the important properties of the Hessian matrix. Moreover, we prove that with an appropriate choice of kernel, the computational complexity of Hessian distribution estimate is equal to that of the policy gradient distribution estimates. Using simulations, we show encouraging experimental results comparing the proposed algorithm to the Bayesian policy gradient and the Bayesian policy natural gradient algorithms described in Ghavamzadeh and Engel [10]. 相似文献
9.
Neural Computing and Applications - As two kinds of popular data mining methods, metric learning and SVM have a interesting and valuable internal relationship. The basic idea of metric learning is... 相似文献
10.
In classification problems, the data samples belonging to different classes have different number of samples. Sometimes, the imbalance in the number of samples of each class is very high and the interest is to classify the samples belonging to the minority class. Support vector machine (SVM) is one of the widely used techniques for classification problems which have been applied for solving this problem by using fuzzy based approach. In this paper, motivated by the work of Fan et al. (Knowledge-Based Systems 115: 87–99 2017), we have proposed two efficient variants of entropy based fuzzy SVM (EFSVM). By considering the fuzzy membership value for each sample, we have proposed an entropy based fuzzy least squares support vector machine (EFLSSVM-CIL) and entropy based fuzzy least squares twin support vector machine (EFLSTWSVM-CIL) for class imbalanced datasets where fuzzy membership values are assigned based on entropy values of samples. It solves a system of linear equations as compared to the quadratic programming problem (QPP) as in EFSVM. The least square versions of the entropy based SVM are faster than EFSVM and give higher generalization performance which shows its applicability and efficiency. Experiments are performed on various real world class imbalanced datasets and compared the results of proposed methods with new fuzzy twin support vector machine for pattern classification (NFTWSVM), entropy based fuzzy support vector machine (EFSVM), fuzzy twin support vector machine (FTWSVM) and twin support vector machine (TWSVM) which clearly illustrate the superiority of the proposed EFLSTWSVM-CIL. 相似文献
11.
Jian Tang Dianhui Wang Tianyou Chai 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》2012,16(9):1585-1594
Online prediction of mill load is useful to control system design in the grinding process. It is a challenging problem to estimate the parameters of the load inside the ball mill using measurable signals. This paper aims to develop a computational intelligence approach for predicting the mill load. Extreme learning machines (ELMs) are employed as learner models to implement the map between frequency spectral features and the mill load parameters. The inputs of the ELM model are reduced features, which are extracted and selected from the vibration frequency spectrum of the mill shell using partial least squares (PLS) algorithm. Experiments are carried out in the laboratory with comparisons on the well-known back-propagation learning algorithm, the original ELM and an optimization-based ELM (OELM). Results indicate that the reduced feature-based OELM can perform reasonably well at mill load parameter estimation, and it outperforms other learner models in terms of generalization capability. 相似文献
12.
Ha Quang Minh 《Information Processing Letters》2011,111(8):395-401
We provide sample complexity of the problem of learning halfspaces with monotonic noise, using the regularized least squares algorithm in the reproducing kernel Hilbert spaces (RKHS) framework. 相似文献
13.
K. Warwick Y. -H. Kang R. J. Mitchell 《Soft Computing - A Fusion of Foundations, Methodologies and Applications》1999,3(4):200-205
The recursive least-squares algorithm with a forgetting factor has been extensively applied and studied for the on-line parameter
estimation of linear dynamic systems. This paper explores the use of genetic algorithms to improve the performance of the
recursive least-squares algorithm in the parameter estimation of time-varying systems. Simulation results show that the hybrid
recursive algorithm (GARLS), combining recursive least-squares with genetic algorithms, can achieve better results than the
standard recursive least-squares algorithm using only a forgetting factor. 相似文献
14.
Simplified neural networks for solving linear least squares andtotal least squares problems in real time 总被引:1,自引:0,他引:1
In this paper a new class of simplified low-cost analog artificial neural networks with on chip adaptive learning algorithms are proposed for solving linear systems of algebraic equations in real time. The proposed learning algorithms for linear least squares (LS), total least squares (TLS) and data least squares (DLS) problems can be considered as modifications and extensions of well known algorithms: the row-action projection-Kaczmarz algorithm and/or the LMS (Adaline) Widrow-Hoff algorithms. The algorithms can be applied to any problem which can be formulated as a linear regression problem. The correctness and high performance of the proposed neural networks are illustrated by extensive computer simulation results. 相似文献
15.
An extended least squares-based algorithm for feedforward networks is proposed. The weights connecting the last hidden and output layers are first evaluated by least squares algorithm. The weights between input and hidden layers are then evaluated using the modified gradient descent algorithms. This arrangement eliminates the stalling problem experienced by the pure least squares type algorithms; however, still maintains the characteristic of fast convergence. In the investigated problems, the total number of FLOPS required for the networks to converge using the proposed training algorithm are only 0.221%-16.0% of that using the Levenberg-Marquardt algorithm. The number of floating point operations per iteration of the proposed algorithm are only 1.517-3.521 times of that of the standard backpropagation algorithm. 相似文献
16.
17.
For improving the classification performance on the cheap, it is necessary to exploit both labeled and unlabeled samples by applying semi-supervised learning methods, most of which are built upon the pair-wise similarities between the samples. While the similarities have so far been formulated in a heuristic manner such as by k-NN, we propose methods to construct similarities from the probabilistic viewpoint. The kernel-based formulation of a transition probability is first proposed via comparing kernel least squares to variational least squares in the probabilistic framework. The formulation results in a simple quadratic programming which flexibly introduces the constraint to improve practical robustness and is efficiently computed by SMO. The kernel-based transition probability is by nature favorably sparse even without applying k-NN and induces the similarity measure of the same characteristics. Besides, to cope with multiple types of kernel functions, the multiple transition probabilities obtained correspondingly from the kernels can be probabilistically integrated with prior probabilities represented by linear weights. We propose a computationally efficient method to optimize the weights in a discriminative manner. The optimized weights contribute to a composite similarity measure straightforwardly as well as to integrate the multiple kernels themselves as multiple kernel learning does, which consequently derives various types of multiple kernel based semi-supervised classification methods. In the experiments on semi-supervised classification tasks, the proposed methods demonstrate favorable performances, compared to the other methods, in terms of classification performances and computation time. 相似文献
18.
Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R3), is demonstrated through robot learning experiments. (This letter is an extended version of our earlier conference paper: Hachiya, Peters, & Sugiyama, 2009 .). 相似文献
19.
Feiping Nie Shiming XiangYun Liu Chenping HouChangshui Zhang 《Pattern recognition letters》2012,33(5):485-491
In this paper, a new discriminant analysis for feature extraction is derived from the perspective of least squares regression. To obtain great discriminative power between classes, all the data points in each class are expected to be regressed to a single vector, and the basic task is to find a transformation matrix such that the squared regression error is minimized. To this end, two least squares discriminant analysis methods are developed under the orthogonal or the uncorrelated constraint. We show that the orthogonal least squares discriminant analysis is an extension to the null space linear discriminant analysis, and the uncorrelated least squares discriminant analysis is exactly equivalent to the traditional linear discriminant analysis. Comparative experiments show that the orthogonal one is more preferable for real world applications. 相似文献
20.
Regularized least squares support vector regression for the simultaneous learning of a function and its derivatives 总被引:1,自引:0,他引:1
Jayadeva 《Information Sciences》2008,178(17):3402-3414
In this paper, we propose a regularized least squares approach based support vector machine for simultaneously approximating a function and its derivatives. The proposed algorithm is simple and fast as no quadratic programming solver needs to be employed. Effectively, only the solution of a structured system of linear equations is needed. 相似文献