首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Proposed in this paper is a new conjugate gradient method with smoothing \(L_{1/2} \) regularization based on a modified secant equation for training neural networks, where a descent search direction is generated by selecting an adaptive learning rate based on the strong Wolfe conditions. Two adaptive parameters are introduced such that the new training method possesses both quasi-Newton property and sufficient descent property. As shown in the numerical experiments for five benchmark classification problems from UCI repository, compared with the other conjugate gradient training algorithms, the new training algorithm has roughly the same or even better learning capacity, but significantly better generalization capacity and network sparsity. Under mild assumptions, a global convergence result of the proposed training method is also proved.  相似文献   

2.
Zou  Difan  Cao  Yuan  Zhou  Dongruo  Gu  Quanquan 《Machine Learning》2020,109(3):467-492

We study the problem of training deep fully connected neural networks with Rectified Linear Unit (ReLU) activation function and cross entropy loss function for binary classification using gradient descent. We show that with proper random weight initialization, gradient descent can find the global minima of the training loss for an over-parameterized deep ReLU network, under certain assumption on the training data. The key idea of our proof is that Gaussian random initialization followed by gradient descent produces a sequence of iterates that stay inside a small perturbation region centered at the initial weights, in which the training loss function of the deep ReLU networks enjoys nice local curvature properties that ensure the global convergence of gradient descent. At the core of our proof technique is (1) a milder assumption on the training data; (2) a sharp analysis of the trajectory length for gradient descent; and (3) a finer characterization of the size of the perturbation region. Compared with the concurrent work (Allen-Zhu et al. in A convergence theory for deep learning via over-parameterization, 2018a; Du et al. in Gradient descent finds global minima of deep neural networks, 2018a) along this line, our result relies on milder over-parameterization condition on the neural network width, and enjoys faster global convergence rate of gradient descent for training deep neural networks.

  相似文献   

3.
针对局部全局一致性学习(LLGC)算法的分类精度在很大程度上取决于控制参数的合理设置问题,提出一种少参数的简洁局部全局一致性学习(BB-LLGC).简化图上的目标函数,使其不受参数α的影响.另外,在标签传递过程中,仅将未标记样本的标签根据相似度传递给其近邻,而将已标记样本的标签强制填回以确保标签传递源头的准确性.UCI数据集的实验结果表明,与LLGC相比,BB-LLGC不仅控制参数少、使用简单,而且分类精度高、收敛速度快.  相似文献   

4.
Conjugate gradient method is a root-finding algorithm to non-linear equations. In this paper, we suggest extending this method for a polynomial to the complex plane. Through the experimental and theoretical mathematics method, we drew the following conclusions: (1) the conjugate gradient is a dynamical system with two complex parameters; (2) locally conditions for convergence to any roots of complex functions is given; (3) the conjugate gradient method may fail to converge to all roots for cubic with three simple roots; (4) the boundary of conjugate gradient basins are fractals in some cases, and depends on the parameters; (5) the algorithm is then improved by introducing a method to determine the optimal parameters.  相似文献   

5.
The training time of ANN depends on size of ANN (i.e. number of hidden layers and number of neurons in each layer), size of training data, their normalization range and type of mapping of training patterns (like X–Y, X–Y, X–Y and X–Y), error functions and learning algorithms. The efforts have been done in past to reduce training time of ANN by selection of an optimal network and modification in learning algorithms. In this paper, an attempt has been made to develop a new neuron model using neuro-fuzzy approach to overcome the problems of ANN incorporating the features of fuzzy systems at a neuron level. Fuzzifying the neuron structure, which incorporates the features of simple neuron as well as high order neuron, has used this synergetic approach.  相似文献   

6.
In this paper, we propose an implicit gradient descent algorithm for the classic k-means problem. The implicit gradient step or backward Euler is solved via stochastic fixed-point iteration, in which we randomly sample a mini-batch gradient in every iteration. It is the average of the fixed-point trajectory that is carried over to the next gradient step. We draw connections between the proposed stochastic backward Euler and the recent entropy stochastic gradient descent for improving the training of deep neural networks. Numerical experiments on various synthetic and real datasets show that the proposed algorithm provides better clustering results compared to k-means algorithms in the sense that it decreased the objective function (the cluster) and is much more robust to initialization.  相似文献   

7.
Motivated by recent applications of wireless sensor networks in monitoring infrastructure networks, we address the problem of optimal coverage of infrastructure networks using sensors whose sensing performance decays with distance. We show that this problem can be formulated as a continuous p-median problem on networks. The literature has addressed the discrete p-median problem   on networks and in continuum domains, and the continuous pp-median problem in continuum domains extensively. However, in-depth analysis of the continuous pp-median problem on networks has been lacking. With the sensing performance model that decays with distance, each sensor covers a region equivalent to its Voronoi partition on the network in terms of the shortest path distance metric. Using Voronoi partitions, we define a directional partial derivative of the coverage metric with respect to a sensor’s location. We then propose a gradient descent algorithm to obtain a locally optimal solution with guaranteed convergence. The quality of an optimal solution depends on the choice of the initial configuration of sensors. We obtain an initial configuration using two approaches: by solving the discrete pp-median problem on a lumped   network and by random sampling. We consider two methods of random sampling: uniform sampling and D2D2-sampling. The first approach with the initial solution of the discrete pp-median problem leads to the best coverage performance for large networks, but at the cost of high running time. We also observe that the gradient descent on the initial solution with the D2D2-sampling method yields a solution that is within at most 7% of the previous solution and with much shorter running time.  相似文献   

8.
9.
We consider a distributed optimal control problem governed by an elliptic convection diffusion PDE, and propose a hybridizable discontinuous Galerkin method to approximate the solution. We use polynomials of degree \(k+1\) to approximate the state and dual state, and polynomials of degree \(k \ge 0\) to approximate their fluxes. Moreover, we use polynomials of degree k to approximate the numerical traces of the state and dual state on the faces, which are the only globally coupled unknowns. We prove optimal a priori error estimates for all variables when \( k \ge 0 \). Furthermore, from the point of view of the number of degrees of freedom of the globally coupled unknowns, this method achieves superconvergence for the state, dual state, and control when \(k\ge 1\). We illustrate our convergence results with numerical experiments.  相似文献   

10.
Most optimization methods for logistic regression or maximum entropy solve the primal problem. They range from iterative scaling, coordinate descent, quasi-Newton, and truncated Newton. Less efforts have been made to solve the dual problem. In contrast, for linear support vector machines (SVM), methods have been shown to be very effective for solving the dual problem. In this paper, we apply coordinate descent methods to solve the dual form of logistic regression and maximum entropy. Interestingly, many details are different from the situation in linear SVM. We carefully study the theoretical convergence as well as numerical issues. The proposed method is shown to be faster than most state of the art methods for training logistic regression and maximum entropy.  相似文献   

11.
Recently there has been significant research in multiple-instance learning, yet most of this work has only considered this model when there are Boolean labels. However, in many of the application areas for which the multiple-instance model fits, real-valued labels are more appropriate than Boolean labels. We define and study a real-valued multiple-instance model in which each multiple-instance example (bag) is given a real-valued label in [0, 1] that indicates the degree to which the bag satisfies the target concept. To provide additional structure to the learning problem, we associate a real-valued label with each point in the bag. These values are then combined using a real-valued aggregation operator to obtain the label for the bag. We then present on-line agnostic algorithms for learning real-valued multiple-instance geometric concepts defined by axis-aligned boxes in constant-dimensional space and describe several possible applications of these algorithms. We obtain our learning algorithms by reducing the problem to one in which the exponentiated gradient or gradient descent algorithm can be used. We also give a novel application of the virtual weights technique. In typical applications of the virtual weights technique, all of the concepts in a group have the same weight and prediction, allowing a single representative concept from each group to be tracked. However, in our application there are an exponential number of different weights and possible predictions. Hence, boxes in each group have different weights and predictions, making the computation of the contribution of a group significantly more involved. However, we are able to both keep the number of groups polynomial in the number of trials and efficiently compute the overall prediction.  相似文献   

12.
13.
We introduce and analyze an augmented mixed finite element method for the Navier–Stokes–Brinkman problem with nonsolenoidal velocity. We employ a technique previously applied to the stationary Navier–Stokes equation, which consists of the introduction of a modified pseudostress tensor relating the gradient of the velocity and the pressure with the convective term, and propose an augmented pseudostress–velocity formulation for the model problem. The resulting augmented scheme is then written equivalently as a fixed point equation, so that the well-known Banach fixed point theorem, combined with the Lax–Milgram lemma, are applied to prove the unique solvability of the continuous and discrete systems. We point out that no discrete inf–sup conditions are required for the solvability analysis, and hence, in particular for the Galerkin scheme, arbitrary finite element subspaces of the respective continuous spaces can be utilized. For instance, given an integer k0, the Raviart–Thomas spaces of order k and continuous piecewise polynomials of degree k+1 constitute feasible choices of discrete spaces for the pseudostress and the velocity, respectively, yielding optimal convergence. We also emphasize that, since the Dirichlet boundary condition becomes a natural condition, the analysis for both the continuous an discrete problems can be derived without introducing any lifting of the velocity boundary datum. In addition, we derive a reliable and efficient residual-based a posteriori error estimator for the augmented mixed method. The proof of reliability makes use of a global inf–sup condition, a Helmholtz decomposition, and local approximation properties of the Clément interpolant and Raviart–Thomas operator. On the other hand, inverse inequalities, the localization technique based on element-bubble and edge-bubble functions, approximation properties of the L2-orthogonal projector, and known results from previous works, are the main tools for proving the efficiency of the estimator. Finally, some numerical results illustrating the performance of the augmented mixed method, confirming the theoretical rate of convergence and properties of the estimator, and showing the behavior of the associated adaptive algorithms, are reported.  相似文献   

14.
15.
A convergent iterative regularization procedure based on the square of a dual norm is introduced for image restoration models with general (quadratic or non-quadratic) convex fidelity terms. Iterative regularization methods have been previously employed for image deblurring or denoising in the presence of Gaussian noise, which use L 2 (Tadmor et?al. in Multiscale Model. Simul. 2:554?C579, 2004; Osher et?al. in Multiscale Model. Simul. 4:460?C489, 2005; Tadmor et?al. in Commun. Math. Sci. 6(2):281?C307, 2008), and L 1 (He et?al. in J.?Math. Imaging Vis. 26:167?C184, 2005) data fidelity terms, with rigorous convergence results. Recently, Iusem and Resmerita (Set-Valued Var. Anal. 18(1):109?C120, 2010) proposed a proximal point method using inexact Bregman distance for minimizing a convex function defined on a non-reflexive Banach space (e.g. BV(??)), which is the dual of a separable Banach space. Based on this method, we investigate several approaches for image restoration such as image deblurring in the presence of noise or image deblurring via (cartoon+texture) decomposition. We show that the resulting proximal point algorithms approximate stably a true image. For image denoising-deblurring we consider Gaussian, Laplace, and Poisson noise models with the corresponding convex fidelity terms as in the Bayesian approach. We test the behavior of proposed algorithms on synthetic and real images in several numerical experiments and compare the results with other state-of-the-art iterative procedures based on the total variation penalization as well as the corresponding existing one-step gradient descent implementations. The numerical experiments indicate that the iterative procedure yields high quality reconstructions and superior results to those obtained by one-step standard gradient descent, with faster computational time.  相似文献   

16.
This article develops distributed optimal control policies via Q-learning for multi-agent systems (MASs) by solving dual games. According to game theory, first, the distributed consensus problem is formulated as a multi-player non-zero-sum game, where each agent is viewed as a player focusing only on its local performance and the whole MAS achieves Nash equilibrium. Second, for each agent, the anti-disturbance problem is formulated as a two-player zero-sum game, in which the control input and external disturbance are a pair of opponents. Specifically, (1) an offline data-driven off-policy for distributed tracking algorithm based on momentum policy gradient (MPG) is developed, which can effectively achieve consensus of MASs with guaranteed -bounded synchronization error. (2) An actor-critic-disturbance neural network is employed to implement the MPG algorithm and obtain optimal policies. Finally, numerical and practical simulation results are conducted to verify the effectiveness of the developed tracking policies via MPG algorithm.  相似文献   

17.
一种结构自适应神经网络及其训练方法   总被引:2,自引:1,他引:1  
宋彦坡  彭小奇 《控制与决策》2010,25(8):1265-1268
针对神经网络建模效果对网络结构、训练方法过于敏感的缺陷,提出一种结构自适应神经网络模型及其训练方法.模型具有双网结构并以"提前终止法"训练,一定程度上降低了建模效果对网络结构的敏感性;模型结构根据建模数据的噪声方差、模型当前误差等信息自适应调整,进一步提高了模型的建模效果,同时具有较高的时间效率.仿真结果表明,该方法弥补了提前终止等传统方法的部分不足,具有较好的效果.  相似文献   

18.
For the sake of simplicity it is often desirable to restrict the number of feedbacks in a controller. In this case the optimal feedbacks depend on the disturbance to which the system is subjected. Using a quadratic error integral as a measure of the response, three criteria of optimization are considered :
  1. The response to a given initial disturbance.

  2. The worst response to an initial disturbance of given magnitude.

  3. The worst comparison with the unconstrained optimal system.

It is shown that for each of these criteria the gradient with respect to the feedbacks can be calculated by a uniform method. The solution may then be found either directly or by a descent procedure. The method is illustrated by an example.  相似文献   

19.
One of the main concerns in geotechnical engineering is slope stability prediction during the earthquake. In this study, two intelligent systems namely artificial neural network (ANN) and particle swarm optimization (PSO)–ANN models were developed to predict factor of safety (FOS) of homogeneous slopes. Geostudio program based on limit equilibrium method was utilized to obtain 699 FOS values with different conditions. The most influential factors on FOS such as slope height, gradient, cohesion, friction angle and peak ground acceleration were considered as model inputs in the present study. A series of sensitivity analyses were performed in modeling procedures of both intelligent systems. All 699 datasets were randomly selected to 5 different datasets based on training and testing. Considering some model performance indices, i.e., root mean square error, coefficient of determination (R 2) and value account for (VAF) and using simple ranking method, the best ANN and PSO–ANN models were selected. It was found that the PSO–ANN technique can predict FOS with higher performance capacities compared to ANN. R 2 values of testing datasets equal to 0.915 and 0.986 for ANN and PSO–ANN techniques, respectively, suggest the superiority of the PSO–ANN technique.  相似文献   

20.
The latest-generation earth observation instruments on airborne and satellite platforms are currently producing an almost continuous high-dimensional data stream. This exponentially growing data poses a new challenge for real-time image processing and recognition. Making full and effective use of the spectral information and spatial structure information of high-resolution remote sensing image is the key to the processing and recognition of high-resolution remote sensing data. In this paper, the adaptive multipoint moment estimation (AMME) stochastic optimization algorithm is proposed for the first time by using the finite lower-order moments and adding the estimating points. This algorithm not only reduces the probability of local optimum in the learning process, but also improves the convergence rate of the convolutional neural network (Lee Cun et al. in Advances in neural information processing systems, 1990). Second, according to the remote sensing image with characteristics of complex background and small sensitive targets, and by automatic discovery, locating small targets, and giving high weights, we proposed a feature extraction method named weighted pooling to further improve the performance of real-time image recognition. We combine the AMME and weighted pooling with the spatial pyramid representation (Harada et al. in Comput Vis Pattern Recognit 1617–1624, 2011) algorithm to form a new, multiscale, and multilevel real-time image recognition model and name it weighted spatial pyramid networks (WspNet). At the end, we use the MNIST, ImageNet, and natural disasters under remote sensing data sets to test WspNet. Compared with other real-time image recognition models, WspNet achieve a new state of the art in terms of convergence rate and image feature extraction compared with conventional stochastic gradient descent method [like AdaGrad, AdaDelta and Adam (Zeiler in Comput Sci, 2012; Kingma and Ba in Comput Sci, 2014; Duchi et al. in J Mach Learn Res 12(7):2121–2159, 2011] and pooling method [like max-pooling, avg-pooling and stochastic-pooling (Zeiler and Fergus in stochastic-pooling for regularization of deep convolutional neural networks, 2013)].  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号