首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
This paper derives a family of differential learning rules that optimize the Shannon entropy at the output of an adaptive system via kernel density estimation. In contrast to parametric formulations of entropy, this nonparametric approach assumes no particular functional form of the output density. We address problems associated with quantized data and finite sample size, and implement efficient maximum likelihood techniques for optimizing the regularizer. We also develop a normalized entropy estimate that is invariant with respect to affine transformations, facilitating optimization of the shape, rather than the scale, of the output density. Kernel density estimates are smooth and differentiable; this makes the derived entropy estimates amenable to manipulation by gradient descent. The resulting weight updates are surprisingly simple and efficient learning rules that operate on pairs of input samples. They can be tuned for data-limited or memory-limited situations, or modified to give a fully online implementation.  相似文献   

2.
For gradient descent learning to yield connectivity consistent with real biological networks, the simulated neurons would have to include more realistic intrinsic properties such as frequency adaptation. However, gradient descent learning cannot be used straightforwardly with adapting rate-model neurons because the derivative of the activation function depends on the activation history. The objectives of this study were to (1) develop a simple computational approach to reproduce mathematical gradient descent and (2) use this computational approach to provide supervised learning in a network formed of rate-model neurons that exhibit frequency adaptation.The results of mathematical gradient descent were used as a reference in evaluating the performance of the computational approach. For this comparison, standard (nonadapting) rate-model neurons were used for both approaches. The only difference was the gradient calculation: the mathematical approach used the derivative at a point in weight space, while the computational approach used the slope for a step change in weight space. Theoretically, the results of the computational approach should match those of the mathematical approach, as the step size is reduced but floating-point accuracy formed a lower limit to usable step sizes. A systematic search for an optimal step size yielded a computational approach that faithfully reproduced the results of mathematical gradient descent.The computational approach was then used for supervised learning of both connection weights and intrinsic properties of rate-model neurons to convert a tonic input into a phasic-tonic output pattern. Learning produced biologically realistic connectivity that essentially used a monosynaptic connection from the tonic input neuron to an output neuron with strong frequency adaptation as compared to a complex network when using nonadapting neurons. Thus, more biologically realistic connectivity was achieved by implementing rate-model neurons with more realistic intrinsic properties. Our computational approach could be applied to learning of other neuron properties.  相似文献   

3.
针对输入输出观测数据均含有噪声的系统辨识问题,提出了一种鲁棒的总体最小二乘自适应辨识算法.该算法在对总体最小二乘问题与向量的瑞利商及其性质研究的基础上,以被辨识系统的增广权向量的瑞利商(RQ)作为损失函数,利用梯度最陡下降原理导出权向量的自适应迭代算法,并利用随机离散学习规律对权向量模的分析修正了算法梯度,提高了算法的噪声鲁棒性,构成了一种噪声鲁棒的总体最小二乘自适应辨识算法.文中研究了该算法的收敛性能.仿真实验结果表明该算法的鲁棒抗噪性能和稳态收敛精度明显高于其它同类方法,而且可使用较大的学习因子,在较高的噪声环境下仍然保持良好的收敛性.  相似文献   

4.
Learning long-term dependencies with gradient descent is difficult   总被引:12,自引:0,他引:12  
Recurrent neural networks can be used to map input sequences to output sequences, such as for recognition, production or prediction problems. However, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals. We show why gradient based learning algorithms face an increasingly difficult problem as the duration of the dependencies to be captured increases. These results expose a trade-off between efficient learning by gradient descent and latching on information for long periods. Based on an understanding of this problem, alternatives to standard gradient descent are considered.  相似文献   

5.
伦淑娴  胡海峰 《自动化学报》2017,43(7):1160-1168
为了提升泄露积分型回声状态网(Leaky integrator echo state network,Leaky-ESN)的性能,提出利用罚函数内点法优化Leaky-ESN的全局参数,如泄漏率、内部连接权矩阵谱半径、输入比例因子等,这克服了通过反复试验法选取参数值而降低了Leaky-ESN模型的优越性和性能.Leaky-ESN的全局参数必须保障回声状态网满足回声状态特性,因此它们之间存在不等式约束条件.有学者提出利用随机梯度下降法来优化内部连接权矩阵谱半径、输入比例因子、泄露率三个全局参数,一定程度上提高了Leaky-ESN的逼近精度.然而,随机梯度下降法是解决无约束优化问题的基本算法,在利用随机梯度下降法优化参数时,没有考虑参数必须满足回声特性的约束条件(不等式约束条件),致使得到的参数值不是最优解.由于罚函数内点法可以求解具有不等式约束的最优化问题,应用范围广,收敛速度较快,具有很强的全局寻优能力.因此,本文提出利用罚函数内点法优化Leaky-ESN的全局参数,并以时间序列预测为例,检验优化后的Leaky-ESN的预测性能,仿真结果表明了本文提出方法的有效性.  相似文献   

6.
提出一种量子BP网络模型及改进学习算法,该BP网络模型首先基于量子学中一位相移门和两位受控非门的通用性,构造出一种量子神经元,然后由该量子神经元构造隐含层,采用梯度下降法进行学习。输出层采用传统神经元构造,采用基于改进的带动量自适应学习率梯度下降法学习。在UCI两个数据集上采用该模型及算法,实验结果表明该方法比传统的BP网络具有较好的收敛速度和正确率。  相似文献   

7.
在深度学习任务中,随机方差衰减梯度法通过降低随机梯度方差,因此,其具有较好的稳定性和较高的计算效率。然而,这类方法在学习过程中均使用恒定的学习率,降低了随机方差衰减梯度法的计算效率。基于随机方差衰减梯度法,借鉴动量加速思想并对梯度估计采取加权平均策略,对学习率利用历史梯度信息进行自动调整,提出了自适应随机方差衰减梯度法。基于MNIST和CIFAR-10数据集,验证提出的自适应随机方差衰减梯度法的有效性。实验结果表明,自适应随机方差衰减梯度法在收敛速度和稳定性方面优于随机方差衰减梯度法和随机梯度下降法。  相似文献   

8.
随机梯度下降算法研究进展   总被引:6,自引:1,他引:5  
在机器学习领域中, 梯度下降算法是求解最优化问题最重要、最基础的方法. 随着数据规模的不断扩大, 传统的梯度下降算法已不能有效地解决大规模机器学习问题. 随机梯度下降算法在迭代过程中随机选择一个或几个样本的梯度来替代总体梯度, 以达到降低计算复杂度的目的. 近年来, 随机梯度下降算法已成为机器学习特别是深度学习研究的焦点. 随着对搜索方向和步长的不断探索, 涌现出随机梯度下降算法的众多改进版本, 本文对这些算法的主要研究进展进行了综述. 将随机梯度下降算法的改进策略大致分为动量、方差缩减、增量梯度和自适应学习率等四种. 其中, 前三种主要是校正梯度或搜索方向, 第四种对参数变量的不同分量自适应地设计步长. 着重介绍了各种策略下随机梯度下降算法的核心思想、原理, 探讨了不同算法之间的区别与联系. 将主要的随机梯度下降算法应用到逻辑回归和深度卷积神经网络等机器学习任务中, 并定量地比较了这些算法的实际性能. 文末总结了本文的主要研究工作, 并展望了随机梯度下降算法的未来发展方向.  相似文献   

9.
Training of recurrent neural networks (RNNs) introduces considerable computational complexities due to the need for gradient evaluations. How to get fast convergence speed and low computational complexity remains a challenging and open topic. Besides, the transient response of learning process of RNNs is a critical issue, especially for online applications. Conventional RNN training algorithms such as the backpropagation through time and real-time recurrent learning have not adequately satisfied these requirements because they often suffer from slow convergence speed. If a large learning rate is chosen to improve performance, the training process may become unstable in terms of weight divergence. In this paper, a novel training algorithm of RNN, named robust recurrent simultaneous perturbation stochastic approximation (RRSPSA), is developed with a specially designed recurrent hybrid adaptive parameter and adaptive learning rates. RRSPSA is a powerful novel twin-engine simultaneous perturbation stochastic approximation (SPSA) type of RNN training algorithm. It utilizes three specially designed adaptive parameters to maximize training speed for a recurrent training signal while exhibiting certain weight convergence properties with only two objective function measurements as the original SPSA algorithm. The RRSPSA is proved with guaranteed weight convergence and system stability in the sense of Lyapunov function. Computer simulations were carried out to demonstrate applicability of the theoretical results.  相似文献   

10.
Kivinen  J.  Warmuth  M. K. 《Machine Learning》2001,45(3):301-329
We study on-line generalized linear regression with multidimensional outputs, i.e., neural networks with multiple output nodes but no hidden nodes. We allow at the final layer transfer functions such as the softmax function that need to consider the linear activations to all the output neurons. The weight vectors used to produce the linear activations are represented indirectly by maintaining separate parameter vectors. We get the weight vector by applying a particular parameterization function to the parameter vector. Updating the parameter vectors upon seeing new examples is done additively, as in the usual gradient descent update. However, by using a nonlinear parameterization function between the parameter vectors and the weight vectors, we can make the resulting update of the weight vector quite different from a true gradient descent update. To analyse such updates, we define a notion of a matching loss function and apply it both to the transfer function and to the parameterization function. The loss function that matches the transfer function is used to measure the goodness of the predictions of the algorithm. The loss function that matches the parameterization function can be used both as a measure of divergence between models in motivating the update rule of the algorithm and as a measure of progress in analyzing its relative performance compared to an arbitrary fixed model. As a result, we have a unified treatment that generalizes earlier results for the gradient descent and exponentiated gradient algorithms to multidimensional outputs, including multiclass logistic regression.  相似文献   

11.
This paper considers the use of neural networks (NN's) in controlling a nonlinear, stochastic system with unknown process equations. The approach here is based on using the output error of the system to train the NN controller without the need to assume or construct a separate model (NN or other type) for the unknown process dynamics. To implement such a direct adaptive control approach, it is required that connection weights in the NN be estimated while the system is being controlled. As a result of the feedback of the unknown process dynamics, however, it is not possible to determine the gradient of the loss function for use in standard (backpropagation-type) weight estimation algorithms. In principle, stochastic approximation algorithms in the standard (Kiefer-Wolfowitz) finite-difference form can be used for this weight estimation since they are based on gradient approximations from available system output errors. However, these algorithms will generally require a prohibitive number of observed system outputs. Therefore, this paper considers the use of a new stochastic approximation algorithm for this weight estimation, which is based on a "simultaneous perturbation" gradient approximation. It is shown that this algorithm can greatly enhance the efficiency over more standard stochastic approximation algorithms based on finite-difference gradient approximations. The approach is illustrated on a simulated wastewater treatment system with stochastic effects and nonstationary dynamics.  相似文献   

12.
This paper proposes a stochastic gradient algorithm and two modified stochastic gradient algorithms for a nonlinear two-variable difference system. The output and the input of a two-variable parameter system depend on time and on spatial coordinates. A stochastic gradient algorithm is introduced to estimate the unknown parameters. In order to increase the convergence rate but not to increase the computational effort, two modified stochastic gradient algorithms are also proposed. The simulation results indicate that the proposed methods are effective.  相似文献   

13.
手写数字逆向传播(Back Propagation,BP)神经网络由输入层、隐藏层、输出层构成。训练数据是MNIST开源手写数字集里60?000个样本,BP算法由随机梯度下降算法和反向传播算法构成,采用network小批量数据迭代30次的网络学习过程,训练出合适的权重和偏置。利用现场可编程门阵列(Field Programmable Gate Array,FPGA)硬件平台,Verilog代码实现BP算法、时序控制各层网络训练状态、Sigmoid(S型)函数及导数线性拟合是设计重点。初始化均值为0,方差为1的高斯分布网络权重和偏置,采用小批量数据个数[m]为10,学习系数[η]为3,在系统中输入样本及标签利用Quartus13.0和modelsim仿真与分析,工程运行迭代30次时间是4.5 s,样本识别正确率是91.6%,与软件python2.7相比满足了硬件设计的实时性和手写数字识别的高准确率。  相似文献   

14.
Stochastic competitive learning   总被引:7,自引:0,他引:7  
Competitive learning systems are examined as stochastic dynamical systems. This includes continuous and discrete formulations of unsupervised, supervised, and differential competitive learning systems. These systems estimate an unknown probability density function from random pattern samples and behave as adaptive vector quantizers. Synaptic vectors, in feedforward competitive neural networks, quantize the pattern space and converge to pattern class centroids or local probability maxima. A stochastic Lyapunov argument shows that competitive synaptic vectors converge to centroids exponentially quickly and reduces competitive learning to stochastic gradient descent. Convergence does not depend on a specific dynamical model of how neuronal activations change. These results extend to competitive estimation of local covariances and higher order statistics.  相似文献   

15.
卷积神经网络的多字体汉字识别   总被引:1,自引:0,他引:1       下载免费PDF全文
目的 多字体的汉字识别在中文自动处理及智能输入等方面具有广阔的应用前景,是模式识别领域的一个重要课题。近年来,随着深度学习新技术的出现,基于深度卷积神经网络的汉字识别在方法和性能上得到了突破性的进展。然而现有方法存在样本需求量大、训练时间长、调参难度大等问题,针对大类别的汉字识别很难达到最佳效果。方法 针对无遮挡的印刷及手写体汉字图像,提出了一种端对端的深度卷积神经网络模型。不考虑附加层,该网络主要由3个卷积层、2个池化层、1个全连接层和一个Softmax回归层组成。为解决样本量不足的问题,提出了综合运用波纹扭曲、平移、旋转、缩放的数据扩增方法。为了解决深度神经网络参数调整难度大、训练时间长的问题,提出了对样本进行批标准化以及采用多种优化方法相结合精调网络等策略。结果 实验采用该深度模型对国标一级3 755类汉字进行识别,最终识别准确率达到98.336%。同时通过多组对比实验,验证了所提出的各种方法对改善模型最终效果的贡献。其中使用数据扩增、使用混合优化方法和使用批标准化后模型对测试样本的识别率分别提高了8.0%、0.3%和1.4%。结论 与其他文献中利用手工提取特征结合卷积神经网络的方法相比,减少了人工提取特征的工作量;与经典卷积神经网络相比,该网络特征提取能力更强,识别率更高,训练时间更短。  相似文献   

16.
Perturbation analysis and optimization of stochastic flow networks   总被引:1,自引:0,他引:1  
We consider a stochastic fluid model of a network consisting of several single-class nodes in tandem and perform perturbation analysis for the node queue contents and associated event times with respect to a threshold parameter at the first node. We then derive infinitesimal perturbation analysis (IPA) derivative estimators for loss and buffer occupancy performance metrics with respect to this parameter and show that these estimators are unbiased. We also show that the estimators depend only on data directly observable from a sample path of the actual underlying discrete event system, without any knowledge of the stochastic characteristics of the random processes involved. This renders them computable in online environments and easily implementable for network management and optimization. This is illustrated by combining the IPA estimators with standard gradient based stochastic optimization methods and providing simulation examples.  相似文献   

17.
The authors discuss the requirements of learning for generalization, where the traditional methods based on gradient descent have limited success. A stochastic learning algorithm based on simulated annealing in weight space is presented. The authors verify the convergence properties and feasibility of the algorithm. An implementation of the algorithm and validation experiments are described  相似文献   

18.
Learning algorithms are described for layered feedforward type neural networks, in which a unit generates a real-valued output through a logistic function. The problem of adjusting the weights of internal hidden units can be regarded as a problem of estimating (or identifying) constant parametes with a non-linear observation equation. The present algorithm based on (he extended Kalman filter has just the time-varying learning rate, while the well-known back-propagation (or generalized delta rule) algorithm based on gradient descent has a constant learning rate. From some simulation examples it is shown that when a sufficiently trained network is desired, the learning speed of the proposed algorithm is faster than that of the traditional back-propagation algorithm.  相似文献   

19.
为解决过程神经网络的隐层结构和训练速度问题,在极限学习机的基础上,提出一种混合优化的结构自适应极限过程神经网络.首先,采用在隐层中逐次增加过程神经元节点直至满足输出误差的方式完成模型结构自适应;然后,为消除冗余节点,提出对新增临时节点输出实施Gram-Schmidt正交化完成相关性判别;最后,构建一种量子衍生布谷鸟算法,对新增节点输入权函数正交基展开系数实施寻优.仿真实验以Mackey-Glass和页岩油TOC预测为例,通过对比分析验证所提出方法的有效性,仿真结果表明所得模型的逼近效率和训练速度有明显提高.  相似文献   

20.
已有稀疏多核学习(MKL)模型在产生核函数权重稀疏解时容易导致信息丢失且泛化能力差,且基于梯度下降法的MKL在接近最优解时收敛速度慢.建立了基于支持向量机(SVM)的弹性多核学习(EMKL)模型并给出了一种基于牛顿梯度优化的EMKL(NO-EMKL).模型在MKL的目标函数中引入弹性项,并设计了基于二阶牛顿梯度下降法的优化算法.实验结果表明:算法不仅具有更好的分类精度,还具有较快的收敛速度.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号