首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
On the problem of local minima in recurrent neural networks   总被引:2,自引:0,他引:2  
Many researchers have recently focused their efforts on devising efficient algorithms, mainly based on optimization schemes, for learning the weights of recurrent neural networks. As in the case of feedforward networks, however, these learning algorithms may get stuck in local minima during gradient descent, thus discovering sub-optimal solutions. This paper analyses the problem of optimal learning in recurrent networks by proposing conditions that guarantee local minima free error surfaces. An example is given that also shows the constructive role of the proposed theory in designing networks suitable for solving a given task. Moreover, a formal relationship between recurrent and static feedforward networks is established such that the examples of local minima for feedforward networks already known in the literature can be associated with analogous ones in recurrent networks.  相似文献   

2.
Classical statistical techniques for prediction reach their limitations in applications with nonlinearities in the data set; nevertheless, neural models can counteract these limitations. In this paper, we present a recurrent neural model where we associate an adaptative time constant to each neuron-like unit and a learning algorithm to train these dynamic recurrent networks. We test the network by training it to predict the Mackey-Glass chaotic signal. To evaluate the quality of the prediction, we computed the power spectra of the two signals and computed the associated fractional error. Results show that the introduction of adaptative time constants associated to each neuron of a recurrent network improves the quality of the prediction and the dynamical features of a neural model. The performance of such dynamic recurrent neural networks outperform time-delay neural networks.  相似文献   

3.
We present a Monte Carlo approach for training partially observable diffusion processes. We apply the approach to diffusion networks, a stochastic version of continuous recurrent neural networks. The approach is aimed at learning probability distributions of continuous paths, not just expected values. Interestingly, the relevant activation statistics used by the learning rule presented here are inner products in the Hilbert space of square integrable functions. These inner products can be computed using Hebbian operations and do not require backpropagation of error signals. Moreover, standard kernel methods could potentially be applied to compute such inner products. We propose that the main reason that recurrent neural networks have not worked well in engineering applications (e.g., speech recognition) is that they implicitly rely on a very simplistic likelihood model. The diffusion network approach proposed here is much richer and may open new avenues for applications of recurrent neural networks. We present some analysis and simulations to support this view. Very encouraging results were obtained on a visual speech recognition task in which neural networks outperformed hidden Markov models.  相似文献   

4.
How to efficiently train recurrent networks remains a challenging and active research topic. Most of the proposed training approaches are based on computational ways to efficiently obtain the gradient of the error function, and can be generally grouped into five major groups. In this study we present a derivation that unifies these approaches. We demonstrate that the approaches are only five different ways of solving a particular matrix equation. The second goal of this paper is develop a new algorithm based on the insights gained from the novel formulation. The new algorithm, which is based on approximating the error gradient, has lower computational complexity in computing the weight update than the competing techniques for most typical problems. In addition, it reaches the error minimum in a much smaller number of iterations. A desirable characteristic of recurrent network training algorithms is to be able to update the weights in an online fashion. We have also developed an online version of the proposed algorithm, that is based on updating the error gradient approximation in a recursive manner.  相似文献   

5.
Although the potential of the powerful mapping and representational capabilities of recurrent network architectures is generally recognized by the neural network research community, recurrent neural networks have not been widely used for the control of nonlinear dynamical systems, possibly due to the relative ineffectiveness of simple gradient descent training algorithms. Developments in the use of parameter-based extended Kalman filter algorithms for training recurrent networks may provide a mechanism by which these architectures will prove to be of practical value. This paper presents a decoupled extended Kalman filter (DEKF) algorithm for training of recurrent networks with special emphasis on application to control problems. We demonstrate in simulation the application of the DEKF algorithm to a series of example control problems ranging from the well-known cart-pole and bioreactor benchmark problems to an automotive subsystem, engine idle speed control. These simulations suggest that recurrent controller networks trained by Kalman filter methods can combine the traditional features of state-space controllers and observers in a homogeneous architecture for nonlinear dynamical systems, while simultaneously exhibiting less sensitivity than do purely feedforward controller networks to changes in plant parameters and measurement noise.  相似文献   

6.
Zemel RS  Mozer MC 《Neural computation》2001,13(5):1045-1064
Attractor networks, which map an input space to a discrete output space, are useful for pattern completion--cleaning up noisy or missing input features. However, designing a net to have a given set of attractors is notoriously tricky; training procedures are CPU intensive and often produce spurious attractors and ill-conditioned attractor basins. These difficulties occur because each connection in the network participates in the encoding of multiple attractors. We describe an alternative formulation of attractor networks in which the encoding of knowledge is local, not distributed. Although localist attractor networks have similar dynamics to their distributed counterparts, they are much easier to work with and interpret. We propose a statistical formulation of localist attractor net dynamics, which yields a convergence proof and a mathematical interpretation of model parameters. We present simulation experiments that explore the behavior of localist attractor networks, showing that they yield few spurious attractors, and they readily exhibit two desirable properties of psychological and neurobiological models: priming (faster convergence to an attractor if the attractor has been recently visited) and gang effects (in which the presence of an attractor enhances the attractor basins of neighboring attractors).  相似文献   

7.
Recurrent neural networks have been successfully used for analysis and prediction of temporal sequences. This paper is concerned with the convergence of a gradient-descent learning algorithm for training a fully recurrent neural network. In literature, stochastic process theory has been used to establish some convergence results of probability nature for the on-line gradient training algorithm, based on the assumption that a very large number of (or infinitely many in theory) training samples of the temporal sequences are available. In this paper, we consider the case that only a limited number of training samples of the temporal sequences are available such that the stochastic treatment of the problem is no longer appropriate. Instead, we use an off-line gradient training algorithm for the fully recurrent neural network, and we accordingly prove some convergence results of deterministic nature. The monotonicity of the error function in the iteration is also guaranteed. A numerical example is given to support the theoretical findings.  相似文献   

8.
Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent difficulties when using neural networks for the processing of high noise, small sample size signals. We introduce a new intelligent signal processing method which addresses the difficulties. The method proposed uses conversion into a symbolic representation with a self-organizing map, and grammatical inference with recurrent neural networks. We apply the method to the prediction of daily foreign exchange rates, addressing difficulties with non-stationarity, overfitting, and unequal a priori class probabilities, and we find significant predictability in comprehensive experiments covering 5 different foreign exchange rates. The method correctly predicts the directionof change for the next day with an error rate of 47.1%. The error rate reduces to around 40% when rejecting examples where the system has low confidence in its prediction. We show that the symbolic representation aids the extraction of symbolic knowledge from the trained recurrent neural networks in the form of deterministic finite state automata. These automata explain the operation of the system and are often relatively simple. Automata rules related to well known behavior such as tr end following and mean reversal are extracted.  相似文献   

9.
Limitations of nonlinear PCA as performed with generic neuralnetworks   总被引:1,自引:0,他引:1  
Kramer's (1991) nonlinear principal components analysis (NLPCA) neural networks are feedforward autoassociative networks with five layers. The third layer has fewer nodes than the input or output layers. This paper proposes a geometric interpretation for Kramer's method by showing that NLPCA fits a lower-dimensional curve or surface through the training data. The first three layers project observations onto the curve or surface giving scores. The last three layers define the curve or surface. The first three layers are a continuous function, which we show has several implications: NLPCA "projections" are suboptimal producing larger approximation error, NLPCA is unable to model curves and surfaces that intersect themselves, and NLPCA cannot parameterize curves with parameterizations having discontinuous jumps. We establish results on the identification of score values and discuss their implications on interpreting score values. We discuss the relationship between NLPCA and principal curves and surfaces, another nonlinear feature extraction method.  相似文献   

10.
Determining the architecture of a neural network is an important issue for any learning task. For recurrent neural networks no general methods exist that permit the estimation of the number of layers of hidden neurons, the size of layers or the number of weights. We present a simple pruning heuristic that significantly improves the generalization performance of trained recurrent networks. We illustrate this heuristic by training a fully recurrent neural network on positive and negative strings of a regular grammar. We also show that rules extracted from networks trained with this pruning heuristic are more consistent with the rules to be learned. This performance improvement is obtained by pruning and retraining the networks. Simulations are shown for training and pruning a recurrent neural net on strings generated by two regular grammars, a randomly-generated 10-state grammar and an 8-state, triple-parity grammar. Further simulations indicate that this pruning method can have generalization performance superior to that obtained by training with weight decay.  相似文献   

11.
《Neurocomputing》1999,24(1-3):13-36
This paper reviews different approaches to improving the real time recurrent learning (RTRL) algorithm and attempts to group them into common frameworks. The characteristics of sub-grouping strategy, mode exchange RTRL, and cellular genetic algorithms are discussed. The relationships between these algorithms are highlighted and their time complexities and convergence capability are compared. The learning algorithms are applied to train recurrent neural networks in an attempt to solve a long-term dependency problem, to model the Hénon map, and to predict the chaotic intensity pulsations of an NH3 laser. The results show that the original RTRL algorithm achieves the lowest error among the gradient-based algorithms, but it requires the longest training time; whereas the sub-grouping strategy uses the shortest training time but its convergence capability is the poorest. The results also demonstrate that the cellular genetic algorithm is an alternative means of training recurrent neural networks when the gradient-based methods fail to find an acceptable solution.  相似文献   

12.
提出一种新的动态对角回归神经网络学习算法-局部动态误差反传算法(LDBP),该算法定义了一种新的局部均方差函数,并为回归单元建立一种新的学习结构。如果估计出各层的期望输出值,多层回归网络便可分解成一组自适应单元(Adaline),而每个单元可通过二次优化方法进行训练。采用可在有限步人找出全局最优解的共轭梯度法(CG)进行寻优。由于学习过程采用超线性搜索,大大减少了循环步数和计算时间。  相似文献   

13.
The use of artificial neural networks (ANNs) models has grown considerably over the last decade. One of the difficulties in using ANNs is the fact that in most cases there are several numbers of input variables available. In the past, there was a tendency to use a large number of inputs in ANNs applications. This can have a number of detrimental effects on the network during training and it also requires a greater amount of data to efficiently estimate the connection weights. Additional inputs tend to increase the required time for training and the risk of the training algorithm becoming stuck in a local minimum. A large number of inputs also increases the risk of including spurious variables that merely increase the noise in the forecasts. Consequently, it is important to use an appropriate selection technique of the input variables in order to obtain the smallest number of independent inputs that are useful predictors for the system which is being researched. The aim of this paper is to review techniques that will allow the selection of appropriate model inputs based particularly on mutual information and genetic algorithms.  相似文献   

14.
Concerns the effect of noise on the performance of feedforward neural nets. We introduce and analyze various methods of injecting synaptic noise into dynamically driven recurrent nets during training. Theoretical results show that applying a controlled amount of noise during training may improve convergence and generalization performance. We analyze the effects of various noise parameters and predict that best overall performance can be achieved by injecting additive noise at each time step. Noise contributes a second-order gradient term to the error function which can be viewed as an anticipatory agent to aid convergence. This term appears to find promising regions of weight space in the beginning stages of training when the training error is large and should improve convergence on error surfaces with local minima. The first-order term is a regularization term that can improve generalization. Specifically, it can encourage internal representations where the state nodes operate in the saturated regions of the sigmoid discriminant function. While this effect can improve performance on automata inference problems with binary inputs and target outputs, it is unclear what effect it will have on other types of problems. To substantiate these predictions, we present simulations on learning the dual parity grammar from temporal strings for all noise models, and present simulations on learning a randomly generated six-state grammar using the predicted best noise model.  相似文献   

15.
传统的梯度算法存在收敛速度过慢的问题,针对这个问题,提出一种将惩罚项加到传统误差函数的梯度算法以训练递归pi-sigma神经网络,算法不仅提高了神经网络的泛化能力,而且克服了因网络初始权值选取过小而导致的收敛速度过慢的问题,相比不带惩罚项的梯度算法提高了收敛速度。从理论上分析了带惩罚项的梯度算法的收敛性,并通过实验验证了算法的有效性。  相似文献   

16.
The paper first summarizes a general approach to the training of recurrent neural networks by gradient-based algorithms, which leads to the introduction of four families of training algorithms. Because of the variety of possibilities thus available to the "neural network designer," the choice of the appropriate algorithm to solve a given problem becomes critical. We show that, in the case of process modeling, this choice depends on how noise interferes with the process to be modeled; this is evidenced by three examples of modeling of dynamical processes, where the detrimental effect of inappropriate training algorithms on the prediction error made by the network is clearly demonstrated.  相似文献   

17.
We study the application of neural networks to modeling the blood glucose metabolism of a diabetic. In particular we consider recurrent neural networks and time series convolution neural networks which we compare to linear models and to nonlinear compartment models. We include a linear error model to take into account the uncertainty in the system and for handling missing blood glucose observations. Our results indicate that best performance can be achieved by the combination of the recurrent neural network and the linear error model.  相似文献   

18.
基于多层局部回归神经网络的多变量非线性系统预测控制   总被引:8,自引:0,他引:8  
以罐式搅拌反应器为例,针对复杂多变量系统的强耦合性、非线性、时变性等问题,研究了多变量非线性系统的预测控制及改善控制性能的方法,采用多层局部回归神经网络离线建立预测模型,以偏差补偿和模型修正相结合的方式对预测模型进行误差补偿,以要线校正用于预测控制,通过对性能指标中的偏差项负指数加权,进一步改善预测控制性能,住址结果表明了控制算法的有效性。  相似文献   

19.
Real-time algorithms for gradient descent supervised learning in recurrent dynamical neural networks fail to support scalable VLSI implementation, due to their complexity which grows sharply with the network dimension. We present an alternative implementation in analog VLSI, which employs a stochastic perturbation algorithm to observe the gradient of the error index directly on the network in random directions of the parameter space, thereby avoiding the tedious task of deriving the gradient from an explicit model of the network dynamics. The network contains six fully recurrent neurons with continuous-time dynamics, providing 42 free parameters which comprise connection strengths and thresholds. The chip implementing the network includes local provisions supporting both the learning and storage of the parameters, integrated in a scalable architecture which can be readily expanded for applications of learning recurrent dynamical networks requiring larger dimensionality. We describe and characterize the functional elements comprising the implemented recurrent network and integrated learning system, and include experimental results obtained from training the network to represent a quadrature-phase oscillator.  相似文献   

20.
The extraction and interpretation of networks of lines from images yields important organizational information of the network under consideration. In this paper, a one-parameter algorithm for the extraction of line networks from images is presented. The parameter indicates the extracted saliency level from a hierarchical graph. Input for the algorithm is the domain specific knowledge of interconnection points. Graph morphological tools are used to extract the minimum cost graph which best segments the network.We give an extensive error analysis for the general case of line extraction. Our method is shown to be robust against gaps in lines, and against spurious vertices at lines, which we consider as the most prominent source of error in line detection. The method indicates detection confidence, thereby supporting error proof interpretation of the network functionality. The method is demonstrated to be applicable on a broad variety of line networks, including dashed lines. Hence, the proposed method yields a major step towards general line tracking algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号