共查询到20条相似文献,搜索用时 10 毫秒
1.
P. S NEELAKANTA 《连接科学》1996,8(1):79-114
Conventionally, the square error (SE) and/or the relative entropy (RE) error are used as a cost function to be minimized in training neural networks via optimization algorithms. While the aforesaid error measures are deduced directly from the parameter values (such as the output and the teacher values of the network), an alternative approach is to elucidate an error measure from the information (or negentropy) content associated with such parameters. That is, a cost-function-based optimization can be specified in the information-theoretic plane in terms of generalized maximum and/or minimum entropy considerations associated with the network. A set of minimum cross-entropy (or mutual information) error measures, known as Csiszar's measures, are deduced in terms of probabilistic attributes of the 'guess' (output) and 'true' (teacher) value parameters pertinent to neural network topologies. Their relative effectiveness in training a neural network optimally towards convergence (by realizing a predicted output close to the teacher function) is discussed with simulated results obtained from a test multi-layer perceptron. The Csiszar family of error measures indicated in this paper offers an alternative set of error functions defined over a training set which can be adopted towards gradient-descent learnings in neural networks using the backpropagation algorithm in lieu of the conventional SE and/or RE error measures. Relevant pros and cons of using Csiszar's error measures are discussed. 相似文献
2.
JOHN K. KRUSCHKE 《连接科学》1993,5(1):3-36
Backpropagation (Rumelhart et al., 1986) was proposed as a general learning algorithm for multi-layer perceptrons. This article demonstrates that a standard version of backprop fails to attend selectively to input dimensions in the same way as humans, suffers catastrophic forgetting of previously learned associations when novel exemplars are trained, and can be overly sensitive to linear category boundaries. Another connectionist model, ALCOVE (Kruschke 1990, 1992), does not suffer those failures. Previous researchers identified these problems; the present article reports quantitative fits of the models to new human learning data. ALCOVE can be functionally approximated by a network that uses linear-sigmoid hidden nodes, like standard backprop. It is argued that models of human category learning should incorporate quasi-local representations and dimensional attention learning, as well as error-driven learning, to address simultaneously all three phenomena. 相似文献
3.
Creation of groups in Networks of Excellence (NoEs) based on knowledge mapping and expertise is a set covering problem known to be non-polynomial. Therefore it is usually approached by heuristic methods which yield good but not necessarily optimal coverage. Selecting teams to form a group within NoEs that are comprised of tens of teams can also be formulated and solved as an integer linear programming (ILP) problem whose solution is guaranteed to be optimal. This paper presents the ILP solution for team selection with typical objective functions. Several genetic algorithm-based methods are also compared to the optimal solution in terms of convergence (time and solution). The compared methods differ in selecting next-generation population schemes. The plain vanilla method is shown to be superior to both the roulette-based and the SUS methods. 相似文献
4.
Backpropagation learning (BP) is known for its serious limitations in generalizing knowledge from certain types of learning material. In this paper, we describe a new learning algorithm, BP-SOM, which overcomes some of these limitations as is shown by its application to four benchmark tasks. BP-SOM is a combination of a multi-layered feedforward network (MFN) trained with BP and Kohonen's self-organizing maps (SOMs). During the learning process, hidden-unit activations of the MFN are presented as learning vectors to SOMs trained in parallel. The SOM information is used when updating the connection weights of the MFN in addition to standard error backpropagation. The effect of the augmented error signal is that, during learning, clusters of hiddenunit activation patterns of instances associated with the same class tend to become highly similar. In a number of experiments, BP-SOM is shown (i) to improve generalization performance (i.e. avoid overfitting); (ii) to increase the amount of hidden units that can be pruned without loss of generalization performance and (iii) to provide a means for automatic rule extraction from trained networks. The results are compared with results achieved by two other learning algorithms for MFNs: conventional BP and BP augmented with weight decay. From the experiments and the comparisons, we conclude that the hybrid BP-SOM architecture, in which supervised and unsupervised and learning co-operate in finding adequate hidden-layer representations, successfully combines the advantages of supervised and unsupervised learning. 相似文献
5.
A number of recent simulation studies have shown that when feedforward neural nets are trained, using backpropagation, to memorize sets of items in sequential blocks and without negative exemplars, severe retroactive interference or catastrophic forgetting results. Both formal analysis and simulation studies are employed here to show why and under what circumstances such retroactive interference arises. The conclusion is that, on the one hand, approximations to 'ideal' network geometries can entirely alleviate interference if the training data sets have been generated from a learnable function (not arbitrary pattern associations). All that is required is either a representative training set or enough sequential memory sets. However, this elimination of interference comes with cost of a breakdown in discrimination between input patterns that have been learned and those that have not: catastrophic remembering. On the other hand, localized geometries for subfunctions eliminate the discrimination problem but are easily disrupted by new training sets and thus cause catastrophic interference. The paper concludes with a formally guaranteed solution to the problems of interference and discrimination. This is the Hebbian Autoassociative Recognition Memory (HARM) model which is essentially a neural net implementation of a simple look-up table. Although it requires considerable memory resources, when used as a yardstick with which to evaluate other proposed solutions, it uses the same or less resources. 相似文献
6.
ISTVAN S. N. BERKELEY 《连接科学》1995,7(2):167-187
A particular backpropagation network, called a network of value units, was trained to detect problem type and validity of a set of logic problems. This network differs from standard networks in using a Gaussian activation function. After training was successfully completed, jittered density plots were computed for each hidden unit, and used to represent the distribution of activations produced in each hidden unit by the entire training set. The density plots revealed a marked banding. Further analysis revealed that almost all of these bands could be assigned featural interpretations, and played an important role in explaining how the network classified input patterns. These results are discussed in the context of other techniques for analyzing network structure, and in the context of other parallel distributed processing architectures. 相似文献
7.
JURGEN SCHMIDHUBER 《连接科学》1989,1(4):403-412
Most known learning algorithms for dynamic neural networks in non-stationary environments need global computations to perform credit assignment. These algorithms either are not local in time or not local in space. Those algorithms which are local in both time and space usually cannot deal sensibly with ‘hidden units’. In contrast, as far as we can judge, learning rules in biological systems with many ‘hidden units’ are local in both space and time. In this paper we propose a parallel on-line learning algorithms which performs local computations only, yet still is designed to deal with hidden units and with units whose past activations are ‘hidden in time’. The approach is inspired by Holland's idea of the bucket brigade for classifier systems, which is transformed to run on a neural network with fixed topology. The result is a feedforward or recurrent ‘neural’ dissipative system which is consuming ‘weight-substance’ and permanently trying to distribute this substance onto its connections in an appropriate way. Simple experiments demonstrating the feasibility of the algorithm are reported. 相似文献
8.
PETER FLETCHER 《连接科学》1992,4(2):125-141
Networks which add and remove nodes need a very clear idea of the role of each node and how it contributes to global performance. This paper presents two approaches to the theory of unsupervised node growth in self-configuring networks: the first sees the function of a node as constraining the probability distribution of dream states and the second sees the node as a non-terminal in a grammar describing the input patterns. I also present two ways of deciding whether to remove nodes. The first estimates the contribution of any desired set of nodes and may be applied to any number of sets simultaneously; the second is precise but may only be applied to one set of nodes at a time. Both are locally implementable. Finally, I discuss the feature language implicit in self-configuring networks and how this constrains what is learnable. 相似文献
9.
10.
Production in Networks 总被引:1,自引:0,他引:1
H.-P. WiendahlS. Lutz 《CIRP Annals》2002,51(2):573-586
New types of cooperation between companies in the manufacturing sector are coming into being. Since nowadays the creation of and involvement in supply chains is for most companies standard practice, new forms of cooperation are now emerging: production networks. The paper describes current developments in the field of production networks along with techniques and methods for their operation and management. 相似文献
11.
12.
结合云铜冶炼加工总厂的工程实例,对钢平台中梁与柱的连接设计进行探讨性研究,摸索出适用于工业化生产的、科学的、合理的连接方式及设计要点,将钢平台中的梁柱连接采用铰接的方式连接,传力明确,简便经济可行。 相似文献
13.
JAAP M. J MURRE 《连接科学》1996,8(2):249-258
The Osgood surface for transfer in human associative learning is introduced (Osgood, 1949). It describes the relationship between stimulus and response similarities and transfer of learning. In this paradigm, first a list A is learned, then a list B, followed by retesting on list A. Simulation results indicate that three-layer networks with backpropagation do not only show 'catastrophic interference' but also 'hypertransfer'. Two-layer networks do not suffer from this. Hypertransfer is explained with reference to hidden-layer representations formed during learning. Since it cannot account for this very general trait of human behavior, backpropagation's role as a tool for models of human memory must be watched very carefully. 相似文献
14.
In this paper, we investigate generalization in supervised feedforward Sigma-pi nets with particular reference to means of augmentation of generalization of the network for specific tasks. The work was initiated because logical (digital) neural networks of this type do not function in the same manner as the more normal semi-linear unit, hence the general principle behind Sigma-pi networks generalization required examination, to enable one to put forward means of augmenting their generalization abilities. The paper studies four methods, two of which are novel methodologies for enhancing Sigma-pi networks generalization abilities. The networks are hardware realizable and the Sigma-pi units are logical (digital) nodes that respond to their input patterns in addressable locations, the locations (site-values) then define the probability of the output being a logical ‘1’. In this paper, we evaluate the performance of Sigma-pi nets with perceptual problems (in pattern recognition). This was carried out by comparative studies, to evaluate how each of the methodologies improved the performance of these networks on previously unseen stimuli. 相似文献
15.
16.
17.
This paper deals with the problem of variable binding in connectionist networks. Specifically, a more thorough solution to the variable binding problem based on the Discrete Neuron formalism is proposed and a number of issues arising in the solution are examined in relation to logic: consistency checking, binding generation, unification and functions. We analyze what is needed in order to resolve these issues and, based on this analysis, a procedure is developed for systematically setting up connectionist networks for variable binding based on logic rules. This solution compares favorably to similar solutions in simplicity and completeness. 相似文献
18.
19.
We introduce a new connectionist paradigm which views neural networks as implementations of syntactic pattern recognition algorithms. Thus, learning is seen as a process of grammatical inference and recognition as a process of parsing. Naturally, the possible realizations of this theme are diverse; in this paper we present some initial explorations of the case where the pattern grammar is context-free, inferred (from examples) by a separate procedure, and then mapped onto a connectionist paper. Unlike most neural networks for which structure is pre-defined, the resulting network has as many levels as are necessary and arbitrary connections between levels. Furthermore, by the addition of a delay element, the network becomes capable of dealing with time-varying patterns in a simple and efficient manner. Since grammatical inference algorithms are notoriously expensive computationally, we place an important restriction on the type of context-free grammars which can be inferred. This dramatically reduces complexity. The resulting grammars are called ‘strictly-hierarchical’ and map straightforwardly onto a temporal connectionist parser (TCP) using a relatively small number of neurons. The new paradigm is applicable to a variety of pattern-processing tasks such as speech recognition and character recognition. We concentrate here on hand-written character recognition; performance in other problem domains will be reported in future publications. Results are presented to illustrate the performance of the system with respect to a number of parameters, namely, the inherent variability of the data, the nature of the learning (supervised or unsupervised) and the details of the clustering procedure used to limit the number of non-terminals inferred. In each of these cases (eight in total), we contrast the performance of a stochastic and a non-stochastic TCP. The stochastic TCP does have greater powers of discrimination, but in many cases the results were very similar. If this result holds in practical situations it is important, because the non-stochastic version has a straightforward implementation in silicon. 相似文献