首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper examines the inductive inference of a complex grammar with neural networks and specifically, the task considered is that of training a network to classify natural language sentences as grammatical or ungrammatical, thereby exhibiting the same kind of discriminatory power provided by the Principles and Parameters linguistic framework, or Government-and-Binding theory. Neural networks are trained, without the division into learned vs. innate components assumed by Chomsky (1956), in an attempt to produce the same judgments as native speakers on sharply grammatical/ungrammatical data. How a recurrent neural network could possess linguistic capability and the properties of various common recurrent neural network architectures are discussed. The problem exhibits training behavior which is often not present with smaller grammars and training was initially difficult. However, after implementing several techniques aimed at improving the convergence of the gradient descent backpropagation-through-time training algorithm, significant learning was possible. It was found that certain architectures are better able to learn an appropriate grammar. The operation of the networks and their training is analyzed. Finally, the extraction of rules in the form of deterministic finite state automata is investigated  相似文献   

2.
Determining the architecture of a neural network is an important issue for any learning task. For recurrent neural networks no general methods exist that permit the estimation of the number of layers of hidden neurons, the size of layers or the number of weights. We present a simple pruning heuristic that significantly improves the generalization performance of trained recurrent networks. We illustrate this heuristic by training a fully recurrent neural network on positive and negative strings of a regular grammar. We also show that rules extracted from networks trained with this pruning heuristic are more consistent with the rules to be learned. This performance improvement is obtained by pruning and retraining the networks. Simulations are shown for training and pruning a recurrent neural net on strings generated by two regular grammars, a randomly-generated 10-state grammar and an 8-state, triple-parity grammar. Further simulations indicate that this pruning method can have generalization performance superior to that obtained by training with weight decay.  相似文献   

3.
There has been an increased interest in combining fuzzy systems with neural networks because fuzzy neural systems merge the advantages of both paradigms. On the one hand, parameters in fuzzy systems have clear physical meanings and rule-based and linguistic information can be incorporated into adaptive fuzzy systems in a systematic way. On the other hand, there exist powerful algorithms for training various neural network models. However, most of the proposed combined architectures are only able to process static input-output relationships; they are not able to process temporal input sequences of arbitrary length. Fuzzy finite-state automats (FFAs) can model dynamical processes whose current state depends on the current input and previous states. Unlike in the case of deterministic finite-state automats (DFAs), FFAs are not in one particular state, rather each state is occupied to some degree defined by a membership function. Based on previous work on encoding DFAs in discrete-time second-order recurrent neural networks, we propose an algorithm that constructs an augmented recurrent neural network that encodes a FFA and recognizes a given fuzzy regular language with arbitrary accuracy. We then empirically verify the encoding methodology by correct string recognition of randomly generated FFAs. In particular, we examine how the networks' performance varies as a function of synaptic weight strengths  相似文献   

4.
Neural networks do not readily provide an explanation of the knowledge stored in their weights as part of their information processing. Until recently, neural networks were considered to be black boxes, with the knowledge stored in their weights not readily accessible. Since then, research has resulted in a number of algorithms for extracting knowledge in symbolic form from trained neural networks. This article addresses the extraction of knowledge in symbolic form from recurrent neural networks trained to behave like deterministic finite-state automata (DFAs). To date, methods used to extract knowledge from such networks have relied on the hypothesis that networks' states tend to cluster and that clusters of network states correspond to DFA states. The computational complexity of such a cluster analysis has led to heuristics that either limit the number of clusters that may form during training or limit the exploration of the space of hidden recurrent state neurons. These limitations, while necessary, may lead to decreased fidelity, in which the extracted knowledge may not model the true behavior of a trained network, perhaps not even for the training set. The method proposed here uses a polynomial time, symbolic learning algorithm to infer DFAs solely from the observation of a trained network's input-output behavior. Thus, this method has the potential to increase the fidelity of the extracted knowledge.  相似文献   

5.
Arithmetic coding is one of the most outstanding techniques for lossless data compression. It attains its good performance with the help of a probability model which indicates at each step the probability of occurrence of each possible input symbol given the current context. The better this model, the greater the compression ratio achieved. This work analyses the use of discrete-time recurrent neural networks and their capability for predicting the next symbol in a sequence in order to implement that model. The focus of this study is on online prediction, a task much harder than the classical offline grammatical inference with neural networks. The results obtained show that recurrent neural networks have no problem when the sequences come from the output of a finite-state machine, easily giving high compression ratios. When compressing real texts, however, the dynamics of the sequences seem to be too complex to be learned online correctly by the net.  相似文献   

6.
Financial forecasting is an example of a signal processing problem which is challenging due to small sample sizes, high noise, non-stationarity, and non-linearity. Neural networks have been very successful in a number of signal processing applications. We discuss fundamental limitations and inherent difficulties when using neural networks for the processing of high noise, small sample size signals. We introduce a new intelligent signal processing method which addresses the difficulties. The method proposed uses conversion into a symbolic representation with a self-organizing map, and grammatical inference with recurrent neural networks. We apply the method to the prediction of daily foreign exchange rates, addressing difficulties with non-stationarity, overfitting, and unequal a priori class probabilities, and we find significant predictability in comprehensive experiments covering 5 different foreign exchange rates. The method correctly predicts the directionof change for the next day with an error rate of 47.1%. The error rate reduces to around 40% when rejecting examples where the system has low confidence in its prediction. We show that the symbolic representation aids the extraction of symbolic knowledge from the trained recurrent neural networks in the form of deterministic finite state automata. These automata explain the operation of the system and are often relatively simple. Automata rules related to well known behavior such as tr end following and mean reversal are extracted.  相似文献   

7.
We explore a network architecture introduced by Elman (1990) for predicting successive elements of a sequence. The network uses the pattern of activation over a set of hidden units from time-step t-1, together with element t, to predict element t + 1. When the network is trained with strings from a particular finite-state grammar, it can learn to be a perfect finite-state recognizer for the grammar. When the net has a minimal number of hidden units, patterns on the hidden units come to correspond to the nodes of the grammar; however, this correspondence is not necessary for the network to act as a perfect finite-state recognizer. Next, we provide a detailed analysis of how the network acquires its internal representations. We show that the network progressively encodes more and more temporal context by means of a probability analysis. Finally, we explore the conditions under which the network can carry information about distant sequential contingencies across intervening elements to distant elements. Such information is maintained with relative ease if it is relevant at each intermediate step; it tends to be lost when intervening elements do not depend on it. At first glance this may suggest that such networks are not relevant to natural language, in which dependencies may span indefinite distances. However, embeddings in natural language are not completely independent of earlier information. The final simulation shows that long distance sequential contingencies can be encoded by the network even if only subtle statistical properties of embedded strings depend on the early information. The network encodes long-distance dependencies by shading internal representations that are responsible for processing common embeddings in otherwise different sequences. This ability to represent simultaneously similarities and differences between several sequences relies on the graded nature of representations used by the network, which contrast with the finite states of traditional automata. For this reason, the network and other similar architectures may be called Graded State Machines.  相似文献   

8.
《Knowledge》2005,18(4-5):135-141
This paper presents a novel connectionist memory-rule based model capable of learning the finite-state properties of an input language from a set of positive examples. The model is based upon an unsupervised recurrent self-organizing map with laterally interconnected neurons. A derivation of functional-equivalence theory is used that allows the model to exploit similarities between the future context of previously memorized sequences and the future context of the current input sequence. This bottom-up learning algorithm binds functionally related neurons together to form states. Results show that the model is able to learn the Reber grammar perfectly from a randomly generated training set and to generalize to sequences beyond the length of those found in the training set.  相似文献   

9.
Fuzzy neural systems have been a subject of great interest in the last few years, due to their abilities to facilitate the exchange of information between symbolic and subsymbolic domains. However, the models in the literature are not able to deal with structured organization of information, that is typically required by symbolic processing. In many application domains, the patterns are not only structured, but a fuzziness degree is attached to each subsymbolic pattern primitive. The purpose of this paper is to show how recursive neural networks, properly conceived for dealing with structured information, can represent nondeterministic fuzzy frontier-to-root tree automata. Whereas available prior knowledge expressed in terms of fuzzy state transition rules are injected into a recursive network, unknown rules are supposed to be filled in by data-driven learning. We also prove the stability of the encoding algorithm, extending previous results on the injection of fuzzy finite-state dynamics in high-order recurrent networks.  相似文献   

10.
We present a training approach using concepts from the theory of stochastic learning automata that eliminates the need for computation of gradients. This approach also offers the flexibility of tailoring a number of specific training algorithms based on the selection of linear and nonlinear reinforcement rules for updating automaton action probabilities. The training efficiency is demonstrated by application to two complex temporal learning scenarios, viz, learning of time-dependent continuous trajectories and feedback controller designs for continuous dynamical plants. For the first problem, it is shown that training algorithms can be tailored following the present approach for a recurrent neural net to learn to generate a benchmark circular trajectory more accurately than possible with existing gradient-based training procedures. For the second problem, it is shown that recurrent neural-network-based feedback controllers can be trained for different control objectives.  相似文献   

11.
Grammatical inference has been extensively studied in recent years as a result of its wide field of application, and in turn, recurrent neural networks have proved themselves to be a good tool for grammatical inference. The learning algorithms for these neural networks, however, have been far less studied than those for feed-forward neural networks. Classical training methods for recurrent neural networks suffer from being trapped in local minimal and having a high computational time. In addition, selecting the optimal size of a neural network for a particular application is a difficult task. This suggests that the problems of developing methods to determine optimal topologies and new training algorithms should be studied.In this paper, we present a multi-objective evolutionary algorithm which is able to determine the optimal size of recurrent neural networks in any particular application. This is specially analyzed in the case of grammatical inference: in particular, we study how to establish the optimal size of a recurrent neural network in order to learn positive and negative examples in a certain language, and how to determine the corresponding automaton using a self-organizing map once the training has been completed.  相似文献   

12.
We investigate possibilities of inducing temporal structures without fading memory in recurrent networks of spiking neurons strictly operating in the pulse-coding regime. We extend the existing gradient-based algorithm for training feedforward spiking neuron networks, SpikeProp (Bohte, Kok, & La Poutré, 2002), to recurrent network topologies, so that temporal dependencies in the input stream are taken into account. It is shown that temporal structures with unbounded input memory specified by simple Moore machines (MM) can be induced by recurrent spiking neuron networks (RSNN). The networks are able to discover pulse-coded representations of abstract information processing states coding potentially unbounded histories of processed inputs. We show that it is often possible to extract from trained RSNN the target MM by grouping together similar spike trains appearing in the recurrent layer. Even when the target MM was not perfectly induced in a RSNN, the extraction procedure was able to reveal weaknesses of the induced mechanism and the extent to which the target machine had been learned.  相似文献   

13.
The paper focuses on methods for injecting prior knowledge into adaptive recurrent networks for sequence processing. In order to increase the flexibility needed for specifying partially known rules, a nondeterministic approach for modelling domain knowledge is proposed. The algorithms presented in the paper allow time-warping nondeterministic automata to be mapped into recurrent architectures with first-order connections. These kinds of automata are suitable for modeling temporal scale distortions in data such as acoustic sequences occurring in problems of speech recognition. The algorithms output a recurrent architecture and a feasible region in the connection weight space. It is demonstrated that, as long as the weights are constrained into the feasible region, the nondeterministic rules introduced using prior knowledge are not destroyed by learning. The paper focuses primarily on architectural issues, but the proposed method allows the connection weights to be subsequently tuned to adapt the behavior of the network to data.  相似文献   

14.
Recurrent neural networks processing symbolic strings can be regarded as adaptive neural parsers. Given a set of positive and negative examples, picked up from a given language, adaptive neural parsers can effectively be trained to infer the language grammar. In this paper we use adaptive neural parsers to face the problem of inferring grammars from examples that are corrupted by a kind of noise that simply changes their membership. We propose a training algorithm, referred to as hybrid finite state filter, which is based on a parsimony principle that penalizes the development of complex rules. We report very promising experimental results showing that the proposed inductive inference scheme is indeed capable of capturing rules, while removing noise.  相似文献   

15.
针对目前藏文文本自动查错方法的不足,该文提出了一种基于规则和统计相结合的自动查错方法.首先以藏文拼写文法为基础,结合形式语言与自动机理论,构造37种确定型有限自动机识别现代藏文字;然后利用查找字典的方法识别梵音藏文字;最后利用互信息和t-测试差等统计方法查找藏语词语搭配错误和语法错误等真字词错误,实现藏文文本的自动查错...  相似文献   

16.
Extracting rules from trained neural networks   总被引:11,自引:0,他引:11  
Presents an algorithm for extracting rules from trained neural networks. The algorithm is a decompositional approach which can be applied to any neural network whose output function is monotone such as a sigmoid function. Therefore, the algorithm can be applied to multilayer neural networks, recurrent neural networks and so on. It does not depend on training algorithms, and its computational complexity is polynomial. The basic idea is that the units of neural networks are approximated by Boolean functions. But the computational complexity of the approximation is exponential, and so a polynomial algorithm is presented. The author has applied the algorithm to several problems to extract understandable and accurate rules. The paper shows the results for the votes data, mushroom data, and others. The algorithm is extended to the continuous domain, where extracted rules are continuous Boolean functions. Roughly speaking, the representation by continuous Boolean functions means the representation using conjunction, disjunction, direct proportion, and reverse proportion. This paper shows the results for iris data.  相似文献   

17.
Recurrent neural networks have been successfully used for analysis and prediction of temporal sequences. This paper is concerned with the convergence of a gradient-descent learning algorithm for training a fully recurrent neural network. In literature, stochastic process theory has been used to establish some convergence results of probability nature for the on-line gradient training algorithm, based on the assumption that a very large number of (or infinitely many in theory) training samples of the temporal sequences are available. In this paper, we consider the case that only a limited number of training samples of the temporal sequences are available such that the stochastic treatment of the problem is no longer appropriate. Instead, we use an off-line gradient training algorithm for the fully recurrent neural network, and we accordingly prove some convergence results of deterministic nature. The monotonicity of the error function in the iteration is also guaranteed. A numerical example is given to support the theoretical findings.  相似文献   

18.
In this Letter, a new approach to build a neural model for the fast identification of spatiotemporal sequences is proposed. Such a model, the Stochastic Neural Sequence Identifier (SNSI), is simple and rapidly learns and identifies a given sequence. The SNSI receives as input several patterns belonging to a particular spatiotemporal sequence and produces as output a label for the sequence identified and a probability of this classification being correct. The SNSI is able to identify a sequence from patterns learned during training or novel ones, i.e., combinations of the sequence items distinct from those belonging to the trained set. The SNSI was tested on a 2D set of both closed and open trajectories with varying levels of complexity. The results suggest that the SNSI is able to recognize all the patterns presented in the training and most of the novel patterns used for testing.  相似文献   

19.
Concerns the effect of noise on the performance of feedforward neural nets. We introduce and analyze various methods of injecting synaptic noise into dynamically driven recurrent nets during training. Theoretical results show that applying a controlled amount of noise during training may improve convergence and generalization performance. We analyze the effects of various noise parameters and predict that best overall performance can be achieved by injecting additive noise at each time step. Noise contributes a second-order gradient term to the error function which can be viewed as an anticipatory agent to aid convergence. This term appears to find promising regions of weight space in the beginning stages of training when the training error is large and should improve convergence on error surfaces with local minima. The first-order term is a regularization term that can improve generalization. Specifically, it can encourage internal representations where the state nodes operate in the saturated regions of the sigmoid discriminant function. While this effect can improve performance on automata inference problems with binary inputs and target outputs, it is unclear what effect it will have on other types of problems. To substantiate these predictions, we present simulations on learning the dual parity grammar from temporal strings for all noise models, and present simulations on learning a randomly generated six-state grammar using the predicted best noise model.  相似文献   

20.
《Applied Soft Computing》2007,7(1):353-363
For a supervised learning method, the quality of the training data or the training supervisor is very important in generating reliable neural networks. However, for real world problems, it is not always easy to obtain high quality training data sets. In this research, we propose a learning method for a neural network ensemble model that can be trained with an imperfect training data set, which is a data set containing erroneous training samples. With a competitive training mechanism, the ensemble is able to exclude erroneous samples from the training process, thus generating a reliable neural network. Through the experiment, we show that the proposed model is able to tolerate the existence of erroneous training samples in generating a reliable neural network.The ability of the neural network to tolerate the existence of erroneous samples in the training data lessens the costly task of analyzing and arranging the training data, thus increasing the usability of the neural networks for real world problems.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号