共查询到20条相似文献,搜索用时 15 毫秒
1.
基于CUDA架构在GPU上实现了神经网络前向传播算法,该算法利用神经网络各层内神经元计算的并行性,每层使用一个Kernel函数来并行计算该层神经元的值,每个Kernel函数都根据神经网络的特性和CUDA架构的特点进行优化。实验表明,该算法比普通的CPU上的算法快了约7倍。研究结果对于提高神经网络的运算速度以及CUDA的适用场合都有参考价值。 相似文献
2.
Masoud Yaghini Mohammad M. Khoshraftar Mehdi Fallahi 《Engineering Applications of Artificial Intelligence》2013,26(1):293-301
Artificial neural network (ANN) training is one of the major challenges in using a prediction model based on ANN. Gradient based algorithms are the most frequent training algorithms with several drawbacks. The aim of this paper is to present a method for training ANN. The ability of metaheuristics and greedy gradient based algorithms are combined to obtain a hybrid improved opposition based particle swarm optimization and a back propagation algorithm with the momentum term. Opposition based learning and random perturbation help population diversification during the iteration. Use of time-varying parameter improves the search ability of standard PSO, and constriction factor guarantees particles convergence. Since several contingent local minima conditions may happen in the weight space, a new cross validation method is proposed to prevent overfitting. Effectiveness and efficiency of the proposed method are compared with several other famous ANN training algorithms on the various benchmark problems. 相似文献
3.
Presents a constructive algorithm for training cooperative neural-network ensembles (CNNEs). CNNE combines ensemble architecture design with cooperative training for individual neural networks (NNs) in ensembles. Unlike most previous studies on training ensembles, CNNE puts emphasis on both accuracy and diversity among individual NNs in an ensemble. In order to maintain accuracy among individual NNs, the number of hidden nodes in individual NNs are also determined by a constructive approach. Incremental training based on negative correlation is used in CNNE to train individual NNs for different numbers of training epochs. The use of negative correlation learning and different training epochs for training individual NNs reflect CNNEs emphasis on diversity among individual NNs in an ensemble. CNNE has been tested extensively on a number of benchmark problems in machine learning and neural networks, including Australian credit card assessment, breast cancer, diabetes, glass, heart disease, letter recognition, soybean, and Mackey-Glass time series prediction problems. The experimental results show that CNNE can produce NN ensembles with good generalization ability. 相似文献
4.
This note presents a new method of parameter estimation, called cascading, for use in adaptive control. The algorithm is shown to be superior to a simple recursive least-squares estimator especially for a system characterized by noisy measurements. The algorithm can be implemented easily on a parallel processor such as ORAC [1], [2] or any sequential processor. When the algorithm is implemented on a parallel processor such as ORAC the real time used to compute the parameter estimates is of the same order as a recursive least-squares estimator. 相似文献
5.
Lixin Zhan 《Computer Physics Communications》2008,179(5):339-344
The Wang-Landau algorithm is a flat-histogram Monte Carlo method that performs random walks in the configuration space of a system to obtain a close estimation of the density of states iteratively. It has been applied successfully to many research fields. In this paper, we propose a parallel implementation of the Wang-Landau algorithm on computers of shared memory architectures by utilizing the OpenMP API for distributed computing. This implementation is applied to Ising model systems with promising speedups. We also examine the effects on the running speed when using different strategies in accessing the shared memory space during the updating procedure. The allowance of data race is recommended in consideration of the simulation efficiency. Such treatment does not affect the accuracy of the final density of states obtained. 相似文献
6.
A multilayer neural networks training algorithm that minimizes the probability of classification error is proposed. The claim is made that such an algorithm possesses some clear advantages over the standard backpropagation (BP) algorithm. The convergence analysis of the proposed procedure is performed and convergence of the sequence of criterion realizations with probability of one is proven. An experimental comparison with the BP algorithm on three artificial pattern recognition problems is given. 相似文献
7.
Le Ye Nanehkaran Y. A. Mwakapesa Deborah Simon Zhang Ruipeng Yi Jianbing Mao Yimin 《The Journal of supercomputing》2022,78(3):3791-3813
The Journal of Supercomputing - Deep convolutional neural networks (DCNNs) have been successfully used in many computer visions task. However, with the increasing complexity of the network and... 相似文献
8.
A new algorithm is presented for training of multilayer feedforward neural networks by integrating a genetic algorithm with an adaptive conjugate gradient neural network learning algorithm. The parallel hybrid learning algorithm has been implemented in C on an MIMD shared memory machine (Cray Y-MP8/864 supercomputer). It has been applied to two different domains, engineering design and image recognition. The performance of the algorithm has been evaluated by applying it to three examples. The superior convergence property of the parallel hybrid neural network learning algorithm presented in this paper is demonstrated. 相似文献
9.
We derive cost formulae for three different parallelisation techniques for training both supervised and unsupervised networks. These formulae are parameterised by properties of the target computer architecture. It is therefore possible to decide both which technique is best for a given parallel computer, and which parallel computer best suits a given technique. One technique, exemplar parallelism, is far superior to almost all parallel computer architectures. Formulae also take into account optimal batch learning as the overall training approach. Cost predictions are made for several of today's popular parallel computers. 相似文献
10.
R. Andonie A. T. Chronopoulos D. Grosu H. Galmeanu 《Concurrency and Computation》2006,18(12):1559-1573
The focus of this study is how we can efficiently implement the neural network backpropagation algorithm on a network of computers (NOC) for concurrent execution. We assume a distributed system with heterogeneous computers and that the neural network is replicated on each computer. We propose an architecture model with efficient pattern allocation that takes into account the speed of processors and overlaps the communication with computation. The training pattern set is distributed among the heterogeneous processors with the mapping being fixed during the learning process. We provide a heuristic pattern allocation algorithm minimizing the execution time of backpropagation learning. The computations are overlapped with communications. Under the condition that each processor has to perform a task directly proportional to its speed, this allocation algorithm has polynomial‐time complexity. We have implemented our model on a dedicated network of heterogeneous computers using Sejnowski's NetTalk benchmark for testing. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献
11.
We consider the FJ‖C max problem of optimal servicing with respect to performance for a given set of jobs by sequential and parallel machines. The problem FJ‖C max is a generalization of the classical J‖C max problem for the case when the servicing system has not only sequential but also parallel (identical) machines. We propose a two-stage algorithm for a heuristic solution of problem FJ‖C max On the first stage, we solve the problem J‖C max, i.e., we assume that the servicing system does not have parallel machines. On the second stage, operations are distributed over parallel machines. On both stages of the algorithm, we use neural network decision making models. The efficiency of a neural network algorithm for the problem J‖C max and problem FJ‖C max was evaluated on 20 test examples obtained from 20 known J‖C max problems by including into the servicing system a random number of copies of sequential machines. 相似文献
12.
V. A. Svetlov I. G. Persiantsev J. S. Shugay S. A. Dolenko 《Optical Memory & Neural Networks》2015,24(4):288-294
The article presents development of the algorithm of adaptive construction of hierarchical neural network classifiers based on automatic modification of the desired output of perceptrons with a small number of neurons in the single hidden layer. The conducted testing of the new program implementation of this approach demonstrated that the considered algorithm was more computationally efficient and provided higher quality of solution of multiple classification problems in comparison with standard multi-layer perceptron. 相似文献
13.
Ioannis K. Konstantopoulos Panos J. Antsaklis 《Journal of Intelligent and Robotic Systems》1995,12(3):197-228
Designing controllers with diagnostic capabilities is important as in a feedback control system, detection and isolation of failures is generally affected by the particular control law used. Therefore, a common approach to control and failure diagnosis problems has significant merit. Controllers capable of performing failure diagnosis have additional diagnostic outputs to detect and isolate sensor and actuator faults. A linear such controller is usually called a four-parameter controller. Neural networks have proved to be a very powerful tool in the control systems area, where they have been used in the modelling and control of dynamical systems. In this paper, a neural network model of a controller with diagnostic capabilities (CDC) is presented for the first time. This nonlinear neural controller is trained to operate as a traditional controller, while at the same time it provides reproduction of the failure occurring either at the actuator or the sensor. The cases of actuator and sensor failure are studied independently. The validity of the results is verified by extensive simulations.A version of this paper under the title The Four-Parameter Controller. A Neural Network Implementation was presented at the IEEE Mediterranean Symposium on New Directions in Control Theory and Applications, Chania, Crete, Greece, June 21–23, 1993. 相似文献
14.
15.
As parallel machines become more widely available, many existing algorithms are being converted to take advantage of the improved speed offered by such computers. However, the method by which the algorithm is distributed is crucial towards obtaining the speed-ups required for many real-time tasks. This paper presents three parallel implementations of the Douglas—Peucker line simplification algorithm on a Sequent Symmetry computer and compares the performance of each with the original sequential algorithm. 相似文献
16.
《Computing Systems in Engineering》1995,6(4-5):409-414
One of the most-used rendering algorithms in Computer Graphics is the Ray-Tracing. The “standard” (Whited like) Ray-Tracing is a good rendering algorithm but with a drawback: the time necessary to produce an image is too large (several hours of CPU time are necessary to make a good picture of a moderately sophisticated 3D scene) and the image is only ready to be observed at the end of processing. This kind of situation is difficult to accept in systems where interactivity is the first goal. “Increasing Realism” in Ray-Tracing tries to avoid the problem by supplying the user with a preview of the final image. This preview can be calculated in a considerably shorter time but permits that, with some margin of error, the user can imagine (even see, sometimes) some final effects. With more processing time the image quality continues improving without loss of previous results. The user can, at any time, interrupt the session if the image does not match what he wants. Simultaneously with the above idea, it is necessary to accelerate image production. Parallelism is then justified by the need of more processing power. The aim of this text is to describe the Interactive Ray-Tracing Algorithm implementation, using a parallel architecture based on Transputers. An overview of the architecture used is presented and the main parallel processes and related problems are discussed. 相似文献
17.
结合RBF神经网络和纠错编码技术,提出了一种把经过混沌加密的图像水印嵌入小波域,并实现盲检测的新型水印算法.首先,对原始图像进行小波分解得到各子带的小波系数;其次,通过密钥选择在小波系数中嵌入水印的起始位置,并对该小波系数进行量化处理作为RBF神经网络模型的输入值;最后,用混沌加密技术和纠错编码技术对待嵌入的水印信息进行预处理以增强水印系统的安全性和鲁棒性,把处理后的水印信息嵌入到经过RBF神经网络模型处理后的小波系数中.实验表明,该算法人类视觉掩蔽性良好,同时对于诸如JEPG压缩、椒盐噪声、滤波等常见图像处理攻击的鲁棒性达到了预期的效果. 相似文献
18.
A two-stage algorithm combining the advantages of adaptive genetic algorithm and modified Newton method is developed for effective training in feedforward neural networks. The genetic algorithm with adaptive reproduction, crossover, and mutation operators is to search for initial weight and bias of the neural network, while the modified Newton method, similar to BFGS algorithm, is to increase network training performance. The benchmark tests show that the two-stage algorithm is superior to many conventional ones: steepest descent, steepest descent with adaptive learning rate, conjugate gradient, and Newton-based methods and is suitable to small network in engineering applications. In addition to numerical simulation, the effectiveness of the two-stage algorithm is validated by experiments of system identification and vibration suppression. 相似文献
19.
An efficient parallel implementation of a nonparaxial beam propagation method for the numerical study of the nonlinear Helmholtz equation is presented. Our solution focuses on minimizing communication and computational demands of the method which are dependent on a nonparaxiality parameter. Performance tests carried out on different types of parallel systems behave according theoretical predictions and show that our proposal exhibits a better behavior than those solutions based on the use of conventional parallel fast Fourier transform implementations. The application of our design is illustrated in a particularly demanding scenario: the study of dark solitons at interfaces separating two defocusing Kerr media, where it is shown to play a key role. 相似文献
20.
P. Yu. Izotov N. L. Kazanskiy D. L. Golovashkin S. V. Sukhanov 《Optical Memory & Neural Networks》2011,20(2):98-106
Using a convolutional neural network as an example, we discuss specific aspects of implementing a learning algorithm of pattern
recognition on the GPU graphics card using NVIDIA CUDA architecture. The training time of the neural network on a video-adapter
is decreased by a factor of 5.96 and the recognition time of a test set is decreased by a factor of 8.76 when compared with
the implementation of an optimized algorithm on a central processing unit (CPU). We show that the implementation of the neural
network algorithms on graphics processors holds promise. 相似文献