期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Compact yet efficient hardware implementation of artificial neural networks with customized topology

Nadia Nedjah Rodrigo Martins da Silva Luiza de Macedo Mourelle 《Expert systems with applications》2012,39(10):9191-9206

There are several neural network implementations using either software, hardware-based or a hardware/software co-design. This work proposes a hardware architecture to implement an artificial neural network (ANN), whose topology is the multilayer perceptron (MLP). In this paper, we explore the parallelism of neural networks and allow on-the-fly changes of the number of inputs, number of layers and number of neurons per layer of the net. This reconfigurability characteristic permits that any application of ANNs may be implemented using the proposed hardware. In order to reduce the processing time that is spent in arithmetic computation, a real number is represented using a fraction of integers. In this way, the arithmetic is limited to integer operations, performed by fast combinational circuits. A simple state machine is required to control sums and products of fractions. Sigmoid is used as the activation function in the proposed implementation. It is approximated by polynomials, whose underlying computation requires only sums and products. A theorem is introduced and proven so as to cover the arithmetic strategy of the computation of the activation function. Thus, the arithmetic circuitry used to implement the neuron weighted sum is reused for computing the sigmoid. This resource sharing decreased drastically the total area of the system. After modeling and simulation for functionality validation, the proposed architecture synthesized using reconfigurable hardware. The results are promising. 相似文献

2.

OCEAN,a flexible adaptive Network-On-Chip for dynamic applications

Ludovic Devaux Sebastien Pillement 《Microprocessors and Microsystems》2014

The dynamic and partial reconfiguration of FPGAs enables the dynamic placement of applicatives tasks in reconfigurable zones. However, the dynamic management of the tasks impacts the communications since they are not present in the FPGA during all computation time. So, the task manager should ensure the allocation of each new task and their interconnection which is performed by a flexible interconnection network. In this article, various interconnection networks are studied. Each architecture is evaluated with respect to its suitability for the paradigm of the dynamic and partial reconfiguration in FPGA implementations. This study leads us to propose the OCEAN network that supports the communication constraints into the context of dynamic reconfigurations. Thanks to a generic platform allowing in situ characterizations of network performances, fair comparisons of various Networks-On-Chip can be realized. The FPGA and ASICs implementations of the OCEAN network are also discussed. 相似文献

3.

On resource allocation in multistage interconnection network-based systems

Eli Opper Miroslaw Malek 《Journal of Parallel and Distributed Computing》1984,1(2):206-220

A resource allocation problem in a reconfigurable multicomputer architecture based on rectangular banyan multistage interconnection network with arbitrary fanout and arbitrary number of levels is studied. Four commonly used problem structures such as ring, pipeline, broadcast, and macropipeline are introduced and the mapping problem of these structures on the system model, which is equivalent to the resource allocation problem, is discussed. Analytic solutions to several mapping questions are given and generalization of the results to other networks is presented. 相似文献

4.

FPGA Implementation of Neurocomputational Models: Comparison Between Standard Back-Propagation and C-Mantec Constructive Algorithm

Francisco Ortega-Zamorano José M. Jerez Gustavo E. Juárez Leonardo Franco 《Neural Processing Letters》2017,46(3):899-914

Recent advances in FPGA technology have permitted the implementation of neurocomputational models, making them an interesting alternative to standard PCs in order to speed up the computations involved taking advantage of the intrinsic FPGA parallelism. In this work, we analyse and compare the FPGA implementation of two neural network learning algorithms: the standard and well known Back-Propagation algorithm and C-Mantec, a constructive neural network algorithm that generates compact one hidden layer architectures with good predictive capabilities. One of the main differences between both algorithms is the fact that while Back-Propagation needs a predefined architecture, C-Mantec constructs its network while learning the input patterns. Several aspects of the FPGA implementation of both algorithms are analyzed, focusing in features like logic and memory resources needed, transfer function implementation, computation time, etc. The advantages and disadvantages of both methods in relationship to their hardware implementations are discussed. 相似文献

5.

An Adaptive Opto-electronic Neural Network for Associative Pattern Retrieval

《Journal of Parallel and Distributed Computing》1993,17(3):245-250

A novel adaptive trinary neural network model is proposed for associative pattern retrieval from incomplete data. Systematic analysis of the convergence mechanism for the character recognition problem is provided to illustrate the derivation of this novel adaptive thresholding scheme. The adaptive scheme with trinary input representation outperforms other associative retrieval schemes in terms of convergence and storage capacity. The inherent parallelism of this neural network architecture is exploited for a parallel optical implementation. The tremendous speed and free-space interconnection capability of optics results in a very efficient real-time character recognition system. The adaptive threshold scheme developed here may have far reaching implications for other neural networks in enhancing their learning and retrieval, speed, and accuracy. 相似文献

6.

A reconfigurable fuzzy neural network with in-situ learning

Pedrycz W. Hart Poskar C. Czerowski P.J. 《Micro, IEEE》1995,15(4):19-30

Our reconfigurable fuzzy processor (RFP) implements both aggregative and referential operations. Its architecture combines structural and parametric flexibility in a network implementing RFPs as a collection of fuzzy neurons. A fuzzy neural network using a bidirectionally linked series of shared buses facilitates a modular and scalable design environment for the RFP. An appropriate interface, separate from the RFP neuron itself, promotes the reuse of the neuron design with alternative interconnection networks 相似文献

7.

Intelligent-memory architecture for artificial neural networks

《Micro, IEEE》2002,22(3):32-40

Execution of artificial neural networks, especially for online pattern recognition, mainly depends on time-efficient execution of weighted sums. A new architecture achieves this goal, with a computation time superior to the time complexity of sequential von Neumann machines. This architecture uses additional logic to extend the functionality of conventional RAM. The authors discuss an implementation of this architecture that uses reconfigurable logic 相似文献

8.

A Cross-layer based mapping for spiking neural network onto network on chip

Yande Xiang 《International Journal of Parallel, Emergent and Distributed Systems》2018,33(5):526-544

Abstract

Network-on-Chip provides a packet-based and scalable inter-connected structure for spiking neural networks. However, existing neural mapping methods just distribute all neurons of a population into an on-chip network core or nearby cores sequentially. As there is no connection among population, the population based mapping degrades inter-neuron communicating performance between different cores. This paper presents a Cross-LAyer based neural MaPping method that maps synaptic connected neurons belonging to adjacent layers into the same on-chip network node. In order to adapt to various input patterns, the strategy also takes input spike rate into consideration and remap neurons for improving mapping efficiency. The method helps to reduce inter-core communication cost. The experimental results demonstrate the efficient results of the proposed mapping strategy in the aspect of spike transfer latency as well as dynamic energy cost improvement. In the applications of handwritten digits and edge extraction, in which the type of interconnection among neurons is different, the neural mapping algorithm reduces spike average transfer latency by maximum 42.83%, and reduces dynamic energy by maximum 36.29%. 相似文献

9.

Accelerating Molecular Dynamics Simulations with Reconfigurable Computers

Scrofano R. Gokhale M.B. Trouw F. Prasanna V.K. 《Parallel and Distributed Systems, IEEE Transactions on》2008,19(6):764-778

With advances in reconfigurable hardware, especially field-programmable gate arrays (FPGAs), it has become possible to use reconfigurable hardware to accelerate complex applications such as those in scientific computing. There has been a resulting development of reconfigurable computers, that is, computers that have both general-purpose processors and reconfigurable hardware, as well as memory and high-performance interconnection networks. In this paper, we describe the acceleration of molecular dynamics simulations with reconfigurable computers. We evaluate several design alternatives for the implementation of the application on a reconfigurable computer. We show that a single node accelerated with reconfigurable hardware, utilizing fine-grained parallelism in the reconfigurable hardware design, is able to achieve a speedup of about two times over the corresponding software-only simulation. We then parallelize the application and study the effect of acceleration on performance and scalability. Specifically, we study strong scaling, in which the problem size is fixed. We find that the unaccelerated version actually scales better, because it spends more time in computation than the accelerated version does. However, we also find that a cluster of P accelerated nodes gives better performance than a cluster of 2P unaccelerated nodes. 相似文献

10.

Parallel transport subsystem implementation for high-performance communication

T. Braun C. Schmidt 《Concurrency and Computation》1994,6(4):375-391

Requirements of emerging applications together with rapid changes in networking technology towards gigabit speeds require new adequate transport systems. Integrated designs of transport services, protocol architecture and implementation platforms are required by forthcoming applications in high-speed network environments. The transport subsystem PATROCLOS (parallel transport subsystem for cel/ based high-speed networks) is designed with special emphasis on a high degree of inherent parallelism to allow efficient implementations on multiprocessor architectures combined with specialized hardware for very time critical functions. The paper presents the new parallel protocol architecture of PATROCLOS, an appropriate implementation architecture based on transputer networks, and performance evaluation results, which indicate high throughput values. 相似文献

11.

一种神经网络硬件实现的可重构设计 总被引：1，自引：0，他引：1

万勇王沁李占才李昂《计算机应用》2006,26(1):202-0203

以BP网络为例,提出了一种可重构神经网络硬件实现方法。通过可重构体系结构、可重构部件的设计,可以灵活地实现不同规模、传递函数及学习方法的神经网络,从而搭建起神经网络快速硬件实现的平台。经过对一个模式识别问题的实现和测试,证明了这种设计方法的可行性。相似文献

12.

一种基于传输触发体系结构的可重构Hash函数处理器：TTAH

下载免费PDF全文

赵学秘王志英戴葵陆洪毅《计算机工程与科学》2007,29(3):66-69

Hash函数是密码学中保证数据完整性的有效手段,性能需求使得某些应用必须采用硬件实现。本文通过分析常用Hash函数在算法上的相似性设计出了专用可重构单元,并将这些可重构单元耦合到传输触发体系结构中,得到一种可重构Hash函数处理器TTAH。常用Hash算法在TTAH上的映射结果表明：与细粒度可重构结构相比,其速度快,资源利用率高;与ASIC相比,可以在额外开销增加较小的前提下有效地支持多种常用Hash函数。相似文献

13.

基于可重构阵列的CNN数据量化方法

朱家扬蒋林李远成宋佳刘帅《计算机应用研究》2024,41(4):1070-1076

针对卷积神经网络(CNN)模型中大量卷积操作,导致网络规模大幅增加,从而无法部署到嵌入式硬件平台,以及不同粒度数据与底层硬件结构不协调导致计算效率低的问题,基于项目组开发的可重构阵列处理器,面向支持多种位宽的运算单元,通过软硬件协同和可重构计算方法,采用KL(Kullback-Leibler)散度自定义量化阈值和随机取整进行截断处理的方式,寻找参数定长的最佳基点位置,设计支持多种计算粒度并行操作的指令及其卷积映射方案,并以此实现三种不同位宽的动态数据量化。实验结果表明,将权值与特征图分别量化到8 bit可以在准确率损失2%的情况下将模型压缩为原来的50%左右;将测试图像量化到三种位宽下进行硬件测试的加速比分别达到1.012、1.273和1.556,最高可缩短35.7%的执行时间和降低56.2%的访存次数,同时仅带来不足1%的相对误差,说明该方法可以在三种量化位宽下实现高效率的神经网络计算,进而达到硬件加速和模型压缩的目的。相似文献

14.

广义细胞自动机的结构及其硬件实现

帅典勋冯翔赵宏彬王兴《计算机学报》2004,27(11):1441-1450

该文作者曾提出了广义细胞自动机(GCA)的原理和并行算法．并且应用于网络快速包交换等动态优化问题．该文进一步讨论了这种新的广义细胞自动机的体系结构、算法的硬件实现及其电路设计。它们对于GCA的实际应用有重要意义．GCA结构不同于Hopfield神经网络(HNN)和细胞神经网络(CNN)，GCA由多层次多粒度宏细胞组成塔形结构．它具有多粒度的宏细胞动力学特征．相同粒度宏细胞之间没有交互，但不同粒度宏细胞之间存在一定程度的交互或反馈．分析和实验表明．在问题求解的优化性、实时性、硬件实现复杂性等方面．该文给出的GCA结构和硬件实现．与HNN和CNN相比有诸多优点．相似文献

15.

基于进化计算的神经网络设计与实现 总被引：15，自引：1，他引：14

何永勇褚福磊钟秉林《控制与决策》2001,16(3):257-262

基于进化算法可有产解决神经网络设计和实现中存在的一些问题,使网络具有更优的性能。在此对基于进化计算的神经网络设计和实现的研究内容及进展情况进行综述,讲座了网络实现的关键问题,包括网络权重的进化训练,网络结构进化设计,学习规则进化选取以及进化操作算子设计等,并分析了相关的研究和发展方向。相似文献

16.

Evolution in architectures and programming methodologies of coarse-grained reconfigurable computing

Bertil Svensson 《Microprocessors and Microsystems》2009,33(3):161-178

In order to meet the increased computational demands of, e.g., multimedia applications, such as video processing in HDTV, and communication applications, such as baseband processing in telecommunication systems, the architectures of reconfigurable devices have evolved to coarse-grained compositions of functional units or program controlled processors, which are operated in a coordinated manner to improve performance and energy efficiency.In this survey we explore the field of coarse-grained reconfigurable computing on the basis of the hardware aspects of granularity, reconfigurability, and interconnection networks, and discuss the effects of these on energy related properties and scalability. We also consider the computation models that are being adopted for programming of such machines, models that expose the parallelism inherent in the application in order to achieve better performance. We classify the coarse-grained reconfigurable architectures into four categories and present some of the existing examples of these categories. Finally, we identify the emerging trends of introduction of asynchronous techniques at the architectural level and the use of nano-electronics from technological perspective in the reconfigurable computing discipline. 相似文献

17.

The backpropagation algorithm on grid and hypercube architectures

Xiru Zhang Michael McKenna Jill P. Mesirov David L. Waltz 《Parallel Computing》1990,14(3):317-327

In this paper, we first describe a model for mapping the backpropagation artificial neural net learning algorithm onto a massively parallel computer architecture with a 2D-grid communications network. We then show how this model can be sped up by hypercube inter-processor connections that provide logarithmic time segmented parallel prefix operations. This approach can serve as a general model for implementing algorithms for layered neural nets on any massively parallel computers that have 2D-grid or hypercube communication networks.

We have implemented this model on the Connection Machine CM-2 — a general purpose, massively parallel computer with a hypercube topology. Initial tests show that this implementation offers about 180 million interconnections per second (IPS) for feed-forward computation and 40 million weight updates per second (WUPS) for learning. We use our model to evaluate this implementation: what machine-specific features have helped improve the performance and where further improvements can be made. 相似文献

18.

循环神经网络研究综述

刘建伟宋志妍《控制与决策》2022,37(11):2753-2768

循环神经网络是神经网络序列模型的主要实现形式,近几年得到迅速发展,其是机器翻译、机器问题回答、序列视频分析的标准处理手段,也是对于手写体自动合成、语音处理和图像生成等问题的主流建模手段.鉴于此,循环神经网络的各分支按照网络结构进行详细分类,大致分为3大类:一是衍生循环神经网络,这类网络是基于基本RNNs模型的结构衍生变体,即对RNNs的内部结构进行修改;二是组合循环神经网络,这类网络将其他一些经典的网络模型或结构与第一类衍生循环神经网络进行组合,得到更好的模型效果,是一种非常有效的手段;三是混合循环神经网络,这类网络模型既有不同网络模型的组合,又在RNNs内部结构上进行修改,是同属于前两类网络分类的结构.为了更加深入地理解循环神经网络,进一步介绍与循环神经网络经常混为一谈的递归神经网络结构以及递归神经网络与循环神经网络的区别和联系.在详略描述上述模型的应用背景、网络结构以及模型变种后,对各个模型的特点进行总结和比较,并对循环神经网络模型进行展望和总结. 相似文献

19.

Mapping computer-vision-related tasks onto reconfigurableparallel-processing systems

Siegel H.J. Armstrong J.B. Watson D.W. 《Computer》1992,25(2):54-63

A tutorial overview of how selected computer-vision-related algorithms can be mapped onto reconfigurable parallel-processing systems is presented. The reconfigurable parallel-processing system assumed for the discussions is a multiprocessor system capable of mixed-mode parallelism; that is, it can operate in either the SIMD or MIMD modes of parallelism and can dynamically switch between modes at instruction-level granularity with generally negligible overhead. In addition, it can be partitioned into independent or communicating submachines, each having the same characteristics as the original machine. Furthermore, this reconfigurable system model uses a flexible multistage cube interconnection network, which allows the connection patterns among the processors to be varied. It is demonstrated how reconfigurability can be used by reviewing and examining five computer-vision-related algorithms, each one emphasizing a different aspect of reconfigurability 相似文献

20.

Algorithmic transformations for neural computing and performance ofsupervised learning on a dataflow machine

Kim S.T. Suwunboriruksa K. Herath S. Jayasumana A. Herath J. 《IEEE transactions on pattern analysis and machine intelligence》1992,18(7):613-623

相似文献