期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Feedforward Neural Network Implementation in FPGA Using Layer Multiplexing for Effective Resource Utilization 总被引：1，自引：0，他引：1

Himavathi S. Anitha D. Muthuramalingam A. 《Neural Networks, IEEE Transactions on》2007,18(3):880-888

This paper presents a hardware implementation of multilayer feedforward neural networks (NN) using reconfigurable field-programmable gate arrays (FPGAs). Despite improvements in FPGA densities, the numerous multipliers in an NN limit the size of the network that can be implemented using a single FPGA, thus making NN applications not viable commercially. The proposed implementation is aimed at reducing resource requirement, without much compromise on the speed, so that a larger NN can be realized on a single chip at a lower cost. The sequential processing of the layers in an NN has been exploited in this paper to implement large NNs using a method of layer multiplexing. Instead of realizing a complete network, only the single largest layer is implemented. The same layer behaves as different layers with the help of a control block. The control block ensures proper functioning by assigning the appropriate inputs, weights, biases, and excitation function of the layer that is currently being computed. Multilayer networks have been implemented using Xilinx FPGA "XCV400hq240." The concept used is shown to be very effective in reducing resource requirements at the cost of a moderate overhead on speed. This implementation is proposed to make NN applications viable in terms of cost and speed for online applications. An NN-based flux estimator is implemented in FPGA and the results obtained are presented 相似文献

2.

A Resource-Efficient Communication Architecture for Chip Multiprocessors on FPGAs

下载免费PDF全文

Maggie Swetha Thota 《计算机科学技术学报》2011,26(3):434-447

Significant advances in field-programmable gate arrays (FPGAs) have made it viable to explore innovative multiprocessor solutions on a single FPGA chip.For multiprocessors,an efficient communication network that matches the needs of the target application is always critical to the overall performance.Wormhole packet-switching network-on-chip (NoC) solutions are replacing conventional shared buses to deal with scalability and complexity challenges coming along with the increasing number of processing elements (PEs).However,the quest for high performance networks has led to very complex and resource-expensive NoC designs,leaving little room for the real computing force,i.e.,PEs.Moreover,many techniques offer very small performance gains or none at all when network traffic is light while increasing the resource usage of routers.We argue that computation is still the primary task of multiprocessors and sufficient resources should be reserved for PEs.This paper presents our novel design and implementation of a resource-efficient communication network for multiprocessors on FPGAs.We reduce not only the required number of routers for a given number of PEs by introducing a new PE-router topology,but also the resource requirement of each router.Our communication network relies on the NEWS channels to transfer packets in a pipelined fashion following the path determined by the routing network.The implementation results on various Xilinx FPGAs show good performance in the typical range of network load for multiprocessor applications. 相似文献

3.

An analog silicon retina with multichip configuration 总被引：1，自引：0，他引：1

Kameda S. Yagi T. 《Neural Networks, IEEE Transactions on》2006,17(1):197-210

The neuromorphic silicon retina is a novel analog very large scale integrated circuit that emulates the structure and the function of the retinal neuronal circuit. We fabricated a neuromorphic silicon retina, in which sample/hold circuits were embedded to generate fluctuation-suppressed outputs in the previous study . The applications of this silicon retina, however, are limited because of a low spatial resolution and computational variability. In this paper, we have fabricated a multichip silicon retina in which the functional network circuits are divided into two chips: the photoreceptor network chip (P chip) and the horizontal cell network chip (H chip). The output images of the P chip are transferred to the H chip with analog voltages through the line-parallel transfer bus. The sample/hold circuits embedded in the P and H chips compensate for the pattern noise generated on the circuits, including the analog communication pathway. Using the multichip silicon retina together with an off-chip differential amplifier, spatial filtering of the image with an odd- and an even-symmetric orientation selective receptive fields was carried out in real time. The analog data transfer method in the present multichip silicon retina is useful to design analog neuromorphic multichip systems that mimic the hierarchical structure of neuronal networks in the visual system. 相似文献

4.

Xilinx XC4000系列FPGA自动测试平台搭建

刘肄倬杨志家王宏《微计算机信息》2010,(11)

随着FPGA的发展,FPGA测试技术也得到了相应的发展。因为FPGA的结构和传统专用集成电路(ASIC)有着本质的区别,在FPGA中不能形成可测性设计电路,但它的可编程能力决定了其测试电路可以通过编程的方法来实现。本文讨论了Xilinx XC4000系列FPGA中CLB资源和互连资源的自动测试方法。而且提出了一种新的测试资源坐标定位方法,使得由软件仿真向器件真实测试取得了突破。并搭建了硬件测试平台。相似文献

5.

Sorting networks on FPGAs

Rene Mueller Jens Teubner Gustavo Alonso 《The VLDB Journal The International Journal on Very Large Data Bases》2012,21(1):1-23

Computer architectures are quickly changing toward heterogeneous many-core systems. Such a trend opens up interesting opportunities but also raises immense challenges since the efficient use of heterogeneous many-core systems is not a trivial problem. Software-configurable microprocessors and FPGAs add further diversity but also increase complexity. In this paper, we explore the use of sorting networks on field-programmable gate arrays (FPGAs). FPGAs are very versatile in terms of how they can be used and can also be added as additional processing units in standard CPU sockets. Our results indicate that efficient usage of FPGAs involves non-trivial aspects such as having the right computation model (a sorting network in this case); a careful implementation that balances all the design constraints in an FPGA; and the proper integration strategy to link the FPGA to the rest of the system. Once these issues are properly addressed, our experiments show that FPGAs exhibit performance figures competitive with those of modern general-purpose CPUs while offering significant advantages in terms of power consumption and parallel stream evaluation. 相似文献

6.

一种高性能FPGA辐射发射抑制方法研究

王霞郑龙飞王蒙军张红丽吴建飞《计算机工程与科学》2021,43(5):814-819

随着半导体技术的不断发展,集成电路的电路速度、集成密度和I/O端口数量已大大增加,FPGA的小型化、高密度集成会引发电磁兼容性的问题,电磁屏蔽是抑制电磁辐射最有效的方法,选择高效的电磁屏蔽材料可以取得良好的屏蔽效果.而目前电磁屏蔽材料在FPGA上的应用较少,因此选取了一款具有代表性的高性能FPGA作为研究对象,通过近场... 相似文献

7.

Applying Genetic Parallel Programming to Synthesize Combinational Logic Circuits 总被引：1，自引：0，他引：1

Sin Man Cheang Kin Hong Lee Kwong Sak Leung 《Evolutionary Computation, IEEE Transactions on》2007,11(4):503-520

Experimental results show that parallel programs can be evolved more easily than sequential programs in genetic parallel programming (GPP). GPP is a novel genetic programming paradigm which evolves parallel program solutions. With the rapid development of lookup-table-based (LUT-based) field programmable gate arrays (FPGAs), traditional circuit design and optimization techniques cannot fully exploit the LUTs in LUT-based FPGAs. Based on the GPP paradigm, we have developed a combinational logic circuit learning system, called GPP logic circuit synthesizer (GPPLCS), in which a multilogic-unit processor is used to evaluate LUT circuits. To show the effectiveness of the GPPLCS, we have performed a series of experiments to evolve combinational logic circuits with two- and four-input LUTs. In this paper, we present eleven multi-output Boolean problems and their evolved circuits. The results show that the GPPLCS can evolve more compact four-input LUT circuits than the well-known LUT-based FPGA synthesis algorithms. 相似文献

8.

LVQ neural network optimized implementation on FPGA devices with multiple-wordlength operations for real-time systems

Blaiech Ahmed Ghazi Ben Khalifa Khaled Boubaker Mohamed Bedoui Mohamed Hedi 《Neural computing & applications》2018,29(2):509-528

The development of hardware platforms for artificial neural networks (ANN) has been hindered by the high consumption of power and hardware resources. In this paper, we present a methodology for ANN-optimized implementation, of a learning vector quantization (LVQ) type on a field-programmable gate array (FPGA) device. The aim was to provide an intelligent embedded system for real-time vigilance state classification of a subject from an analysis of the electroencephalogram signal. The present approach consists in applying the extension of the algorithm architecture adequacy (AAA) methodology with the arithmetic accuracy constraint, allowing the LVQ-optimized implementation on the FPGA. This extension improves the optimization phase of the AAA methodology by taking into account the operations wordlength required by applying and creating approximative-wordlength operation groups, where the operations in the same group will be performed with the same operator. This LVQ implementation will allow a considerable gain of circuit resources, power and maximum frequency while respecting the time and accuracy constraints. To validate our approach, the LVQ implementation has been tried for several network topologies on two Virtex devices. The accuracy–success rate relation has been studied and reported.

相似文献

9.

A new method for in situ measurement of parameters and degradation processes in modern nanoscale programmable devices

Petr Pfeifer Zdenek PlivaAuthor Vitae 《Microprocessors and Microsystems》2014

This paper presents a new method and results from measurement of internal parameters of programmable nanoscale circuits, namely Xilinx FPGA devices and especially Zynq SoC devices designed on 28 nm TSMC’s technology and older 45 nm Spartan 6 device as well as Xilinx Virtex product lines. The method utilizes a new undersampling approach for frequency measurement and an easy way of processing BRAM data streams. The proposed flexible circuits have been used in various measurements of timing parameters and delays in FPGAs, including measurements or detection of the aging issues. The paper presents results of measurements under various core voltage values as performed on selected Xilinx FPGA platforms, including key results about limited usability of the latest 28 nm devices under accelerated conditions and possibility of studying or mitigating aging effects in FPGAs. The paper presents rare results of experiments, real measurements and data available from current as well as previous technology nodes and it attempts to uncover new facts and areas of the latest high-end technologies, including the area of aging and degradation processes in general. The new methodology, presented approach and results can also be used in various dependable systems, including selected aerospace, medical, automotive or transportation ones. It is also directly and easily applicable to modern processor and multicore systems. 相似文献

10.

Using Lava to design and verify recursive and periodic sorters

Koen Claessen Mary Sheeran Satnam Singh 《International Journal on Software Tools for Technology Transfer (STTT)》2003,4(3):349-358

相似文献

11.

Design and VLSI implementation of a high-performance face detection engine

Dongil HanAuthor Vitae Jongho Choi^{Author Vitae} 《Computers & Electrical Engineering》2012,38(5):1222-1239

This paper proposes a novel hardware structure and field-programmable gate array (FPGA) implementation method for real-time detection of multiple human faces with robustness against illumination variations. These are designed to greatly improve face detection in various environments with using MCT techniques and the AdaBoost learning algorithm which is robust against variable illumination. We have designed, implemented, and verified the hardware architecture of the face detection engine for high-performance face detection and real-time processing. The face detection chip is developed by verifying and implementing it using a FPGA and an application-specific integrated circuit (ASIC). To verify and implement the chip, we used a Virtex5 LX330 FPGA board and a 0.18 μm 1-poly and 6-metal CMOS logic process. Performance results of the implementation and verification showed it is possible to detect at least 32 faces of a wide variety of sizes at a maximum speed of 147 frames per second. 相似文献

12.

Speed optimal FPGA implementation of the encryption algorithms for telecom applications

《Microprocessors and Microsystems》2020

The last two decades have seen a revolution in telecom technology with the evolution of three wireless mobile communication standards, namely, GPRS to 3G, 3G to 4G, and 4G to 5G. 5G offers faster download speeds and enables high connectivity between devices such as mobile phones, displays, smart homes, and smart cars because of its high reliability and high bandwidths (up to 10 Gbps). However, at the same time, data and personal information are also more susceptible to theft because of the high connectivity. Such threats can be addressed using electronic data encryption using the advanced encryption standard (AES). Because of their reconfigurable and parallel architectures, Field-Programmable Gate Arrays (FPGAs) are getting popular in VLSI design flows to enable the pre-silicon validation of designs faster data rates in real-time. FPGAs also serve as platforms for software development in the pre-silicon environment owing to their faster speeds. The design community is also heavily relying on High-Level Synthesis (HLS) tools in VLSI design flows. HLS platforms enable the new designs to improve the process with sustained authentication between two analytical selections from conventional functional specifications. We propose a high-throughput FPGA implementation based on high-level Synthesis for the AES algorithm. The implementation uses a 128-bit key and is highly suited for telecom applications such as 5G. Researchers have developed and tested the setup and then used the Vivado HLS tool to evaluate various HLS guidelines as per the implementation. The generated Verilog RTL was verified and implemented on Xilinx Kintex 7 and Virtex 6 FPGAs. Since using the same resources, we have seen significant results than existing methods achieved by individual investigators. We have also verified the design for functionality by checking the ciphertext output from our design against a reference design output for the same input plaintext. 相似文献

13.

FPGA implementation of an enhanced chaotic-KASUMI block cipher

《Microprocessors and Microsystems》2021

The radio link is a broadcast channel used to transmit data over mobile networks. Because of the sensitivity of this network part, a security mechanism is used to ensure users’ information. For example, the third generation of mobile network security is based on the KASUMI block cipher, which is standardized by the Third Generation Partnership Project (3GPP). This work proposes an optimized and enhanced implementation of the KASUMI block cipher based on a chaotic generator. The purpose is to develop an efficient ciphering algorithm with better performance and good security robustness while preserving the standardization. The proposed design was implemented on several Xilinx Virtex Field Programmable Gate Arrays (FPGA) technologies. The synthesis results and a comparison with previous works prove the performance improvement of the proposed cipher block in terms of throughput, used hardware logic resources, and resistance against most cryptanalysis attacks. 相似文献

14.

Neural network training based on FPGA with floating point number format and it��s performance

Mehmet Ali ?avu?lu Cihan Karakuzu Suhap ?ahin Mehmet Yakut 《Neural computing & applications》2011,20(2):195-202

In this paper, two-layered feed forward artificial neural network’s (ANN) training by back propagation and its implementation on FPGA (field programmable gate array) using floating point number format with different bit lengths are remarked based on EX-OR problem. In the study, being suitable with the parallel data-processing specification on ANN’s nature, it is especially ensured to realize ANN training operations parallel over FPGA. On the training, Virtex2vp30 chip of Xilinx FPGA family is used. The network created on FPGA is coded by using VHDL. By comparing the results to available literature, the technique developed here proved to consume less space for the subjected ANN training which has the same structure and bit length, it is shown to have better performance. 相似文献

15.

Optimal utilization of available reconfigurable hardware resources

Kashif Latif Arshad Aziz Athar MahboobAuthor vitae 《Computers & Electrical Engineering》2011,37(6):1043-1057

Field programmable gate arrays (FPGAs) are continuously gaining momentum and becoming essential part of today’s digital systems and applications. The growing use of these devices coupled with increasingly more complex and integrated designs necessitates search for techniques in efficient utilization of their internal resources. Standard HDL coding techniques and synthesis tools implement logic to look up table (LUT) based architecture. The resulting design utilizes more area on the chip and some fast and dedicated areas and resources of the chip remain unutilized. This in turn results in slower clock rates and larger critical path lengths, hence the design remains inefficient in terms of both speed and area. In this paper we present and discuss techniques to effectively utilize the FPGA dedicated resources in order to speed up achievable clock rates and reduce the FPGA area utilization. Various useful HDL constructs are presented that utilize dedicated hardware resources of modern Xilinx FPGAs. Optimization techniques are presented with implementation examples and corresponding quantitative performance evaluation. In most of the cases we have achieved 50% reduction in chip area utilization and simultaneously improved timing results significantly. 相似文献

16.

A parameterized multilevel pattern matching architecture on FPGAs for network intrusion detection and prevention

Tian Song DongSheng Wang ZhiZhong Tang 《中国科学F辑(英文版)》2009,52(6):949-963

Pattern matching is one of the most performance-critical components for the content inspection based applications of network security, such as network intrusion detection and prevention. To keep up with the increasing speed network, this component needs to be accelerated by well designed custom coprocessor. This paper presents a parameterized multilevel pattern matching architecture (MPM) which is used on FPGAs. To achieve less chip area, the architecture is designed based on the idea of selected character decoding (SCD) and multilevel method which are analyzed in detail. This paper also proposes an MPM generator that can generate RTL-level codes of MPM by giving a pattern set and predefined parameters. With the generator, the efficient MPM architecture can be generated and embedded to a total hardware solution. The third contribution is a mathematical model and formula to estimate the chip area for each MPM before it is generated, which is useful for choosing the proper type of FPGAs. One example MPM architecture is implemented by giving 1785 patterns of Snort on Xilinx Virtex 2 Pro FPGA. The results show that this MPM can achieve 4.3 Gbps throughput with 5 stages of pipelines and 0.22 slices per character, about one half chip area of the most area-efficient architecture in literature. Other results are given to show that MPM is also efficient for general random pattern sets. The performance of MPM can be scalable near linearly, potential for more than 100 Gbps throughput. Supported by the National Natural Science Foundation of China (Grant No. 60803002), and the Excellent Young Scholars Research Fund of Beijing Institute of Technology 相似文献

17.

An FPGA based soft multiprocessor for DNS/DNSSEC authoritative server

Rabah SadounAuthor Vitae El Bey Bourennane^{Author Vitae} 《Microprocessors and Microsystems》2011,35(5):473-483

相似文献

18.

Deterministic bit-stream digital neurons

Braendler D. Hendtlass T. O''Donoghue P. 《Neural Networks, IEEE Transactions on》2002,13(6):1514-1525

In this paper, we present the design of a deterministic bit-stream neuron, which makes use of the memory rich architecture of fine-grained field-programmable gate arrays (FPGAs). It is shown that deterministic bit streams provide the same accuracy as much longer stochastic bit streams. As these bit streams are processed serially, this allows neurons to be implemented that are much faster than those that utilize stochastic logic. Furthermore, due to the memory rich architecture of fine-grained FPGAs, these neurons still require only a small amount of logic to implement. The design presented here has been implemented on a Virtex FPGA, which allows a very regular layout facilitating efficient usage of space. This allows for the construction of neural networks large enough to solve complex tasks at a speed comparable to that provided by commercially available neural-network hardware. 相似文献

19.

Evolvable Reasoning Hardware: Its Prototyping and Performance Evaluation

Moritoshi Yasunaga Jung Hwan Kim Ikuo Yoshihara 《Genetic Programming and Evolvable Machines》2001,2(3):211-230

In this paper, we propose evolvable reasoning hardware and its design methodology. In the proposed design methodology, case databases of each reasoning task are transformed into truth tables, which are evolved to extract rules behind the past cases through a genetic algorithm. Circuits for the evolvable reasoning hardware are synthesized from the evolved truth-tables. Parallelism in each task can be embedded directly in the circuits through the direct hardware implementation of the case databases. We developed the evolvable reasoning hardware prototype using Xilinx Virtex FPGA chips and applied it to the English-pronunciation-reasoning (EPR) task. The evolvable reasoning hardware for the EPR task was implemented with 270K gates, achieving an extremely high reasoning speed of less than 300 ns/phoneme. It also achieved a reasoning accuracy of 82.1% which is almost the same accuracy as NETTalk in neural networks and MBRTalk in parallel AI. 相似文献

20.

The route to a defect tolerant LUT through artificial evolution

Asbjoern Djupdal Pauline C. Haddow 《Genetic Programming and Evolvable Machines》2011,12(3):281-303

Evolutionary techniques may be applied to search for specific structures or functions, as specified in the fitness function. This paper addresses the challenge of finding an appropriate fitness function when searching for generic rather than specific structures which, when combined wiacteristic of defect tolerance on the circuit. Production defects for integrated circuits are expected to increase considerably. To avoid a corresponding drop in yield, improved defect tolerance solutions are needed. In the case of Field Programmable Gate Arrays (FPGAs), the pre-designed gate array provides a bridge between production and the application designers. Thus, introduction of defect tolerant techniques to the FPGA itself could provide a defect free gate array to the application designer, despite production defects. The search for defect tolerance presented herein is directed at finding defect tolerant structures for an important building block of FPGAs: Look-Up Tables (LUTs). Two key approaches are presented: (1) applying evolved generic building blocks to a traditional LUT design and (2) evolving the LUT design directly. The results highlight the fact that evolved generic defect tolerant structures can contribute to highly reliable circuit designs at the expense of area usage. Further, they show that applying such a technique, rather than direct evolution, has benefits with respect to evolvability of larger circuits, again at the expense of area usage. 相似文献