期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Design tradeoff analysis of floating-point adders in FPGAs

Ali Malik Dongdong Chen Younhee Choi Moon Ho Lee Seok-Bum Ko 《Electrical and Computer Engineering, Canadian Journal of》2008,33(3):169-175

With gate counts of ten million, field-programmable gate arrays (FPGAs) are becoming suitable for floating-point computations. Addition is the most complex operation in a floating-point unit and can cause major delay while requiring a significant area. Over the years, the VLSI community has developed many floating-point adder algorithms aimed primarily at reducing the overall latency. An efficient design of the floating-point adder offers major area and performance improvements for FPGAs. Given recent advances in FPGA architecture and area density, latency has become the main focus in attempts to improve performance. This paper studies the implementation of standard; leading-one predictor (LOP); and far and close datapath (2-path) floating-point addition algorithms in FPGAs. Each algorithm has complex sub-operations which contribute significantly to the overall latency of the design. Each of the sub-operations is researched for different implementations and is then synthesized onto a Xilinx Virtex-II Pro FPGA device. Standard and LOP algorithms are also pipelined into five stages and compared with the Xilinx IP. According to the results, the standard algorithm is the best implementation with respect to area, but has a large overall latency of 27.059 ns while occupying 541 slices. The LOP algorithm reduces latency by 6.5% at the cost of a 38% increase in area compared to the standard algorithm. The 2-path implementation shows a 19% reduction in latency with an added expense of 88% in area compared to the standard algorithm. The five-stage standard pipeline implementation shows a 6.4% improvement in clock speed compared to the Xilinx IP with a 23% smaller area requirement. The five-stage pipelined LOP implementation shows a 22% improvement in clock speed compared to the Xilinx IP at a cost of 15% more area. 相似文献

2.

Improvements on the design and implementation of DVB-S2 LDPC decoders

K.C. Cinnati Loi Seok-Bum Ko Author vitae 《Computers & Electrical Engineering》2011,37(6):1137-1146

The architecture of a field-programmable gate-array (FPGA) implementation of a low-density parity-check (LDPC) decoder for the Digital Video Broadcasting – Second Generation via Satellite (DVB-S2) standard is presented. Algorithms are devised to systematically apply the values given in DVB-S2 to implement a memory mapping scheme, which allows for 360 functional units (FUs) to be used in decoding and supports both normal and short frames. A design of a parity-check module (PCM) is presented that verifies the parity-check equations of the LDPC codes. Furthermore, a special characteristic of five of the codes defined in DVB-S2 and their influence on the decoder design is discussed.Two versions of the LDPC decoder are synthesized for two families of FPGAs. The results show that the decoder presented uses fewer hardware resources than a DVB-S2 LDPC decoder found in the current literature that also uses FPGA, while improving the maximum frequency of the decoder. 相似文献

3.

Decimal SRT Square Root: Algorithm and Architecture

Amir Kaivani Seok-Bum Ko 《Circuits, Systems, and Signal Processing》2013,32(5):2137-2150

Given the popularity of decimal arithmetic, hardware implementation of decimal operations has been a hot topic of research in recent decades. Besides the four basic operations, the square root can be implemented as an instruction directly in the hardware, which improves the performance of the decimal floating-point unit in the processors. Hardware implementation of decimal square rooters is usually done using either functional or digit-recurrence algorithms. Functional algorithms, entailing multiplication per iteration, seem inadequate to use for decimal square roots, given the high cost of decimal multipliers. On the other hand, digit-recurrence square root algorithms, particularly SRT (this method is named after its creators, Sweeney, Robertson, and Tocher) algorithms, are simple and well suited for decimal arithmetic. This paper, with the intention of reducing the latency of the decimal square root operation while maintaining a reasonable cost, proposes an SRT algorithm and the corresponding hardware architecture to compute the decimal square root. The proposed fixed-point square root design requires n+3 cycles to compute an n-digit root; the synthesis results show an area cost of about 31K NAND2 and a cycle time of 40 FO4. These results reveal the 14 % speed advantage of the proposed decimal square root architecture over the fastest previous work (which uses a functional algorithm) with about a quarter of the area. 相似文献

4.

Combining ESOP minimization with BDD-based decomposition for improved FPGA synthesis

Muma K. Dongdong Chen Younhee Choi Dodds D. Moon Ho Lee Seok-Bum Ko 《Electrical and Computer Engineering, Canadian Journal of》2008,33(3):177-182

This paper proposes a novel method to improve the utilization efficiency and performance of field-programmable gate arrays (FPGAs). The proposed method, ExorBDD, uses a stage of exclusive-sum-of-product (ESOP) minimization, followed by a stage of decomposition using binary decision diagrams (BDDs). For exclusive OR (XOR)?intensive circuits, experiments were conducted on 19 MCNC benchmark parity circuits (ranging from 5 to 25 inputs), as they are the most representative case of XOR-intensive circuits. The results using the proposed approach show significant improvements over Exorcism4, BDS, and commercial tools. On average, the new approach uses only 30.3% as many look-up tables as are used by Xilinx tools (and only 16.4% in comparison to Altera). On average, the new approach has a maximum combinational path delay of 89.2% compared to the delay with Xilinx tools (80.3% compared to Altera). Experiments were also conducted on non-XOR-intensive circuits. These results show that ExorBDD also performs well for arbitrary circuits. 相似文献

5.

Efficient Realization of Parity Prediction Functions in FPGAs

Seok-Bum Ko Jien-Chung Lo 《Journal of Electronic Testing》2004,20(5):489-499

In this paper, we propose an AND/XOR-based technology mapping method for efficient realization of parity prediction functions in field programmable gate arrays (FPGAs). Due to the fixed size of the programmable blocks in an FPGA, decomposing a circuit into sub-circuits with appropriate number of inputs can achieve an excellent implementation efficiency. Specifically, the proposed technology mapping method is based on Davio expansion theorem. The AND/XOR nature of the proposed method allows it to operate on XOR intensive circuits, such as parity prediction functions, efficiently. We conduct experiments using the parity prediction functions with respect to MCNC benchmark circuits. With the proposed approach, the number of configurable logic blocks (CLBs) is reduced by 67.6% (compared to speed-optimized results) and 57.7% (compared to area-optimized results), respectively. The total equivalent gate counts are reduced by 65.5%, maximum combinational path delay is reduced by 56.7%, and maximum net delay is reduced by 80.5% compared to conventional methods. 相似文献

6.

A Novel Decimal Logarithmic Converter Based on First-Order Polynomial Approximation

Dongdong Chen Seok-Bum Ko 《Circuits, Systems, and Signal Processing》2012,31(3):1179-1190

This paper presents a decimal logarithmic converter based on the decimal first-order polynomial (linear) approximation algorithm. The proposed approach is mainly based on a look-up table, followed a decimal linear approximation step. Compared with a binary-based decimal linear approximation algorithm (Algorithm 1), the proposed algorithm (Algorithm 2) is error-free in the conversion between the decimal and the binary formats. The proposed architecture is implemented by the combinational logic in the binary coded decimal (BCD) encoding on Virtex5 XC5VLX110T FPGA. The results of the comparison show that the hardware performance of Algorithm 2 can run 2.15 times faster than Algorithm 1, with the expense of 1.14 times more area. 相似文献

7.

Design and verification of an efficient WISHBONE-based network interface for network on chip

K. Swaminathan G. Lakshminarayanan Seok-Bum Ko 《Computers & Electrical Engineering》2014

In this paper, a generic asynchronous First In First Out (FIFO) based WISHBONE compatible plug and play Network Interface (NI) for Network on Chip (NoC) is designed and verified. Four different types of encoded asynchronous FIFOs namely binary, Gray, one-hot and Johnson are designed and analyzed. It is found that Gray-code asynchronous FIFO is the best to handle the asynchronous clock domain issues in NI. The control signals of the WISHBONE bus wrappers from/to asynchronous FIFOs and packing/unpacking modules are asserted concurrently at the same rising edge of the respective router and IP clocks to reduce the latency. The same NI has been utilized for transferring data between synchronous as well as asynchronous clock domains irrespective of clock frequency and phase differences. The proposed NI ensures the seamless high data throughput between the routers and IP cores with minimal latency, higher throughput, higher speed and utilized lesser area compared to the existing design. 相似文献

8.

Factorized multi-scale multi-resolution residual network for single image deraining

Sujit Shivakanth Deivalakshmi S Ko Seok-Bum 《Applied Intelligence》2022,52(7):7582-7598

Applied Intelligence - The performance of vision systems can be affected when used in severe weather conditions such as heavy rain or snow. Rain streak removal is an ill posed problem as they can... 相似文献

9.

Novel convolutional neural network architecture for improved pulmonary nodule classification on computed tomography

Wang Yi Zhang Hao Chae Kum Ju Choi Younhee Jin Gong Yong Ko Seok-Bum 《Multidimensional Systems and Signal Processing》2020,31(3):1163-1183

Multidimensional Systems and Signal Processing - Computed tomography (CT) is widely used to locate pulmonary nodules for preliminary diagnosis of the lung cancer. However, due to high visual... 相似文献

10.

Novel nonlinearity minimized time-to-digital converters with digital calibration technique

Latha P. Sivakumar R. Rao Y. V. Ramana Ko Seok-Bum 《Analog Integrated Circuits and Signal Processing》2022,113(1):9-25

Analog Integrated Circuits and Signal Processing - This paper presents a low nonlinearity, four channel Gated Ring Oscillator (GRO) based Time-to-Digital Converters (TDC) in Xilinx 28 nm... 相似文献