期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

FPGA Implementation(s) of a Scalable Encryption Algorithm

Mace F. Standaert F.-X. Quisquater J.-J. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(2):212-216

SEA is a scalable encryption algorithm targeted for small embedded applications. It was initially designed for software implementations in controllers, smart cards, or processors. In this letter, we investigate its performances in field-programmable gate array (FPGA) devices. For this purpose, a loop architecture of the block cipher is presented. Beyond its low cost performances, a significant advantage of the proposed architecture is its full flexibility for any parameter of the scalable encryption algorithm, taking advantage of generic VHDL coding. The letter also carefully describes the implementation details allowing us to keep small area requirements. Finally, a comparative performance discussion of SEA with the advanced encryption standard Rijndael and (a cipher purposed for efficient FPGA implementations) is proposed. It illustrates the interest of platform/context-oriented block cipher design and, as far as SEA is concerned, its low area requirements and reasonable efficiency. 相似文献

2.

Low area field-programmable gate array implementation of PRESENT image encryption with key rotation and substitution

Srikanth Parikibandla Sreenivas Alluri 《ETRI Journal》2021,43(6):1113-1129

Lightweight ciphers are increasingly employed in cryptography because of the high demand for secure data transmission in wireless sensor network, embedded devices, and Internet of Things. The PRESENT algorithm as an ultra-lightweight block cipher provides better solution for secure hardware cryptography with low power consumption and minimum resource. This study generates the key using key rotation and substitution method, which contains key rotation, key switching, and binary-coded decimal-based key generation used in image encryption. The key rotation and substitution-based PRESENT architecture is proposed to increase security level for data stream and randomness in cipher through providing high resistance to attacks. Lookup table is used to design the key scheduling module, thus reducing the area of architecture. Field-programmable gate array (FPGA) performances are evaluated for the proposed and conventional methods. In Virtex 6 device, the proposed key rotation and substitution PRESENT architecture occupied 72 lookup tables, 65 flip flops, and 35 slices which are comparably less to the existing architecture. 相似文献

3.

The research and design of reconfigurable computing for Block cipher

Xiaohui Yang Zibin Dai Yongfu Zhang Xuerong Yu 《电子科学学刊(英文版)》2008,25(4):503-510

This paper describes a new specialized Reconfigurable Cryptographic for Block ciphers Architecture（RCBA）.Application-specific computation pipelines can be configured according to the characteristics of the block cipher processing in RCBA,which delivers high performance for cryptographic applications.RCBA adopts a coarse-grained reconfigurable architecture that mixes the appropriate amount of static configurations with dynamic configurations.RCBA has been implemented based on Altera’s FPGA,and representative algorithms of block cipher such as DES,Rijndael and RC6 have been mapped on RCBA architecture successfully.System performance has been analyzed,and from the analysis it is demonstrated that the RCBA architecture can achieve more flexibility and efficiency when compared with other implementations. 相似文献

4.

High-speed VLSI architectures for the AES algorithm 总被引：1，自引：0，他引：1

Xinmiao Zhang Parhi K.K. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2004,12(9):957-967

This paper presents novel high-speed architectures for the hardware implementation of the Advanced Encryption Standard (AES) algorithm. Unlike previous works which rely on look-up tables to implement the SubBytes and InvSubBytes transformations of the AES algorithm, the proposed design employs combinational logic only. As a direct consequence, the unbreakable delay incurred by look-up tables in the conventional approaches is eliminated, and the advantage of subpipelining can be further explored. Furthermore, composite field arithmetic is employed to reduce the area requirements, and different implementations for the inversion in subfield GF(2/sup 4/) are compared. In addition, an efficient key expansion architecture suitable for the subpipelined round units is also presented. Using the proposed architecture, a fully subpipelined encryptor with 7 substages in each round unit can achieve a throughput of 21.56 Gbps on a Xilinx XCV1000 e-8 bg560 device in non-feedback modes, which is faster and is 79% more efficient in terms of equivalent throughput/slice than the fastest previous FPGA implementation known to date. 相似文献

5.

Differential Side Channel Analysis Attacks on FPGA Implementations of ARIA

ChangKyun Kim Martin Schläffer SangJae Moon 《ETRI Journal》2008,30(2):315-325

In this paper, we first investigate the side channel analysis attack resistance of various FPGA hardware implementations of the ARIA block cipher. The analysis is performed on an FPGA test board dedicated to side channel attacks. Our results show that an unprotected implementation of ARIA allows one to recover the secret key with a low number of power or electromagnetic measurements. We also present a masking countermeasure and analyze its second‐order side channel resistance by using various suitable preprocessing functions. Our experimental results clearly confirm that second‐order differential side channel analysis attacks also remain a practical threat for masked hardware implementations of ARIA. 相似文献

6.

ARCHITECTURE MODEL AND RESOURCE GRAPH BUILDING ALGORITHM FOR DETAILED FPGA ARCHITECTURE DESIGN

Li Zhihua ;Yang Haigang ;Yang Liqun ;Li Wei ;Huang Juan 《电子科学学刊(英文版)》2014,(6):505-512

相似文献

7.

High-throughput LDPC decoders

Mansour M.M. Shanbhag N.R. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2003,11(6):976-996

A high-throughput memory-efficient decoder architecture for low-density parity-check (LDPC) codes is proposed based on a novel turbo decoding algorithm. The architecture benefits from various optimizations performed at three levels of abstraction in system design-namely LDPC code design, decoding algorithm, and decoder architecture. First, the interconnect complexity problem of current decoder implementations is mitigated by designing architecture-aware LDPC codes having embedded structural regularity features that result in a regular and scalable message-transport network with reduced control overhead. Second, the memory overhead problem in current day decoders is reduced by more than 75% by employing a new turbo decoding algorithm for LDPC codes that removes the multiple checkto-bit message update bottleneck of the current algorithm. A new merged-schedule merge-passing algorithm is also proposed that reduces the memory overhead of the current algorithm for low to moderate-throughput decoders. Moreover, a parallel soft-input-soft-output (SISO) message update mechanism is proposed that implements the recursions of the Balh-Cocke-Jelinek-Raviv (BCJR) algorithm in terms of simple "max-quartet" operations that do not require lookup-tables and incur negligible loss in performance compared to the ideal case. Finally, an efficient programmable architecture coupled with a scalable and dynamic transport network for storing and routing messages is proposed, and a full-decoder architecture is presented. Simulations demonstrate that the proposed architecture attains a throughput of 1.92 Gb/s for a frame length of 2304 bits, and achieves savings of 89.13% and 69.83% in power consumption and silicon area over state-of-the-art, with a reduction of 60.5% in interconnect length. 相似文献

8.

Efficient Implementations for AES Encryption and Decryption 总被引：1，自引：0，他引：1

Rashmi Ramesh Rachh P. V. Ananda Mohan B. S. Anami 《Circuits, Systems, and Signal Processing》2012,31(5):1765-1785

This paper proposes two efficient architectures for hardware implementation of the Advanced Encryption Standard (AES) algorithm. The composite field arithmetic for implementing SubBytes (S-box) and InvSubBytes (Inverse S-box) transformations investigated by several authors is used as the basis for deriving the proposed architectures. The first architecture for encryption is based on optimized S-box followed by bit-wise implementation of MixColumns and AddRoundKey and optimized Inverse S-box followed by bit-wise implementation of InvMixColumns and AddMixRoundKey for decryption. The proposed S-box and Inverse S-box used in this architecture are designed as a cascade of three blocks. In the second proposed architecture, the block III of the proposed S-box is combined with the MixColumns and AddRoundKey transformations forming an integrated unit for encryption. An integrated unit for decryption combining the block III of the proposed InvSubBytes with InvMixColumns and AddMixRoundKey is formed on similar lines. The delays of the proposed architectures for VLSI implementation are found to be the shortest compared to the state-of-the-art implementations of AES operating in non-feedback mode. Iterative and fully unrolled sub-pipelined designs including key schedule are implemented using FPGA and ASIC. The proposed designs are efficient in terms of Kgates/Giga-bits per second ratio compared with few recent state-of-the-art ASIC (0.18-μm CMOS standard cell) based designs and throughput per area (TPA) for FPGA implementations. 相似文献

9.

Dual-Data Rate Transpose-Memory Architecture Improves the Performance,Power and Area of Signal-Processing Systems

Mohamed El-Hadedy Xinfei Guo Martin Margala Mircea R. Stan Kevin Skadron 《Journal of Signal Processing Systems》2017,88(2):167-184

This paper presents a novel type of high-speed and area-efficient register-based transpose memory architecture enabled by reporting on both edges of the clock. The proposed new architecture, by using the double-edge triggered registers, doubles the throughput and increases the maximum frequency by avoiding some of the combinational circuit used in prior work. The proposed design is evaluated with both FPGA and ASIC flow in 28/32nm technology. The experimental results show that the proposed memory achieves almost 4X improvement in throughput while consuming 46 % less area with the FPGA implementations compared to prior work. For ASIC implementations, it achieves more than 60 % area reduction and at least 2X performance improvement while burning 60 % less power compared to other register-based designs implemented with the same flow. As an example, a proposed 8X8 transpose memory with 12-bit input/output resolution is able to achieve a throughput of 107.83Gbps at 647MHz by taking only 140 slices on a Virtex-7 Xilinx FPGA platform, and achieve a throughput of 88.2Gbps at 529MHz by taking 0.024mm ² silicon area for ASIC. The proposed transpose memory is integrated in both 2D-DCT and 2D-IDCT blocks for signal processing applications on the same FPGA platform. The new architecture allows a 3.5X speed-up in performance for the 2D-DCT algorithm, compared to the previous work, while consuming 28 % less area, and 2D-IDCT achieves a 3X speed-up while consuming 20 % less area. 相似文献

10.

High Performance Reconfigurable FIR Filter Architecture Using Optimized Multiplier

J. L. Mazher Iqbal S. Varadarajan 《Circuits, Systems, and Signal Processing》2013,32(2):663-682

In mobile communication systems and multimedia applications, need for efficient reconfigurable digital finite impulse response (FIR) filters has been increasing tremendously because of the advantage of less area, low cost, low power and high speed of operation. This article presents a near optimum low- complexity, reconfigurable digital FIR filter architecture based on computation sharing multipliers (CSHM), constant shift method (CSM) and modified binary-based common sub-expression elimination (BCSE) method for different word-length filter coefficients. The CSHM identifies common computation steps and reuses them for different multiplications. The proposed reconfigurable FIR filter architecture reduces the adders cost and operates at high speed for low-complexity reconfigurable filtering applications such as channelization, channel equalization, matched filtering, pulse shaping, video convolution functions, signal preconditioning, and various other communication applications. The proposed architecture has been implemented and tested on a Virtex 2 xc2vp2-6fg256 field-programmable gate array (FPGA) with a precision of 8-bits, 12-bits, and 16-bits filter coefficients. The proposed novel reconfigurable FIR filter architecture using dynamically reconfigurable multiplier block offers good area and speed improvement compared to existing reconfigurable FIR filter implementations. 相似文献

11.

基于部分重构技术的加密算法实现研究

下载免费PDF全文

王峰周学海陈艾罗赛《电子学报》2007,35(5):959-963

针对当前可重构计算技术在加密领域的应用中存在性能和资源占用量等方面的缺陷,提出了一种基于部分重构技术的加密算法实现方法.该方法利用Xilinx FPGA具有的基于模块的部分重构能力实现具有对合结构的块加密算法,有效解决了模块间的协同机制、通讯通道设计以及执行时序调整等关键问题.加密算法实现的对比实验验证了该方法的有效性. 相似文献

12.

Novel detector implementations for 3G LTE downlink and uplink

Tuomo Hänninen Janne Janhunen Markku Juntti 《Analog Integrated Circuits and Signal Processing》2014,78(3):645-655

We summarize our recent state-of-the-art programmable and reconfigurable detector and QR decomposition (QRD) implementations targeting 3G long term evolution (LTE) downlink and uplink requirements. The downlink transmission is based on the orthogonal frequency division multiplexing, whereas the uplink transmission uses a single-carrier frequency-division multiple access. The downlink implementations are based on the programmable transport triggered architecture (TTA) which provides a flexible and energy efficient architecture template. In TTA detector implementation, the LTE detection rate requirements up to 20 MHz bandwidth and 4 × 4 antenna system with 64-QAM, are achieved by using 1–6 programmable cores in parallel. Each core runs at 277 MHz clock frequency and consumes 55.5–64.0 mW depending on the detector configuration. The downlink detector is based on the selective spanning with fast enumeration algorithm. The uplink field-programmable gate array (FPGA) detector implementation is targeted for 4 × 4 antenna system and 64-QAM achieving a detection rate requirement for 20 MHz bandwidth. The used FPGA board for uplink implementation is Xilinx Virtex-6 and the implementation has been carried out using Xilinx Vivado high level synthesis tool. Two different detector architectures are implemented. The first one achieves the detection rate requirement with a single processing block running at 231 MHz and the latter one with four blocks in parallel, each running at 247 MHz. The implemented detector is based on the K-best algorithm. A multiple-input multiple-output receiver requires QRD to produce valid inputs for the detector. In addition to detector implementations, QRD is also implemented on both TTA and FPGA. Modified Gram–Schmidt algorithm is used in both QRD implementations. 相似文献

13.

A Reduced-CP Approach to SC/FDE Block Transmission for Broadband Wireless Communications 总被引：1，自引：0，他引：1

Gusmao A. Torres P. Dinis R. Esteves N. 《Communications, IEEE Transactions on》2007,55(4):801-809

For conventional cyclic prefix (CP)-assisted single-carrier/frequency-domain equalization (SC/FDE) implementations, as well as for orthogonal frequency-division multiplexing (OFDM) implementations, the CP length is known to be selected on the basis of the expected maximum delay spread. Next, the data block size can be chosen to be large enough to minimize the CP overhead, yet small enough to make the channel variation over the block negligible. This paper considers the possibility of reducing the overall CP assistance, when transmitting sequences of SC blocks, while avoiding an excessively long fast Fourier transform window for FDE purposes and keeping good FDE performances through low-complexity, noniterative receiver techniques. These techniques, which take advantage of specially designed frame structures, rely on a basic algorithm for decision-directed correction (DDC) of the FDE inputs when the CP is not long enough to cope with the time-dispersive channel effects. More specifically, we present and evaluate a novel class of reduced-CP SC/FDE schemes, which takes advantage of a special frame structure for replacing "useless" CP redundancy by fully useful channel coding redundancy, with the help of the DDC algorithm. When using the DDC-FDE technique with these especially designed frame structures, the impact of previous decisions, which are not error-free, is shown to be rather small, thereby allowing a power-efficiency advantage (in addition to the obvious bandwidth-efficiency advantage) over conventional block transmission implementations under full-length CP. Additionally, the DDC algorithm is also shown to be useful to improve the power efficiency of these conventional implementations 相似文献

14.

高效Kasumi加密算法的软件设计与实现

李翔徐童熊焰《通信技术》2012,45(3):37-40

Kasumi分组密码由MISTY1加密算法发展而来,为第三代移动通信系统（3G）无线网络提供完整性和保密性服务。目前,该算法已有多种高效硬件实现方法,却少有高效软件实现方法提出。这里提出一种基于包并行的高效软件设计与实现,并通过对FI子函数进行查表来优化加密过程,同时引入新的SSE转置指令实现快速密钥生成。实验结果表明这里的方法比协议实现要快4倍,并达到实际通信部署的要求。相似文献

15.

面向分组密码的四维度并行处理架构研究

下载免费PDF全文

王寿成李功丽严迎建徐进辉《电子学报》2017,45(10):2457-2463

通过对分组密码算法加密特征的分析,将分组密码算法的并行性划分为分组内同操作并行性、分组内异操作并行性、分组间同操作并行性和分组间异操作并行性等四维度并行性,并根据此提出了基于Amdahl定律的分组密码四维度并行处理模型FDPM.该模型能够指导分组密码处理架构设计,为架构资源配置和并行性开发提供整体建议.以FDPM为依据,提出了一种面向分组密码的可重构流处理架构RCSA,该架构能够有效开发分组密码处理的并行性,在提高密码处理性能的同时也能提高资源利用率.通过算法映射结果分析,证明了FDPM模型的正确性与RCSA架构的高效性. 相似文献

16.

System-Level Data-Flow Transformation Exploration and Power-Area Trade-offs Demonstrated on Video Codecs

Francky Catthoor Martin Janssen Lode Nachtergaele Hugo De Man 《The Journal of VLSI Signal Processing》1998,18(1):39-50

A VLSI architecture for the block matching motion estimation is described in this paper. The proposed architecture achieves 100% PE utilization and alleviates I/O bottleneck problem using small amount of distributed on-chip image memory. The number of processing elements is scalable according to the degree of parallel processing and throughput requirement. The overall computations are performed in pipelined manner and the data fill time for contiguous block is eliminated to increase throughput. The VLSI system implementation methodologies and the layouts are also described. Finally, the performances are evaluated and the advantages are outlined, compared to other architectures. 相似文献

17.

Design of the Switching Controller for the High-Capacity Non-Blocking Internet Router

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2009,17(8):1157-1161

The sequential greedy scheduling (SGS) algorithm is a scalable maximal matching algorithm. This algorithm was conceptually proposed and well received since it provides non-blocking in an Internet router with input buffers and a cross-bar, unlike other existing implementations. In this paper, we implent a new design of the SGS algorithm, and determine its exact behaviour, performance and QoS that it provides. We examine different design options and measure the performance of their implementations in terms of their scalability and speed. It will be shown that multiple scheduler modules of a terabit Internet router can be implemented on a low-cost field-programmable gate-array (FPGA) device, and that the processing can be performed within the desired time slot duration. Proper functioning of the implemented scheduler was confirmed through thorough software and hardware testing. 相似文献

18.

High Throughput,Scalable VLSI Architecture for Block Matching Motion Estimation

You Jaehee Lee Sang Uk 《Journal of Signal Processing Systems》1998,19(1):39-50

A VLSI architecture for the block matching motion estimation is described in this paper. The proposed architecture achieves 100% PE utilization and alleviates I/O bottleneck problem using small amount of distributed on-chip image memory. The number of processing elements is scalable according to the degree of parallel processing and throughput requirement. The overall computations are performed in pipelined manner and the data fill time for contiguous block is eliminated to increase throughput. The VLSI system implementation methodologies and the layouts are also described. Finally, the performances are evaluated and the advantages are outlined, compared to other architectures. 相似文献

19.

A Flexible Architecture for Modular Arithmetic Hardware Accelerators based on RNS

Samuel Antão Leonel Sousa 《Journal of Signal Processing Systems》2014,76(3):249-259

Modular arithmetic is a building block for a variety of applications potentially supported on embedded systems. An approach to turn modular arithmetic more efficient is to identify algorithmic modifications that would enhance the parallelization of the target arithmetic in order to exploit the properties of parallel devices and platforms. The Residue Number System (RNS) introduces data-level parallelism, enabling the parallelization even for algorithms based on modular arithmetic with several data dependencies. However, the mapping of generic algorithms to full RNS-based implementations can be complex and the utilization of suitable hardware architectures that are scalable and adaptable to different demands is required. This paper proposes and discusses an architecture with scalability features for the parallel implementation of algorithms relying on modular arithmetic fully supported by the Residue Number System (RNS). The systematic mapping of a generic modular arithmetic algorithm to the architecture is presented. It can be applied as a high level synthesis step for an Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA) design flow targeting modular arithmetic algorithms. An implementation with the Xilinx Virtex 4 and Altera Stratix II Field Programmable Gate Array (FPGA) technologies of the modular exponentiation and Elliptic Curve (EC) point multiplication, used in the Rivest-Shamir-Adleman (RSA) and (EC) cryptographic algorithms, suggests latency results in the same order of magnitude of the fastest hardware implementations of these operations known to date. 相似文献

20.

分组密码算法CTC的立方分析

穆道光张文政《信息安全与通信保密》2012,(7):132-135

立方攻击是在2009年欧洲密码年会上由Dinur和Shamir提出的一种新型密码分析方法,该方法旨在寻找密钥比特之间的线性关系。CTC(Courtois Toy Cipher)是N.Courtois设计的一种用于密码分析研究的分组密码算法,该算法的密钥长度、明文长度和迭代轮数都是可变的。文中利用立方攻击方法针对密钥长度为60bit的4轮CTC进行了分析,在选择明文攻击条件下,结合二次测试可恢复全部密钥,密钥恢复阶段仅需要不到2~10次加密算法。相似文献