期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Improving the computational efficiency of modular operations for embedded systems

《Journal of Systems Architecture》2014,60(5):440-451

Security protocols such as IPSec, SSL and VPNs used in many communication systems employ various cryptographic algorithms in order to protect the data from malicious attacks. Thanks to public-key cryptography, a public channel which is exposed to security risks can be used for secure communication in such protocols without needing to agree on a shared key at the beginning of the communication. Public-key cryptosystems such as RSA, Rabin and ElGamal cryptosystems are used for various security services such as key exchange and key distribution between communicating nodes and many authentication protocols. Such public-key cryptosystems usually depend on modular arithmetic operations including modular multiplication and exponentiation. These mathematical operations are computationally intensive and fundamental arithmetic operations which are intensively used in many fields including cryptography, number theory, finite field arithmetic, and so on. This paper is devoted to the analysis of modular arithmetic operations and the improvement of the computation of modular multiplication and exponentiation from hardware design perspective based on FPGA. Two of the well-known algorithms namely Montgomery modular multiplication and Karatsuba algorithms are exploited together within our high-speed pipelined hardware architecture. Our proposed design presents an efficient solution for a range of applications where area and performance are both important. The proposed coprocessor offers scalability which means that it supports different security levels with a cost of performance. We also build a system-on-chip design using Xilinx’s latest Zynq-7000 family extensible processing platform to show how our proposed design improve the processing time of modular arithmetic operations for embedded systems. 相似文献

2.

Design and analysis of high-speed 8-bit ALU using 18 nm FinFET technology

Shylashree N. Venkatesh B. Saurab T. M. Srinivasan Tarun Nath Vijay 《Microsystem Technologies》2019,25(6):2349-2359

All modern computational devices consist of ALU. With increase in complexity of software and the consistent shift of software towards parallelism, high speed processors with hardware support for time consuming operations such as multiplication would benefit. Smaller, compact devices such as IoT devices need to run software such as security software and be able to offload computation cost from the cloud. In this paper, a high speed 8-bit ALU using 18 nm FinFET technology is proposed. The arithmetic and logical unit consists of fast compute units such as Kogge Stone fast adder and Dadda multiplier along with basic logic gates. In this paper, an ALU with each compute unit optimized for speed is proposed, while responsibly consuming area. Dadda multiplier is of 8 × 8 architecture as opposed to conventional approach of 4 × 4 making it a true 8-bit ALU. Simulation and analysis is done using Cadence Virtuoso in Analog Design Environment. The transistor count of proposed design is 5298, the power consumption is 219 µW and maximum delay is 166.8 ps. The design is also expected to consume a maximum of one clock cycle for any computation.

相似文献

3.

一种有限域快速低功耗模乘电路设计与实现

程桂花罗永龙齐学梅左开中《计算机时代》2012,(4):21-23

有限域的运算是密码学的基础,而在有限域的运算中模乘运算是核心运算之一。为此,分析了模乘运算的原理及特点,使用Verilog HDL设计模乘电路,通过FPGA实现了基于有限域的模乘运算。电路应用双沿寄存器结构,并且规模小、速度快、功耗低能实现有限域通用模乘运算对加密算法的硬件实现具有实际价值。相似文献

4.

基于改进4-2压缩结构的32位浮点乘法器设计

邵磊李昆张树丹于宗光徐睿《微计算机信息》2007,23(9)

本文介绍一种用于高性能DSP的32位浮点乘法器设计,通过采用改进Booth编码的树状4-2压缩器结构,提高了速度,降低了功耗,该乘法器结构规则且适合于VLSI实现,单个周期内完成一次24位整数乘或者32位浮点乘。整个设计采用Verilog HDL语言结构级描述,用0.25um单元库进行逻辑综合.完成一次乘法运算时间为24.30ns. 相似文献

5.

基于改进4—2压缩结构的32位浮点乘法器设计

邵磊李昆张树丹于宗光徐睿《微计算机信息》2007,23(3X):224-225,199

本文介绍一种用于高性能DSP的32位浮点乘法器设计,通过采用改进Booth编码的树状4-2压缩器结构,提高了速度,降低了功耗,该乘法器结构规则且适合于VLSI实现,单个周期内完成一次24位整数乘或者32位浮点乘。整个设计采用Verilog HDL语言结构级描述,用0.25um单元库进行逻辑综合.完成一次乘法运算时间为24.30ns. 相似文献

6.

面向RSA运算的高速模乘器

张文祥苏斌《计算机工程与设计》2006,27(4):676-678,681

根据两位的Booth编码技术和符号预测技术,针对Blakley模乘算法进行了分析和改进,采用了一种理想的适合于硬件实现的算法。根据此算法,并结合CRT算法,实现了一种新的脉动阵列结构,使算法迭代次数减为原来的一半,同时采用高速的大数全加器,大大提高了模乘的运算速度。基于CMOS的0．35m工艺,对于1024位的操作数,可在100Hz时钟频率下工作,完成一次1024位数字签名时间是4．0ms。相似文献

7.

Power dissipation and area comparison of 512-bit and 1024-bit key AES

Jaeik Cho Setiawan Soekamtoputra Ken Choi Jongsub Moon 《Computers & Mathematics with Applications》2013,65(9):1378-1383

Advanced Encryption Standard (AES) has replaced its predecessor, Double Encryption Standard (DES), as the most widely used encryption algorithm in many security applications. Up to today, AES standard has key size variants of 128, 192, and 256-bit, where longer bit keys provide more secure ciphered text output. In the hardware perspective, bigger key size also means bigger area and power consumption due to more operations that need to be done. Some companies that employ ultra-high security in their systems may look for a key size bigger than 256-bit AES. In this paper, 128 and 256-bit AES hardware, as well as two variants of an AES encryption algorithm for 512-bit and 1024-bit key size, are implemented and compared in terms of power consumption and area. The experiment is done in 45 nm CMOS technology at 1.1 V using a Synopys DC Compiler and Modelsim and total power consumption and area results are presented and graphically compared. 相似文献

8.

一种高性能大数模运算单元及其应用

陈勇涛段成华《计算机仿真》2009,26(6):339-343

为了加速公钥密码系统的实现速度,设计支持大教模乘和模加减运算的模运算单元是关键.目前的方法多关注于这两种运算的分别实现,为了改善这种方式导致的硬件单元吞吐量低的问题,提出了一种流水线结构的高性能大数模运算单元.基于改进的Montgomery模乘算法,采用流水线技术,把模乘电路分成3个流水线阶段,并把模加减电路结合到第3阶段,得到一种能同时计算模乘和模加减的模运算单元.仿真结果显示,模运算单元以较少的资源占用率获得了较高的吞吐量,非常适合做高性能的公钥密码系统的基本硬件运算单元. 相似文献

9.

32位嵌入式系统中扩展精度数学算法实现

下载免费PDF全文

聂胜伟陆士强程恩惠《计算机工程》2006,32(23):271-272

提出了一种32位嵌入式系统中应用的扩展精度数学算法。适用于缺乏数字协处理器硬件支持并且软件浮点运算达不到系统时间要求的系统。算法运算数据精度高、扩展性好。介绍了32位乘法、除法、开方算法以及64位加法、减法、乘法算法。相似文献

10.

一种新的加法型快速大数模乘算法 总被引：1，自引：0，他引：1

下载免费PDF全文

陈勤周律张旻《计算机工程》2007,33(1):167-169

通过对目前常用的几类模乘方法的综合研究,充分吸取估商型模乘算法的估商思想,借助Montgomery型模乘算法中模2n易计算特性,采用窗口分段处理方式,给出了一种新的利用模N进行预计算的方法,进而提出了一种新的加法型模乘AB mod N快速实现算法。模N为1 024-bit、窗宽为6时,新算法平均仅需693次1 024-bit加法便可完成一次AB mod N模乘运算,与当前加法型模乘算法相比,较大幅度地降低了计算复杂度。相似文献

11.

一种新型的基于Montgomery的模幂器结构

下载免费PDF全文

张远洋李峥杨磊张少武《计算机工程》2007,33(16):211-213

大数模乘是许多公钥密码体制的核心运算，也是运算效率提高的瓶颈。基于Montgomery模乘算法，该文提出了一种改进的快速模乘及其模幂算法，由于采用了新的booth编码，算法的循环次数减少近一半，因此性能提高近一倍。模幂器采用新型的保留进位加法器(CSA)树，此结构无须对每次模乘的结果求和。实验表明，在97MHz时钟频率下，1 024-bit模幂器的波特率为184Kb/s，适合于设计高速的公钥密码协处理器。相似文献

12.

一种改进的Montgomery大数模乘器

苏斌刘宏伟《计算机工程与应用》2005,41(5):126-128

文中针对Montgomery模乘算法进行了分析和改进,采用了一种理想的适合于硬件实现的Montgomery算法。根据此算法提出了一种新的脉动阵列结构,有效降低了芯片的面积,提高了模乘的运算速度。基于CMOS的0.6um工艺下,模乘器VLSI实现共用9k个等效门,最高工作时钟频率可达100MHz,完成1024位Montgomery模乘约需4295个时钟周期。相似文献

13.

基于IMPULSE C的GF(P)域椭圆加密算法的硬件加速

崔强强金同标朱勇《计算机应用》2011,31(9):2385-2388

研究了大素数域上的椭圆曲线加密算法,基于IMPULSE C语言,对该算法进行编程实现;在标准射影坐标系下,对点加和倍加算法进行并行化改进,并且在编程时利用编译器特性做了进一步的并行化。通过对加密算法合理的软硬件分割,将计算量大而且复杂的点乘运算作为硬件部分,通过现场可编程门陈列(FPGA)进行硬件加速;将加密协议的其他部分作为软件部分,在传统CPU上执行,并将硬件部分生成VHDL代码。分别进行加密算法的CoDeveloper的桌面仿真和生成的硬件VHDL代码的ISE综合仿真。最后将该加速设计在Xilinx Virtex-5 xc5vfx70t FPGA开发板上作了实现,基于FPGA的实验结果表明,P-192上点乘运算处理在133MHz时钟下用时2.9 ms,硬件资源分配合理,与现有的手工编写的HDL代码相比,具有并行加速优势。相似文献

14.

大整数Comba和Karatsuba乘法的多核并行化研究

蒋丽娟刘芳芳赵玉文杨超蔡颖《计算机系统应用》2016,25(11):232-236

大整数运算广泛地应用于公钥加密算法、大规模科学计算中高精度浮点数运算类以及构建大特征值等领域,然而其大部分算法空间和时间开销都很大,尤其对于核心运算之一的大整数乘法,当数据达到一定规模时,超长的串行计算时间已成为制约算法应用的巨大瓶颈.近几年来,伴随着多核、众核芯片的迅猛发展,通过充分挖掘算法本身的并行度以利用并行处理器的强大计算能力,进而高效地提升算法性能,成为一种研究趋势.本文基于通用多核并行计算平台,研究了大整数乘法Comba及Karatsuba快速算法的并行化,提出了高效的多核并行算法.在算法实现及性能优化上,采用了OpenMP+SIMD的多级并行技术,使性能获得巨大提升.在性能测试上,我们使用优化的并行算法与原始串行算法进行对比试验,结果显示,8线程并行Comba算法和Karatsuba算法相比串行对应算法分别实现了5.85倍以及6.14倍的性能加速比提升. 相似文献

15.

一种高效率的RSA模幂算法的研究 总被引：6，自引：2，他引：4

饶进平冯登国《计算机工程与应用》2003,39(9):76-77,121

RSA硬件的执行效率主要取决于模幂运算的实现效率。该文旨在介绍一种引入中国剩余定理加速私钥操作,并采用Barret模缩减方法,避开除法运算,将模幂运算转换成三个乘法运算和一个加法运算的快速模幂算法及其硬件实现方法。在乘法运算的实现中,采用Booth乘法器,可以大大缩短电路的关键路径,显著地提高硬件的执行效率。相似文献

16.

基于流水线重构技术的16x16位乘加器的设计

赵倩汤乃云韩桂泽《微计算机信息》2006,(35)

比较了几种16x16位乘加器的实现方法,给出了一种嵌入于微处理器的基于流水线重构技术的16x16位乘加器的设计方案,该设计可完成16bit整数或序数的乘法或乘加运算,并提高了运算的速度,减少了面积。利用CadenceEDA工具对电路进行了仿真,仿真结果验证了设计的准确性。相似文献

17.

低功耗嵌入式平台的SM2国密算法优化实现

下载免费PDF全文

刘赣秦李晖朱辉黄煜坤刘兴东《网络与信息安全学报》2022,8(6):29-38

随着无线通信技术的发展和智能终端的普及,越来越多的密码算法被应用到物联网设备中以保障通信安全和数据安全,其中,由国家密码管理局提出的SM2椭圆曲线公钥密码算法作为我国自主研发的椭圆曲线公钥密码算法具有安全性高、密钥短的优点,已在通信系统中广泛部署,应用于身份认证、密钥协商等关键环节。然而,由于算法涉及有限域上的大整数运算,计算开销较大,在低功耗嵌入式平台下的执行严重影响用户体验。因此,面向ARM-m系列处理器提出了一种低功耗嵌入式平台的SM2算法的高效实现方案。具体来说,通过Thumb-2指令集提供的支持处理进位和节省寻址周期,对大整数的模加、模减等基础运算进行优化,并结合平台可用寄存器的数量构建高效的基础运算模块;基于ARM-m系列处理器乘累加指令周期短的特点,优化蒙哥马利乘法实现,并结合CIOS算法设计高效的模乘方案,方案不再局限于梅森素数,极大地提高了模乘计算的速度和灵活性;在理论分析和实验测试的基础上,给出了嵌入式平台上多倍点标量乘法w NAF滑动窗法的窗长选取方法。实验测试结果表明,可有效提升资源受限的低功耗嵌入式平台中SM2算法的计算效率,不做预计算的情况下在Cortex-... 相似文献

18.

An 8-bit systolic AES architecture for moderate data rate applications

Sheikh Muhammad Farhan Shoab A. Khan Habibullah Jamal 《Microprocessors and Microsystems》2009,33(3):221-231

The complexity involved in mapping an algorithm to hardware is a function of the controller logic and data path. Minimizing data path size can lead to significant savings in hardware area and power dissipation. This paper presents an implementation of a novel architectural transformation technique for mapping a word bit wide algorithm to byte vector serial architecture. The technique divides the input word to several bytes and then traces each byte for extracting architectural transformation. The technique is applied on Advanced Encryption Standard (AES) algorithm which is non-linear in nature. Using this technique, the 32-bit AES algorithm is transformed into a byte-systolic architecture. The novelty of the technique is more pronounced around the mix column design which is the most complex part of the AES algorithm. The complex matrix multiplication component and standard transformations of the 32-bit AES algorithm are transformed to support 8-bit operations. The resulted AES architectures reuse same logic resources for key expansion and encryption/decryption. The proposed design offers moderate data rates in the range of 41 Mbps for encryption and 37 Mbps for decryption while utilizing 236 and 280 slices, respectively, on Xilinx Virtex II xc2v1000-6 FPGA. Comparison results show significant gain in throughput when compared with other 8-bit designs. This makes it a viable data/communication security solution for a variety of embedded and consumer electronics. 相似文献

19.

模重复平方算法的rho改进算法

石小平姜浩《计算机应用与软件》2011,28(12)

利用循环二进制方法给出了适合大指数模乘运算的模重复平方算法的rho改进算法,以提高模幂乘法的计算速度。新算法的实质是一种指数约减算法,可以有效减少模重复平方算法中的模乘运算。通过实例计算表明,新算法可以极大地提高运算速度。相似文献

20.

扩展到整数类型范围的模的模乘算法

邵荣《计算机应用》2012,32(9):2470-2471

针对模乘运算的模超过一半整数位会发生算术溢出,不使用高精度运算就无法处理的问题,提出一种利用同余关系缩小乘积的模乘算法。通过将整数分解成两位数,按照两位数乘法的原理,将高位部分乘积用同余关系缩小,避免了乘法运算过程的算术溢出。结果表明,该方法可以将64位整数为基础的模乘运算的模扩大到62位。相似文献