期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

仲先海徐金甫严迎建《微电子学与计算机》2009,26(2)

基于多精度双域Montgomery模乘算法实现了一个可伸缩双域模乘器.模乘器处理单元采用新颖的三时钟结构代替传统的双时钟结构缩短关键路径延时,提高了时钟频率.使用SMIC0.18μm COMS标准单元工艺库综合后,模乘器的时钟频率最大能达到240MHz,计算256bit有限域GF(p)上的模乘只需要0.23μs. 相似文献

2.

一种Montgomery模乘的硬件算法及其实现 总被引：1，自引：0，他引：1

方颖立高志强《微电子学》2002,32(4):276-278,282

采用大数的高基表示方法对原 Montgomery算法进行了改进 ,提出了一种高效的面向硬件的计算 Montgomery积的算法 ,按照该算法实现的硬件具有较低的复杂度和较高的处理速度 ,并且利用 CSMC的 0 .6 μm CMOS标准单元库实现了 5 1 2位的 Montgomery模乘器。该模乘器约含480 0 0等效门 ,面积约为 3 mm× 3 mm,最高工作时钟频率可达 40 MHz,完成 5 1 2位 Montgomery模乘需要 3 4 1个时钟周期相似文献

3.

双域可重构CIOS模乘器的研究与设计

下载免费PDF全文

王威严迎建李伟李默然《微电子学》2015,45(4):502-506

为提高ECC处理器中模乘的运算速度,并支持长度可重构的双域(素域和二元域)模乘计算,分析了CIOS模乘算法的并行性,提出了适用于硬件并行加速的双参数CIOS算法,设计了6级流水线以及多字数据并行运算的模乘硬件电路,可实现1 152位以内任意长度的双域模乘运算。在CMOS 0.18 μm工艺库下综合并布局布线,电路最大时钟频率为238 MHz。与其他文献的运算时间相比,160～1 024位模乘运算时间减少了17%～40%。相似文献

4.

RSA密码协处理器的实现 总被引：11，自引：0，他引：11

李树国周润德冯建华孙义和《电子学报》2001,29(11):1441-1444

密码协处理器的面积过大和速度较慢制约了公钥密码体制RSA在智能卡中的应用.文中对Montgomery模乘算法进行了分析和改进,提出了一种新的适合于智能卡应用的高基模乘器结构.由于密码协处理器采用两个32位乘法器的并行流水结构,这与心动阵列结构相比它有效地降低了芯片的面积和模乘的时钟数,从而可在智能卡中实现RSA的数字签名与认证.实验表明:在基于0.35μm TSMC标准单元库工艺下,密码协处理器执行一次1024位模乘需1216个时钟周期,芯片设计面积为38k门.在5MHz的时钟频率下,加密1024位的明文平均仅需374ms.该设计与同类设计相比具有最小的模乘运算时钟周期数,并使芯片的面积降低了1/3.这个指标优于当今电子商务的密码协处理器,适合于智能卡应用. 相似文献

5.

基于FPGA的素域模乘快速实现方法

韩炼冰黄锐段俊红王松房利国《信息安全与通信保密》2013,(9)

素域中的模乘运算是椭圆曲线密码体制中必不可少的基本运算,模乘运算的速度影响椭圆曲线算法的整体性能.文中设计了一种融合了窗口技术和流水线技术的素域模乘快速实现方法,采用硬件描述语言VHDL完成模乘的设计实现,并优化设计,充分发挥了流水线的优势.通过Modelsim仿真工具仿真,正确完成一次模乘运算只需要96个时钟周期.在Altera EP2AGX45 FPGA中的运行结果表明:150 Mhz的时钟频率下,完成一次384 bits的模乘运算仅需要0.64 us. 相似文献

6.

一种基于椭圆曲线的流水线实现方法 总被引：2，自引：2，他引：0

张霄鹏李树国《微电子学与计算机》2008,25(7)

提出了一种基于椭圆曲线的流水线实现方法,来解决串行计算的效率低下问题.通过分析椭圆曲线密码运算的数据相关性,在不增加模乘器面积的前提下,采用三级流水线,提高了椭圆曲线密码的运算速度,并给出适用于椭圆曲线密码VLSI设计的流水线的实现流程. 相似文献

7.

一种大数模幂的硬件实现设计

王晓林周玉洁《信息技术》2005,29(10):41-44

提出了一种实现大数模幂的硬件设计方法。其中的大数模乘部分基于基2的Montgomery改进算法，采用模乘心动阵列结构，提出了一种双边沿触发串行计算的新结构，节约了面积，同时可以达到较高的时钟频率。模幂部分基于M-ary算法，减少了所需模乘运算的次数。并比较了这种实现方法与常见的L-R二进制幂算法的实现方式速度上的改进。相似文献

8.

适于流水线结构的改进FIPS算法及其实现

谷荧柯白国强陈弘毅《微电子学》2008,38(5)

分析了基于FIPS的乘加器结构的VLSI实现随着操作数宽度的变化,速度和面积的变化趋势; 提出了一种改进FIPS算法,解决了采用流水线结构的数据通路导致的数据迟滞问题.在SMIC 0.18 μm CMOS工艺下,基于该改进算法,设计了一个128位操作数位宽的模乘器,与基于原算法的设计相比,硬件面积增加约5%,效率提高了约42%.利用该模乘器进行1 024位RSA运算时,速度可达1.1 Mbps. 相似文献

9.

大数模乘脉动阵列的FPGA细粒度映射实现

黄谆白国强陈弘毅《微电子学与计算机》2005,22(7):31-35,41

通过分析FPGA可配置逻辑块的细致结构,提出了一种基于FPGA的细粒度映射方法,并使用该方法高效实现了大数模乘脉动阵列.在保持高速计算特点的同时,将模乘脉动阵列的资源消耗降低为原来的三分之一.在低成本的20万门级FPGA器件中即可实现1024位模乘器.该实现每秒可进行20次RSA签名.如果换用高性能FPGA,签名速度更可提高至每秒40次. 相似文献

10.

基于阵列结构的ECC算法核心运算模块设计

杨玲王友仁《微电子学》2010,40(3)

针对椭圆曲线密码算法复杂、计算开销大、运算强度高和数据量大的特点,提出一种ECC算法硬件实现阵列处理结构,设计了有限域GP(2m)上的核心运算(模乘和模除运算)模块,实现了核心算法到计算结构的空间映射.设计原型在Xilinx公司Virtex-E系列FPGA器件上实现并进行验证.实验结果表明,该结构可获得较高的并行处理能力和计算效率,时钟频率和运算速度显著提高,在100 MHz时钟频率下,点乘运算速度达到平均90多次/秒. 相似文献

11.

New and Improved Architectures for Montgomery Modular Multiplication

M. Sudhakar R. V. Kamala M. B. Srinivas 《Mobile Networks and Applications》2007,12(4):281-291

In this paper an improved Montgomery multiplier, based on modified four-to-two carry-save adders (CSAs) to reduce critical path delay, is presented. Instead of implementing four-to-two CSA using two levels of carry-save logic, authors propose a modified four-to-two CSA using only one level of carry-save logic taking advantage of pre-computed input values. Also, a new bit-sliced, unified and scalable Montgomery multiplier architecture, applicable for both RSA and ECC (Elliptic Curve Cryptography), is proposed. In the existing word-based scalable multiplier architectures, some processing elements (PEs) do not perform useful computation during the last pipeline cycle when the precision is not equal to an exact multiple of the word size, like in ECC. This intrinsic limitation requires a few extra clock cycles to operate on operand lengths which are not powers of 2. The proposed architecture eliminates the need for extra clock cycles by reconfiguring the design at bit-level and hence can operate on any operand length, limited only by memory and control constraints. It requires 2∼15% fewer clock cycles than the existing architectures for key lengths of interest in RSA and 11∼18% for binary fields and 10∼14% for prime fields in case of ECC. An FPGA implementation of the proposed architecture shows that it can perform 1,024-bit modular exponentiation in about 15 ms which is better than that by the existing multiplier architectures.

M. B. SrinivasEmail:

相似文献

12.

Hardware Elliptic Curve Cryptographic Processor Over$rm GF(p)$

《IEEE transactions on circuits and systems. I, Regular papers》2006,53(9):1946-1957

A novel hardware architecture for elliptic curve cryptography (ECC) over$ GF(p)$is introduced. This can perform the main prime field arithmetic functions needed in these cryptosystems including modular inversion and multiplication. This is based on a new unified modular inversion algorithm that offers considerable improvement over previous ECC techniques that use Fermat's Little Theorem for this operation. The processor described uses a full-word multiplier which requires much fewer clock cycles than previous methods, while still maintaining a competitive critical path delay. The benefits of the approach have been demonstrated by utilizing these techniques to create a field-programmable gate array (FPGA) design. This can perform a 256-bit prime field scalar point multiplication in 3.86 ms, the fastest FPGA time reported to date. The ECC architecture described can also perform four different types of modular inversion, making it suitable for use in many different ECC applications. 相似文献

13.

Cellular-array modular multiplier for fast RSA public-key cryptosystem based on modified Booth's algorithm

Jin-Hua Hong Cheng-Wen Wu 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2003,11(3):474-484

We propose a radix-4 modular multiplication algorithm based on Montgomery's algorithm, and a fast radix-4 modular exponentiation algorithm for Rivest, Shamir, and Adleman (RSA) public-key cryptosystem. By modifying Booth's algorithm, a radix-4 cellular-array modular multiplier has been designed and simulated. The radix-4 modular multiplier can be used to implement the RSA cryptosystem. Due to reduced number of iterations and pipelining, our modular multiplier is four times faster than a direct radix-2 implementation of Montgomery's algorithm. The time to calculate a modular exponentiation is about n/sup 2/ clock cycles, where n is the word length, and the clock cycle is roughly the delay time of a full adder. The utilization of the array multiplier is 100% when we interleave consecutive exponentiations. Locality, regularity, and modularity make the proposed architecture suitable for very large scale integration implementation. High-radix modular-array multipliers are also discussed, at both the bit level and digit level. Our analysis shows that, in terms of area-time product, the radix-4 modular multiplier is the best choice. 相似文献

14.

Customizable elliptic curve cryptosystems

Cheung R.C.C. Telle N.J. Luk W. Cheung P.Y.K. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2005,13(9):1048-1059

This paper presents a method for producing hardware designs for elliptic curve cryptography (ECC) systems over the finite field GF(2/sup m/), using the optimal normal basis for the representation of numbers. Our field multiplier design is based on a parallel architecture containing multiple m-bit serial multipliers; by changing the number of such serial multipliers, designers can obtain implementations with different tradeoffs in speed, size and level of security. A design generator has been developed which can automatically produce a customised ECC hardware design that meets user-defined requirements. To facilitate performance characterization, we have developed a parametric model for estimating the number of cycles for our generic ECC architecture. The resulting hardware implementations are among the fastest reported: for a key size of 270 bits, a point multiplication in a Xilinx XC2V6000 FPGA at 35 MHz can run over 1000 times faster than a software implementation on a Xeon computer at 2.6 GHz. 相似文献

15.

A 230-MHz half-bit level pipelined multiplier using truesingle-phase clocking

Somasekhar D. Visvanathan V. 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》1993,1(4):415-422

An 8-bit×8-bit signed two's complement pipelined multiplier megacell implemented in 1.6-μm single-poly, double-metal N-well CMOS is described. It is capable of throughputs of 230,000,000 multiplications/s at a clock frequency of 230 MHz, with a latency of 12 clock cycles. A half-bit level pipelined architecture, and the use of true single-phase clocked circuitry are the key features of this design. Simulation studies indicate that the multiplier dissipates 540 mW at 230 MHz. The multiplier cell has 5176 transistors, with dimensions of 1.5 mm×1.4 mm. This multiplier satisfies the need for very high-throughput multiplier cores required in DSP architectures 相似文献

16.

A Scalable Structure for a Multiplier and an Inversion Unit in GF(2m)

Chanho Lee Jeongho Lee 《ETRI Journal》2003,25(5):315-320

Elliptic curve cryptography (ECC) offers the highest security per bit among the known public key cryptosystems. The operation of ECC is based on the arithmetic of the finite field. This paper presents the design of a 193‐bit finite field multiplier and an inversion unit based on a normal basis representation in which the inversion and the square operation units are easy to implement. This scalable multiplier can be constructed in a variable structure depending on the performance area trade‐off. We implement it using Verilog HDL and a 0.35 µm CMOS cell library and verify the operation by simulation. 相似文献

17.

A 2.5-10-GHz clock multiplier unit with 0.22-ps RMS jitter in standard 0.18-/spl mu/m CMOS

van de Beek R.C.H. Vaucher C.S. Leenaerts D.M.W. Klumperink E.A.M. Nauta B. 《Solid-State Circuits, IEEE Journal of》2004,39(11):1862-1872

This paper demonstrates a low-jitter clock multiplier unit that generates a 10-GHz output clock from a 2.5-GHz reference clock. An integrated 10-GHz LC oscillator is locked to the input clock, using a simple and fast phase detector circuit that overcomes the speed limitation of a conventional tri-state phase frequency detector due to the lack of an internal feedback loop. A frequency detector guarantees PLL locking without degenerating jitter performance. The clock multiplier is implemented in a standard 0.18-/spl mu/m CMOS process and achieves a jitter generation of 0.22 ps while consuming 100 mW power from a 1.8-V supply. 相似文献

18.

A DLL-Based Programmable Clock Multiplier in 0.18-μ m CMOS With −70 dBc Reference Spur

Maulik P.C. Mercer D.A. 《Solid-State Circuits, IEEE Journal of》2007,42(8):1642-1648

This paper describes a 150-400 MHz programmable clock multiplier which uses a recirculating DLL. The clock multiplier uses a sampling phase detector and employs chopping, autozeroing and various other circuit techniques to reduce static phase offset and crosstalk between the reference and the output clock. The DLL is implemented in 0.18-mum CMOS, consumes 16 mW of power, and achieves 1-5 ps RMS jitter and -70 dBc reference spur level. 相似文献

19.

一种开关移位式32位乘法器的设计

刘学勇李晓江马成炎《电子器件》2008,31(5)

乘法器在CPU的ALU设计中是很重要,也是较为复杂的一部分,它占据大的面积和较长的延时.根据系统不同的要求,我们可以设计出不同的乘法器.本文是在系统时钟要求和面积两方的限制下做了折衷,提出了一种基于开关和移位工作方式的多时钟周期乘法器的设计.最后用DC进行综合,并经VCS仿真得到结果与SYNOPSYS公司design_ware里的乘法器进行比较,指出其优缺点. 相似文献