首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 187 毫秒
1.
高速双有限域加密协处理器设计   总被引:10,自引:3,他引:7  
文章提出了一种能够同时在有限域GF(P)和GF(2^m)中高速实现椭圆曲线密码算法(ECC)的协处理器。该协处理器能够高速完成椭圆曲线密码算法中各种基本的运算。通过调用这些基本的模运算指令,可以实现各种ECC上的加密算法。该协处理器支持512位以下任意长度的模运算。协处理器工作速度很快,整个协处理器综合采用了多种加速结构和算法并采用了流水线结构设计。根据物理综合的结果,协处理器可以工作在300MHz的频率,运算时间比此前的一些同类芯片快4到10倍左右。  相似文献   

2.
RSA密码协处理器的实现   总被引:11,自引:0,他引:11  
李树国  周润德  冯建华  孙义和 《电子学报》2001,29(11):1441-1444
密码协处理器的面积过大和速度较慢制约了公钥密码体制RSA在智能卡中的应用.文中对Montgomery模乘算法进行了分析和改进,提出了一种新的适合于智能卡应用的高基模乘器结构.由于密码协处理器采用两个32位乘法器的并行流水结构,这与心动阵列结构相比它有效地降低了芯片的面积和模乘的时钟数,从而可在智能卡中实现RSA的数字签名与认证.实验表明:在基于0.35μm TSMC标准单元库工艺下,密码协处理器执行一次1024位模乘需1216个时钟周期,芯片设计面积为38k门.在5MHz的时钟频率下,加密1024位的明文平均仅需374ms.该设计与同类设计相比具有最小的模乘运算时钟周期数,并使芯片的面积降低了1/3.这个指标优于当今电子商务的密码协处理器,适合于智能卡应用.  相似文献   

3.
以RSA算法为例,探讨了公钥密码处理芯片的设计与实现.首先提出了公钥密码芯片实现中的核心问题,即大整数模幂运算算法和大整数模乘运算算法的实现;然后针对RSA算法,提出了Montgomery模乘算法的CIOS方法的一种新的快速硬件并行实现方法,其中采用了加法与乘法并行运算以及多级流水线技术以提高性能,较大地减少了乘法运算时间,提高了模乘器的性能.  相似文献   

4.
大数模幂乘运算的VLSI实现   总被引:5,自引:0,他引:5  
信息加密,数字答乐,身份证等等是信息安全领域的重要内容,只有公钥密友体制才能很好地解决这些问题,大数模幂乘运算是许多公钥密友体制的核心运算,也是运算效率提高的瓶颈。基于Montgomery模乘变换,构造了一种新型的脉动阵列架构模乘运算器。结合简单二进制幂运算算法,采用0.8μm CMOS工艺,成功地设计并制造了256bit模幂乘运算器THM256,电路规模为18677门,芯片面积为17.63mm6  相似文献   

5.
作为由国家密码管理局公布的SM2椭圆曲线公钥密码算法的核心运算,模乘的实现好坏直接决定着整个密码芯片性能的优劣.Montgomery模乘算法是目前最高效也是应用最为广泛的一种模乘算法.本文基于Mont-gomery模乘算法,设计了一种高速,且支持双域(GF(p)素数域和GF(2m )二进制域)的Montgomery模乘器.提出了新的实现结构,以及一种新型的W allace树乘法单元.通过对模块合理的安排和复用,本设计极大的缩小了时间消耗与硬件需求,节省了大量的资源.实现256位双域模乘仅需0.34μs .  相似文献   

6.
高性能可扩展公钥密码协处理器研究与设计   总被引:1,自引:0,他引:1       下载免费PDF全文
黎明  吴丹  戴葵  邹雪城 《电子学报》2011,39(3):665-670
 本文提出了一种高效的点乘调度策略和改进的双域高基Montgomery模乘算法,在此基础上设计了一种新型高性能可扩展公钥密码协处理器体系结构,并采用0.18μm 1P6M标准CMOS工艺实现了该协处理器,以支持RSA和ECC等公钥密码算法的计算加速.该协处理器通过扩展片上高速存储器和使用以基数为处理字长的方法,具有良好的可扩展性和较强的灵活性,支持2048位以内任意大数模幂运算以及576位以内双域任意椭圆曲线标量乘法运算.芯片测试结果表明其具有很好的加速性能,完成一次1024位模幂运算仅需197μs、GF(p)域192位标量乘法运算仅需225μs、GF(2m)域163位标量乘法运算仅需200.7μs.  相似文献   

7.
在公钥密码体制中,都涉及到大数模乘运算,其实现效率将直接影响整个系统的响应速度。将大数模乘运算用专用集成电路快速而又低成本地实现,将有助于电子商务的快速推广。该文针对应用很广的RSA公钥密码算法,提出了一种高基(2H进制)的大数模乘硬件实现方法。这种设计方法通过合理增加部分硬件开销,动态构造并行加法并配用初始化存储数据表提高模乘运算的时空效率。作者已成功地在Altera公司的Stratix-epls10f780c6芯片上实现512比特大数乘法运算,仅需437.5ns,是目前公开文献上FPGA实现速度的10倍左右。  相似文献   

8.
江宝安 《通信技术》2022,(3):346-350
公钥密码是一种非对称加密密码算法,典型的公钥密码算法有RSA、ElGamal和椭圆曲线密码(Elliptic Curves Cryptography,ECC)算法,这些算法都有各自的优缺点,适应不同场合.基于阶为梅森素数的有限域乘法单群,提出了一种新的Diffie-Hellman公钥密码算法.该算法本身基于模2运算,便...  相似文献   

9.
基于Montgomery模乘的RSA算法VLSI实现   总被引:2,自引:1,他引:1  
介绍了一种基于可伸展的Montgomery模乘结构的1024位RSA加解密芯片实现。设计采用的新型心动阵列结构,可以有在有效控制芯片面积的前提下,极大地提高运算频率,从而提高运算速度。经过ModelSim仿真和Design Compiler综合,与当前已发表的RSA芯片设计相比,该设计在面积和速度上均有优势。  相似文献   

10.
RSA是目前公认的在理论和实际应用中最为成熟和完善的一种公钥密码体制,不仅可以进行加密,还可以用来进行数字签名和身份验证,是公钥密码体制的代表。对RSA数据加密算法在数字签名中的应用作了详细的分析,对RSA算法做了全面的讨论,大数模幂乘运算是实现RSA等公钥密码的基本运算,其运行效率决定了RSA公钥密码的性能,主要研究了各种模幂算法的快速实现方法,对其中某些环节做了适当的改进。  相似文献   

11.
A four-processor chip, for use in processor arrays for image computations, is described. The large degree of data parallelism available in image computations allows dense array implementations where all processors operate under the control of a single instruction stream. An instruction decoder shared by the four processors on the chip minimizes the pin count allocated for global control of the processors. The chip incorporates an interface to an external SRAM (static RAM) for memory expansion without glue chips. The full-custom 2-μm CMOS chip contains 56669 transistors and runs instructions at 10 MHz. Five hundred and twelve 16-b processors and 4 Mbyte of distributed external memory fit on two industry standard cards to yield 5-billion instructions per second peak throughout. As image I/O can overlap perfectly with pixel computation, an array containing 128 of these chips can provide more than 600 16-b operations per pixel on 512×512 images at 30 Hz  相似文献   

12.
Methods are presented for developing synthesizable FFT cores. These are based on a modular approach in which parameterized commutator and processor blocks are cascaded to implement the computations required in many important FFT signal flow graphs. In addition, it is shown how the use of a digital serial data organization can be used to produce systems that offer 100% processor utilization along with reductions in storage requirements. The approach has been used to create generators for the automated synthesis of FFT cores that are portable across a broad range of silicon technologies. Resulting chip designs are competitive with ones created using manual methods but with significant reductions in design times  相似文献   

13.
在公钥密码实现中,Montgomery模乘扮演着非常重要的角色。本文研究Montgomery模乘(MMM)的迭代控制结构,给出了进行MMM迭代的输入边界控制条件,以及改进的MMM算法。这种扩展的迭代控制条件适合用于复杂求幂的迭代过程,在其边界控制下可直接进行一些加法、减法及乘法等基本运算,而无须模约化处理。给出的模乘迭代算法具有高度的灵活性,可利用来实现安全高效的RSA、ECC等公钥密码体制。  相似文献   

14.
A high-speed shuffle bus which is able to implement various kinds of communication schemes for VLSI processor arrays is presented. Because of the simple and modular nature of the shuffle bus, the processor arrays can now be easily modularized and equipped with flexible capabilities for both global and neighboring communications. With data swapping at a rate as high as 200 MHz between adjacent registers, the shuffle bus is about 20 times faster than those bidirectional pipelined buses. The shuffle bus may be expanded to accommodate any number of nodes with an arbitrary number of bits in each word. With only a few sets of control patterns, the shuffle bus can be applied to the implementation of a wide variety of interconnection networks, sorting networks, FIFO, and associative memory  相似文献   

15.
In this article, a parallel hardware processor is presented to compute elliptic curve scalar multiplication in polynomial basis representation. The processor is applicable to the operations of scalar multiplication by using a modular arithmetic logic unit (MALU). The MALU consists of two multiplications, one addition, and one squaring. The two multiplications and the addition or squaring can be computed in parallel. The whole computations of scalar multiplication over GF(2163) can be performed in 3 064 cycles. The simulation results based on Xilinx Virtex2 XC2V6000 FPGAs show that the proposed design can compute random GF(2163) elliptic curve scalar multiplication operations in 31.17 μs, and the resource occupies 3 994 registers and 15 527 LUTs, which indicates that the crypto-processor is suitable for high-performance application.  相似文献   

16.
A special purpose computer intended primarily for use as a terminal multiplexor has been constructed. Program storage for the processor is implemented using an Ovonic read-mostly memory. In addition a bipolar scratchpad memory is implemented for data buffering and status information storage. The processor organization and the Ovonic memory design are discussed. The Ovonic memory is modular in 2048-bit increments to 8192 bits. It has an access time of 70 ns and a cycle time of 125 ns. The processor executes three million instructions per second and can support up to 128 mixed terminal devices with minimum interface circuitry. The ability to alter the program store provides needed flexibility to reconfigure the complement of terminal devices.  相似文献   

17.
Modular exponentiation is the cornerstone computation in public-key cryptography systems such as RSA cryptosystems. The operation is time consuming for large operands. This paper describes the characteristics of three architectures designed to implement modular exponentiation using the fast binary method: the first field-programmable gate array (FPGA) prototype has a sequential architecture, the second has a parallel architecture, and the third has a systolic array-based architecture. The paper compares the three prototypes as well as Blum and Paar's implementation using the time /spl times/ area classic factor. All three prototypes implement the modular multiplication using the popular Montgomery algorithm.  相似文献   

18.
A prototype vision chip has been designed that incorporates a 20 × 64 array of processing elements on a 31 μm pitch. Each processor element includes 14 bits of digital memory in addition to seven analogue registers. Digital operands include NOR and NOT with operations of diffusion, subtraction, inversion and squaring available in the analogue domain. The cells of the array can be configured as an asynchronous propagation network allowing operations such as flood filling to occur with times of ~1 μs across the array. Exploiting this feature allows the chip to recognise the difference between closed and open shapes at 30,000 frames per second. The chip is fabricated in 0.18 μm CMOS technology.  相似文献   

19.
This paper presents a novel approach for low-power high-performance inner product processor design. The processor is dynamically reconfigurable for computing inner products of input arrays with four or more combinations of array dimensions and precision. The processor mainly consists of an array of 8×8 or 4×4 small multipliers plus two or three arrays of adders. It requires very simple reconfigurable components. The whole network may be reconfigured by using a few control bits for the desired computations, and the reconfiguration can be done dynamically. The design is regular, modular, and can easily be pipelined, and most parts of the network are symmetric and repeatable. A set of low-power high-performance parallel counters is also proposed for the implementation of the design, which could lead to a significant reduction in worst case power dissipation compared with traditional binary-logic based architectures, while showing superiority in speed, VLSI area, and layout simplicity  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号