首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We propose a radix-4 modular multiplication algorithm based on Montgomery's algorithm, and a fast radix-4 modular exponentiation algorithm for Rivest, Shamir, and Adleman (RSA) public-key cryptosystem. By modifying Booth's algorithm, a radix-4 cellular-array modular multiplier has been designed and simulated. The radix-4 modular multiplier can be used to implement the RSA cryptosystem. Due to reduced number of iterations and pipelining, our modular multiplier is four times faster than a direct radix-2 implementation of Montgomery's algorithm. The time to calculate a modular exponentiation is about n/sup 2/ clock cycles, where n is the word length, and the clock cycle is roughly the delay time of a full adder. The utilization of the array multiplier is 100% when we interleave consecutive exponentiations. Locality, regularity, and modularity make the proposed architecture suitable for very large scale integration implementation. High-radix modular-array multipliers are also discussed, at both the bit level and digit level. Our analysis shows that, in terms of area-time product, the radix-4 modular multiplier is the best choice.  相似文献   

2.
An architecture based on the RSA public key cryptography algorithm is presented. The circuit includes two components, one for modular squaring and one for modular multiplication. Each component is based on the Montgomery algorithm and implements the modular operations using two modified serial-parallel multipliers. A full modular exponentiation is completed every n(n + 3) clock cycles. All circuits are systolic, operate with 100% efficiency and their maximum combinational delay is equal to one gated Full-Adder. Thus, high-speed performance is achieved while the low cell hardware complexity enables an efficient VLSI implementation.  相似文献   

3.
In this paper an improved Montgomery multiplier, based on modified four-to-two carry-save adders (CSAs) to reduce critical path delay, is presented. Instead of implementing four-to-two CSA using two levels of carry-save logic, authors propose a modified four-to-two CSA using only one level of carry-save logic taking advantage of pre-computed input values. Also, a new bit-sliced, unified and scalable Montgomery multiplier architecture, applicable for both RSA and ECC (Elliptic Curve Cryptography), is proposed. In the existing word-based scalable multiplier architectures, some processing elements (PEs) do not perform useful computation during the last pipeline cycle when the precision is not equal to an exact multiple of the word size, like in ECC. This intrinsic limitation requires a few extra clock cycles to operate on operand lengths which are not powers of 2. The proposed architecture eliminates the need for extra clock cycles by reconfiguring the design at bit-level and hence can operate on any operand length, limited only by memory and control constraints. It requires 2∼15% fewer clock cycles than the existing architectures for key lengths of interest in RSA and 11∼18% for binary fields and 10∼14% for prime fields in case of ECC. An FPGA implementation of the proposed architecture shows that it can perform 1,024-bit modular exponentiation in about 15 ms which is better than that by the existing multiplier architectures.
M. B. SrinivasEmail:
  相似文献   

4.
Bit-level systolic arrays for modular multiplication   总被引:4,自引:0,他引:4  
This paper presents bit-level cellular arrays implementing Blakley's algorithm for multiplication of twon-bit integers modulo anothern-bit integer. The semi-systolic version uses 3n(n+3) single-bit carry save adders and 2n copies of 3-bit carry look-ahead logic, and computes a pair of binary numbers (C, S) in 3n clock cycles such thatC+S[0, 2N). The carry look-ahead logic is used to estimate the sign of the partial product, which is needed during the reduction process. The final result in the correct range [0,N) can easily be obtained by computingC+S andC+S–N, and selecting the latter if it is positive; otherwise, the former is selected. We construct a localized process dependence graph of this algorithm, and introduce a systolic array containing 3nw simple adder cells. The latency of the systolic array is 6n+w–2, wherew=n/2. The systolic version does not require broadcast and can be used to efficiently compute several modular multiplications in a pipelined fashion, producing a result in every clock cycle.  相似文献   

5.
毛天然  李树国 《微电子学》2006,36(3):344-346,351
提出了一种基于Montgomery算法的模乘器。与现有结构相比,由于采用了多级流水线的乘法器结构,提高了系统的时钟频率;并通过引入预计算单元,解决了流水线停顿的问题,提高了系统的并行性,减少了所需的时钟数。该模乘器位长233位,基于SMIC 0.18μm最坏工艺的综合结果表明,电路的关键路径最大时延为3.8 ns,芯片面积2 mm2。一次模乘计算只需要108个时钟周期,适合ECC密码体制的应用要求。  相似文献   

6.
Let S be a set of n points in the plane. We derive algorithms for approximating S by a step function of size k < n, i.e., by an x-monotone rectilinear polyline with k < n horizontal segments. We use the vertical distance to measure the quality of the approximation, i.e., the maximum distance from a point in S to the horizontal segment directly above or below it. We consider two types of problems: min-ε, where the goal is to minimize the error for a given number of horizontal segments k and min-#, where the goal is to minimize the number of segments for a given allowed error ε. After O (n) preprocessing time, we solve instances of the latter in O (min{k log nn}) time per instance. We can then solve the former problem in O (min{n2, nk log n}) time. Both algorithms require O (n) space. Our second contribution is an approximation algorithm for the min-ε problem that computes a solution within a factor of 3 of the optimal error for k segments, or with at most the same error as the k-optimal but using 2k − 1 segments. Furthermore, experiments on real data show even better results than what is guaranteed by the theoretical bounds. Both approximations run in O (n log n) time and O (n) space.  相似文献   

7.
RSA密码协处理器的实现   总被引:11,自引:0,他引:11  
李树国  周润德  冯建华  孙义和 《电子学报》2001,29(11):1441-1444
密码协处理器的面积过大和速度较慢制约了公钥密码体制RSA在智能卡中的应用.文中对Montgomery模乘算法进行了分析和改进,提出了一种新的适合于智能卡应用的高基模乘器结构.由于密码协处理器采用两个32位乘法器的并行流水结构,这与心动阵列结构相比它有效地降低了芯片的面积和模乘的时钟数,从而可在智能卡中实现RSA的数字签名与认证.实验表明:在基于0.35μm TSMC标准单元库工艺下,密码协处理器执行一次1024位模乘需1216个时钟周期,芯片设计面积为38k门.在5MHz的时钟频率下,加密1024位的明文平均仅需374ms.该设计与同类设计相比具有最小的模乘运算时钟周期数,并使芯片的面积降低了1/3.这个指标优于当今电子商务的密码协处理器,适合于智能卡应用.  相似文献   

8.
9.
We study the problem of routing in three-dimensional ad hoc networks. We are interested in routing algorithms that guarantee delivery and are k-local, i.e., each intermediate node v’s routing decision only depends on knowledge of the labels of the source and destination nodes, of the subgraph induced by nodes within distance k of v, and of the neighbour of v from which the message was received. We model a three-dimensional ad hoc network by a unit ball graph, where nodes are points in three-dimensional space, and for each node v, there is an edge between v and every node u contained in the unit-radius ball centred at v. The question of whether there is a simple local routing algorithm that guarantees delivery in unit ball graphs has been open for some time. In this paper, we answer this question in the negative: we show that for any fixed k, there can be no k-local routing algorithm that guarantees delivery on all unit ball graphs. This result is in contrast with the two-dimensional case, where 1-local routing algorithms that guarantee delivery are known. Specifically, we show that guaranteed delivery is possible if the nodes of the unit ball graph are contained in a slab of thickness \(1/\sqrt{2}.\) However, there is no k-local routing algorithm that guarantees delivery for the class of unit ball graphs contained in thicker slabs, i.e., slabs of thickness \(1/\sqrt{2} + \epsilon\) for some \( \epsilon > 0.\) The algorithm for routing in thin slabs derives from a transformation of unit ball graphs contained in thin slabs into quasi unit disc graphs, which yields a 2-local routing algorithm. We also show several results that further elaborate on the relationship between these two classes of graphs.  相似文献   

10.
A generalized -bit least-significant-digit (LSD) first, serial/parallel multiplier architecture is presented with 1n wheren is the operand size. The multiplier processes both the serial input operand and the double precision product -bits per clock cycle in an LSD first, synchronous fashion. The complete two's complement double precision product requires 2n/ clock cycles. This generalized architecture creates a continuum of multipliers between traditional bit-serial/parallel multipliers (=1) and fully-parallel multipliers (=n). -bit serial/parallel multipliers allow anoptimized integrated circuit arithmetic to be designed based on a particular application's area, power, throughput, latency, and numerical precision constraints.This project was pratically funded by the UCSD-NSF I/UCR Center on Ultra-High Speed Intergrated Circuits and Systems.  相似文献   

11.
In this paper, a fully-pipeline linear systolic array based on adjusted Montgomery's algorithm is presented to perform modular multiplication at extremely high speed. The processing element (PE) consists of only 4 full-adders and 14 flip-flops. Three-stage internal pipelined PE results in a very short critical path with only a one-bit full-adder delay. Thus, it can run at a very high cycle rate. The total execution time for an n-bit modular multiplication is 2n + 11 cycles with only (n/2 + 2) PEs. A modular exponentiation based on it takes (3n + 16.5)n cycles in average. Compared with most published VLSI modular multipliers, the hardware complexity is greatly reduced while keeping very high throughput. Therefore it is a good candidate of the arithmetic units used in the many public-key crypto-systems, e.g. RSA, Elliptic Curve and so on, especially for the embedded applications concerning information security.  相似文献   

12.
Batch RSA   总被引:1,自引:0,他引:1  
We present a variant of the RSA algorithm called Batch RSA with two important properties:
–  • The cost per private operation is exponentially smaller than other number-theoretic schemes [9], [23], [22], [11], [13], [12]. In practice, the new variant effectively performs several modular exponentiations at the cost of a single modular exponentiation. This leads to a very fast RSA-like scheme whenever RSA is to be performed at some central site or when pure-RSA encryption (versus hybrid encryption) is to be performed.
–  • An additional important feature of Batch RSA is the possibility of using a distributed Batch RSA process that isolates the private key from the system, irrespective of the size of the system, the number of sites, or the number of private operations that need to be performed.
A preliminary version of this paper appeared inAdvances in Cryptology: Proceedings of Crypto '89, pp. 175–185. This work was performed at U.C., Berkeley, and ARL, Israel.  相似文献   

13.
This paper describes two novel architectures for a unified multiplier and inverter (UMI) in GF(2m): the UMI merges multiplier and inverter into one unified data-path. As such, the area of the data-path is reduced. We present two options for hyperelliptic curve cryptography (HECC) using UMIs: an FPGA-based high-performance implementation (Type-I) and an ASIC-based lightweight implementation (Type-II). The use of a UMI combined with affine coordinates brings a smaller data-path, smaller memory and faster scalar multiplication.Both implementations use curves defined by h(x)=x and f(x)=x5+f3x3+x2+f0. The high throughput version uses 2316 slices and 2016 bits of block RAM on a Xilinx Virtex-II FPGA, and finishes one scalar multiplication in . The lightweight version uses only 14.5 kGates, and one scalar multiplication takes 450 ms.  相似文献   

14.
This article presents the VLSI design of a configurable RSA public key cryptosystem supporting the 512-bit, 1024-bit and 2048-bit based on Montgomery algorithm achieving comparable clock cycles of current relevant works but with smaller die size. We use binary method for the modular exponentiation and adopt Montgomery algorithm for the modular multiplication to simplify computational complexity, which, together with the systolic array concept for electric circuit designs effectively, lower the die size. The main architecture of the chip consists of four functional blocks, namely input/output modules, registers module, arithmetic module and control module. We applied the concept of systolic array to design the RSA encryption/decryption chip by using VHDL hardware language and verified using the TSMC/CIC 0.35 m 1P4 M technology. The die area of the 2048-bit RSA chip without the DFT is 3.9 × 3.9 mm2 (4.58 × 4.58 mm2 with DFT). Its average baud rate can reach 10.84 kbps under a 100 MHz clock.  相似文献   

15.
In this paper, the integer N = p^kq is called a 〈k, 1〉-integer, if p and q are odd primes with almost the same size and k is a positive integer. Such integers were previously proposed for various cryptographic applications. The conditional factorization based on lattice theory for n-bit 〈k, 1〉-integers is considered, and there is an algorithm in time polynomial in n to factor these integers if the least significant |(2k - 1)n/(3k-1)(k+1)| bits of p are given.  相似文献   

16.
in this paper, simple 1-D and 2-D systolic array for realizing the discrete cosine transform (DCT) based on the discrete Fourier transform (DFT) fo an input sequence are presented. The proposed arrays are obtained by a simple modified DFT (MDFT) and an inverse DFT (IDFT) version of the Goertzel algorithm combined with Kung's approach. The 1-D array requiresN cells, one multiplier and takesN clock cycles to produce a completeN-point DCT. The 2-D array takes N clock cycles, faster than the 1-D array, but the area complexity is larger. A continuous flow of input data is allowed and no idle time is required between the input sequences.  相似文献   

17.
Modular inverse arithmetic plays an important role in elliptic curve cryptography. Based on the analysis of Montgomery modular inversion algorithm, this paper presents a new dual-field modular inversion algorithm, and a novel scalable and unified architecture for Montgomery inverse hardware in finite fields GF(p) and GF(2 n ) is proposed. Furthermore, this architecture based on the new modular inversion algorithm has been verified by modeling it in Verilog-HDL, and accomplished it under 0.18 μm CMOS technology. The result indicates that our work has better performance and flexibility than other works.  相似文献   

18.
Based on the analysis of several familiar large integer modular multiplication algorithms, this paper proposes a new Scalable Hybrid modular multiplication (SHyb) algorithm which has scalable operands, and presents an RSA algorithm model with scalable key size. Theoretical analysis shows that SHyb algorithm requires m^2n/2 + 2m iterations to complete an mn-bit modular multiplication with the application of an n-bit modular addition hardware circuit. The number of the required iterations can be reduced to a half of that of the scalable Montgomery algorithm. Consequently, the application scope of the RSA cryptosystem is expanded and its operation speed is enhanced based on SHyb algorithm.  相似文献   

19.
This paper presents a combinatorial method of evaluating the effectiveness of linear hybrid cellular automata (LHCA) and linear feedback shift registers (LFSR) as generators for stimulating faults requiring a pair of vectors. We provide a theoretical analysis and empirical comparisons to see why the LHCA are better than the LFSRs as generators for sequential-type faults in a built-in self-test environment. Based on the concept of a partner set, the method derives the number of distinctk-cell substate vectors which have 22k , 1k[n/2], transition capability for ann-cell LHCA and ann-cell LFSR with maximum length cycles. Simulation studies of the ISCAS85 benchmark circuits provide evidence of the effectiveness of the theoretrical metric.This work was supported in part by Reserach Grants No. 5711 and No. 39409 and a Strategic Grant from the Natural Sciences and Engineering Research Council of Canada and by an equipments loan from the Canadian Microelectronics Corporation.A preliminary version of this paper is partially presented at theIEEE ISCAS'94, May 1994.  相似文献   

20.
The manifestations of ion traps, ion neutralization, and minority carrier generation at the insulator/semiconductor interface (hereafter, interface for brevity) in MIS structures are judged from isothermal dependences of ion depolarization current J and high-frequency capacitance C s of the depletion layer in the semiconductor on gate potential V g and the rate of potential change v = dV g/dt = const. In the general case, even for a single type of mobile ions in the insulator, the dynamic current–voltage characteristics (CVCs) may exhibit three current peaks. The transfer of some nonlocalized (free) ions at the interface through the insulator, depletion of ion traps, and decomposition of neutral ion–electron associates are responsible for the peaks. The sequence and number (down to one) of the peaks depend on the activation energies of the associated processes, value of v, and energy of activation of minority carrier generation. Depending on these parameters, the peaks may appear, disappear, or merge into a broad peak, which may erroneously be identified as a result of the depletion of ion traps that have an energy spectrum. In other words, the CVC with a single peak does not necessarily mean that there exist several types of mobile ions. From the J, C s = f(T, V g, n 0, v) families, one can discriminate between purely ionic and electronic phenomena and identify free, neutralized, and/or trapped ions present at the interface (here, T is the temperature and n 0 is the initial total surface concentration of particles (ions) and neutral associates at the interface).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号