首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 140 毫秒
1.
An architecture based on the RSA public key cryptography algorithm is presented. The circuit includes two components, one for modular squaring and one for modular multiplication. Each component is based on the Montgomery algorithm and implements the modular operations using two modified serial-parallel multipliers. A full modular exponentiation is completed every n(n + 3) clock cycles. All circuits are systolic, operate with 100% efficiency and their maximum combinational delay is equal to one gated Full-Adder. Thus, high-speed performance is achieved while the low cell hardware complexity enables an efficient VLSI implementation.  相似文献   

2.
Modulo 2 n +1 multiplication plays an important role in the Fermat number transform and residue number systems; the diminished-1 representation of numbers has been found most suitable for representing the elements of the rings. Existing algorithms for modulo (2 n +1) multiplication either use recursive modulo (2 n +1) addition, or a regular binary multiplication integrated with the modulo reduction operation. Although most often adopted for largen, this latter approach requires conversions between the diminished-1 and binary representations. In this paper we propose a parallel fine-grained architecture, based on a Wallace tree, for modulo (2 n +1) multiplication which does not require any conversions; the use of a Wallace tree considerably improves the speed of the multiplier. This new architecture exhibits an extremely modular structure with associated VLSI implementation advantages. The critical path delay and the hardware requirements of the new multiplier are similar to that of a correspondingn×n bit binary multiplier.  相似文献   

3.
Bit-level systolic arrays for modular multiplication   总被引:4,自引:0,他引:4  
This paper presents bit-level cellular arrays implementing Blakley's algorithm for multiplication of twon-bit integers modulo anothern-bit integer. The semi-systolic version uses 3n(n+3) single-bit carry save adders and 2n copies of 3-bit carry look-ahead logic, and computes a pair of binary numbers (C, S) in 3n clock cycles such thatC+S[0, 2N). The carry look-ahead logic is used to estimate the sign of the partial product, which is needed during the reduction process. The final result in the correct range [0,N) can easily be obtained by computingC+S andC+S–N, and selecting the latter if it is positive; otherwise, the former is selected. We construct a localized process dependence graph of this algorithm, and introduce a systolic array containing 3nw simple adder cells. The latency of the systolic array is 6n+w–2, wherew=n/2. The systolic version does not require broadcast and can be used to efficiently compute several modular multiplications in a pipelined fashion, producing a result in every clock cycle.  相似文献   

4.
Modulo 2 n +1 multiplication plays an important role in the Fermat number transform and residue number systems; the diminished-1 representation of numbers has been found most suitable for representing the elements of the rings. Existing algorithms for modulo (2 n +1) multiplication either use recursive modulo (2 n +1) addition, or a regular binary multiplication integrated with the modulo reduction operation. Although most often adopted for largen, this latter approach requires conversions between the diminished-1 and binary representations. In this paper we propose a parallel fine-grained architecture, based on a Wallace tree, for modulo (2 n +1) multiplication which does not require any conversions; the use of a Wallace tree considerably improves the speed of the multiplier. This new architecture exhibits an extremely modular structure with associated VLSI implementation advantages. The critical path delay and the hardware requirements of the new multiplier are similar to that of a correspondingn×n bit binary multiplier.  相似文献   

5.
6.
The solution of the algebraic path problem (APP) for arbitrarily sized graphs by a fixed-size systolic array processor (SAP) is addressed. First, a block algorithm for the APP is obtained. The algorithm is modified to require only two block subproblems or primitives. Two similar SAPS are designed for solving the primitives. Then, the primitive SAPS are integrated into a versatile SAP (VSAP). The proposed VSAP hasp ×p processing elements (PEs), solving the APP of ann-vertex graph inn 3/p 2 +n 2/p + 3p − 2 cycles. With slight modifications in the operations performed by the PEs, the problem is optimally solved inn 3/p 2 + 3p − 2 cycles. This work has been supported by the Ministry of Education and Science of Spain (CAICYT) under contract PA85-0314.  相似文献   

7.
Modular inverse arithmetic plays an important role in elliptic curve cryptography. Based on the analysis of Montgomery modular inversion algorithm, this paper presents a new dual-field modular inversion algorithm, and a novel scalable and unified architecture for Montgomery inverse hardware in finite fields GF(p) and GF(2 n ) is proposed. Furthermore, this architecture based on the new modular inversion algorithm has been verified by modeling it in Verilog-HDL, and accomplished it under 0.18 μm CMOS technology. The result indicates that our work has better performance and flexibility than other works.  相似文献   

8.
We present a new serial-parallel concurrent modular-multiplication algorithm and architecture suitable for standard RSA encryption. In the new scheme, multiplication is performed modulo a multiple of the RSA modulus n, which has a diminished-radix form 2 k -v, where k and v are positive integers and v < n. This design is the first concurrent modular multiplier to use a diminished-radix algorithm and to pipeline concurrent modular-reduction to optimize the clock rate. For a modular multiplier of order ranging from 1 to 10 (number of multiplier bits per clock cycle), a faster clock rate and throughput is possible than with other known designs including those of Brickell, Morita, Sedlak and Golze, and Miyaguchi. Throughput estimates for 512-bit RSA decryption range from 100 kbit/s in a serial mode to 650 kbit/s with a modular multiplier of order 10, at a clock rate of 20 MHz on 1.5 m CMOS.  相似文献   

9.
A prototype filter design is reviewed to underscore the computational problems arising in such designs. A purely systolic-array architecture is presented. This array provides the computational support necessary for filter design. Due to a simple and novel data steering technique the array is capable of carrying out a number of important matrix operations such as factorization, inversion of factors, and matrix-matrix multiplication. Another interesting attribute is the array's ability to maximally overlap computations of multiphase algorithms. In this study we demonstrate the execution of a dense matrix factorization phase and a factor inversion phase on the array with no need for intraphase or interphase I/O. We show that these phases (which are the backbone of an optimal filtering algorithm) are completed in the optimal count of aboutn time units. The array employs 2n nn simple processing elements (PEs) that are active every other time unit. It is shown that the functions of two adjacent PEs can be merged and assigned to a single PE thus maximizing PE utilization. A possible design of a merged PE is given.  相似文献   

10.
王晓林  周玉洁 《信息技术》2005,29(10):41-44
提出了一种实现大数模幂的硬件设计方法。其中的大数模乘部分基于基2的Montgomery改进算法,采用模乘心动阵列结构,提出了一种双边沿触发串行计算的新结构,节约了面积,同时可以达到较高的时钟频率。模幂部分基于M-ary算法,减少了所需模乘运算的次数。并比较了这种实现方法与常见的L-R二进制幂算法的实现方式速度上的改进。  相似文献   

11.
We propose and describe new error control algorithms for redundant residue number systems (RRNSs) and residue number system product codes. These algorithms employ search techniques for obtaining error values from within a set of values (that contains all possible error values). For a given RRNS, the error control algorithms have a computational complexity of t·O(log2 n + log2 m?) comparison operations, where t denotes the error correcting capability, n denotes the number of moduli, and m? denotes the geometric average of moduli. These algorithms avoid most modular operations. We describe a refinement to the proposed algorithms that further avoids the modular operation required in their respective first steps, with an increase of ?log2 n? to their computational complexity. The new algorithms provide significant computational advantages over existing methods.  相似文献   

12.
This paper presents the design of a new multiplier architecture for normal integer multiplication of positive and negative numbers as well as for multiplication in finite fields of order 2n. It has been developed to increase the performance of algorithms for cryptographic and signal processing applications on implementations of the Instruction Systolic Array (ISA) parallel computer model [M. Kunde, H.W. Lang, M. Schimmler, H. Schmeck, H. Schröder, Parallel Computing 7 (1988) 25-39, H.W. Lang, Integration, the VLSI Journal 4 (1986) 65-74]. The multiplier operates least significant bit (LSB)-first for integer multiplication and most significant bit ( )-first for finite field multiplication. It is a modular bit-serial design, which on the one hand can be efficiently implemented in hardware and on the other hand has the advantage that it can handle operands of arbitrary length.  相似文献   

13.
Signal Processing algorithms generally rely heavily on the convolution operation which in turn is multiplication intensive. However, more recently convolution algorithms based on the squaring operation as opposed to the multiplication operation have been developed. In this article we present two ROM based methods for performing the squaring operation modulo 2 n , modulo 2 n −1, or modulo 2 n +1. The performance, cost, and implementation issues of the two methods are analyzed in detail and compared against each other as well as with a traditional ROM based implementation. It is shown that both methods obtain ROM bit savings of 99.99%, for 32-bit word lengths, when compared with traditional techniques. However, one of the methods outperforms the other in all other respects such as overhead costs, of up to 99.48% savings, performance, up to about 20 times faster, and regularity and simplicity of hardware design.  相似文献   

14.
A new on-the-fly conversion algorithm is proposed, and high-speed array multipliers with the on-the-fly conversion are presented. The new on-the-fly conversion logic is used to speed up carry-propagate addition at the last stage of multiplication, and provides constant delay independent of the number of input bits. In this paper, the multiplication architecture and the on-the-fly conversion algorithm are presented and discussed in detail. The proposed architecture has multiplication time of (n + 1)tFA, where n is the number of input bits and tFA is the delay of a full adder. According to our comparative performance evaluation, the proposed architecture has shorter delay and requires less area than the conventional array multiplier with on-the-fly conversion.  相似文献   

15.
This paper focuses on the design and implementation of a fast reconfigurable method for elliptic curve cryptography acceleration in GF(2 m ). The main contribution of this paper is comparing different reconfigurable modular multiplication methods and modular reduction methods for software implementation on Intel IA-32 processors, optimizing point arithmetic to reduce the number of expensive reduction operations through a novel reduction sharing technique, and measuring performance for scalar point multiplication in GF(2 m ) on Intel IA-32 processors. This paper determined that systematic reduction is best for fields defined with trinomials or pentanomials; however, for fields defined with reduction polynomials with large Hamming weight Barrett reduction is best. In GF(2571) for Intel P4 2.8 GHz processor, long multiplication with systematic reduction was 2.18 and 2.26 times faster than long multiplication with Barrett or Montgomery reduction. This paper determined that Montgomery Invariant scalar point multiplication with Systematic reduction in Projective coordinates was the fastest method for single scalar point multiplication for the NIST fields from GF(2163) to GF(2571). For single scalar point multiplication on a reconfigurable elliptic curve cryptography accelerator, we were able to achieve ∼6.1 times speedup using reconfigurable reduction methods with long multiplication, Montgomery’s MSB Invariant method in projective coordinates, and systematic reduction. Further extensions were made to implement fast reconfigurable elliptic curve cryptography for repeated scalar point multiplication on the same base point. We also show that for L > 20 the LSB invariant method combined with affine doubling precomputation outperforms the LSB invariant method combined with López-Dahab doubling precomputation for all reconfigurable reduction polynomial techniques in GF(2571) for Intel IA-32 processors. For L = 1000, the LSB invariant scalar point multiplication method was 13.78 to 34.32% faster than using the fastest Montgomery Invariant scalar point multiplication method on Intel IA-32 processors.  相似文献   

16.
The delayed least-mean-square (DLMS) algorithm is useful for adaptive finite impulse response (FIR) filtering applications where high throughput rates are required. In the paper, a bit-serial bit-level systolic array based on new schemes for multiplication and inner-product computation is presented to implement DLMS adaptive N-tap FIR filters. The architecture is highly regular, modular, and thus well-suited to VLSI implementation. It has an efficiency of 100% and a throughput rate of one filter output per 2B cycles, where B is the word length of input data. In addition, the proposed array uses a small delay of [(4B+N/2+4)/2B] in the filter coefficient adaptation, where [x] is the smallest integer greater than or equal to x. This ensures that the DLMS algorithm can have good performance under proper selection of the step size. Based on a conservative design technique of static complementary metal oxide semiconductor (CMOS) logic, it is shown that the proposed system can be realized in a single chip for most practical applications  相似文献   

17.
Efficient blue organic light-emitting diodes have been developed based on one novel fluorescent beryllium complex bis(2-(2-hydroxyphenyl)-4-methyl-pyridine)beryllium (Be(4-mpp)2). The simple double-layer device based on Be(4-mpp)2 as the EML as well as the ETL not only shows pure and stable blue emission with the CIE coordinates of (0.14, 0.09), but also presents very high EL efficiency in terms of both the peak values (5.4% for EQE and 4.2 lm W−1 for PE) and the EQE value remaining ?4.0% in very wide brightness range (10–10,000 cd m−2) that indicates very good operational stability. They are the highest EL efficiencies ever reported for such saturated and stable OLED (CIE: x < 0.15, y < 0.10) to the best of our knowledge.  相似文献   

18.
Currently, development of suitable cathode materials for zinc‐ion batteries (ZIBs) is plagued by the sluggish kinetics of Zn2+ with multivalent charge in the host structure. Herein, it is demonstrated that interlayer Mn2+‐doped layered vanadium oxide (Mn0.15V2O5·nH2O) composites with narrowed direct bandgap manifest greatly boosted electrochemical performance as zinc‐ion battery cathodes. Specifically, the Mn0.15V2O5·nH2O electrode shows a high specific capacity of 367 mAh g?1 at a current density of 0.1 A g?1 as well as excellent retentive capacities of 153 and 122 mAh g?1 after 8000 cycles at high current densities up to 10 and 20 A g?1, respectively. Even at a low temperature of ?20 °C, a reversible specific capacity of 100 mAh g?1 can be achieved at a current density of 2.0 A g?1 after 3000 cycles. The superior electrochemical performance originates from the synergistic effects between the layered nanostructures and interlayer doping of Mn2+ ions and water molecules, which can enhance the electrons/ions transport kinetics and structural stability during cycling. With the aid of various ex situ characterization technologies and density functional theory calculations, the zinc‐ion storage mechanism can be revealed, which provides fundamental guidelines for developing high‐performance cathodes for ZIBs.  相似文献   

19.
We propose a novel area/time efficient elliptic curve cryptography (ECC) processor architecture which performs all finite field arithmetic operations in the discrete Fourier domain. The proposed architecture utilizes a class of optimal extension fields (OEF) GF(q m ) where the field characteristic is a Mersenne prime q = 2 n  − 1 and m = n. The main advantage of our architecture is that it achieves extension field modular multiplication in the discrete Fourier domain with only a linear number of base field GF(q) multiplications in addition to a quadratic number of simpler operations such as addition and bitwise rotation. We achieve an area between 25k and 50k equivalent gates for the implementations over OEFs of size 169, 289 and 361 bits. With its low area and high speed, the proposed architecture is well suited for ECC in small device environments such as sensor networks. The work at hand presents the first hardware implementation of a frequency domain multiplier suitable for ECC and the first hardware implementation of ECC in the frequency domain.
Berk SunarEmail:
  相似文献   

20.
Modular exponentiation with a large modulus, which is usually accomplished by repeated modular multiplications, has been widely used in public key cryptosystems for secured data communications. To speed up the computation, the Montgomery modular multiplication algorithm is used to relax the process of quotient determination, and the carry-save addition (CSA) is employed to reduce the critical path delay. In this paper, based on the inherent data dependency between the modular multiplication and square operations in the H-algorithm of modular exponentiation, we present a new modular exponentiation architecture with a unified modular multiplication/square module and show how to reduce the number of input operands for the CSA tree by mathematical manipulation. The developed architecture has the following advantages. 1) There is no need to convert the carry-save form of an operand into its binary representation at the end of each modular multiplication. In this way, except the final step to get the result of modular exponentiation, the time-consuming carry propagation can then be eliminated. 2) The number of input operands for the CSA tree is reduced in a very efficient way. 3) The hardware saving is achieved with very limited impact on the original critical path delay when designed with two distinct modular multiplication and square components. Experimental results show that our modular exponentiation design obtains the least hardware complexity compared with the existing work and outperforms them in terms of area-time (AT) complexity as well.   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号