首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Bit-level systolic arrays for modular multiplication   总被引:4,自引:0,他引:4  
This paper presents bit-level cellular arrays implementing Blakley's algorithm for multiplication of twon-bit integers modulo anothern-bit integer. The semi-systolic version uses 3n(n+3) single-bit carry save adders and 2n copies of 3-bit carry look-ahead logic, and computes a pair of binary numbers (C, S) in 3n clock cycles such thatC+S[0, 2N). The carry look-ahead logic is used to estimate the sign of the partial product, which is needed during the reduction process. The final result in the correct range [0,N) can easily be obtained by computingC+S andC+S–N, and selecting the latter if it is positive; otherwise, the former is selected. We construct a localized process dependence graph of this algorithm, and introduce a systolic array containing 3nw simple adder cells. The latency of the systolic array is 6n+w–2, wherew=n/2. The systolic version does not require broadcast and can be used to efficiently compute several modular multiplications in a pipelined fashion, producing a result in every clock cycle.  相似文献   

2.
In this paper, we present a design framework for scalable memory-based implementation of the discrete Hartley transform (DHT) using simple and efficient systolic and systolic-like structures for short and prime transform lengths, as well as, for lengths 4 and 8. We have used the proposed short-length structures to construct highly modular architectures for higher transform lengths by a new prime-factor implementation approach. The structures proposed for the prime-factor DHT, interestingly, do not involve any transposition hardware/time. Besides, it is shown here that an N-point DHT can be computed efficiently from two (N/2)-point DHTs of its even- and odd-indexed input subsequences in a recursive manner using a ROM-based multiplication stage. Apart from flexibility of implementation, the proposed structures offer significantly lower area-time complexity compared with the existing structures. The proposed schemes of computation of the DHT can conveniently be scaled not only for higher transform lengths but also according to the hardware constraint or the throughput requirement of the application.  相似文献   

3.
Modular exponentiation is the cornerstone computation in public-key cryptography systems such as RSA cryptosystems. The operation is time consuming for large operands. This paper describes the characteristics of three architectures designed to implement modular exponentiation using the fast binary method: the first field-programmable gate array (FPGA) prototype has a sequential architecture, the second has a parallel architecture, and the third has a systolic array-based architecture. The paper compares the three prototypes as well as Blum and Paar's implementation using the time /spl times/ area classic factor. All three prototypes implement the modular multiplication using the popular Montgomery algorithm.  相似文献   

4.
This paper presents a new modular multiplication algorithm that allows one to implement modular multiplications efficiently. It proposes a systematic approach for maximizing a level of parallelism when performing a modular multiplication. The proposed algorithm effectively integrates three different existing algorithms, a classical modular multiplication based on Barrett reduction, the modular multiplication with Montgomery reduction and the Karatsuba multiplication algorithms in order to reduce the computational complexity and increase the potential of parallel processing. The algorithm is suitable for both hardware implementations and software implementations in a multiprocessor environment. To show the effectiveness of the proposed algorithm, we implement several hardware modular multipliers and compare the area and performance results. We show that a modular multiplier using the proposed algorithm achieves a higher speed comparing to the modular multipliers based on the previously proposed algorithms.  相似文献   

5.
In this paper, we deal with efficient modular multiplication algorithms working with large integers on programmable smart-cards. The current smart-cards don’t contain a library supporting modular arithmetic operations, which is needed in advanced cryptography schemes. Fortunately, the smart-cards contain a crypto co-processor providing a cryptographic support that can help with the acceleration of modular multiplication. The performance of three classical methods for modular multiplication with large integers and one method using a crypto co-processor is analyzed in this paper. The results of our implementation of modular arithmetic operations with the accelerated multiplication can be useful for the construction of advanced cryptographic schemes.  相似文献   

6.
Leong  P.C. Tan  E.C. Tan  P.C. 《Electronics letters》1999,35(14):1140-1141
An efficient iterative modular multiplication algorithm is proposed for software implementation by a small micro-controller without an integer-divide instruction. The algorithm exploits the built-in single-precision multiply instruction and permits the shift operations to be effected with little or no instruction execution while reducing the total number of multi-precision additions required  相似文献   

7.
Pipelined systolic architectures for DLMS adaptive filtering   总被引:6,自引:0,他引:6  
This work reports two new pipelined, systolic architectures for delayed least mean squares (DLMS) adaptive filtering. In contrast to existing systolic architectures, which introduce a tracking delay that increases linearly with filter order, those presented here, do not. They support the same sampling rate as the fastest such architecture reported so far, even when unpipelined. Our designs use significantly less hardware (i.e., multiply-accumulate modules and registers) with minimal control logic requirement on account of the algebraic projection techniques that we employ, implying a net gain in terms of the silicon area utilized and the dynamic power dissipated. Further, one of these architectures introduces only half the adaptation delay that is conventionally used for systolization; the other requires the normal adaptation delay, but compensates by using considerably reduced control logic. The sampling rates supported by our architectures are further increased by pipelining the processor modules to the level of a 42 compressor. This requires only small adaptation and tracking delays, which are independent of filter order, and is possible without requiring a modification of the basic algorithm (in terms of introducing a lookahead in the adaptation), all in contrast with the only pipelined DLMS architecture reported so far. We propose and implement a scheme in our architectures, for computing a normalized step size for delayed adaptation, in the general context of a nonstationary real-time environment. The simulation studies performed with our architectures indicate remarkably improved convergence properties over those of previously reported architectures.  相似文献   

8.
A systematic approach for designing one class of fault-tolerant systolic arrays (FTSAs) with orthogonal interconnects and unidirectional data flow, Orthogonal Unidirectional Systolic Array (OUSA), for multiplication of rectangular matrices is presented in this paper. The method employs space-time redundancy to achieve fault-tolerance. By conducting proposed systematic design procedure, four different systolic arrays of OUSA type are obtained. All the arrays can tolerate single transient errors and majority of multiple errors with high probability. In order to provide high bandwidth in data access, a special hardware called address generator unit, was designed. Hardware complexity and performance gains achieved at higher (system, algorithm and architecture) design levels were analyzed. The obtained results show that with n2 + 2n processing elements the total execution time of the fault-tolerant algorithm is 6n + 3 time units, the hardware overhead due to involving fault-tolerance is in the range from 6.25% down to 0.8%, while time overhead is 50%. In addition, by involving hardware implemented address generation unit we reduce the total execution time of the algorithm almost five times, compared to software address calculations.  相似文献   

9.
10.
Several novel systolic architectures for implementing densely pipelined bit parallel IIR filter sections are presented. The fundamental problem of latency in the feedback loop is overcome by employing redundant arithmetic in combination with bit-level feedback, allowing a basic first-order section to achieve a wordlength-independent latency of only two clock cycles. This is extended to produce a building block from which higher order sections can be constructed. The architecture is then refined by combining the use of both conventional and redundant arithmetic, resulting in two new structures offering substantial hardware savings over the original design. In contrast to alternative techniques, bit-level pipelinability is achieved with no net cost in hardware.This paper is a revised and extended version of a paper by the same authors presented at the second International Conference on Systolic Arrays, San Diego, May 1988 [1].  相似文献   

11.
Based on the analysis of several familiar large integer modular multiplication algorithms, this paper proposes a new Scalable Hybrid modular multiplication (SHyb) algorithm which has scalable operands, and presents an RSA algorithm model with scalable key size. Theoretical analysis shows that SHyb algorithm requires m^2n/2 + 2m iterations to complete an mn-bit modular multiplication with the application of an n-bit modular addition hardware circuit. The number of the required iterations can be reduced to a half of that of the scalable Montgomery algorithm. Consequently, the application scope of the RSA cryptosystem is expanded and its operation speed is enhanced based on SHyb algorithm.  相似文献   

12.
In this paper, time-efficient systolic and semi-systolic architectures for the implementation of multidimensional DFTs are proposed that allow modularity and easy expansibility while keeping throughput independent of the dimension of the DFT. We shall call an array semi-systolic if the input data is to be preloaded into every cell of the array or if the output data can be calculated not only in the boundary cells of the array. DFT algorithms are represented by FORTRAN-like code, in order to explicitly display the suggested rotational transformations in the index space. After the transformation, multidimensional DFT algorithm can be mapped onto a systolic structure. The area-time2 complexity of the proposed design is within logk factor of the lower bound for ak-point DFT (AT 2=(k 2 log2 k)), i.e. equals 0(k 2 log3 k)=0(N 2M log3 N) fork=N 2M point DFT;A is the area complexity andT is the system throughput.  相似文献   

13.
A new efficient modular division algorithm suitable for systolic implementation and its systolic architecture is proposed in this article. With a new exit condition of while loop and a new updating method of a control variable, the new algorithm reduces the average of iteration numbers by more than 14.3% compared to the algorithm proposed by Chen, Bai and Chen. Based on the new algorithm, we design a fast systolic architecture with an optimised core computing cell. Compared to the architecture proposed by Chen, Bai and Chen, our systolic architecture has reduced the critical path delay by about 18% and the total computational time for one modular division by almost 30%, with the cost of about 1% more cells. Moreover, by the addition of a flag signal and three logic gates, the proposed systolic architecture can also perform Montgomery modular multiplication and a fast unified modular divider/multiplier is realised.  相似文献   

14.
大数模幂乘算法的快速实现   总被引:2,自引:0,他引:2  
刘悦  李桂丽  田莹 《信息技术》2003,27(5):25-27
大素数的选取是构造RSA密钥的关键 ,在素数的产生及测试是RSA公钥系统中的一个重要研究课题。描述了公钥密码体制中DSA、RSA等数据加密算法的原理及加密、解密过程 ,分析了各种算法的性能和适用的场合 ,针对上述算法的计算量巨大的问题 ,给出了实现数据加密较好的方法。理论和实验表明 ,该算法用于实现RSA算法 ,新算法的效率有明显的提高  相似文献   

15.
基于Montgomery模乘的RSA算法VLSI实现   总被引:2,自引:1,他引:1  
介绍了一种基于可伸展的Montgomery模乘结构的1024位RSA加解密芯片实现。设计采用的新型心动阵列结构,可以有在有效控制芯片面积的前提下,极大地提高运算频率,从而提高运算速度。经过ModelSim仿真和Design Compiler综合,与当前已发表的RSA芯片设计相比,该设计在面积和速度上均有优势。  相似文献   

16.
向市场推出新手机的时坷要求正在变得越来越紧追。在面市时间日益缩短的趋势之下.产品要想取得成功.就需要具备更多的功能;伴随着手机市场的发展.手机支持的功能也在不断增多:除了基本的话音通信功能之外.照相机,因特网浏览、流视频、MP3,游戏以及PDA功能等都在被越来越多的手机所拥有。  相似文献   

17.
An improved systolic array for the Montgomery modular multiplication algorithm is presented. The recursive equation proposed by Walter [1993] is explicitly transformed into Boolean algebraic equations. By analysing and optimising these equations an improved systolic array can be made 20% faster than that proposed by Walter  相似文献   

18.
Many sequential multipliers for polynomial basis GF(2k) fields have been proposed using the LSbit and MSbit multiplication algorithm. However, all those designs are defined over fixed size GF(2k) fields and sometimes over fixed special form irreducible polynomials (AOL, trinomials, pentanomials). When such architectures are redesigned for arbitrary GF(2k) fields and generic irreducible polynomials, therefore made versatile, they result in high space complexity (gate–latch number), low frequency (high critical path) and high latency designs. In this paper a Montgomery multiplication element (MME) architecture specially designed for arbitrary GF(2k) fields defined over general irreducible polynomials, is proposed, based on an optimized version of the Montgomery multiplication (MM) algorithm for GF(2k) fields. To evaluate the proposed MME and prove the efficiency of the MM algorithm in versatile designing, three distinct versatile Montgomery multiplier architectures are presented using this proposed MME. They achieve small gate–latch number and high clock frequency compared to other sequential versatile designs.  相似文献   

19.
20.
This paper presents a method of using a parity prediction scheme for detecting erroneous outputs in bit-parallel, sequential, and digit-serial Gaussian normal basis (GNB) multipliers over GF(2m). Although all-type NB multipliers have different time and space complexities, our analytical results indicate that all-type GNB multipliers have the same structure if they use parity prediction function. For example, in the field GF(2233), we have estimated that the error detection rate for a sequential multiplier is nearly 100% if a comparison is made as per clock cycle. Our analytical results also show that the area overhead of the proposed digit-serial multiplier with concurrent error detection does not exceed 5%. Several efficient parity prediction techniques will be shown in this work to provide a low overhead solution to concurrent error detection particularly when the cryptography implementations using GF(2m) multiplier require higher reliability and the protection against adversarial attacks.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号