共查询到20条相似文献,搜索用时 31 毫秒
1.
In this paper, a new High-Radix Finite Field multiplication algorithm for GF(2m) is proposed for the first time. The proposed multiplication algorithm can operate in a Digit-serial fashion, and hence can give a trade-off between the speed, the area , the input/output pin limitation, and the low power consumption by simply varying the digit size. A detailed example of a new Radix-16 GF(2m) Digit-Serial multiplication architecture adopting the proposed algorithm illustrates a speed improvement of 75% when compared to conventional Radix-2 bit-serial realization. This is made more significant when it is noted that the speed improvement of 75% was achieved at the expense of only 2.3 times increase in the hardware requirements of the proposed architecture. 相似文献
2.
Chiou-Yng Lee Author Vitae 《Integration, the VLSI Journal》2008,41(1):106-112
This paper presents new time-dependent and time-independent multiplication algorithms over finite fields GF(2m) by employing an interleaved conventional multiplication and a folded technique. The proposed algorithm allows efficient realization of the bit-parallel systolic multipliers. The results show that the proposed time-independent multiplier saves about 54% space complexity as compared to other related multipliers for polynomial and dual bases of GF(2m). The proposed architectures include the features of regularity, modularity and local interconnection. Accordingly, it is well suited for VLSI implementation. 相似文献
3.
Huong Ho 《Journal of Signal Processing Systems》2014,75(3):203-208
In this paper, the design and circuit implementation of a polynomial basis multiplier architecture over Galois Fields GF(2m) is presented. The proposed architecture supports field multiplication of two m-term polynomials where m is a positive integer. Circuit implementations based on this parameterized architecture where m is configurable is suitable for applications in error control coding and cryptography. The proposed architecture offers low latency, polynomial basis multiplication where the irreducible polynomial P(x)?=?x m ?+?p kt .?x kt ?+?…?+?p 1.?x?+?1 with m ≥ kt + 4 is dynamically reconfigurable. Results of the complexity analysis show that the proposed architecture requires less logic resources compared to existing sequential polynomial basis multipliers. In terms of timing performance, the proposed architecture has a latency of m/4, which is the lowest among the multipliers found in literature for GF(2m). 相似文献
4.
Modular inverse arithmetic plays an important role in elliptic curve cryptography. Based on the analysis of Montgomery modular
inversion algorithm, this paper presents a new dual-field modular inversion algorithm, and a novel scalable and unified architecture
for Montgomery inverse hardware in finite fields GF(p) and GF(2
n
) is proposed. Furthermore, this architecture based on the new modular inversion algorithm has been verified by modeling it
in Verilog-HDL, and accomplished it under 0.18 μm CMOS technology. The result indicates that our work has better performance
and flexibility than other works. 相似文献
5.
The maximum a posterior probability (MAP) algorithm has been widely used in Turbo decoding for its outstanding performance. However, it is very challenging to design high-speed MAP decoders because of inherent recursive computations. This paper presents two novel high-speed recursion architectures for MAP-based Turbo decoders. Algorithmic transformation, approximation, and architectural optimization are incorporated in the proposed designs to reduce the critical path. Simulations show that neither of the proposed designs has observable decoding performance loss compared to the true MAP algorithm when applied in Turbo decoding. Synthesis results show that the proposed Radix-2 recursion architecture can achieve comparable processing speed to that of the state-of-the-art recursion (Radix-4) architecture with significantly lower complexity while the proposed Radix-4 architecture is 32% faster than the best existing design 相似文献
6.
We present low area and low power semi-systolic array architectures for polynomial basis multiplication over GF(2m) using Progressive Multiplier Reduction Technique (PMR). These architectures are explored using linear and nonlinear techniques applied to the polynomial multiplication algorithm. The nonlinear techniques allow the designer, to control the processor workload and reduce the inter-processor communications. The semi-systolic architectures obtained have simple structure with local communication. ASIC implementations of our designs and comparable published designs show that the proposed scalable semi-systolic structures have less area complexity (56.8–94.6 %) and power consumption (55.2–84.2 %) except for a scalable design published by the same authors. However, one of the proposed scalable designs outperforms this design in terms of throughput by 73.8 %. This makes the proposed designs suited to embedded applications that require low power consumption and moderate speed. 相似文献
7.
Multiplication-accumulation operations described by
represent the fundamental computation involved in many digital signal processing algorithms. For high speed signal processing,
one obvious approach to realize the above computation in VLSI is to employm discrete multipliers working in parallel. However, a more area efficient approach is offered by the merged multiplication
technique [5]. But the principal drawback of the conventional merged technique is its longer latency than the former discrete
approach. This work proposes a hardware algorithm for merged array multiplication which eliminates this drawback and achieves
significant improvement in latency when compared with the conventional scheme for merged multiplication. The proposed algorithm
utilizes multiple wave front computation as opposed to the traditional approach where computation in an array multiplier is
carried out by a single wave front. The improvement in latency by the proposed approach is greater than 40% (form>2) when compared with a conventional approach to merged multiplication. The consequent cost in the form of additional requirement
of VLSI area is found to be rather small. In this paper, we provide a thorough analytic discussion on the proposed algorithm
and support it by experimental results. 相似文献
8.
A new on-the-fly conversion algorithm is proposed, and high-speed array multipliers with the on-the-fly conversion are presented. The new on-the-fly conversion logic is used to speed up carry-propagate addition at the last stage of multiplication, and provides constant delay independent of the number of input bits. In this paper, the multiplication architecture and the on-the-fly conversion algorithm are presented and discussed in detail. The proposed architecture has multiplication time of (n + 1)tFA, where n is the number of input bits and tFA is the delay of a full adder. According to our comparative performance evaluation, the proposed architecture has shorter delay and requires less area than the conventional array multiplier with on-the-fly conversion. 相似文献
9.
全同态加密(FHE)可以真正从根本上解决云计算时将数据及其操作委托给第三方时的数据安全问题。针对全同态加密中占较大比例的大整数乘法运算优化需求,该文提出一种数论变换乘法蝶形运算的操作数合并算法,利用取模操作的快速算法,分别可将基16和基32运算单元的操作数减少到43.8%和39.1%。在此基础上,设计并实现了数论变换基32运算单元的硬件设计架构,在SMIC 90 nm工艺下的综合结果显示,电路的最高工作频率为600 MHz,面积1.714 mm2。实验结果表明,该优化算法提升了数论变换乘法蝶形运算的计算效率。 相似文献
10.
This contribution focuses on a class of Galois field used to achieve fast finite field arithmetic which we call an Optimal
Extension Field (OEF), first introduced in [3]. We extend this work by presenting an adaptation of Itoh and Tsujii's algorithm
for finite field inversion applied to OEFs. In particular, we use the facts that the action of the Frobenius map in GF (p
m
) can be computed with only m-1 subfield multiplications and that inverses in GF (p) may be computed cheaply using known techniques. As a result, we show that one extension field inversion can be computed
with a logarithmic number of extension field multiplications. In addition, we provide new extension field multiplication formulas
which give a performance increase. Further, we provide an OEF construction algorithm together with tables of Type I and Type
II OEFs along with statistics on the number of pseudo-Mersenne primes and OEFs. We apply this new work to provide implementation
results using these methods to construct elliptic curve cryptosystems on both DEC Alpha workstations and Pentium-class PCs.
These results show that OEFs when used with our new inversion and multiplication algorithms provide a substantial performance
increase over other reported methods.
Received 7 July 1999 and revised 29 March 2000 Online publication 15 September 2000 相似文献
11.
An implementation for a fast public-key cryptosystem 总被引:9,自引:0,他引:9
In this paper we examine the development of a high-speed implementation of a system to perform exponentiation in fields of the form GF(2
n
). For sufficiently large n, this device has applications in public-key cryptography. The selection of representation and observations on the structure of multiplication have led to the development of an architecture which is of low complexity and high speed. A VLSI implementation has being fabricated with measured throughput for exponentiation for cryptographic purposes of approximately 300 kilobits per second. 相似文献
12.
全同态加密(FHE)可以真正从根本上解决云计算时将数据及其操作委托给第三方时的数据安全问题.针对全同态加密中占较大比例的大整数乘法运算优化需求,该文提出一种数论变换乘法蝶形运算的操作数合并算法,利用取模操作的快速算法,分别可将基16和基32运算单元的操作数减少到43.8%和39.1%.在此基础上,设计并实现了数论变换基32运算单元的硬件设计架构,在SMIC 90 nm工艺下的综合结果显示,电路的最高工作频率为600 MHz,面积1.714 mm2.实验结果表明,该优化算法提升了数论变换乘法蝶形运算的计算效率. 相似文献
13.
A new bit-parallel systolic multiplier over GF(2m) under the polynomial basis and normal basis is proposed. This new circuit is constructed by m
2 identical cells, each of which consists of one two-input AND gate, one three-input XOR gate and five 1-bit latches. Especially,
the proposed architecture is without the basis conversion as compared to the well-known multipliers with the redundant representation.
With this proposed multiplier, a parallel-in parallel-out systolic array has also been developed for computing inversion and
division over GF(2m). The proposed architectures are well suited to VLSI systems due to their regular interconnection pattern and modular structure.
相似文献
Che Wun ChiouEmail: |
14.
Vu Quan Nguyen Woo Hyun Son Marek Parfieniuk Luong Tran Nhat Trung Sang Yoon Park 《ETRI Journal》2020,42(3):376-387
Massive computation of the reconstruction algorithm for compressive sensing (CS) has been a major concern for its real‐time application. In this paper, we propose a novel high‐speed architecture for the orthogonal matching pursuit (OMP) algorithm, which is the most frequently used to reconstruct compressively sensed signals. The proposed design offers a very high throughput and includes an innovative pipeline architecture and scheduling algorithm. Least‐squares problem solving, which requires a huge amount of computations in the OMP, is implemented by using systolic arrays with four new processing elements. In addition, a distributed‐arithmetic‐based circuit for matrix multiplication is proposed to counterbalance the area overhead caused by the multi‐stage pipelining. The results of logic synthesis show that the proposed design reconstructs signals nearly 19 times faster while occupying an only 1.06 times larger area than the existing designs for N = 256, M = 64, and m = 16, where N is the number of the original samples, M is the length of the measurement vector, and m is the sparsity level of the signal. 相似文献
15.
16.
This work presents a novel scalable multiplication algorithm for a type-t Gaussian normal basis (GNB) of GF(2m). Utilizing the basic characteristics of MSD-first and LSD-first schemes with d-bit digit size, the GNB multiplication can be decomposed into n(n + 1) Hankel matrix-vector multiplications. where n = (mt + 1)/d. The proposed scalable architectures for computing GNB multiplication comprise of one d × d Hankel multiplier, four registers and one final reduction polynomial circuit. Using the relationship of the basis conversion from the GNB to the normal basis, we also present the modified scalable multiplier which requires only nk Hankel multiplications, where k = mt/2d if m is even or k = (mt − t + 2)/2d if m is odd. The developed scalable multipliers have the feature of scalability. It is shown that, as the selected digit size d ≥ 8, the proposed scalable architectures have significantly lower time-area complexity than existing digit-serial multipliers. Moreover, the proposed architectures have the features of regularity, modularity, and local interconnection ability. Accordingly, they are well suited for VLSI implementation. 相似文献
17.
Modulo 2
n
+1 multiplication plays an important role in the Fermat number transform and residue number systems; the diminished-1 representation of numbers has been found most suitable for representing the elements of the rings. Existing algorithms for modulo (2
n
+1) multiplication either use recursive modulo (2
n
+1) addition, or a regular binary multiplication integrated with the modulo reduction operation. Although most often adopted for largen, this latter approach requires conversions between the diminished-1 and binary representations. In this paper we propose a parallel fine-grained architecture, based on a Wallace tree, for modulo (2
n
+1) multiplication which does not require any conversions; the use of a Wallace tree considerably improves the speed of the multiplier. This new architecture exhibits an extremely modular structure with associated VLSI implementation advantages. The critical path delay and the hardware requirements of the new multiplier are similar to that of a correspondingn×n bit binary multiplier. 相似文献
18.
19.
The finite field is widely used in error-correcting codes and cryptography. Among its important arithmetic operations, multiplication is identified as the most important and complicated. Therefore, a multiplier with concurrent error detection ability is elegantly needed. In this paper, a concurrent error detection scheme is presented for bit-parallel systolic dual basis multiplier over GF(2m) according to the Fenn’s multiplier in [7]. Although, the proposed method increases the space complexity overhead about 27% and the latency overhead about one extra clock cycle as compared to Fenn’s multiplier. Our analysis shows that all single stuck-at faults can be detected concurrently. 相似文献
20.
This paper presents two bit-serial modular multipliers based on the linear feedback shift register using an irreducible all one polynomial (AOP) over GF(2m). First, a new multiplication algorithm and its architecture are proposed for the modular AB multiplication. Then a new algorithm and architecture for the modular AB2 multiplication are derived based on the first multiplier. They have significantly smaller hardware complexity than the previous multipliers because of using the property of AOP. It simplifies the modular reduction compared with the case of using the generalized irreducible polynomial. Since the proposed multipliers have low hardware requirements and regular structures, they are suitable for VLSI implementation. The proposed multipliers can be used as the kernel architecture for the operations of exponentiation, inversion, and division. 相似文献