期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

于斌黄海刘志伟赵石磊那宁《电子与信息学报》2022,43(7):1821-1827

针对签名验签速度难以满足特定应用领域需求的问题,该文设计了一种高性能Ed25519算法的硬件实现架构.采用宽度为2 bit的窗口法实现标量乘运算,减少了标量乘所需的总周期数;通过优化点加倍点操作步骤,提高了乘法器的硬件使用率;使用低计算复杂度的快速模约简实现模乘,提高了整体运算速度.为了使模L运算可复用标量乘中的快速模约简,该文提出一种基于Barrett约简的模L算法.通过优化解压过程中模幂操作过程,精简了步骤并使其可复用模乘.对所提架构做硬件实现,在TSMC的55 nm CMOS工艺下,面积为746×103等效门,最高频率360 MHz,每秒能够执行公钥生成9.06×104次、签名8.82×104次和验签3.99×104次. 相似文献

2.

高速Ed25519验签算法硬件架构的设计与实现

薛一鸣刘树荣郭书恒李岩胡彩娥《通信学报》2022,(3):101-112

针对区块链等特定场景对验签速度有较高要求的特点,设计了一种高速Ed25519验签算法的硬件架构.提出了基于交错NAF的多点乘算法,通过预计算和查表的方式,有效减少了点加、倍点的次数;采用Karatsuba乘法和快速约简方法实现模乘运算,并设计了不需要模加、模减的点加、倍点操作步骤,有效提升了点加、倍点运算的性能.针对解... 相似文献

3.

面向移动设备的国密SM2高效实现研究

张吉鹏黄军浩于璇刘哲《电子学报》2023,(12):3437-3443

SM2的优化实现在x86-64架构上已经得到了充分的研究,但在ARMv8-A架构上的优化仍不充分,为此本工作提出了以下优化方案：针对SM2的模p与模n乘法/平方运算,充分利用p与n的数值特点优化了蒙哥马利模乘;针对模p与模n求逆运算,推导并实现了更快的基于费马小定理的模逆算法;针对固定点与非固定点标量乘法,分别实现了宽度为7与5的窗口算法;针对签名生成过程中s的计算,用一个模n加/减法替换一个模n乘法.将上述优化技术集成到OpenSSL(3.0.0-beta1)中后,在华为云鲲鹏920计算平台上的测试表明,SM2签名性能提升8.7倍;SM2验签性能提升3.5倍.在移动设备树莓派4平台上,SM2的签名性能提高9.7倍;验签性能提高3.4倍. 相似文献

4.

一种基于图形处理器的高吞吐量SM2数字签名计算方案

朱辉黄煜坤王枫为杨晓鹏李晖《电子与信息学报》2022,44(12):4274-4283

随着数据传输安全的普及和认证信息细粒化程度的提高,基于公钥密码学的签名运算使用越来越频繁,其处理速度逐渐成为制约各种高并发安全应用的瓶颈问题。为此,该文提出一种基于图形处理器(GPU)的高吞吐量SM2数字签名计算方案。首先,通过GPU底层指令优化基础运算的计算过程,构建高效的基础运算模块;进而,结合GPU的平台特性,优化基于费马小定理的模逆算法,缩短SM2推荐素数的加法链,大幅提升模逆处理速度;同时,按需使用倍点运算和重复倍点算法,避免线程束分化现象,并有效减少未知点乘运算的计算量。理论分析和实验测试结果表明该方案可有效地提升SM2签名和验签算法的处理速度,在RTX3090单卡上实现了7.609×10⁷次/s的签名吞吐量和3.46×10⁶次/s的验签吞吐量。相似文献

5.

面向双线性对的F_p~2-FIOS模乘算法及其实现架构研究

姜占鹏孙铭玮黄海徐江刘志伟白瑞方舟曲家兴《通信学报》2022,(2):100-108

针对双线性对运算效率低的问题,提出了一种面向双线性对的二次扩域细集成操作数扫描(F_p~2-FIOS)模乘算法。该算法通过优化二次扩域下(AB+CD)mod P的运算过程,有效降低了模乘中的模约减次数;设计了满足不同应用需求的2种硬件架构及其调度方式以提升算法的计算效率;采用TSMC 55 nm工艺实现了双线性对运算单元。与现有文献相比,所设计的架构在一次模乘时间、时钟频率和面积时间积等性能指标上优于同类模乘设计,在整体Optimal ate对运算实现上也有一定的优势。相似文献

6.

基于快速标量乘算法的椭圆曲线数字签名方案

严琳卢忱《电子科技》2014,27(4):23-26

计算标量乘kP是ECC快速实现的关键,也是ECC研究的热点问题。文中介绍了基于Montgomery思想的快速标量乘算法,重点介绍了白国强等人的运算多标量乘kP+lQ的算法,并分析了其局限性,同时对其进行了改进。在此基础上,设计了一种分段快速标量乘算法,将改进的算法与分段标量乘算法运用到ECDSA中。经分析验证,分段快速标量乘算法,提高了效率,对ECDSA的快速实现具有一定意义。相似文献

7.

高性能可扩展公钥密码协处理器研究与设计 总被引：1，自引：0，他引：1

下载免费PDF全文

黎明吴丹戴葵邹雪城《电子学报》2011,39(3):665-670

本文提出了一种高效的点乘调度策略和改进的双域高基Montgomery模乘算法,在此基础上设计了一种新型高性能可扩展公钥密码协处理器体系结构,并采用0.18μm 1P6M标准CMOS工艺实现了该协处理器,以支持RSA和ECC等公钥密码算法的计算加速.该协处理器通过扩展片上高速存储器和使用以基数为处理字长的方法,具有良好的可扩展性和较强的灵活性,支持2048位以内任意大数模幂运算以及576位以内双域任意椭圆曲线标量乘法运算.芯片测试结果表明其具有很好的加速性能,完成一次1024位模幂运算仅需197μs、GF(p)域192位标量乘法运算仅需225μs、GF(2^m)域163位标量乘法运算仅需200.7μs. 相似文献

8.

双有限域模乘和模逆算法及其硬件实现 总被引：2，自引：1，他引：1

陈光化朱景明刘名曾为民《电子与信息学报》2010,32(9):2095-2100

有限域上的模乘和模逆运算是椭圆曲线密码体系的两个核心运算。该文在Blakley算法的基础上提出一种radix-4快速双有限域模乘算法,该算法采用Booth编码技术将原算法的迭代次数减少一半,并利用符号估计技术简化约减操作;在扩展Euclidean求逆算法的基础上提出一种能够同时支持双有限域运算的高效模逆算法,该算法不仅避免了大整数比较操作,而且提高了算法在每次迭代过程中的移位效率。然后针对这两种算法特点设计出一种能够同时完成双有限域上模乘和模逆操作的统一硬件结构。实现结果表明:256位的模乘和模逆统一硬件电路与同类型设计相比较,在电路面积没有增加的情况下,模乘运算速度提高68%,模逆运算的速度也提高了17.4%。相似文献

9.

基于国产密码算法SM9的可追踪属性签名方案

唐飞凌国玮单进勇《电子与信息学报》2022,44(10):3610-3617

国产密码算法SM9是我国自主设计的标识密码方案,现已受到各界的广泛关注。为了解决现有属性签名(ABS)方案验签效率不高这一问题,该文基于国密SM9算法构造新的支持树形访问策略的属性签名方案,该方案的验签操作仅需1次双线性对映射和1次指数运算。此外,所提方案具有签名者身份可追踪功能,防止恶意签名者利用属性签名的匿名性进行非法签名操作,从而避免传统属性签名中无条件匿名性下的签名滥用问题。安全分析结果表明所提方案在随机谕言机模型下具有不可伪造性,同时也可抗合谋攻击。与现有的可追踪属性签名方案相比,所提方案的追踪算法效率更高,签名与验签开销也更低。实验结果表明,所提方案验签算法的计算复杂度与策略规模无关,完成1次验签算法仅需2 ms。相似文献

10.

基于MPI双处理器ECC标量乘算法的并行化研究

黄剑华《电子世界》2012,(10):8-9

本文借鉴串行范畴内椭圆曲线密码体制中原有的二进制标量乘算法,从并行计算的角度提高ECC中标量乘运算的效率、进而提高ECC的整体性能。本文设计了基于MPI双处理器标量乘算法并行执行模型。通过分析ECC中的二进制标量乘算法和并行的2r标量乘算法,分别给出了相应的改进标量乘算法设计与实现,改进算法有效地提高了标量乘运算的效率。相似文献

11.

An efficient multi-exponentiation scheme based on modified Booth's method

Yeu-Pong Lai Chin-Chen Chang 《International Journal of Electronics》2013,100(3):221-233

We propose an efficient multi-exponentiation algorithm based on the modified Booth' algorithm and Montgomery's modular multiplication algorithm. The multi-exponentiation algorithm can be used to implement fast modern cryptosystems. Owing to the reduced number of multiplications, this algorithm is about 10% faster than Pekmestzi's algorithms. The proposed algorithm can be implemented in hardware as a small component. The component can then be used to form an efficient modular multi-exponentiation module by combining it with an efficient Montgomery modular multiplication module. 相似文献

12.

Tripartite modular multiplication

Kazuo Sakiyama Miroslav Kne?evi? Junfeng Fan Bart Preneel Ingrid VerbauwhedeAuthor vitae 《Integration, the VLSI Journal》2011,44(4):259-269

This paper presents a new modular multiplication algorithm that allows one to implement modular multiplications efficiently. It proposes a systematic approach for maximizing a level of parallelism when performing a modular multiplication. The proposed algorithm effectively integrates three different existing algorithms, a classical modular multiplication based on Barrett reduction, the modular multiplication with Montgomery reduction and the Karatsuba multiplication algorithms in order to reduce the computational complexity and increase the potential of parallel processing. The algorithm is suitable for both hardware implementations and software implementations in a multiprocessor environment. To show the effectiveness of the proposed algorithm, we implement several hardware modular multipliers and compare the area and performance results. We show that a modular multiplier using the proposed algorithm achieves a higher speed comparing to the modular multipliers based on the previously proposed algorithms. 相似文献

13.

Two systolic architectures for modular multiplication

Wei-Chang Tsai Shung C.B. Sheng-Jyh Wang 《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2000,8(1):103-107

The authors present two systolic architectures to speed up the computation of modular multiplication in RSA cryptosystems. In the double-layer architecture, the main operation of Montgomery's algorithm is partitioned into two parallel operations after using the precomputation of the quotient bit. In the non-interlaced architecture, we eliminate the one-clock-cycle gap between iterations by pairing off the double-layer architecture. We compare our architectures with some previously proposed Montgomery-based systolic architectures, on the basis of both modular multiplication and modular exponentiation. The comparisons indicate that our architectures offer the highest speed, lower hardware complexity, and lower power consumption 相似文献

14.

A New Algorithm for High-Speed Modular Multiplication Design

Ming-Der Shieh Jun-Hong Chen Wen-Ching Lin Hao-Hsuan Wu 《IEEE transactions on circuits and systems. I, Regular papers》2009,56(9):2009-2019

Modular exponentiation in public-key cryptosystems is usually achieved by repeated modular multiplications on large integers. Designing high-speed modular multiplication is thus very crucial to speed up the decryption/encryption process. In this paper, we first explore how to relax the data dependency that exists between multiplication, quotient determination, and modular reduction in the conventional Montgomery modular multiplication algorithm. Then, we propose a new modular multiplication algorithm for high-speed hardware design. The speed improvement is achieved by reducing the critical path delay from the 4-to-2 to 3-to-2 carry-save addition. The resulting time complexity of our development is further decreased by simultaneously performing the multiplication and modular reduction processes. Experimental results show that the developed modular multiplication can operate at speeds higher than those of related work. When the proposed modular multiplication is applied to modular exponentiation, both time and area-time advantages are obtained. 相似文献

15.

LFSR multipliers over GF(2) defined by all-one polynomial

Hyun-Sung Sung-Woon 《Integration, the VLSI Journal》2007,40(4):473-478

This paper presents two bit-serial modular multipliers based on the linear feedback shift register using an irreducible all one polynomial (AOP) over GF(2^m). First, a new multiplication algorithm and its architecture are proposed for the modular AB multiplication. Then a new algorithm and architecture for the modular AB² multiplication are derived based on the first multiplier. They have significantly smaller hardware complexity than the previous multipliers because of using the property of AOP. It simplifies the modular reduction compared with the case of using the generalized irreducible polynomial. Since the proposed multipliers have low hardware requirements and regular structures, they are suitable for VLSI implementation. The proposed multipliers can be used as the kernel architecture for the operations of exponentiation, inversion, and division. 相似文献

16.

Fast Reconfigurable Elliptic Curve Cryptography Acceleration for <Emphasis Type="Italic">GF</Emphasis>(2<Superscript><Emphasis Type="Italic">m</Emphasis></Superscript>) on 32 bit Processors

Aaron E. Cohen Keshab K. Parhi 《Journal of Signal Processing Systems》2010,60(1):31-45

This paper focuses on the design and implementation of a fast reconfigurable method for elliptic curve cryptography acceleration in GF(2^m). The main contribution of this paper is comparing different reconfigurable modular multiplication methods and modular reduction methods for software implementation on Intel IA-32 processors, optimizing point arithmetic to reduce the number of expensive reduction operations through a novel reduction sharing technique, and measuring performance for scalar point multiplication in GF(2^m) on Intel IA-32 processors. This paper determined that systematic reduction is best for fields defined with trinomials or pentanomials; however, for fields defined with reduction polynomials with large Hamming weight Barrett reduction is best. In GF(2⁵⁷¹) for Intel P4 2.8 GHz processor, long multiplication with systematic reduction was 2.18 and 2.26 times faster than long multiplication with Barrett or Montgomery reduction. This paper determined that Montgomery Invariant scalar point multiplication with Systematic reduction in Projective coordinates was the fastest method for single scalar point multiplication for the NIST fields from GF(2¹⁶³) to GF(2⁵⁷¹). For single scalar point multiplication on a reconfigurable elliptic curve cryptography accelerator, we were able to achieve ∼6.1 times speedup using reconfigurable reduction methods with long multiplication, Montgomery’s MSB Invariant method in projective coordinates, and systematic reduction. Further extensions were made to implement fast reconfigurable elliptic curve cryptography for repeated scalar point multiplication on the same base point. We also show that for L > 20 the LSB invariant method combined with affine doubling precomputation outperforms the LSB invariant method combined with López-Dahab doubling precomputation for all reconfigurable reduction polynomial techniques in GF(2⁵⁷¹) for Intel IA-32 processors. For L = 1000, the LSB invariant scalar point multiplication method was 13.78 to 34.32% faster than using the fastest Montgomery Invariant scalar point multiplication method on Intel IA-32 processors. 相似文献

17.

Hardware Elliptic Curve Cryptographic Processor Over$rm GF(p)$

《IEEE transactions on circuits and systems. I, Regular papers》2006,53(9):1946-1957

A novel hardware architecture for elliptic curve cryptography (ECC) over$ GF(p)$is introduced. This can perform the main prime field arithmetic functions needed in these cryptosystems including modular inversion and multiplication. This is based on a new unified modular inversion algorithm that offers considerable improvement over previous ECC techniques that use Fermat's Little Theorem for this operation. The processor described uses a full-word multiplier which requires much fewer clock cycles than previous methods, while still maintaining a competitive critical path delay. The benefits of the approach have been demonstrated by utilizing these techniques to create a field-programmable gate array (FPGA) design. This can perform a 256-bit prime field scalar point multiplication in 3.86 ms, the fastest FPGA time reported to date. The ECC architecture described can also perform four different types of modular inversion, making it suitable for use in many different ECC applications. 相似文献

18.

A New Modular Exponentiation Architecture for Efficient Design of RSA Cryptosystem

《Very Large Scale Integration (VLSI) Systems, IEEE Transactions on》2008,16(9):1151-1161

Modular exponentiation with a large modulus, which is usually accomplished by repeated modular multiplications, has been widely used in public key cryptosystems for secured data communications. To speed up the computation, the Montgomery modular multiplication algorithm is used to relax the process of quotient determination, and the carry-save addition (CSA) is employed to reduce the critical path delay. In this paper, based on the inherent data dependency between the modular multiplication and square operations in the H-algorithm of modular exponentiation, we present a new modular exponentiation architecture with a unified modular multiplication/square module and show how to reduce the number of input operands for the CSA tree by mathematical manipulation. The developed architecture has the following advantages. 1) There is no need to convert the carry-save form of an operand into its binary representation at the end of each modular multiplication. In this way, except the final step to get the result of modular exponentiation, the time-consuming carry propagation can then be eliminated. 2) The number of input operands for the CSA tree is reduced in a very efficient way. 3) The hardware saving is achieved with very limited impact on the original critical path delay when designed with two distinct modular multiplication and square components. Experimental results show that our modular exponentiation design obtains the least hardware complexity compared with the existing work and outperforms them in terms of area-time (AT) complexity as well. 相似文献