首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
复合域乘法运算是对称密码算法中的基本运算和重要模块,因操作复杂且计算时间长,其实现性能在很大程度上制约着对称密码算法的运算速度。文章研究了对称密码算法中的复合域乘法运算特点及实现原理,设计了以GF(28)为基域,扩展到GF((28 )h(k=1,2,3,4)域上的复合域乘法可重构架构,通过配置能够灵活高效地实现GF(2 8)、GF((2H)2)、GF(2 8)3、CF((28)4)域上的有限域乘法操作。同时结合处理器的指令设计方法,设计了通用的复合域乘法操作及配置指令,能够极大的提高对称密码算法中复合域乘法运算的处理效率。最后文章对复合域乘法可重构架构进行了模拟与验证,在0.18μmCMOS工艺标准单元库下进行逻辑综合以及布局布线,并对综合结果进行了性能评估。结果表明,文章提出的复合域乘法可重构架构及相应的专用指令,在灵活性的前提下提供了较高的执行效率,具有较高的实用价值。  相似文献   

2.
脉动阵列结构规整、吞吐量大,适合矩阵乘算法,广泛用于设计高性能卷积、矩阵乘加速结构。在深亚微米工艺下,通过增大阵列规模来提升芯片计算性能,会导致频率下降、功耗剧增等问题。因此,结合3D集成电路技术,提出了一种将平面脉动阵列结构映射到3D集成电路上的双精度浮点矩阵乘加速结构3D-MMA。首先,设计了针对该结构的分块映射调度算法,提升矩阵乘计算效率;其次,提出了基于3D-MMA的加速系统,构建了3D-MMA的性能模型,并对其设计空间进行探索;最后,评估了该结构实现代价,并同已有先进加速器进行对比分析。实验结果表明,访存带宽为160GB/s时,采用4层16×16脉动阵列的堆叠结构时,3D-MMA计算峰值性能达3TFLOPS,效率达99%,且实现代价小于二维实现。在相同工艺下,同线性阵列加速器及K40GPU相比,3D-MMA的性能是后者的1.36及1.92倍,而面积远小于后者。探索了3D集成电路在高性能矩阵乘加速器设计中的优势,对未来进一步提升高性能计算平台性能具有一定的参考价值。  相似文献   

3.
流水线配置技术在可重构处理器中的应用   总被引:1,自引:1,他引:0       下载免费PDF全文
提出一种应用于可重构处理器中的流水线配置技术,能够有效减低配置时间,提高应用程序的执行速度。可重构处理器包括通用处理器和一个粗颗粒度的可重构阵列。可重构阵列将处理应用中占据大量执行时间的循环,这些循环将被分解为不同的行在阵列上以流水线的方式执行。该技术在FPGA验证系统上得到了验证。验证的应用包括H.264基准中的整数离散余弦变换和运动估计。相比传统的可重构处理器PipeRench, MorphoSys以及TI的DSP TMS320DM642有大约3.5倍的性能提升。  相似文献   

4.
选择素数域和二进制域上基于字的Montgomery模乘算法,分析传统双域模乘器在二进制域上运算效率不高的问题,首先选择能够使两个域上模乘器延迟时间相当的字长,并对模乘器进行双域的可重构设计,使之能够同时支持素数域和二进制域上的运算。相较以往设计,采用双域双基设计的模乘器使时钟周期数平均缩短了48%。  相似文献   

5.
连宜新  陈韬  李伟  南龙梅 《计算机工程》2022,48(2):156-163+172
为解决对称密码中s盒和非线性布尔函数(NBF)在实现密码专用处理器时采用异构化设计导致的资源浪费问题,提出一种类AESs盒和NBF的可重构电路结构方法。分析s盒问题中的原有非线性布尔函数模块(NBFM),4-4、6-4的s盒电路能够提供更好的适配性,但不能很好地支持8-8的s盒电路。基于塔域分解理论,论证不同的类AESs盒电路差异在于输入前后的转换矩阵。采用混合基的方法将类AESs盒电路分解成GF (16)上的各个运算模块,并推导出模块比特级别表达式,在具体适配运算模块时采取门级实现、NBFM适配实现或对NBFM进行改进3种方案,实现类AESs盒和NBF的可重构电路。实验结果表明,该方法在不影响原有NBF功能的基础上,利用4个NBFM与22.7%的s盒电路面积即可实现一个完整的类AESs盒电路。  相似文献   

6.
Redundant signed digit number systems have been used as a basis for the construction of fast arithmetic circuits for several years. In particular, addition circuits with no carry-ripple effects have been developed using signed binary arithmetic systems. This paper presents a general class of signed binary addition tables and provides a framework for constructing various tables. The existence of an entire class of tables provides a circuit designer with an additional degree of freedom while developing addition circuitry. The choice of the exact form of the addition table can be based on the dominant desired characteristics of the resultant circuit. An example of a circuit derived for area minimization is presented and compared to another signed binary addition circuit that was previously published. Both circuits were optimized and mapped to 20 different CMOS cell libraries. The experimental results indicate an average decrease in area of 26% and an average decrease in dynamic power consumption of 29% with an average increase in delay of only 4.4%.  相似文献   

7.
为满足嵌入式设备小面积高性能的需求,设计一种基于开源RISC-V指令集的32位可综合乱序处理器.处理器包括分支预测、相关性处理等关键技术,支持RISC-V基本整数运算、乘除法以及压缩指令集.采用具有顺序单发射、乱序执行、乱序写回等特性的三级流水线结构,运用哈佛体系结构及AHB总线协议,可满足并行访问指令与数据的需求.在...  相似文献   

8.
针对CMOS/纳米线/分子混合(CMOL)电路的缺陷导致电路功耗增加这一问题,提出基于单元限用的容错映射方法.首先建立缺陷对的功耗模型,分析常连缺陷对的映射模式对功耗的影响;然后通过高功耗单元的限用与功耗约束的设置,以减少高成本映射模式带来的功耗开销;最后采用改进的遗传算法完成电路容错映射.ISCAS标准测试电路的实验...  相似文献   

9.
为了提高伽罗华有限域乘法器的通用性,降低实现的复杂度,采用自然基算法,用简单的逻辑门电路实现乘法运算过程。提出可重构的迭代计算结构,能满足域长m为3~8的乘法器,并用FPGA实现。结果表明,可重构有限域乘法器能够满足多种标准RS码的乘法运算的需要。  相似文献   

10.
Algorithms for four quadrant two's complement multiplication by integer power of two variables follow from a circuit word mathematical approach. Electronic and electro-mechanical implementations yield economical and fast slide multiplier circuits. Alternatives are ranked according to a component gate count X propagation delay index of design efficiency.  相似文献   

11.
Need of Digital Signal Processing (DSP) systems which is embedded and portable has been increasing as a result of the speed growth of semiconductor technology. Multiplier is a most crucial part in almost every DSP application. So, the low power, high speed multipliers is needed for high speed DSP. Array multiplier is one of the fast multiplier because it has regular structure and it can be designed very easily. Array multiplier is used for multiplication of unsigned numbers by using full adders and half adders. It depends on the previous computations of partial sum to produce the final output. Hence, delay is more to produce the output. In the previous work, Complementary Metal Oxide Semiconductor (CMOS) Carry Look-ahead Adders (CLA) and CMOS power gating based CLA are used for maximizing the speed of the multiplier and to improve the power dissipation with minimum delay. CMOS logic is based on radix 2(binary) number system. In arithmetic operation, major issue corresponds to carry in binary number system. Higher radix number system like Quaternary Signed Digit (QSD) can be used for performing arithmetic operations without carry. The proposed system designed an array multiplier with Quaternary Signed Digit number system (QSD) based Carry Look-Ahead Adder (CLA) to improve the performance. Generally, the quaternary devices require simpler circuit to process same amount of data than that needed in binary logic devices. Hence the Quaternary logic is applied in the CLA to improve the speed of adder and high throughput. In array multiplier architecture, instead of full adders, carry look-ahead adder based on QSD are used. This facilitates low consumption of power and quick multiplication. Tanner EDA tool is used for simulating the proposed multiplier circuit in 180 nm technology. With respect to area, Power Delay Product (PDP), Average power proposed QSD CLA multiplier is compared with Power gating CLA and CLA multiplier.  相似文献   

12.
Recently, cryptographic applications based on finite fields have attracted much attention. The most demanding finite field arithmetic operation is multiplication. This investigation proposes a new multiplication algorithm over GF(2^m) using the dual basis representation. Based on the proposed algorithm, a parallel-in parallel-out systolic multiplier is presented, The architecture is optimized in order to minimize the silicon covered area (transistor count). The experimental results reveal that the proposed bit-parallel multiplier saves about 65% space complexity and 33% time complexity as compared to the traditional multipliers for a general polynomial and dual basis of GF(2^m).  相似文献   

13.
在椭圆曲线密码体制(ECC)中,有限域GF(2m)上模乘运算是最基本的运算,加速模乘运算是提高ECC算法性能的关键。针对不同不可约多项式广泛应用的现状,提出了一种通用GF(2m)模乘加速器设计方案。该加速器通过指令调度的方式,能快捷地完成有限域上模乘运算。实现结果表明,该设计完全适用于智能卡等应用要求。  相似文献   

14.
为使可重构扫描网络免受未经授权的访问、恶意仪器对传输数据的窜改和嗅探3种安全攻击的影响,提出了锁定隔离安全结构.该结构首先把彼此不具有安全威胁的仪器分成一组,通过控制隔离信号实现组与组之间的单独访问,以防止恶意仪器对传输数据的窜改和嗅探.然后通过使用锁段插入位保护关键的仪器,只有扫描网络中处于特定位置的多个键值被设置成特定值(0,1序列)时锁段插入位才能打开,能加大未经授权访问的难度.此外,为解决仪器分组多导致硬件开销大和布线困难的问题,提出了仪器分组算法,根据仪器间的安全关系构建无向图,然后对无向图求极大独立集,能有效地减少仪器分组数.在ITC 02基准电路上的实验结果表明,与国际上同类方法相比,所提的安全结构打开锁段插入位所需要的时间增大了7倍,在面积、功耗和布线上的平均减少百分比分别为3.81%,9.02%和4.55%.  相似文献   

15.
Digital signal processing algorithms often rely heavily on a large number of multiplications, which is both time and power consuming. However, there are many practical solutions to simplify multiplication, like truncated and logarithmic multipliers. These methods consume less time and power but introduce errors. Nevertheless, they can be used in situations where a shorter time delay is more important than accuracy. In digital signal processing, these conditions are often met, especially in video compression and tracking, where integer arithmetic gives satisfactory results. This paper presents a simple and efficient multiplier with the possibility to achieve an arbitrary accuracy through an iterative procedure, prior to achieving the exact result. The multiplier is based on the same form of number representation as Mitchell’s algorithm, but it uses different error correction circuits than those proposed by Mitchell. In such a way, the error correction can be done almost in parallel (actually this is achieved through pipelining) with the basic multiplication. The hardware solution involves adders and shifters, so it is not gate and power consuming. The error summary for operands ranging from 8 bits to 16 bits indicates a very low relative error percentage with two iterations only. For the hardware implementation assessment, the proposed multiplier is implemented on the Spartan 3 FPGA chip. For 16-bit operands, the time delay estimation indicates that a multiplier with two iterations can work with a clock cycle more than 150 MHz, and with the maximum relative error being less than 2%.  相似文献   

16.
针对高性能RISC-V处理器乘法运算延迟过长的问题,改进了基本乘法器中的基4-Booth编码以及Wallace树型结构,提出了基于符号补偿的基4-Booth编码以及交替使用3-2压缩器和4-2压缩器的Wallace树型结构。基于符号补偿的基4-Booth编码减少了部分积的数量,降低了符号位进位翻转带来的功耗。改进的Wallace树型结构减少了部分积累加所花费的时钟周期,缩短了乘法器的关键路径,降低了乘法指令的执行延迟。利用VCS仿真验证了改进的乘法器功能正确性,通过板级测试评估了其性能。结果表明,本文的乘法器功能正确,相较于PicoRV32,执行整型乘法指令所花费的时钟周期缩短了88.2%。Dhrystone分数提高了71.7%,功耗降低了4.9%。  相似文献   

17.
丁勇 《计算机工程》2009,35(8):169-170
对于GF(p)上的椭圆曲线的标量乘计算,Ciet通过引入特征多项式为φ2+2=0的自同态φ,提出一种整数k的φ-NAF分解。对φ-NAF分解使用窗口技术得到k的φ-NAFw分解,通过一定量的存储可以获取更快的计算速度。对该分解的长度和Hamming密度进行较为准确的估计。  相似文献   

18.
线性反馈移位寄存器是序列密码的重要组成部分,介绍了一种可重构线性反馈移位寄存器的设计.根据需要,它可以被配置成为GF(2),GF(28),GF(216)或GF(232)域上一定长度范围内的任意一个线性反馈移位寄存器.该可重构线性反馈移位寄存器原型已在Altera公司的EP2S60F1020C5 FPGA上实现,最高可工作在100 MHz时钟下.试验结果表明:该可重构线性反馈移位寄存器占用硬件资源少,性能稳定.  相似文献   

19.
骆祖莹  闵应骅  杨士元 《计算机学报》2001,24(10):1034-1043
过大的平均功耗使芯片产生较多的热量,降低芯片的可靠性及性能,严重时会损坏芯片,因此有效地对电路平均功耗做出精确的估计非常重要。由于实际电路存在时间延迟,而考虑延时的电路功耗模型计算量较大,用模拟方法求取电路平均功耗非常耗时。为了在较短的时间内对VLSI电路的平均功耗做出较为可信的估计,该文提出了一套电路功耗分析理论,并由此给出了一种用于CMOS电路平均功耗快速模拟的输入向量对序列压缩方法,ISCAS85及ISCAS89电路集的实验结果表明这种估计方法具有平均功耗估计值准确和加速明显的优点。  相似文献   

20.
This study presents an efficient exponent architecture for public-key cryptosystems using Montgomery multiplication based on programmable cellular automata (PCA). Multiplication is the key operation in implementing circuits for cryptosystem, as the process of encrypting and decrypting a message requires modular exponentiation which can be decomposed into multiplications. Efficient multiplication algorithm and simple architecture are the key for implementing exponentiation. Thus we employ Montgomery multiplication algorithm and construct simple architecture based on irreducible all one polynomial (AOP) in GF(2m). The proposed architecture has the advantage of high regularity and a reduced hardware complexity based on combining the characteristics of the irreducible AOP and PCA. The proposed architecture can be efficiently used for public-key cryptosystem.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号