首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 328 毫秒
1.
文章分析了主要分组密码算法操作特征以及处理结构的特点,结合可重构处理结构的设计方法,提出一种可重构密码处理结构.设计实现了基于可重构密码处理结构的验证原型.分析结果表明,在验证原型上执行的分组密码算法都可达到较高的性能.  相似文献   

2.
杜怡然  李伟  戴紫彬 《电子学报》2020,48(4):781-789
针对密码算法的高效能实现问题,该文提出了一种基于数据流的粗粒度可重构密码逻辑阵列结构PVHArray.通过研究密码算法运算及控制结构特征,基于可重构阵列结构设计方法,提出了以流水可伸缩的粗粒度可重构运算单元、层次化互连网络和面向周期级的分布式控制网络为主体的粗粒度可重构密码逻辑阵列结构及其参数化模型.为了提升可重构密码逻辑阵列的算法实现效能,该文结合密码算法映射结果,确定模型参数,构建了规模为4×4的高效能PVHArray结构.基于55nm CMOS工艺进行流片验证,芯片面积为12.25mm2,同时,针对该阵列芯片进行密码算法映射.实验结果表明,该文提出高效能PVHArray结构能够有效支持分组、序列以及杂凑密码算法的映射,在密文分组链接(CBC)模式下,相较于可重构密码逻辑阵列REMUS_LPP结构,其单位面积性能提升了约12.9%,单位功耗性能提升了约13.9%.  相似文献   

3.
王寿成  李功丽  严迎建  徐进辉 《电子学报》2017,45(10):2457-2463
通过对分组密码算法加密特征的分析,将分组密码算法的并行性划分为分组内同操作并行性、分组内异操作并行性、分组间同操作并行性和分组间异操作并行性等四维度并行性,并根据此提出了基于Amdahl定律的分组密码四维度并行处理模型FDPM.该模型能够指导分组密码处理架构设计,为架构资源配置和并行性开发提供整体建议.以FDPM为依据,提出了一种面向分组密码的可重构流处理架构RCSA,该架构能够有效开发分组密码处理的并行性,在提高密码处理性能的同时也能提高资源利用率.通过算法映射结果分析,证明了FDPM模型的正确性与RCSA架构的高效性.  相似文献   

4.
计算资源与寄存器资源分配是可重构处理器自动并行映射的重要问题,该文针对可重构分组密码指令集处理器的资源分配问题,建立算子调度参数模型和处理器资源参数模型,研究了分组密码并行调度与资源消耗之间的约束关系;在此基础上提出基于贪婪思维、列表调度和线性扫描的自动映射算法,实现了分组密码在可重构分组密码指令集处理器上的自动映射。通过可用资源变化实验验证算法并行映射的有效性,并对AES-128算法的映射效果做了横向对比验证算法的先进性,所提自动映射算法对分组密码在可重构处理中的并行计算研究有一定的指导意义。  相似文献   

5.
计算资源与寄存器资源分配是可重构处理器自动并行映射的重要问题,该文针对可重构分组密码指令集处理器的资源分配问题,建立算子调度参数模型和处理器资源参数模型,研究了分组密码并行调度与资源消耗之间的约束关系;在此基础上提出基于贪婪思维、列表调度和线性扫描的自动映射算法,实现了分组密码在可重构分组密码指令集处理器上的自动映射.通过可用资源变化实验验证算法并行映射的有效性,并对AES-128算法的映射效果做了横向对比验证算法的先进性,所提自动映射算法对分组密码在可重构处理中的并行计算研究有一定的指导意义.  相似文献   

6.
该文在研究分组密码算法处理特征的基础上,提出了可重构分簇式分组密码处理器架构。在指令的控制下,数据通路可动态地重构为4个32bit簇,2个64bit簇和一个128bit簇,满足了分组密码算法数据处理所需的灵活性。基于分簇结构,提出了由指令显性地分隔电路结构的低功耗优化技术,采用此技术使得整体功耗降低了36.1%。设计并实现了5级流水线以及运算单元内流水结构,处理AES/DES/IDEA算法的速度分别达到了689.6Mbit/s, 400Mbit/s和416.7Mbit/s。  相似文献   

7.
密码专用可编程逻辑阵列(CSPLA)是一种数据流驱动的密码处理结构,该文针对不同规模的阵列结构和密码算法映射实现能效关系的问题,首先以CSPLA的特定硬件结构为基础,以分组密码的高能效实现为切入点,建立基于该结构的分组密码算法映射能效模型并分析影响能效的相关因素,然后进一步根据阵列结构上算法映射的基本过程提出映射算法,最后选取几种典型的分组密码算法分别在不同规模的阵列进行映射实验.结果表明越大的规模并不一定能够带来越高的能效,为取得映射的最佳能效,阵列的规模参数应当与具体的硬件资源限制和密码算法运算需求相匹配,CSPLA规模为4×4~4×6时映射取得最优能效,AES算法最优能效为33.68 Mbps/mW,对比其它密码处理结构,CSPLA具有较优的能效特性.  相似文献   

8.
密码专用可编程逻辑阵列(CSPLA)是一种数据流驱动的密码处理结构,该文针对不同规模的阵列结构和密码算法映射实现能效关系的问题,首先以CSPLA的特定硬件结构为基础,以分组密码的高能效实现为切入点,建立基于该结构的分组密码算法映射能效模型并分析影响能效的相关因素,然后进一步根据阵列结构上算法映射的基本过程提出映射算法,最后选取几种典型的分组密码算法分别在不同规模的阵列进行映射实验。结果表明越大的规模并不一定能够带来越高的能效,为取得映射的最佳能效,阵列的规模参数应当与具体的硬件资源限制和密码算法运算需求相匹配,CSPLA规模为4×4~4×6时映射取得最优能效,AES算法最优能效为33.68 Mbps/mW,对比其它密码处理结构,CSPLA具有较优的能效特性。  相似文献   

9.
针对基于粗粒度可重构阵列结构的分组密码算法映射情况复杂、难以实现统一度量的问题,该文采用多目标决策手段,以性能及功耗参数为决策目标,基于分组密码算法轮运算及粗粒度可重构阵列结构特征约束,提出了一种面向分组密码算法映射的加权度量模型.同时,采用主客观综合分析法,定义了模型权重参数的计算方式,从而通过配置合理的权重参数,以高能效映射算法实现差异化的映射.为了降低决策时间,该文进一步提出了基于二进制编码的枚举搜索算法,实现了最优映射结果搜索与映射矩阵建立的并行,使决策的时间复杂度降至O(2n).实验结果表明,该文提出的加权度量模型能实现高效的分组密码算法映射方案决策,单位面积性能提升了约14.2%,能效提升了约一倍.  相似文献   

10.
冯晓  李伟  戴紫彬  马超  李功丽 《电子学报》2017,45(6):1311-1320
现有的可重构分组密码实现结构中,专用指令处理器吞吐率不高,阵列结构资源利用率低、算法映射过程复杂.为此,设计了分组密码可重构异构多核并行处理架构RAMCA(Reconfigurable Asymmetrical Multi-Core Architecture),分析了典型SP(AES-128)、Feistel(SMS4)、L-M(IDEA)及MISTY(KASUMI)结构算法在RAMCA上的映射过程.在65nm CMOS工艺下完成了逻辑综合和功能仿真.实验表明,RAMCA工作频率可达到1GHz,面积约为1.13mm2,消除工艺影响后,对各分组密码算法的运算速度均高于现有专用指令处理器以及Celator、RCPA和BCORE等阵列结构密码处理系统.  相似文献   

11.
There is increasing research and commercial interest in miniature on-body and implantable devices for continuous real-time biosignal monitoring. A key challenge in realizing this vision is in implementation of biosignal processing algorithms with acceptably low energy consumption. In this article, we investigate implementation of the REACT algorithm for real-time epileptic seizure detection on a Coarse Grained Reconfigurable Array (CGRA) based architecture. Computationally expensive biosignal processing tasks are offloaded from a conventional Digital Signal Processor (DSP) to the CGRA. The CGRA is designed to support low power biosignal processing by means of a systolic architecture, flexible interconnect and low resource usage. The CGRA architecture is shown to provide 38% and 60% improvements in energy consumption and in performance, respectively, for the REACT system, without the use of voltage scaling or increased clock frequency.  相似文献   

12.
文中提出了一种基于RCSIMD体系结构的8192点FFT的并行算法.该并行算法将8192数据分成连续64块,每块128个连续数据(存储在可重构处理元的局部存储器),采用RCSIMD可重构处理阵列完成块倒位序变换,块内只进行逻辑上的倒位序变换(倒位序过程隐含在配置数据中).这种数据存储和倒位序处理方法可以充分利用处理阵列通信网络和处理单元的能力.  相似文献   

13.
姚于斌  毛志刚 《信息技术》2008,32(4):102-104
在现有可重构协处理器设计实现的基础上,根据图像处理算法的特点,提出了一种改进的可重构协处理器结构IRC.在IRC中,采用了一种基于网格的网络结构与行(列)SIMD的工作模式,使得图像处理算法的执行效率得到提高.在典型的图像处理应用中,IRC能够达到通用处理器的7倍以上的速度.  相似文献   

14.
It’s a promising way to improve performance significantly by adding reconfigurable processing unit (RPU) to a general purpose processor. In this paper, a Reconfigurable Multi-Core (RMC) architecture combining general multi-core and reconfigurable logic is proposed. Reconfigurable logic is separated into RPUs logically, which are coupled with general purpose cores as co-processors via a full crossbar switch. An RPU Manager (RPU-M) is also designed to manage RPUs. To verify RMC, a simulation method based on the Simics and Virtex 5 FPGA is adopted, which simplifies the simulation and assures the evaluation accuracy of hardware function cores. Five workloads are selected to test RMC, including 3-DES, AES, SHA2, IDCT and JPEG_ENC. The experimental results show a 3.10 times average speedup over software implementation on the original multi-core, and the data and control communication overhead on RMC is acceptable.  相似文献   

15.
Applications based on the fast Fourier transform (FFT), such as signal and image processing, require high computational power, plus the ability to experiment with algorithms. Reconfigurable hardware devices in the form of field programmable gate arrays (FPGAs) have been proposed as a way of obtaining high performance at an economical price. However, users must program FPGAs at a very low level and have a detailed knowledge of the architecture of the device being used. They do not therefore facilitate easy development of, or experimentation with, signal/image processing algorithms. To try to reconcile the dual requirements of high performance and ease of development, the paper reports on the design and realisation of a high level framework for the implementation of 1D and 2D FFTs for real-time applications. A wide range of FFT algorithms, including radix-2, radix-4, split-radix and fast Hartley transform (FHT) have been implemented under a common framework in order to enable system designers to meet different system requirements. Results show that the parallel implementation of 2D FFT achieves linear speed-up and real-time performance for large matrix sizes. Finally, an FPGA-based parametrisable environment based on 2D FFT is presented as a solution for frequency-domain image filtering application.  相似文献   

16.
针对可重构密码处理器对于不同域上的序列密码算法兼容性差、实现性能低的问题,该文分析了序列密码算法的多级并行性并提出了一种反馈移位寄存器(FSR)的预抽取更新模型。进而基于该模型设计了面向密码阵列架构的可重构反馈移位寄存器运算单元(RFAU),兼容不同有限域上序列密码算法的同时,采取并行抽取和流水处理策略开发了序列密码算法的反馈移位寄存器级并行性,从而有效提升了粗粒度可重构阵列(CGRA)平台上序列密码算法的处理性能。实验结果表明与其他可重构处理器相比,对于有限域(GF)(2)上的序列密码算法,RFAU带来的性能提升为23%~186%;对于GF(2u)域上的序列密码算法,性能提升达约66%~79%,且面积效率提升约64%~91%。  相似文献   

17.
This paper describes a new specialized Reconfigurable Cryptographic for Block ciphers Architecture(RCBA).Application-specific computation pipelines can be configured according to the characteristics of the block cipher processing in RCBA,which delivers high performance for cryptographic applications.RCBA adopts a coarse-grained reconfigurable architecture that mixes the appropriate amount of static configurations with dynamic configurations.RCBA has been implemented based on Altera’s FPGA,and representative algorithms of block cipher such as DES,Rijndael and RC6 have been mapped on RCBA architecture successfully.System performance has been analyzed,and from the analysis it is demonstrated that the RCBA architecture can achieve more flexibility and efficiency when compared with other implementations.  相似文献   

18.
Many architectures have been proposed for rank order and stack filtering. To achieve additional speedup in these structures requires the use of parallel processing techniques such as pipelining and block processing. Pipelining is well understood but few block architectures have been developed for rank order and stack filtering. Block processing is essential for additional speedup when the original architecture has reached the throughput limits caused by the underlying technology. A trivial block structure simply repeats a single input, single output structure to generate a multiple input, multiple output structure. Therefore the architecture can achieve speedups equal to the number of multiple outputs or the block size. However, unlike linear filters, the rank order and stack filter outputs are calculated using comparisons. It is possible to share these comparisons within the block structure and thus substantially reduce the size of the block structure. The authors introduce a systematic method for applying block processing to rank order filters and stack filters. This method takes advantage of shared comparisons within the block structure to generate a block filter with shared substructures whose complexity is reduced by up to one-third compared to the original filter structure times the block size. Furthermore, block processing is important for the generation of low power designs. A block structure can trade its increased speedup for a throughput equal to the original single output architecture but with a significantly lower power requirement. The power reduction in the trivial block structures is limited by the power supply voltage. They demonstrate how block structures with shared substructures allow them to continue decreasing the power consumption beyond the limit imposed by the supply voltage  相似文献   

19.
In this paper, we describe an impulse-based ultra wideband (UWB) radio system for wireless sensor network (WSN) applications. Different architectures have been studied for base station and sensor nodes. The base station node uses coherent UWB architecture because of the high performance and good sensitivity requirements. However, to meet complexity, power and cost constraints, the sensor module uses a novel non-coherent architecture that can autonomously detect the UWB signals. The radio modules include a transceiver block, a baseband processing unit and a power management block. The transceiver block includes a Gaussian pulse generator, a multiplier, an integrator and timing circuits. For long range applications, a wideband low noise amplifier (LNA) is included in the transceiver of the sensor module, whereas in short range applications it is simply eliminated to further reduce the power consumption. In order to verify the proposed system concept, circuit level implementation is studied using 1.5 V 0.18 μm CMOS technology. Finally, the UWB radio modules have been designed for implementation in liquid-crystal-polymer (LCP) based System-on-Package (SoP) technology for low power, low cost and small size integration. A small low cost, double-slotted, Knight’s helm antenna is embedded in the LCP substrate, which shows stable characterization and a return loss better than ?10 dB over the UWB band.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号