首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
提出一种可重构AES硬件架构,对加/解密运算模块和密钥扩展模块进行了可重构设计,使其能够适配128bit、192bit、256bit三种密钥长度的AES算法,并针对列混合模块进行了结构优化。在FPGA上进行了验证与测试,并在0.18μmSMIC工艺下进行了逻辑综合及布局布线。结果表明其核心时钟频率为270MHz,吞吐量达到3.4Gb/s,能够满足高性能的密码处理要求。  相似文献   

2.
In this paper, we present a typical temporal partitioning methodology that temporally partitions a data flow graph on reconfigurable system. Our approach optimizes the communication cost of the design. This aim can be reached by minimizing the transfer of data required between design partitions and the routing cost between FPGA modules. Consequently, our algorithm is composed by two main steps. The first step aims to find a temporal partitioning of the graph. This step gives the optimal solution in term of communication cost. Next, our approach builds the best architecture, on a partially reconfigurable FPGA, that gives the lowest routing cost between modules. The proposed methodology was tested on several examples on the Xilinx Virtex-II pro. The results show significant reduction in the communication cost compared with others famous approaches used in this field.  相似文献   

3.
陈乃金 《计算机应用》2012,32(1):158-162
针对可重构计算硬件任务划分通信成本较小化的问题,提出了一种基于深度优先贪婪搜索划分(DFGSP)算法。首先,从待调度的就绪队列中取出队首任务,在某一硬件面积约束下,按深度优先搜索(DFS)方式扫描一个计算密集型任务转换来的有向无环图(DAG),逐个划入满足要求的节点;然后,一遇到不满足面积要求的任务节点时,就计算当前划分模块间输出边数(可量化为通信成本);最后,跳过当前不满足要求的任务节点,继续搜索该点之后处于就绪状态的节点,当搜索到满足要求的点时,按加入该点后不增加当前划分块间输出边数和尽可能填满可重构运算阵列的原则进行。实验结果表明,与现有的簇划分(CBP)、簇层次敏感两种划分算法相比,提出的算法获得了最小划分模块数和平均跨模块间I/O边数最小的均值,通过实际验证,算法显著地改善了硬件任务的划分效果,而且运行开销没有明显增加。  相似文献   

4.
This paper focuses on the design process for reconfigurable architecture. Our contribution focuses on introducing a new temporal partitioning algorithm. Our algorithm is based on typical mathematic flow to solve the temporal partitioning problem. This algorithm optimizes the transfer of data required between design partitions and the reconfiguration overhead. Results show that our algorithm considerably decreases the communication cost and the latency compared with other well known algorithms.  相似文献   

5.
The most crucial task in real-time processing of steganography algorithms is to reduce the computational delay and increase the throughput of a system. This critical issue is effectively addressed by implementing steganography methods in reconfigurable hardware. In the proposed framework, a new high-speed reconfigurable architectures have been designed for Least Significant Bit (LSB) or multi-bit based image steganography algorithm that suits Field Programmable Gate Arrays (FPGAs) or Application Specific Integrated Circuits (ASICs) implementation. The architectures are designed and instantiated to implement the complete steganography system. The proposed system is competent enough to provide larger throughput, since high degrees of pipelining and parallel operations are incorporated at the module level. The evolved architectures are realized in Xilinx Virtex-II Pro XC2V500FG256-6 FPGA device using Register Transfer Level (RTL) compliant Verilog coding and has the capacity to work in real-time at the rate of 183.48 frames/second. Prior to the FPGA/ASIC implementation, the proposed steganography system is simulated in software to validate the concepts intended to implement. The hardware implemented algorithm is tested by varying embedding bit size as well as the resolution of a cover image. As it is clear from the results presented that the projected framework is superior in speed, area and power consumption compared to other researcher’s method.  相似文献   

6.
This paper presents a detailed architecture and a reconfigurable logic based hardware design of the SCAN algorithm. This architecture can be used to encrypt high resolution images in real-time. Although the SCAN algorithm is a block cipher algorithm with arbitrarily large blocks, the present design is for 64×64 pixel blocks in order to provide real-time image encryption throughput. The architecture was initially targeted at the Xilinx XCV-1000 FPGA, for which design and performance results are presented in the paper.  相似文献   

7.
制造执行系统是实现企业信息集成的关键技术,是企业信息化工程领域的研究热点之一。分析了制造执行系统的定义和功能,并提出基于软件构件的可重构制造执行系统的软件体系结构和实现方法。可重构制造执行系统的开发与运行实践表明,可重构制造执行系统的软件开发效率高、周期短、可扩展性好。  相似文献   

8.
制造执行系统是实现企业信息集成的关键技术,是企业信息化工程领域的研究热点之一。分析了制造执行系统的定义和功能,并提出基于软件构件的可重构制造执行系统的软件体系结构和实现方法。可重构制造执行系统的开发与运行实践表明,可重构制造执行系统的软件开发效率高、周期短、可扩展性好。  相似文献   

9.
近年来,可重构片上系统已成为科学研究及嵌入式应用领域中应对复杂计算需求的有效技术解决方案.针对目前缺少一个从系统级设计到应用实现,统一、综合规划动态重构问题的系统设计流程,以及动态重构过程对系统设计人员不透明等问题,在系统设计层给出了一种过程级软硬件统一编程模型.在此框架内,设计人员通过调用已根据应用特性进行优化的软硬件协同函数,即可利用高级语言完成系统功能描述;在细节设计层提出了基于单位面积加速比的软硬件任务调度算法,实时管理动态可重构资源;在应用实现层,以可重构专用图形加速卡为原型系统,论述动态可重构系统实现中的关键技术.实验及测试结果验证了通过将动态重构问题置于整个系统设计流程中予以考虑,能够达到提升系统开发效率之目的.  相似文献   

10.
为了实现网络入侵检测系统中的精确字符串匹配,本文提出了一种基于叶子-附加和二叉搜索树的字符串匹配算法及其实现架构;首先采用叶子-追加算法来对给定的模式集进行处理,以消除模式之间的重叠。然后采用二叉搜索树算法提取叶子模式及其匹配向量来构建二叉搜索树,并根据每个节点的比较结果,通过左遍历或右遍历来实现字符串的精确匹配;为了进一步提高字符串匹配算法的内存效率,提出了级联二叉搜索树;最后給出了实现精确字符串匹配的总体架构和各个功能模块的架构;实验结果表明,本文提出的设计不仅在内存效率和吞吐量方面优于目前先进的设计技术,而且具有灵活的可扩展性。  相似文献   

11.
随着互联网的数据量呈爆炸式增长,以纯软件方式运行的SM4算法速度慢、CPU占用率高,而基于Verilog/VHDL实现的现场可编程门阵列或专用集成电路存在灵活性差、升级维护困难等问题。为了解决上述问题,提出了一种SM4国密算法的异构可重构计算系统的设计方案,采用高层次综合和异构可重构技术,通过优化数据内存分配与传输、优化循环、矢量化内核以及增加计算单元等方式,设计了SM4算法电子密码本模式和计数器模式的定制计算架构,并将该系统部署在FPGA异构平台。实验结果表明:SM4-ECB和SM4-CTR两种主流工作模式的定制计算架构在Intel Stratix 10 GX2800上,吞吐率分别达到109.48 Gbps和63.73 Gbps,是Intel Xeon E5-2650 V2 CPU上对应模式吞吐率的232.63倍和141.62倍。以此核心模块(包含数据输入、加解密、输出)的整体异构可重构计算系统的性能也分别达到了纯软件方式的4.90倍和3.56倍。该方案不仅实现了针对特定模式进行定制加速,而且可以通过硬件重构灵活支持不同的计算模式,兼顾了系统的灵活性和高效性。  相似文献   

12.
BLAKE算法是基于ChaCha流密码和采用标准HAIFA迭代模式的SHA-3候选算法之一。针对现有BLAKE算法对于循环单元中G函数模型的研究少,且没有考虑硬件代价及实现结果等问题,提出可重构的设计思想,在FPGA上实现了BLAKE算法循环单元的三种G函数模式。在Xilinx Virtex-5 FPGA上的实现结果表明,在不影响性能的前提下,本方案可重构后的面积比分别实现的面积总和减少了58%。  相似文献   

13.
The Particle Swarm Optimization or PSO is a heuristic based on a population of individuals, in which the candidates for a solution of the problem at hand evolve through a simulation process of a social adaptation simplified model. Combining robustness, efficiency and simplicity, PSO has gained great popularity as many successful applications are reported. The algorithm has proven to have several advantages over other algorithms that based on swarm intelligence principles. The use of PSO solving problems that involve computationally demanding functions often results in low performance. In order to accelerate the process, one can proceed with the parallelization of the algorithm and/or map it directly onto hardware. This paper presents a novel massively parallel coprocessor for PSO implemented using reconfigurable hardware. The implementation results show that the proposed architecture is up to 135× and not less than 20× faster in terms of optimization time when compared to the direct software execution of the algorithm. Both the accelerator and the processor used to run the software version are mapped into FPGA reconfigurable hardware.  相似文献   

14.
阐述了可重构技术在密集型计算领域的广阔应用前景.基于该技术的数据加密系统兼具了硬件的效率和软件的灵活性,有着重要的理论与实际意义.在可重构系统基础上设计并实现了数据的加密计算.给出了一种通用的加密模块接口的设计方法,用于实现对加密模块的状态控制,并向接口用户提供一个简单易用,与底层实现无关的接口.在XUP开发平台上,用AES和DES数据加密算法进行了功能验证和性能分析,表明该方法行之有效.  相似文献   

15.
Integer pixel motion estimation (IME) is one crucial module with high complexity in high-definition video encoder. Efficient algorithm and architecture joint design is supposed to tradeoff multiple target parameters including throughput capacity, logic gate, on-chip SRAM size, memory bandwidth, and rate distortion performance. Data organization and on-chip buffer structure are crucial factors for IME architecture design, accounting for multiple target performance tradeoff. In this work, we combine global hierarchical search and local full search to propose hardware efficient IME algorithm, and then propose hardware VLSI architecture with optimized on-chip buffer structure. The major contribution of this work is characterized by: (1) improved hierarchical IME algorithm with presearch and deliberate data organization, (2) multistage on-chip reference pixel buffer structure with high data reuse between integer and fraction pixel motion estimations, (3) highly reused and reconfigurable processing element structure. The optimized data organization and buffer structure achieves nearly 70 % buffer saving with less than average 0.08, 0.12 dB the worst case, PSNR degradation compared with full search based architecture. At the hardware cost of 336 and 382 K logic gate and 20 kB SRAM, the proposed architecture achieves the throughput of 384 and 272 cycles per macroblock, at system frequency of 95 and 264 MHz for 1080p and QFHD @30fps format video coding.  相似文献   

16.
针对RS译码器结构复杂,资源消耗大的问题.提出了一种基于动态可重构技术的RS译码器;该译码器将伴随多项式计算和钱氏搜索算法在同一个可重构模块RSCM中通过动态改变电路结构,以时分复用的方式实现;给出了基于状态机的译码控制器,实现各功能模块的调用;采用VHDL语言实现,在Quartus Ⅱ 7.2环境下进行仿真;结果表明,该译码器能有效降低硬件资源占用率,最高时钟频率达到124MHz.  相似文献   

17.
This paper presents low-complex and novel techniques for designing reconfigurable architectures for multi-standard address generator and interleaver. The emphasis of this work is on hardware re-use, but it also focuses on optimizing the hardware to support multiple standards. A low-cost reconfigurable architecture for address generator and interleaver is proposed which operates in WLAN (802.11a/b/g and 802.11n), WiMAX (802.16e) and 3GPP LTE standards. A simple algorithm and a reconfigurable architecture that eliminates the computationally intricate mod function for LTE, and floor as well as mod function for WLAN/WiMAX, are proposed to reduce the hardware cost as well as implementation complexity. Novel architectures are also proposed to select the increment values for 16-QAM and 64-QAM schemes. A unique configurable subtracting sub-block for each modulation scheme is also presented. Software simulation is carried out to authenticate the functionality of the algorithm. The proposed reconfigurable architectures are realized on FPGA and tested on board. Synthesis results on Spartan-3 FPGA display 66% reduction in FPGA resource utilization and 74% increase in operating frequency compared to the cited address generators. Implementation results on Kintex UltraScale FPGA display a reduction of 34% in resource utilization and 20% in total on-chip power compared to the cited interleavers. This design is also implemented using 45 nm CMOS standard cell technology, and ASIC synthesis results of the reconfigurable address generator exhibit 76.4% improvement in data rate and 52.23% decrease in latency compared to the state-of-the-art address generators. The proposed multimode interleaver also exhibit 60.28% reduction in hardware complexity.  相似文献   

18.
Scalability planning for reconfigurable manufacturing systems   总被引:1,自引:0,他引:1  
Scalability is a key characteristic of reconfigurable manufacturing systems, which allows system throughput capacity to be rapidly and cost-effectively adjusted to abrupt changes in market demand. This paper presents a scalability planning methodology for reconfigurable manufacturing systems that can incrementally scale the system capacity by reconfiguring an existing system. An optimization algorithm based on Genetic Algorithm is developed to determine the most economical way to reconfigure an existing system. Adding or removing machines to match the new throughput requirements and concurrently rebalancing the system for each configuration, accomplishes the system reconfiguration. The proposed approach is validated through a case study of a CNC-based automotive cylinder head machining system.  相似文献   

19.
针对当前哈希函数算法标准和应用需求不同的现状,以及同一系统对安全性可能有着不同的要求,采用可重构的设计思想,在对SHA-1、SHA-256、SHA-512三种哈希函数的不同特征进行深入分析的基础上,总结归纳出统一的处理模型。根据不同的要求,每一种SHA(SHA-1、SHA-256、SHA-512)系列哈希函数都可以单独灵活地执行。使用流水线,并在关键路径进行加法器的优化,提高了算法的吞吐率。并且使用效能比的概念,与M3服务器对比,可重构平台的效能比比通用服务器高很多。  相似文献   

20.
In this paper we present a framework for realizing arbitrary instruction set extensions (IE) that are identified post-silicon. The proposed framework has two components viz., an IE synthesis methodology and the architecture of a reconfigurable data-path for realization of the such IEs. The IE synthesis methodology ensures maximal utilization of resources on the reconfigurable data-path. In this context we present the techniques used to realize IEs for applications that demand high throughput or those that must process data streams. The reconfigurable hardware called HyperCell comprises a reconfigurable execution fabric. The fabric is a collection of interconnected compute units. A typical use case of HyperCell is where it acts as a co-processor with a host and accelerates execution of IEs that are defined post-silicon. We demonstrate the effectiveness of our approach by evaluating the performance of some well-known integer kernels that are realized as IEs on HyperCell. Our methodology for realizing IEs through HyperCells permits overlapping of potentially all memory transactions with computations. We show significant improvement in performance for streaming applications over general purpose processor based solutions, by fully pipelining the data-path.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号