共查询到20条相似文献,搜索用时 140 毫秒
1.
2.
3.
4.
5.
6.
7.
8.
阐述了可重构技术在密集型计算领域的广阔应用前景.基于该技术的数据加密系统兼具了硬件的效率和软件的灵活性,有着重要的理论与实际意义.在可重构系统基础上设计并实现了数据的加密计算.给出了一种通用的加密模块接口的设计方法,用于实现对加密模块的状态控制,并向接口用户提供一个简单易用,与底层实现无关的接口.在XUP开发平台上,用AES和DES数据加密算法进行了功能验证和性能分析,表明该方法行之有效. 相似文献
9.
10.
11.
针对传统谷物粉种类检测速度较慢的问题,基于ZYNQ平台实现随机森林算法辅助微波无损检测技术对谷物粉种类进行高效准确识别。通过对随机森林模型硬件实现的分析研究,提出了一种改进模型参数结构,有效节省了硬件存储资源的消耗。为了缩短算法预测时间并降低系统功耗,在硬件实现时引入提前终止识别机制,在保证准确率不变的前提下避免不必要的决策树预测过程。针对Zedboard开发板,设计一种模型参数存储方案,充分利用片上资源保证系统正常工作。实验结果表明,与传统CPU实现随机森林算法相比,该方案在ZYNQ上运行的实测时间缩短约54.2%,同时没有引起识别精度的损失。 相似文献
12.
针对高效视频编解码标准中后处理CNN算法在通用平台运行时产生的高延时缺点,提出一种基于现场可编程逻辑门阵列(FPGA)的后处理卷积神经网络硬件并行架构。提出的并行架构通过改进输入与输出缓冲的数据并发过程,调整卷积模块整体并行度,加快模块硬件流水。实验结果表明,基于本文所提出的并行架构设计的CNN硬件加速器在Xilinx ZCU102上处理分辨率为176×144视频流,计算性能相当于每秒360.5 GFLOPS,计算速度可满足81.01 FPS,相比时钟频率4 GHz的Intel i7-4790K,计算速度加快了76.67倍,相比NVIDIA GeForce GTX 750Ti加速了32.50倍。在计算能效比方面,本文后处理CNN加速器功耗为12.095 J,能效比是Intel i7-4790K的512.90倍,是NVIDIA GeForce GTX 750Ti的125.78倍。 相似文献
13.
Wearable computers are embedded into the mobile environment of their users. A design challenge for wearable systems is to combine the high performance required for tasks such as video decoding with the low energy consumption required to maximise battery runtimes and the flexibility demanded by the dynamics of the environment and the applications. In this paper, we demonstrate that reconfigurable hardware technology is able to answer this challenge. We present the concept and the prototype implementation of an autonomous wearable unit with reconfigurable modules (WURM). We discuss experiments that show the uses of reconfigurable hardware in WURM: ASICs-on-demand and adaptive interfaces. Finally, we present an experiment with an operating system layer for WURM. 相似文献
14.
在考虑动态部分重构及重构延时等特征的基础上,采用遗传算法及其与爬山算法的融合实现可重构系统软硬件任务的划分,并采用动态优先级调度算法进行划分结果的评价。实验表明,在可重构系统的资源约束等条件下,算法能够有效地实现应用任务图到可重构系统的时空映射。 相似文献
15.
M. FonsAuthor VitaeF. FonsAuthor Vitae E. CantóAuthor Vitae 《Future Generation Computer Systems》2012,28(1):268-286
Nowadays the development of automatic biometrics-based personal recognition systems is a reality in the current technological age. Not only those applications demanding stringent security levels but also many daily use consumer applications request the existence of high performance computational platforms in charge of recognizing the identity of an individual based on the analysis of his/her physiological or behavioural characteristics. The state of the art points out two main open problems in the implementation of such automatic applications: on the one hand, the needed improvement of the reliability level of the existing recognition systems in terms of accuracy, security and real-time performances; on the other hand, the cost reduction of those physical platforms in charge of the processing.This work addresses those limitations of current systems and aims at finding the proper system architecture to develop this kind of high-performance applications at low cost. Because of that, those existing solutions based on expensive multiprocessor systems like HPC (High Performance Computer), GPU (Graphics Processing Unit), or PC (Personal Computer) platforms need to be discarded, and instead of them embedded system solutions based on programmable logic devices are suggested in this work. The programmability performances of FPGA (Field Programmable Gate Array) devices together with the inherent parallelism of hardware design provide the needed flexibility to develop made-to-measure coprocessors in charge of accelerating those time-critical computational tasks. To address the cost of the system, dynamically reconfigurable FPGAs are suggested in this work. The scheduling of the recognition application into a series of mutually exclusive tasks, and the reutilization of those functional resources available in the FPGA by multiplexing different coprocessors in the same area along the application execution time allows reducing the size of the device and therefore its cost at the expense of the reconfiguration overhead.The hardware-software co-design of an AFAS (automatic fingerprint-based authentication system) under two different run-time reconfigurable platforms is presented as the proof of concept of the suggested architecture. The outstanding results achieved in this work pave the way for the implementation of biometric applications by means of run-time reconfigurable FPGAs. 相似文献
16.
ABSTRACT We present a method for implementing hardware intelligent processing accelerator on domestic service robots. These domestic service robots support human life; therefore, they are required to recognize environments using intelligent processing. Moreover, the intelligent processing requires large computational resources. Therefore, standard personal computers (PCs) with robot middleware on the robots do not have enough resources for this intelligent processing. We propose a ‘connective object for middleware to an accelerator (COMTA),’ which is a system that integrates hardware intelligent processing accelerators and robot middleware. Herein, by constructing dedicated architecture digital circuits, field-programmable gate arrays (FPGAs) accelerate intelligent processing. In addition, the system can configure and access applications on hardware accelerators via a robot middleware space; consequently, robotic engineers do not require the knowledge of FPGAs. We conducted an experiment on the proposed system by utilizing a human-following application with image processing, which is commonly applied in the robots. Experimental results demonstrated that the proposed system can be automatically constructed from a single-configuration file on the robot middleware and can execute the application 5.2 times more efficiently than an ordinary PC. 相似文献
17.
18.
BIT试验中VME总线故障注入设备控制单元设计 总被引:1,自引:1,他引:1
针对航空电子设备BIT(机内测试)试验,设计了一种基于FPGA(现场可编程门阵列)的VME总线故障注入设备。该设备的控制单元用于完成故障注入设备的总体控制,是实现故障注入任务的关键。详细分析了VME总线故障注入设备的总体框架,给出了VME总线故障注入设备控制单元的设计方案,包括详细的软、硬件设计方法以及该系统的工作流程,并通过测试工具验证了控制单元设计和功能的正确性。最后,讨论了BIT试验中故障注入技术应用未来研究工作的开展方向。 相似文献
19.
神经网络参数量和运算量的扩大,使得在资源有限的硬件平台上流水线部署神经网络变得更加困难。基于此,提出了一种解决深度学习模型在小型边缘计算平台上部署困难的方法。该方法基于应用于自定义数据集的深度可分离网络模型,在软件端使用迁移学习、敏感度分析和剪枝量化的步骤进行模型压缩,在硬件端分析并设计了适用于有限资源FPGA的流水线硬件加速器。实验结果表明,经过软件端的网络压缩优化,这种量化部署模型具有94.60%的高准确率,16.64 M的较低的单次推理定点数运算量和0.079 M的参数量。此外,经过硬件资源优化后,在国产FPGA开发板上进行流水线部署,推理帧率达到了366 FPS,计算能效为8.57 GOPS/W。这一研究提供了一种在小型边缘计算平台上高性能部署深度学习模型的解决方案。 相似文献
20.
针对现有的哈希算法硬件架构仅实现少量几种算法的问题,设计了一种可实现SM3,MD5,SHA-1以及SHA-2系列共7种哈希算法的可重构IP,以满足同一系统对安全性可选择的需求。通过分析各哈希算法及其运算逻辑的相似性,该设计最大化地重用加法器和寄存器,极大地减少了总的实现面积。此外,该设计灵活可配,可以对内存直接存取。以Altera的Stratix II为FPGA目标器件,其最高频率可达100 MHz,总面积较现有设计减少26.7%以上,且各算法单位面积吞吐率均优于现有设计。 相似文献