首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
Nowadays the development of automatic biometrics-based personal recognition systems is a reality in the current technological age. Not only those applications demanding stringent security levels but also many daily use consumer applications request the existence of high performance computational platforms in charge of recognizing the identity of an individual based on the analysis of his/her physiological or behavioural characteristics. The state of the art points out two main open problems in the implementation of such automatic applications: on the one hand, the needed improvement of the reliability level of the existing recognition systems in terms of accuracy, security and real-time performances; on the other hand, the cost reduction of those physical platforms in charge of the processing.This work addresses those limitations of current systems and aims at finding the proper system architecture to develop this kind of high-performance applications at low cost. Because of that, those existing solutions based on expensive multiprocessor systems like HPC (High Performance Computer), GPU (Graphics Processing Unit), or PC (Personal Computer) platforms need to be discarded, and instead of them embedded system solutions based on programmable logic devices are suggested in this work. The programmability performances of FPGA (Field Programmable Gate Array) devices together with the inherent parallelism of hardware design provide the needed flexibility to develop made-to-measure coprocessors in charge of accelerating those time-critical computational tasks. To address the cost of the system, dynamically reconfigurable FPGAs are suggested in this work. The scheduling of the recognition application into a series of mutually exclusive tasks, and the reutilization of those functional resources available in the FPGA by multiplexing different coprocessors in the same area along the application execution time allows reducing the size of the device and therefore its cost at the expense of the reconfiguration overhead.The hardware-software co-design of an AFAS (automatic fingerprint-based authentication system) under two different run-time reconfigurable platforms is presented as the proof of concept of the suggested architecture. The outstanding results achieved in this work pave the way for the implementation of biometric applications by means of run-time reconfigurable FPGAs.  相似文献   

2.
There are many design challenges in the hardware-software co-design approach for performance improvement of data-intensive streaming applications with a general-purpose microprocessor and a hardware accelerator. These design challenges are mainly to prevent hardware area fragmentation to increase resource utilization, to reduce hardware reconfiguration cost and to partition and schedule the tasks between the microprocessor and the hardware accelerator efficiently for performance improvement and power savings of the applications.In this paper a modular and block based hardware configuration architecture named memory-aware run-time reconfigurable embedded system (MARTRES) is proposed for efficient resource management and performance improvement of streaming applications. Subsequently we design a task placement algorithm named hierarchical best fit ascending (HBFA) algorithm to prove that MARTRES configuration architecture is very efficient in increased resource utilization and flexible in task mapping and power savings. The time complexity of HBFA algorithm is reduced to O(n) compared to traditional Best Fit (BF) algorithm’s time complexity of O(n2), when the quality of the placement solution by HBFA is better than that of BF algorithm. Finally we design an efficient task partitioning and scheduling algorithm named balanced partitioned and placement-aware partitioning and scheduling algorithm (BPASA). In BPASA we exploit the temporal parallelism in streaming applications to reduce reconfiguration cost of the hardware, while keeping in mind the required throughput of the output data. We balance the exploitation of spatial parallelism and temporal parallelism in streaming applications by considering the reconfiguration cost vs. the data transfer cost. The scheduler refers to the HBFA placement algorithm to check whether contiguous area on FPGA is available before scheduling the task for HW or for SW.  相似文献   

3.
张宇  冯丹 《计算机科学》2010,37(5):274-277
由于应用种类、实时性以及处理效率等要求,高性能嵌入式计算硬件平台需要具备相当的计算能力以及一定的适应性。为此提出了一种基于Xilinx FPGA的动态可重构的片上系统设计方案。系统采用专用硬件来执行计算密集型任务,运用动态可重构技术来支持硬件处理模块功能的动态配置。研究了Xilinx可编程片上系统上的3种硬件加速方案:CPU协处理器、PLB扩展加速器和MPMC扩展加速器。实验数据表明MPMC加速器性能最优。在Vir-tex5 FPGA器件上实现了可动态重构的MPMC加速器,以128位AES加密、解密两个功能模块为例,从硬件资源占用率、重构延时等角度考察了可重构系统的特点。  相似文献   

4.
Hardware/software co-design for particle swarm optimization algorithm   总被引:1,自引:0,他引:1  
This paper presents a hardware/software (HW/SW) co-design approach using SOPC technique and pipeline design method to improve design flexibility and execution performance of particle swarm optimization (PSO) for embedded applications. Based on modular design architecture, a Particle Updating Accelerator module via hardware implementation for updating velocity and position of particles and a Fitness Evaluation module implemented either on a soft-cored processor or Field Programmable Gate Array (FPGA) for evaluating the objective functions are respectively designed to work closely together to carry out the evolution process at different design stages. Thanks to the design flexibility, the proposed approach can tackle various optimization problems of embedded applications without the need for hardware redesign. To further improve the execution performance of the PSO, a hardware random number generator (RNG) is also designed in this paper in addition to a particle re-initialization scheme to promote exploration search during the optimization process. Experimental results have demonstrated that the proposed HW/SW co-design approach for PSO algorithms has good efficiency for obtaining high-quality solutions for embedded applications.  相似文献   

5.
K.  L.  B.  I. 《Computers & Electrical Engineering》2007,33(5-6):324-332
It is a challenge to implement large word length public-key algorithms on embedded systems. Examples are smartcards, RF-ID tags and mobile terminals. This paper presents a HW/SW co-design solution for RSA and Elliptic Curve Cryptography (ECC) over GF(p) on a 12 MHz 8-bit 8051 micro-controller. The hardware coprocessor has a Modular Arithmetic Logic Unit (MALU) of which the digit size (d) is variable. It can be adapted to the speed and bandwidth of the micro-controller to which it is connected. The HW/SW co-design space exploration is based on the GEZEL system-level design environment. It allows the designer to find the best performance-area combination for the digit size. As a case study of an FPGA prototyping, 160-bit ECC over GF(p) (ECC-160p) was implemented on Xilinx Virtex-II PRO (XC2VP30). The results show that one point multiplication takes only 130 ms including all communications between the 8051 and the coprocessor. The performance is 40 times faster than the most optimized SW implementation on a small CPU in literature. This is achieved by the HW/SW co-design exploration in order to find the optimized digit size of the MALU. On the other hand, the design of ECC-160p maintains a high level of flexibility by using coprocessor instructions. Our proposed architecture proves that HW/SW co-design provides a high performance close to ASIC solutions with a flexible feature of SW even on a small CPU.  相似文献   

6.
7.
CPU/FPGA混合架构是可重构计算的普遍结构,为了简化混合架构上FPGA的使用,提出了一种硬件线程方法,并设计了硬件线程的执行机制,以硬件线程的方式使用可重构资源.同时,软硬件线程可以通过共享数据存储方式进行多线程并行执行,将程序中计算密集部分以FPGA上的硬件线程方式执行,而控制密集部分则以CPU上的软件线程方式执行.在Simics仿真软件模拟的混合架构平台上,对DES,MD5SUM和归并排序算法进行软硬件多线程改造后的实验结果表明,平均执行加速比达到了2.30,有效地发挥了CPU/FPGA混合架构的计算性能.  相似文献   

8.
9.
提出了一种专用指令处理器的软硬件协同设计方法,该方法可以在设计的早期阶段对处理器进行系统探索和验证.根据椭圆曲线密码算法的特点,并按照专用指令处理器的设计原则,以椭圆曲线密码运算基本操作及运算存储特性为基础,设计了超长指令字ECC专用指令处理器的指令集结构模型.根据处理器的指令集结构模型,以指令模拟器为基础,搭建了处理器的软硬件协同验证平台,从系统设计、RTL描述和FPGA硬件原型3个不同层次对处理器进行了验证.  相似文献   

10.
This paper describes the development of efficient hardware/software (HW/SW) neuro-fuzzy systems. The model used in this work consists of an adaptive neuro-fuzzy inference system modified for efficient HW/SW implementation. The design of two different on-chip approaches are presented: a high-performance parallel architecture for offline training and a pipelined architecture suitable for online parameter adaptation. Details of important aspects concerning the design of HW/SW solutions are given. The proposed architectures have been implemented using a system-on-a-programmable-chip. The device contains an embedded-processor core and a large field programmable gate array (FPGA). The processor provides flexibility and high precision to implement the learning algorithms, while the FPGA allows the development of high-speed inference architectures for real-time embedded applications.  相似文献   

11.
This paper describes the implementation of a reconfigurable hardware-based genetic algorithm (HGA) accelerator using the hardware-software (HW/SW) co-design methodology. This HGA is coupled with a unique TRNG that extracts random jitters from a phase lock loop (PLL) to ensure proper GA operation. It is then applied and benchmarked with several case studies, which include the optimization of a simple fitness function, a constrained Michalewicz function, and the tuning of parameters in finger-vein biometrics. A HGA solution is necessary in systems that demand high performance during the optimization process. However, implementations that are completely designed in hardware will result in a very rigid architecture, making it difficult to reconfigure the system for use in different applications. This paper aims to solve this issue by proposing a HGA design that provides reconfigurability and flexibility by moving problem-dependent processes into software. The prototyping platform used is an Altera Stratix II EP2S60 FPGA prototyping board with a clock frequency of 50 MHz. The HW/SW co-design technique is applied, and system partitioning is done based on aspects such as system constraints, operational intensity, process sequencing, hardware logic utilization, and reconfigurability. Experimental results show that the proposed HGA outperforms equivalent software implementations compiled with an open-sourced C++ GA component library (GAlib) running on the same prototyping platform by 102 times at most. In the final case study, the application of the proposed HGA in tunable parameter optimization in finger-vein biometrics improved the matching rate, reducing the equal error rate (EER) value from 1.004% down to 0.101%.  相似文献   

12.
13.
软硬件划分是软硬件协同设计的关键环节,划分的结果直接影响目标系统的设计质量。因此,对于一个给定的应用程序,为了使得目标系统快速执行且成本低廉,合理的划分策略十分重要。由于单个任务具有多种不同的硬件实现方式,与传统的单一硬件实现方式的软硬件划分问题相比,多选择的软硬件划分更能客观地反映现实应用。这导致问题的求解更具挑战性,它们已被证明是NP完全问题。基于多核处理器片上系统并针对任务图为二叉树的应用,建立了多选择软硬件划分问题的计算模型,并提出了解决该问题的动态规划算法。实验结果表明,当问题规模适中时,所提动态规划算法能够有效地获得精确解,并展示了算法的计算能力与硬件面积限制之间的关系。  相似文献   

14.
动态部分重配置及其FPGA实现   总被引:2,自引:1,他引:2       下载免费PDF全文
李涛  刘培峰  杨愚鲁 《计算机工程》2006,32(14):224-226
动态部分重配置充分利用了FPGA芯片提供的可重配置功能,提高了FPGA芯片的利用率,减小了FPGA芯片的配置时间,有效地提高了系统的整体性能。该文介绍了动态部分重配置的两种实现方法,并在Spartan-II FPGA上进行了验证。  相似文献   

15.
基于NSGA-II的嵌入式系统软硬件划分方法   总被引:2,自引:0,他引:2  
软硬件划分是软硬件协同设计中的一个关键问题。针对单处理器嵌入式系统,提出将NSGA-II应用于软硬件划分中,该算法一次运行可以获得多个Pareto最优解,为各个目标函数之间权衡分析提供了有效的工具,提高了设计效率。结果表明,通过该划分方法,在满足系统性能要求下,可为复杂嵌入式系统提供多个设计目标的全局优化方案。  相似文献   

16.
This paper presents the development of a neuro-fuzzy agent for ambient-intelligence environments. The agent has been implemented as a system-on-chip (SoC) on a reconfigurable device, i.e., a field-programmable gate array. It is a hardware/software (HW/SW) architecture developed around a MicroBlaze processor (SW partition) and a set of parallel intellectual property cores for neuro-fuzzy modeling (HW partition). The SoC is an autonomous electronic device able to perform real-time control of the environment in a personalized and adaptive way, anticipating the desires and needs of its inhabitants. The scheme used to model the intelligent agent is a particular class of an adaptive neuro-fuzzy inference system with piecewise multilinear behavior. The main characteristics of our model are computational efficiency, scalability, and universal approximation capability. Several online experiments have been performed with data obtained in a real ubiquitous computing environment test bed. Results obtained show that the SoC is able to provide high-performance control and adaptation in a life-long mode while retaining the modeling capabilities of similar agent-based approaches implemented on larger computing machines.  相似文献   

17.
The dynamic partial reconfiguration technology of FPGA has made it possible to adapt system functionalities at run-time to changing environment conditions. However, this new dimension of dynamic hardware reconfigurability has rendered existing CAD tools and platforms incapable of efficiently exploring the design space. As a solution, we proposed a novel UML-based hardware/software co-design platform (UCoP) targeting at dynamically partially reconfigurable network security systems (DPRNSS). Computation-intensive network security functions, implemented as reconfigurable hardware functions, can be configured on-demand into a DPRNSS at run-time. Thus, UCoP not only supports dynamic adaptation to different environment conditions, but also increases hardware resource utilization. UCoP supports design space exploration for reconfigurable systems in three folds. Firstly, it provides reusable models of typical reconfigurable systems that can be customized according to user applications. Secondly, UCoP provides a partially reconfigurable hardware task template, using which users can focus on their hardware designs without going through the full partial reconfiguration flow. Thirdly, UCoP provides direct interactions between UML system models and real reconfigurable hardware modules, thus allowing accurate time measurements. Compared to the existing lower-bound and synthesis-based estimation methods, the accurate time measurements using UCoP at a high abstraction level can more efficiently reduce the system development efforts.  相似文献   

18.
With the development of the design complexity in embedded systems, hardware/software (HW/SW) partitioning becomes a challenging optimization problem in HW/SW co-design. A novel HW/SW partitioning method based on position disturbed particle swarm optimization with invasive weed optimization (PDPSO-IWO) is presented in this paper. It is found by biologists that the ground squirrels produce alarm calls which warn their peers to move away when there is potential predatory threat. Here, we present PDPSO algorithm, in each iteration of which the squirrel behavior of escaping from the global worst particle can be simulated to increase population diversity and avoid local optimum. We also present new initialization and reproduction strategies to improve IWO algorithm for searching a better position, with which the global best position can be updated. Then the search accuracy and the solution quality can be enhanced. PDPSO and improved IWO are synthesized into one single PDPSO-IWO algorithm, which can keep both searching diversification and searching intensification. Furthermore, a hybrid NodeRank (HNodeRank) algorithm is proposed to initialize the population of PDPSO-IWO, and the solution quality can be enhanced further. Since the HW/SW communication cost computing is the most time-consuming process for HW/SW partitioning algorithm, we adopt the GPU parallel technique to accelerate the computing. In this way, the runtime of PDPSO-IWO for large-scale HW/SW partitioning problem can be reduced efficiently. Finally, multiple experiments on benchmarks from state-of-the-art publications and large-scale HW/SW partitioning demonstrate that the proposed algorithm can achieve higher performance than other algorithms.  相似文献   

19.
20.
Reconstructive signal processing algorithms encompass a broad spectrum of computational methods. Fortunately, most of the methods fall into the classes of the matrix algebraic calculations, convolution, or transform type algorithms. These algorithms possess common properties such as regularity, locality and recursiveness. Considering such general class of reconstructive signal processing (SP) techniques, in this paper we propose a new Hardware/Software (HW/SW) co-design paradigm for the implementation of reconstructive SP algorithms via efficient systolic arrays integrated as digital SP coprocessors units. In particular, the selected matrix–matrix and matrix–vector multiplication algorithms are implemented in a systolic computing fashion that meets the real time SP system requirements when employing the developed Hardware/Software Co-Design method oriented at the use of a Xilinx Field Programmable Gate Array (FPGA) XC4VSX35-10ff668.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号