首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper presents the development of a neuro-fuzzy agent for ambient-intelligence environments. The agent has been implemented as a system-on-chip (SoC) on a reconfigurable device, i.e., a field-programmable gate array. It is a hardware/software (HW/SW) architecture developed around a MicroBlaze processor (SW partition) and a set of parallel intellectual property cores for neuro-fuzzy modeling (HW partition). The SoC is an autonomous electronic device able to perform real-time control of the environment in a personalized and adaptive way, anticipating the desires and needs of its inhabitants. The scheme used to model the intelligent agent is a particular class of an adaptive neuro-fuzzy inference system with piecewise multilinear behavior. The main characteristics of our model are computational efficiency, scalability, and universal approximation capability. Several online experiments have been performed with data obtained in a real ubiquitous computing environment test bed. Results obtained show that the SoC is able to provide high-performance control and adaptation in a life-long mode while retaining the modeling capabilities of similar agent-based approaches implemented on larger computing machines.  相似文献   

2.
3.
4.
提出了一种专用指令处理器的软硬件协同设计方法,该方法可以在设计的早期阶段对处理器进行系统探索和验证.根据椭圆曲线密码算法的特点,并按照专用指令处理器的设计原则,以椭圆曲线密码运算基本操作及运算存储特性为基础,设计了超长指令字ECC专用指令处理器的指令集结构模型.根据处理器的指令集结构模型,以指令模拟器为基础,搭建了处理器的软硬件协同验证平台,从系统设计、RTL描述和FPGA硬件原型3个不同层次对处理器进行了验证.  相似文献   

5.
The increasing use of images in miscellaneous applications such as medical image analysis and visual quality inspection has led to growing interest in image processing. However, images are often contaminated with noise which may corrupt any of the following image processing steps. Therefore, noise filtering is often a necessary preprocessing step for the most image processing applications. Thus, in this paper an optimized field-programmable gate array (FPGA) design is proposed to implement the adaptive vector directional distance filter (AVDDF) in hardware/software (HW/SW) codesign context for removing noise from the images in real-time. For that, the high-level synthesis (HLS) flow is used through the Xilinx Vivado HLS tool to reduce the design complexity of the HW part. The SW part is developed based on C/C++ programming language and executed on an advanced reduced instruction set computer (RISC) machines (ARM) Cortex-A53 processor. The communication between the SW and HW parts is achieved using the advanced extensible Interface stream (AXI-stream) interface to increase the data bandwidth. The experiment results on the Xilinx ZCU102 FPGA board show an improvement in processing time of the AVDDF filter by 98% for the HW/SW implementation relative to the SW implementation. This result is given for the same quality of image between the HW/SW and SW implementations in terms of the normalized color difference (NCD) and the peak signal to noise ratio (PSNR).  相似文献   

6.
K.  L.  B.  I. 《Computers & Electrical Engineering》2007,33(5-6):324-332
It is a challenge to implement large word length public-key algorithms on embedded systems. Examples are smartcards, RF-ID tags and mobile terminals. This paper presents a HW/SW co-design solution for RSA and Elliptic Curve Cryptography (ECC) over GF(p) on a 12 MHz 8-bit 8051 micro-controller. The hardware coprocessor has a Modular Arithmetic Logic Unit (MALU) of which the digit size (d) is variable. It can be adapted to the speed and bandwidth of the micro-controller to which it is connected. The HW/SW co-design space exploration is based on the GEZEL system-level design environment. It allows the designer to find the best performance-area combination for the digit size. As a case study of an FPGA prototyping, 160-bit ECC over GF(p) (ECC-160p) was implemented on Xilinx Virtex-II PRO (XC2VP30). The results show that one point multiplication takes only 130 ms including all communications between the 8051 and the coprocessor. The performance is 40 times faster than the most optimized SW implementation on a small CPU in literature. This is achieved by the HW/SW co-design exploration in order to find the optimized digit size of the MALU. On the other hand, the design of ECC-160p maintains a high level of flexibility by using coprocessor instructions. Our proposed architecture proves that HW/SW co-design provides a high performance close to ASIC solutions with a flexible feature of SW even on a small CPU.  相似文献   

7.
This work aims to pave the way for an efficient open system architecture applied to embedded electronic applications to manage the processing of computationally complex algorithms at real-time and low-cost. The target is to define a standard architecture able to enhance the performance-cost trade-off delivered by other alternatives nowadays in the market like general-purpose multi-core processors. Our approach, sustained by hardware/software (HW/SW) co-design and run-time reconfigurable computing, is synthesizable in SRAM-based programmable logic. As proof-of-concept, a run-time partially reconfigurable field-programmable gate array (FPGA) is addressed to carry out a specific application of high-demanding computational power such as an automatic fingerprint authentication system (AFAS). Biometric personal recognition is a good example of compute-intensive algorithm composed of a series of image processing tasks executed in a sequential order. In our pioneer conception, these tasks are partitioned and synthesized first in a series of coprocessors that are then instantiated and executed multiplexed in time on a partially reconfigurable region of the FPGA. The implementation benchmark of the AFAS either as a pure software approach on a PC platform under a dual-core processor (Intel Core 2 Duo T5600 at 1.83 GHz) or as a reconfigurable FPGA co-design (identical algorithm partitioned in HW/SW tasks operating at 50 or 100 MHz on the second smallest device of the Xilinx Virtex-4 LX family) highlights a speed-up of one order of magnitude in favor of the FPGA alternative. These results let point out biometric recognition as a sensible killer application for run-time reconfigurable computing, mainly in terms of efficiently balancing computational power, functional flexibility and cost. Such features, reached through partial reconfiguration, are easily portable today to a broad range of embedded applications with identical system architecture.  相似文献   

8.
The high efficiency video coding (HEVC) standard shows enhanced video compression efficiency at the cost of high performance requirements. To address these requirements different approaches, like algorithmic optimization, parallelization and hardware acceleration can be used leading to a complex design space. In order to find an efficient solution, early design verification and performance evaluation is crucial. Hereby the prevailing methodology is the simulation of the complex HW/SW architecture. Targeting heterogeneous designs, different simulation models have different performance evaluation capabilities making a combined HW/SW co-analysis of the entire system a cumbersome task. To facilitate this co-analysis, we propose a non-intrusive instrumentation methodology for simulation models, which automatically adapts to the model under observation.With the help of this instrumentation methodology we perform the analysis and exploration of different design aspects of a SystemC-based heterogeneous multi-core model of an HEVC intra encoder. In the course of this HW/SW co-analysis various aspects of the parallelization and hardware acceleration of the video coding algorithms are presented and further improved. Due to its cycle accurate nature the developed model is well suited to facilitate various performance evaluations and to drive HW/SW co-optimizations of the explored system, as discussed in this paper.  相似文献   

9.
10.
Hardware/software co-design for particle swarm optimization algorithm   总被引:1,自引:0,他引:1  
This paper presents a hardware/software (HW/SW) co-design approach using SOPC technique and pipeline design method to improve design flexibility and execution performance of particle swarm optimization (PSO) for embedded applications. Based on modular design architecture, a Particle Updating Accelerator module via hardware implementation for updating velocity and position of particles and a Fitness Evaluation module implemented either on a soft-cored processor or Field Programmable Gate Array (FPGA) for evaluating the objective functions are respectively designed to work closely together to carry out the evolution process at different design stages. Thanks to the design flexibility, the proposed approach can tackle various optimization problems of embedded applications without the need for hardware redesign. To further improve the execution performance of the PSO, a hardware random number generator (RNG) is also designed in this paper in addition to a particle re-initialization scheme to promote exploration search during the optimization process. Experimental results have demonstrated that the proposed HW/SW co-design approach for PSO algorithms has good efficiency for obtaining high-quality solutions for embedded applications.  相似文献   

11.
12.
13.
随着硬件系统和软件技术的发展,对片上网络多处理系统的研究进入了交叉研究状态,但是关于实际软件应用与硬件平台结合的研究尚有些不足.本文讲述了基于实际硬件平台的片上网络系统实现.通过FPGA平台实现了一个通用片上网络系统,并通过多任务映射方法将两个多媒体应用程序映射在片上网络系统中,实现了软件多任务与硬件片上网络多处理系统的合理结合.  相似文献   

14.
钟丽  刘彦  余思洋  谢中 《计算机应用》2015,35(5):1412-1416
针对现有的椭圆曲线算法系统级设计中开发周期长,以及不同模块的性能开销指标不明确等问题,提出一种基于电子系统级(ESL)设计的软硬件(HW/SW)协同设计方法.该方法通过分析SM2(ShangMi2)算法原理与实现方式,研究了不同的软硬件划分方案,并采用统一建模语言SystemC对硬件模块进行周期精确级建模.通过模块级与系统级两层验证比较软硬件模块执行周期数,得出最佳性能划分方式.最后结合算法控制流程图(CFG)与数据流程图(DFG)将ESL模型转化为寄存器传输级(RTL)模型进行逻辑综合与比较,得出在180 nm CMOS工艺,50 MHz频率下,当算法性能最佳时,点乘模块执行时间为20 ms,门数83 000,功耗约2.23 mW.实验结果表明所提系统级架构分析对基于椭圆曲线类加密芯片在性能、面积与功耗的评估优势明显且适用性强,基于此算法的嵌入式系统芯片(SoC)可根据性能与资源限制选择合适的结构并加以应用.  相似文献   

15.
The BOAR emulation system is targeted to hardware/software (HW/SW) codevelopment of advanced embedded DSP and telecom systems. The challenge of the BOAR system is efficient customization of programmable hardware, and dedicated partitioning routine to target applications and structures, which allows quite high overall system performance. The system allows multiple configurations for communication between processors and field programmable gate arrays (FPGAs) making the BOAR system an efficient tool for real-time HW/SW coverification. The reprogrammable hardware of the emulation tool is based on four Xilinx 4000-series devices, two Texas TMS320C50 signal processors and one Motorola MC68302 microcontroller. With current devices the BOAR hardware provides approximately 40–70 kgates of logic capacity in DSP applications. The emulation capacity can be expanded by connecting several similar boards in chain. The system has also a versatile internal reprogrammable test environment for test bench development, performance evaluations and design debugging. The logic development environment is based on the Synopsys synthesis tools and an automatic design management software, which performs resource mapping and performance-driven design partitioning between FPGAs. The emulation hardware is currently connected to logic and software development environments via an RS-232C bus. The BOAR emulation system has been found a very efficient platform for real-life prototyping of different types of DSP algorithms and systems, and validating correct functionality of a VHDL macro library.  相似文献   

16.
Sequential Monte Carlo (SMC) represents a principal statistical method for tracking objects in video sequences by on-line estimation of the state of a non-linear dynamic system. The performance of individual stages of the SMC algorithm is usually data-dependent, making the prediction of the performance of a real-time capable system difficult and often leading to grossly overestimated and inefficient system designs. Also, the considerable computational complexity is a major obstacle when implementing SMC methods on purely CPU-based resource constrained embedded systems. In contrast, heterogeneous multi-cores present a more suitable implementation platform. We use hybrid CPU/FPGA systems, as they can efficiently execute both the control-centric sequential as well as the data-parallel parts of an SMC application. However, even with hybrid CPU/FPGA platforms, determining the optimal HW/SW partitioning is challenging in general, and even impossible with a design time approach. Thus, we need self-adaptive architectures and system software layers that are able to react autonomously to varying workloads and changing input data while preserving real-time constraints and area efficiency. In this article, we present a video tracking application modeled on top of a framework for implementing SMC methods on CPU/FPGA-based systems such as modern platform FPGAs. Based on a multithreaded programming model, our framework allows for an easy design space exploration with respect to the HW/SW partitioning. Additionally, the application can adaptively switch between several partitionings during run-time to react to changing input data and performance requirements. Our system utilizes two variants of a add/remove self-adaptation technique for task partitioning inside this framework that achieve soft real-time behavior while trying to minimize the number of active cores. To evaluate its performance and area requirements, we demonstrate the application and the framework on a real-life video tracking case study and show that partial reconfiguration can be effectively and transparently used for realizing adaptive real-time HW/SW systems.  相似文献   

17.
指令集模拟器是计算机体系结构研究和SoC软硬件协同设计的重要工具,模拟器的性能和灵活性是影响设计和验证效率的重要因素。解释型指令集模拟器具有很好的灵活性,在操作系统等涉及到自修改代码的模拟中具有不可替代的作用。该文给出了一个高性能解释型指令集模拟器的设计,它具有很高的模拟精度和很好的灵活性;同时指令集模拟器采用了动态译码缓存等优化技术,使其具有很高的模拟性能。以ARM7指令集模拟器为实例,所提出的优化技术同样适用于其它现心RISC体系结构。  相似文献   

18.
In this paper two different approaches to the design of a reconfigurable Tate pairing hardware accelerator are presented. The first uses macro components based on a large, fixed number of underlying Galois Field arithmetic units in parallel to minimise the computation time. The second is an area efficient approach based on a small, variable number of underlying components. Both architectures are prototyped on an FPGA. Timing results for each architecture with various different design parameters are presented.  相似文献   

19.
基于蓝牙技术的智能机器人传感器网络研究a   总被引:8,自引:0,他引:8  
蓝牙技术是当前最新的短程无线通信技术标准。今以蓝牙技术,构筑了机器人无线传感器网络,取代了电缆连接,给出了系统整体架构和网络模型,同时对系统软硬件设计与工作原理进行了分析。  相似文献   

20.
Cloud vendors are actively adopting FPGAs into their infrastructures for enhancing performance and efficiency. As cloud services continue to evolve, FPGA (field programmable gate array) systems would play an even important role in the future. In this context, FPGA sharing in multi-tenancy scenarios is crucial for the wide adoption of FPGA in the cloud. Recently, many works have been done towards effective FPGA sharing at different layers of the cloud computing stack.In this work, we provide a comprehensive survey of recent works on FPGA sharing. We examine prior art from different aspects and encapsulate relevant proposals on a few key topics. On the one hand, we discuss representative papers on FPGA resource sharing schemes; on the other hand, we also summarize important SW/HW techniques that support effective sharing. Importantly, we further analyze the system design cost behind FPGA sharing. Finally, based on our survey, we identify key opportunities and challenges of FPGA sharing in future cloud scenarios.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号