首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper describes the development of efficient hardware/software (HW/SW) neuro-fuzzy systems. The model used in this work consists of an adaptive neuro-fuzzy inference system modified for efficient HW/SW implementation. The design of two different on-chip approaches are presented: a high-performance parallel architecture for offline training and a pipelined architecture suitable for online parameter adaptation. Details of important aspects concerning the design of HW/SW solutions are given. The proposed architectures have been implemented using a system-on-a-programmable-chip. The device contains an embedded-processor core and a large field programmable gate array (FPGA). The processor provides flexibility and high precision to implement the learning algorithms, while the FPGA allows the development of high-speed inference architectures for real-time embedded applications.  相似文献   

2.
软硬件划分是软硬件协同设计的关键环节,划分的结果直接影响目标系统的设计质量。因此,对于一个给定的应用程序,为了使得目标系统快速执行且成本低廉,合理的划分策略十分重要。由于单个任务具有多种不同的硬件实现方式,与传统的单一硬件实现方式的软硬件划分问题相比,多选择的软硬件划分更能客观地反映现实应用。这导致问题的求解更具挑战性,它们已被证明是NP完全问题。基于多核处理器片上系统并针对任务图为二叉树的应用,建立了多选择软硬件划分问题的计算模型,并提出了解决该问题的动态规划算法。实验结果表明,当问题规模适中时,所提动态规划算法能够有效地获得精确解,并展示了算法的计算能力与硬件面积限制之间的关系。  相似文献   

3.
With the development of the design complexity in embedded systems, hardware/software (HW/SW) partitioning becomes a challenging optimization problem in HW/SW co-design. A novel HW/SW partitioning method based on position disturbed particle swarm optimization with invasive weed optimization (PDPSO-IWO) is presented in this paper. It is found by biologists that the ground squirrels produce alarm calls which warn their peers to move away when there is potential predatory threat. Here, we present PDPSO algorithm, in each iteration of which the squirrel behavior of escaping from the global worst particle can be simulated to increase population diversity and avoid local optimum. We also present new initialization and reproduction strategies to improve IWO algorithm for searching a better position, with which the global best position can be updated. Then the search accuracy and the solution quality can be enhanced. PDPSO and improved IWO are synthesized into one single PDPSO-IWO algorithm, which can keep both searching diversification and searching intensification. Furthermore, a hybrid NodeRank (HNodeRank) algorithm is proposed to initialize the population of PDPSO-IWO, and the solution quality can be enhanced further. Since the HW/SW communication cost computing is the most time-consuming process for HW/SW partitioning algorithm, we adopt the GPU parallel technique to accelerate the computing. In this way, the runtime of PDPSO-IWO for large-scale HW/SW partitioning problem can be reduced efficiently. Finally, multiple experiments on benchmarks from state-of-the-art publications and large-scale HW/SW partitioning demonstrate that the proposed algorithm can achieve higher performance than other algorithms.  相似文献   

4.
K.  L.  B.  I. 《Computers & Electrical Engineering》2007,33(5-6):324-332
It is a challenge to implement large word length public-key algorithms on embedded systems. Examples are smartcards, RF-ID tags and mobile terminals. This paper presents a HW/SW co-design solution for RSA and Elliptic Curve Cryptography (ECC) over GF(p) on a 12 MHz 8-bit 8051 micro-controller. The hardware coprocessor has a Modular Arithmetic Logic Unit (MALU) of which the digit size (d) is variable. It can be adapted to the speed and bandwidth of the micro-controller to which it is connected. The HW/SW co-design space exploration is based on the GEZEL system-level design environment. It allows the designer to find the best performance-area combination for the digit size. As a case study of an FPGA prototyping, 160-bit ECC over GF(p) (ECC-160p) was implemented on Xilinx Virtex-II PRO (XC2VP30). The results show that one point multiplication takes only 130 ms including all communications between the 8051 and the coprocessor. The performance is 40 times faster than the most optimized SW implementation on a small CPU in literature. This is achieved by the HW/SW co-design exploration in order to find the optimized digit size of the MALU. On the other hand, the design of ECC-160p maintains a high level of flexibility by using coprocessor instructions. Our proposed architecture proves that HW/SW co-design provides a high performance close to ASIC solutions with a flexible feature of SW even on a small CPU.  相似文献   

5.
基于NSGA-II的嵌入式系统软硬件划分方法   总被引:2,自引:0,他引:2  
软硬件划分是软硬件协同设计中的一个关键问题。针对单处理器嵌入式系统,提出将NSGA-II应用于软硬件划分中,该算法一次运行可以获得多个Pareto最优解,为各个目标函数之间权衡分析提供了有效的工具,提高了设计效率。结果表明,通过该划分方法,在满足系统性能要求下,可为复杂嵌入式系统提供多个设计目标的全局优化方案。  相似文献   

6.
硬件的强大处理能力及软件的灵活性和可编程性,使得视频解码芯片的结构从硬件转向软硬件分区结构.作为新兴的标准,AVS视频标准对解码器的软硬件分区结构提出新的挑战.从AVS视频标准算法和实现复杂度入手,提出一种AVS高清视频解码器软硬件分区结构,实现满足基准档次6.0级别的AVS高清视频码流的实时解码,支持灵活的音视频同步、错误恢复、缓冲区管理和系统控制机制.已经在AVS101芯片上实现,硬件采用7阶宏块级同步流水,软件任务在RISC处理器上实现,可以在148.5MHz工作频率下对NTSC,PAL,720p(60f/s),直至1080i(60field/s)节目的实时解码显示.  相似文献   

7.
Decoding high-quality videos in real-time is becoming more and more difficult with the increasing resolution. In this paper, a novel hardware/software (HW/SW) partitioning is proposed with powerful SIMD (single instruction multiple data) instructions for the real-time AVS video decoder. Since most key functions that need large amounts of computations are optimized by SIMD instead of hardware, the distribution of workload between hardware and software is balanceable, and the performance of the video decoder is improved. Besides, the generality and programmability are also maintained. The proposed method is implemented on a 32-bit dual-issue RISC processor with 256-bit vector extension. The experimental results of conformation AVS test sequences show that the video decoder system can support the real-time decoding of AVS 1080p videos at 30 fps, and improve performance over 100 times compared to the original processor without the proposed method. Moreover, this approach could be easily applied to other video decoders, such as H.264 and VC-1.  相似文献   

8.
9.
Efficient heuristic and tabu search for hardware/software partitioning   总被引:1,自引:0,他引:1  
Hardware/software (HW/SW) partitioning is a crucial step in HW/SW codesign that determines which components of the system are implemented on hardware and which ones on software. It has been proved that the HW/SW partitioning problem is NP-hard. In this paper, we present two approaches for HW/SW partitioning that aims to minimize the hardware cost while taking into account software and communication constraints. The first is a heuristic approach that treats the HW/SW partitioning problem as an extended 0–1 knapsack problem. In the second approach, tabu search is used to further improve the solution obtained from the proposed heuristic algorithm. Experimental results show that the proposed algorithms outperform a recently reported work by up to 28 %.  相似文献   

10.
The high efficiency video coding (HEVC) standard shows enhanced video compression efficiency at the cost of high performance requirements. To address these requirements different approaches, like algorithmic optimization, parallelization and hardware acceleration can be used leading to a complex design space. In order to find an efficient solution, early design verification and performance evaluation is crucial. Hereby the prevailing methodology is the simulation of the complex HW/SW architecture. Targeting heterogeneous designs, different simulation models have different performance evaluation capabilities making a combined HW/SW co-analysis of the entire system a cumbersome task. To facilitate this co-analysis, we propose a non-intrusive instrumentation methodology for simulation models, which automatically adapts to the model under observation.With the help of this instrumentation methodology we perform the analysis and exploration of different design aspects of a SystemC-based heterogeneous multi-core model of an HEVC intra encoder. In the course of this HW/SW co-analysis various aspects of the parallelization and hardware acceleration of the video coding algorithms are presented and further improved. Due to its cycle accurate nature the developed model is well suited to facilitate various performance evaluations and to drive HW/SW co-optimizations of the explored system, as discussed in this paper.  相似文献   

11.
软硬件划分问题是软硬件协同设计的重要问题之一,它涉及到系统建模,划分算法和划分方案评价等问题,其中划分算法设计是关键点。以提高系统时间性能为目标,利用任务流图构造系统模型,在其上实现了基于优先权的评价函数,提出了搜索空间平滑技术与离散粒子群算法相结合的软硬件划分算法,并且解决了两者的融合问题,并能根据系统信息动态适应调整算法参数。实验结果表明,算法时间开销稳定,求解质量较高。  相似文献   

12.
面向UMPC的北大众志-SK系统芯片设计   总被引:4,自引:0,他引:4  
如何更好地满足3C融合的需求,是超便携个人计算机(UMPC)普及的关键.北大众志-SK系统芯片,将传统个人计算机中分布在主板上的中央处理器、北桥与南桥芯片组、显示控制器和其它输入输出控制设备等众多芯片的功能集成到单一芯片中.该系统芯片采用2D/3D扩展指令、软硬协同视频解码加速部件、硬件视频编解码等方式,在高效完成多媒体处理的前提下,有效降低了对中央处理器性能的需求.通过在单芯片内部实现多层次的存储架构,简化了数据的传输路径,提高了数据传输的效率,从而提高系统性能.此外,在该系统芯片中还实现了众多主流的输入输出接口控制部件,以满足个人计算机的日常应用需求.该设计达到了高集成度、高性能、低功耗的设计目标,提供了面向教育、电子政务和个人信息处理等领域的低成本、低功耗、易使用、便于维护的UMPC解决方案.  相似文献   

13.
Hardware/software partitioning is an essential step in hardware/software co-design. For large size problems, it is difficult to consider both solution quality and time. This paper presents an efficient GPU-based parallel tabu search algorithm (GPTS) for HW/SW partitioning. A single GPU kernel of compacting neighborhood is proposed to reduce the amount of GPU global memory accesses theoretically. A kernel fusion strategy is further proposed to reduce the amount of GPU global memory accesses of GPTS. To further minimize the transfer overhead of GPTS between CPU and GPU, an optimized transfer strategy for GPU-based tabu evaluation is proposed, which considers that all the candidates do not satisfy the given constraint. Experiments show that GPTS outperforms state-of-the-art work of tabu search and is competitive with other methods for HW/SW partitioning. The proposed parallelization is significant when considering the ordinary GPU platform.  相似文献   

14.
This paper describes the implementation of a reconfigurable hardware-based genetic algorithm (HGA) accelerator using the hardware-software (HW/SW) co-design methodology. This HGA is coupled with a unique TRNG that extracts random jitters from a phase lock loop (PLL) to ensure proper GA operation. It is then applied and benchmarked with several case studies, which include the optimization of a simple fitness function, a constrained Michalewicz function, and the tuning of parameters in finger-vein biometrics. A HGA solution is necessary in systems that demand high performance during the optimization process. However, implementations that are completely designed in hardware will result in a very rigid architecture, making it difficult to reconfigure the system for use in different applications. This paper aims to solve this issue by proposing a HGA design that provides reconfigurability and flexibility by moving problem-dependent processes into software. The prototyping platform used is an Altera Stratix II EP2S60 FPGA prototyping board with a clock frequency of 50 MHz. The HW/SW co-design technique is applied, and system partitioning is done based on aspects such as system constraints, operational intensity, process sequencing, hardware logic utilization, and reconfigurability. Experimental results show that the proposed HGA outperforms equivalent software implementations compiled with an open-sourced C++ GA component library (GAlib) running on the same prototyping platform by 102 times at most. In the final case study, the application of the proposed HGA in tunable parameter optimization in finger-vein biometrics improved the matching rate, reducing the equal error rate (EER) value from 1.004% down to 0.101%.  相似文献   

15.
王璞  武继刚 《计算机科学》2012,39(1):290-294
软硬件划分是软硬件协同设计的关键环节,它决定系统中哪些组件由软件实现,哪些由硬件实现。软硬件划分问题已被证明是NP完全问题。将一类软硬件划分问题看作变异的0-1背包问题,在求解背包问题的算法基础上构造出软硬件划分问题的优质启发解。此外,采用禁忌搜索(Tabu Search)算法对求得的启发解进行改进,在软件开销和通信开销满足一定约束的条件下,使得硬件开销尽可能小。实验结果证明,所提算法对当前最新算法的改进最大可达到28%。  相似文献   

16.
17.
软硬件划分是嵌入式系统设计的高层抽象环节中最重要的关键步骤之一.在某些数据相关的应用领域中,划分环境是动态变化的,因此我们提出了一种解决动态软硬件划分的方法.这种方法基于一种名为DQCGA的演化算法.DQCGA算法受自然界中对称和互补机制的启发,操纵一对互补的概率向量来适应动态变化的环境.我们系统地完成了建模,动态环境定义等环节,然后通过和已有方法的比较,有针对性地设计了实验.试验结果很好地证明了该方法对于解决软硬件划分问题的可行性和有效性,并且较之以往的方法有着更好的表现.  相似文献   

18.
Tracking systems are important in computervision, with applications in surveillance, human computer interaction, etc. Consumer graphics processing units (GPUs) have experienced an extraordinary evolution in both computing performance and programmability, leading to greater use of the GPU for non-rendering applications. In this work we propose a real-time object tracking algorithm, based on the hybridization of particle filtering (PF) and a multi-scale local search (MSLS) algorithm, presented for both CPU and GPU architectures. The developed system provides successful results in precise tracking of single and multiple targets in monocular video, operating in real-time at 70 frames per second for 640 × 480 video resolutions on the GPU, up to 1,100% faster than the CPU version of the algorithm.  相似文献   

19.
The increasing use of images in miscellaneous applications such as medical image analysis and visual quality inspection has led to growing interest in image processing. However, images are often contaminated with noise which may corrupt any of the following image processing steps. Therefore, noise filtering is often a necessary preprocessing step for the most image processing applications. Thus, in this paper an optimized field-programmable gate array (FPGA) design is proposed to implement the adaptive vector directional distance filter (AVDDF) in hardware/software (HW/SW) codesign context for removing noise from the images in real-time. For that, the high-level synthesis (HLS) flow is used through the Xilinx Vivado HLS tool to reduce the design complexity of the HW part. The SW part is developed based on C/C++ programming language and executed on an advanced reduced instruction set computer (RISC) machines (ARM) Cortex-A53 processor. The communication between the SW and HW parts is achieved using the advanced extensible Interface stream (AXI-stream) interface to increase the data bandwidth. The experiment results on the Xilinx ZCU102 FPGA board show an improvement in processing time of the AVDDF filter by 98% for the HW/SW implementation relative to the SW implementation. This result is given for the same quality of image between the HW/SW and SW implementations in terms of the normalized color difference (NCD) and the peak signal to noise ratio (PSNR).  相似文献   

20.
Hardware–software partitioning (HW/SW) divides an application into software and hardware. It is one of the crucial steps in embedded system design. For a given task, hardware with different areas may provide different execution speeds due to the potential of parallel execution in hardware implementation. Thus, one task may have multiple-choice in hardware implementation according to the available hardware areas. Existing HW/SW partitioning approaches typically consider only a single implementation manner in hardware, overlooking the multiple-choice of hardware implementations. This paper presents a computing model to cater for the HW/SW partitioning problems with the multiple-choice implementation in hardware. An efficient heuristic algorithm is proposed to rapidly generate approximate solution, that is further refined by a tabu search algorithm also customized in this paper. Moreover, a dynamic programming algorithm is proposed for the exact solution of the relatively small problems. Extensive simulation results show that the approximate solutions are very close to the exact ones, and they can be refined by tabu search to the solutions with the error no more than 1.5% for all cases considered in this paper.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号