首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Hardware–software partitioning (HW/SW) divides an application into software and hardware. It is one of the crucial steps in embedded system design. For a given task, hardware with different areas may provide different execution speeds due to the potential of parallel execution in hardware implementation. Thus, one task may have multiple-choice in hardware implementation according to the available hardware areas. Existing HW/SW partitioning approaches typically consider only a single implementation manner in hardware, overlooking the multiple-choice of hardware implementations. This paper presents a computing model to cater for the HW/SW partitioning problems with the multiple-choice implementation in hardware. An efficient heuristic algorithm is proposed to rapidly generate approximate solution, that is further refined by a tabu search algorithm also customized in this paper. Moreover, a dynamic programming algorithm is proposed for the exact solution of the relatively small problems. Extensive simulation results show that the approximate solutions are very close to the exact ones, and they can be refined by tabu search to the solutions with the error no more than 1.5% for all cases considered in this paper.  相似文献   

2.
Efficient heuristic and tabu search for hardware/software partitioning   总被引:1,自引:0,他引:1  
Hardware/software (HW/SW) partitioning is a crucial step in HW/SW codesign that determines which components of the system are implemented on hardware and which ones on software. It has been proved that the HW/SW partitioning problem is NP-hard. In this paper, we present two approaches for HW/SW partitioning that aims to minimize the hardware cost while taking into account software and communication constraints. The first is a heuristic approach that treats the HW/SW partitioning problem as an extended 0–1 knapsack problem. In the second approach, tabu search is used to further improve the solution obtained from the proposed heuristic algorithm. Experimental results show that the proposed algorithms outperform a recently reported work by up to 28 %.  相似文献   

3.
Support Vector Machine (SVM) regression is an important technique in data mining. The SVM training is expensive and its cost is dominated by: (i) the kernel value computation, and (ii) a search operation which finds extreme training data points for adjusting the regression function in every training iteration. Existing training algorithms for SVM regression are not scalable to large datasets because: (i) each training iteration repeatedly performs expensive kernel value computations, which is inefficient and requires holding the whole training dataset in memory; (ii) the search operation used in each training iteration considers the whole search space which is very expensive. In this article, we significantly improve the scalability and efficiency of SVM regression by exploiting the high performance of Graphics Processing Units (GPUs) and solid state drives (SSDs). Our key ideas are as follows. (i) To reduce the cost of repeated kernel value computations and avoid holding the whole training dataset in the GPU memory, we precompute all the kernel values and store them in the CPU memory extended by the SSD; together with an efficient strategy to read the precomputed kernel values, reusing precomputed kernel values with an efficient retrieval is much faster than computing them on-the-fly. This also alleviates the restriction that the training dataset has to fit into the GPU memory, and hence makes our algorithm scalable to large datasets, especially for large datasets with very high dimensionality. (ii) To enhance the performance of the frequently used search operation, we design an algorithm that minimizes the search space and the number of accesses to the GPU global memory; this optimized search algorithm also avoids branch divergence (one of the causes for poor performance) among GPU threads to achieve high utilization of the GPU resources. Our proposed techniques together form a scalable solution to the SVM regression which we call SIGMA. Our extensive experimental results show that SIGMA is highly efficient and can handle very large datasets which the state-of-the-art GPU-based algorithm cannot handle. On the datasets of size that the state-of-the-art GPU-based algorithm can handle, SIGMA consistently outperforms the state-of-the-art GPU-based algorithm by an order of magnitude and achieves up to 86 times speedup.  相似文献   

4.
软硬件划分是软硬件协同设计的关键环节,划分的结果直接影响目标系统的设计质量。因此,对于一个给定的应用程序,为了使得目标系统快速执行且成本低廉,合理的划分策略十分重要。由于单个任务具有多种不同的硬件实现方式,与传统的单一硬件实现方式的软硬件划分问题相比,多选择的软硬件划分更能客观地反映现实应用。这导致问题的求解更具挑战性,它们已被证明是NP完全问题。基于多核处理器片上系统并针对任务图为二叉树的应用,建立了多选择软硬件划分问题的计算模型,并提出了解决该问题的动态规划算法。实验结果表明,当问题规模适中时,所提动态规划算法能够有效地获得精确解,并展示了算法的计算能力与硬件面积限制之间的关系。  相似文献   

5.
With the development of the design complexity in embedded systems, hardware/software (HW/SW) partitioning becomes a challenging optimization problem in HW/SW co-design. A novel HW/SW partitioning method based on position disturbed particle swarm optimization with invasive weed optimization (PDPSO-IWO) is presented in this paper. It is found by biologists that the ground squirrels produce alarm calls which warn their peers to move away when there is potential predatory threat. Here, we present PDPSO algorithm, in each iteration of which the squirrel behavior of escaping from the global worst particle can be simulated to increase population diversity and avoid local optimum. We also present new initialization and reproduction strategies to improve IWO algorithm for searching a better position, with which the global best position can be updated. Then the search accuracy and the solution quality can be enhanced. PDPSO and improved IWO are synthesized into one single PDPSO-IWO algorithm, which can keep both searching diversification and searching intensification. Furthermore, a hybrid NodeRank (HNodeRank) algorithm is proposed to initialize the population of PDPSO-IWO, and the solution quality can be enhanced further. Since the HW/SW communication cost computing is the most time-consuming process for HW/SW partitioning algorithm, we adopt the GPU parallel technique to accelerate the computing. In this way, the runtime of PDPSO-IWO for large-scale HW/SW partitioning problem can be reduced efficiently. Finally, multiple experiments on benchmarks from state-of-the-art publications and large-scale HW/SW partitioning demonstrate that the proposed algorithm can achieve higher performance than other algorithms.  相似文献   

6.
New Model and Algorithm for Hardware/Software Partitioning   总被引:1,自引:0,他引:1       下载免费PDF全文
This paper focuses on the algorithmic aspects for the hardware/software (HW/SW) partitioning which searches a reasonable composition of hardware and software components which not only satisfies the constraint of hardware area but also optimizes the execution time. The computational model is extended so that all possible types of communications can be taken into account for the HW/SW partitioning. Also, a new dynamic programming algorithm is proposed on the basis of the computational model, in which source data, rather than speedup in previous work, of basic scheduling blocks are directly utilized to calculate the optimal solution. The proposed algorithm runs in O(n·A) for n code fragments and the available hardware area A. Simulation results show that the proposed algorithm solves the HW/SW partitioning without increase in running time, compared with the algorithm cited in the literature.  相似文献   

7.
软硬件划分与调度是软硬件协同设计的关键环节,是经典的组合优化问题。本文针对调度与软硬件划分问题提出一种高效的启发式算法。调度算法根据任务的出度及软件计算时间对任务赋予不同的优先级,出度越大,优先级越高,出度相同的情况下,软件计算时间越大,优先级越高。划分算法首先寻找关键路径,然后将关键路径上具有最高受益面积比的任务交由硬件去实现。每次迭代更新当前关键路径的调度长度及剩余硬件面积。继续循环,直到剩余的硬件面积不再满足关键路径上的任何一个软件任务所需的硬件面积的要求为止,这样使得硬件面积的使用率比较高。实验表明,该算法对已有算法的改进可达到38%。  相似文献   

8.
软硬件划分问题是软硬件协同设计的重要问题之一,它涉及到系统建模,划分算法和划分方案评价等问题,其中划分算法设计是关键点。以提高系统时间性能为目标,利用任务流图构造系统模型,在其上实现了基于优先权的评价函数,提出了搜索空间平滑技术与离散粒子群算法相结合的软硬件划分算法,并且解决了两者的融合问题,并能根据系统信息动态适应调整算法参数。实验结果表明,算法时间开销稳定,求解质量较高。  相似文献   

9.
王璞  武继刚 《计算机科学》2012,39(1):290-294
软硬件划分是软硬件协同设计的关键环节,它决定系统中哪些组件由软件实现,哪些由硬件实现。软硬件划分问题已被证明是NP完全问题。将一类软硬件划分问题看作变异的0-1背包问题,在求解背包问题的算法基础上构造出软硬件划分问题的优质启发解。此外,采用禁忌搜索(Tabu Search)算法对求得的启发解进行改进,在软件开销和通信开销满足一定约束的条件下,使得硬件开销尽可能小。实验结果证明,所提算法对当前最新算法的改进最大可达到28%。  相似文献   

10.
高健  李涛 《计算机工程与设计》2007,28(14):3426-3428
软硬件划分是嵌入式系统软硬件协同设计中的关键技术之一,如何兼顾系统的性能和成本,达到两者的最佳结合,是软硬件划分的主要问题.针对单CPU多ASICs类型的目标结构,选取了遗传算法、禁忌搜索算法和模拟退火算法等全局优化算法进行系统的软硬件划分,并对3种算法的有效性进行了比较分析.  相似文献   

11.
基于遗传和禁忌搜索混合的软硬件划分算法   总被引:1,自引:0,他引:1       下载免费PDF全文
针对嵌入式系统软硬件划分问题,在比较了遗传算法(GA)和禁忌搜索(TS)各自优缺点的基础上,提出采用遗传/禁忌混合算法(GATS)的策略,用遗传算法提供并行搜索的主框架,用禁忌搜索作为遗传算法的变异算子,遗传算法中变异过程解空间的搜索由禁忌搜索实现。实验结果表明,GATS具有多出发点和记忆功能强、爬山能力强的优势,能够克服GA爬山能力差、TS单点出发的弱点。最后与单纯的遗传算法和禁忌搜索算法进行对比实验,证明GATS更有优势,得到的划分结果也更优秀。  相似文献   

12.
张良  徐成  田峥  李涛 《计算机应用》2013,33(7):1898-1902
软硬件划分是嵌入式系统设计过程中一个关键环节,已经被证明是一个NP问题。针对目前算法在进行大任务集下的软硬件划分时计算复杂度高、不能快速收敛,且找到的全局最优解的质量不佳等问题,提出一种基于贪心算法和模拟退火算法相融合的软硬件划分方法。首先将软硬件划分问题规约为变异的0-1背包问题,在求解背包问题的算法基础上用贪心算法构造出初始划分解;然后,对代价函数的解空间进行合理的区域划分,并基于划分的区间设计新的代价函数,采用改进的模拟退火算法对初始划分进行全局寻优。实验结果表明,与目前已有的类似改进算法相比,新算法在任务划分质量和算法运行时间两个方面的提升率最大可达到8%和17%左右,具有高效性和实用性。  相似文献   

13.
通过遗传算法进行系统级软硬件划分   总被引:4,自引:3,他引:4  
介绍采用遗传算法解决软硬件划分问题,具体讨论在遗传算法实现过程中的编码和解码,适应值函数的选取,选择,交叉,变异算子的实现、收敛准则的决定等问题的处理,与已发表文献的处理方法进行比较,最后通过随机实验取得好的结果。  相似文献   

14.
15.
基于NSGA-II的嵌入式系统软硬件划分方法   总被引:2,自引:0,他引:2  
软硬件划分是软硬件协同设计中的一个关键问题。针对单处理器嵌入式系统,提出将NSGA-II应用于软硬件划分中,该算法一次运行可以获得多个Pareto最优解,为各个目标函数之间权衡分析提供了有效的工具,提高了设计效率。结果表明,通过该划分方法,在满足系统性能要求下,可为复杂嵌入式系统提供多个设计目标的全局优化方案。  相似文献   

16.
A hardware/software partitioning methodology for improving performance in single-chip systems composed by processor and Field Programmable Gate Array reconfigurable logic is presented. Speedups are achieved by executing critical software parts on the reconfigurable logic. A hybrid System-on-Chip platform, which can model the majority of existing processor-FPGA systems, is considered by the methodology. The partitioning method uses an automated kernel identification process at the basic-block level for detecting critical kernels in applications. Three different instances of the generic platform and two sets of benchmarks are used in the experimentation. The analysis on five real-life applications showed that these applications spend an average of 69% of their instruction count in 11% on average of their code. The extensive experiments illustrate that for the systems composed by 32-bit processors the improvements of five applications ranges from 1.3 to 3.7 relative to an all software solution. For a platform composed by an 8-bit processor, the performance gains of eight DSP algorithms are considerably greater, as the average speedup equals 28.  相似文献   

17.
The BOAR emulation system is targeted to hardware/software (HW/SW) codevelopment of advanced embedded DSP and telecom systems. The challenge of the BOAR system is efficient customization of programmable hardware, and dedicated partitioning routine to target applications and structures, which allows quite high overall system performance. The system allows multiple configurations for communication between processors and field programmable gate arrays (FPGAs) making the BOAR system an efficient tool for real-time HW/SW coverification. The reprogrammable hardware of the emulation tool is based on four Xilinx 4000-series devices, two Texas TMS320C50 signal processors and one Motorola MC68302 microcontroller. With current devices the BOAR hardware provides approximately 40–70 kgates of logic capacity in DSP applications. The emulation capacity can be expanded by connecting several similar boards in chain. The system has also a versatile internal reprogrammable test environment for test bench development, performance evaluations and design debugging. The logic development environment is based on the Synopsys synthesis tools and an automatic design management software, which performs resource mapping and performance-driven design partitioning between FPGAs. The emulation hardware is currently connected to logic and software development environments via an RS-232C bus. The BOAR emulation system has been found a very efficient platform for real-life prototyping of different types of DSP algorithms and systems, and validating correct functionality of a VHDL macro library.  相似文献   

18.
钟丽  刘彦  余思洋  谢中 《计算机应用》2015,35(5):1412-1416
针对现有的椭圆曲线算法系统级设计中开发周期长,以及不同模块的性能开销指标不明确等问题,提出一种基于电子系统级(ESL)设计的软硬件(HW/SW)协同设计方法.该方法通过分析SM2(ShangMi2)算法原理与实现方式,研究了不同的软硬件划分方案,并采用统一建模语言SystemC对硬件模块进行周期精确级建模.通过模块级与系统级两层验证比较软硬件模块执行周期数,得出最佳性能划分方式.最后结合算法控制流程图(CFG)与数据流程图(DFG)将ESL模型转化为寄存器传输级(RTL)模型进行逻辑综合与比较,得出在180 nm CMOS工艺,50 MHz频率下,当算法性能最佳时,点乘模块执行时间为20 ms,门数83 000,功耗约2.23 mW.实验结果表明所提系统级架构分析对基于椭圆曲线类加密芯片在性能、面积与功耗的评估优势明显且适用性强,基于此算法的嵌入式系统芯片(SoC)可根据性能与资源限制选择合适的结构并加以应用.  相似文献   

19.
深度优先搜索算法在GPU集群中大型图上的简单执行,会导致线程间的负载不平衡和无法合并内存访问的情况,这使得算法的性能较低.为了明显提高算法在单个GPU和多个GPU环境下的性能,在处理数据之前通过采取一系列有效的操作来进行重新编排.提出了构造线程和数据之间映射的新技术,通过利用前缀求和及二分查找操作来达到完美的负载平衡.为了降低通信开销,对DFS各分支中需要进行交换的边集执行修剪操作.实验结果表明,算法在单个GPU上可以尽可能地实现最佳的并行性,在多GPU环境下可以最小化通信开销.在一个GPU集群中,它可以对合有数十亿节点的图有效地执行分布式DFS.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号