首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
An energy-aware online task mapping algorithm in NoC-based system   总被引:1,自引:1,他引:0  
With the development of the semiconductor technology, more processors can be integrated onto a single chip. Network-on-Chip is an efficient communication solution for many-core system. However, enhancing performance with lower energy consumption is still a challenge. One critical issue is mapping applications to NoC. This work proposed an online mapping method, which optimizes task mapping algorithm to reduce communication energy consumption. The communication status of applications at runtime is analyzed first. Then, the algorithm computes the mapping placement dynamically and implements the real-time mapping online. Experimental results based on simulation show that the algorithm proposed in this article can achieve more than 20% communication energy saving compared with first fit mapping and nearest neighbor mapping. The migration cost caused by the remapping process is also considered, and can be calculated at the runtime to estimate the effect of remapping.  相似文献   

2.
Dynamic task mapping for Network-on-Chip based systems   总被引:1,自引:0,他引:1  
Efficiency of Network-on-Chip (NoC) based multi-processor systems largely depends on optimal placement of tasks onto processing elements (PEs). Although number of task mapping heuristics have been proposed in literature, selecting best technique for a given environment remains a challenging problem. Keeping in view the fact that comparisons in original study of each heuristic may have been conducted using different assumptions, environment, and models. In this study, we have conducted a detailed quantitative analysis of selected dynamic task mapping heuristics under same set of assumptions, similar environment, and system models. Comparisons are conducted with varying network load, number of tasks, and network size for constantly running applications. Moreover, we propose an extension to communication-aware packing based nearest neighbor (CPNN) algorithm that attempts to reduce communication overhead among the interdependent tasks. Furthermore, we have conducted formal verification and modeling of proposed technique using high level Petri nets. The experimental results indicate that proposed mapping algorithm reduces communication cost, average hop count, and end-to-end latency as compared to CPNN especially for large mesh NoCs. Moreover, proposed scheme achieves up to 6% energy savings for smaller mesh NoCs. Further, results of formal modeling indicate that proposed model is workable and operates according to specifications.  相似文献   

3.
Networks on Chip (NoC) have emerged as the key paradigm for designing a scalable communication infrastructure for future Systems on Chip (SoC). An important issue in NoC design is how to map an application on this architecture and how to determine the hardware/software partition that satisfies the performance, cost and flexibility requirements. In this paper, we propose an approach that concurrently optimizes the mapping and the partitioning of streaming applications. The proposed approach exploits multiobjective evolutionary algorithms that are fed by execution performances scores corresponding to the evaluated mappings and partitioning ability to pipeline execution of the streaming application. As result, most promising solutions are highlighted for mapping multimedia applications onto a SoC architecture interconnecting 16 nodes through 2D-Mesh and Ring NoC.  相似文献   

4.
针对NoC任务映射问题中时延难以预测和启发式算法效率低的问题,提出一个时延改进模型和近邻随机遗传算法。该模型从宏观的链路负载分布和单个节点的排队时延两方面来构建NoC映射的时延模型,通过引入时延因子、权重系数来刻画不同映射方案对时延性能的影响,避免了NoC通信时延精确建模的难题。提出近邻随机思想来构建遗传算法的初始种群,并且运用该算法实现了面向时延的NoC映射,在达到全局最优的情况下,比经典遗传算法效率提升将近20%。实验结果表明,该算法优于现有的经典遗传算法和随机映射方案。  相似文献   

5.
随着处理器核数的增加,片上互连网络NoC结构日趋复杂,导致片上互连网络功耗所占的比重和功耗分析的难度也在增加。片上互连网络的任务映射,既要保证多处理器核心之间通信的高性能,又要保证耗费尽可能少的功耗和面积,即在有限的功耗和面积开销下获得较高的性能。在进行任务映射时,核心之间的通信距离是减少任务通信功耗的关键。连续且近凸的区域有助于缩短任务的通信距离。分析了一种功耗最优的片上互连网络启发式映射算法(INC),该算法由区域选择算法和节点映射算法组成。对区域选择算法的2个因子进行了改进,使应用总的通信开销最小化且保证后续应用以很小的通信代价进行区域选择。提出了新的基于选择区域的映射算法。它们在动态到达程序映射问题中的实验结果表明,新的区域选择算法和节点映射算法相比于INC,可以减少12.10%的通信功耗,并且带来11.23%的通信延迟优化。  相似文献   

6.
3-D Networks-on-Chip(NoC) emerge as a potent solution to address both the interconnection and design complexity problems facing future Multiprocessor System-on-Chips(MPSoCs).Effective run-time mapping on such 3-D NoC-based MPSoCs can be quite challenging,as the arrival order and task graphs of the target applications are typically not known a priori,which can be further complicated by stringent energy requirements for NoC systems.This paper thus presents an energy-aware run-time incremental mapping algorithm(ERIM) for 3-D NoC which can minimize the energy consumption due to the data communications among processor cores,while reducing the fragmentation effect on the incoming applications to be mapped,and simultaneously satisfying the thermal constraints imposed on each incoming application.Specifically,incoming applications are mapped to cuboid tile regions for lower energy consumption of communication and the minimal routing.Fragment tiles due to system fragmentation can be gleaned for better resource utilization.Extensive experiments have been conducted to evaluate the performance of the proposed algorithm ERIM,and the results are compared against the optimal mapping algorithm(branch-and-bound) and two heuristic algorithms(TB and TL).The experiments show that ERIM outperforms TB and TL methods with significant energy saving(more than 10%),much reduced average response time,and improved system utilization.  相似文献   

7.
针对将计算任务合理地映射到三维片上网络(NoC)的问题,提出了一种基于遗传算法(GA)的改进算法。GA具有快速随机的搜索能力,Prim算法可在加权连通图内得到最小生成树,改进算法结合了两种算法的优势,将计算任务合理地分配到各个网络节点,对于优化三维片上网络功耗和散热等问题具有很高的效率。通过仿真实验,对所提出的基于Prim算法的改进GA与基本GA的3D NoC映射算法进行了对比,仿真结果显示,基于Prim算法的改进GA平均功耗更低,从总体趋势来看,处理单元数量的增加与功耗降低幅度成正相关,在101个处理单元情况下,平均功耗比基本GA降低32%。  相似文献   

8.
Efficient run-time mapping of tasks onto Multiprocessor System-on-Chip (MPSoC) is very challenging especially when new tasks of other applications are also required to be supported at run-time. In this paper, we present a number of communication-aware run-time mapping heuristics for the efficient mapping of multiple applications onto an MPSoC platform in which more than one task can be supported by each processing element (PE). The proposed mapping heuristics examine the available resources prior to recommending the adjacent communicating tasks on to the same PE. In addition, the proposed heuristics give priority to the tasks of an application in close proximity so as to further minimize the communication overhead. Our investigations show that the proposed heuristics are capable of alleviating Network-on-Chip (NoC) congestion bottlenecks when compared to existing alternatives. We map tasks of applications onto an 8 × 8 NoC-based MPSoC to show that our mapping heuristics consistently leads to reduction in the total execution time, energy consumption, average channel load and latency. In particular, we show that energy savings can be up to 44% and average channel load is improved by 10% for some cases.  相似文献   

9.
一种基于遗传算法的片上网络映射算法   总被引:1,自引:0,他引:1       下载免费PDF全文
片上网络NoC以其高可扩展性成为片上多核的互连解决方案。IP核到NoC结点的映射是片上网络设计的重要阶段。映射对芯片的性能和功耗有重要的影响。本文详细阐述了映射算法的研究现状,给出了映射算法的分类方法,并且分析各种方法的特点。最后,给出一种采用顺序表示的基于遗传算法的NoC映射算法。实验结果表明,该映射算法能够取得较好的准确性和较高的效率。  相似文献   

10.
针对采用2D-Torus拓扑结构且支持电压频率岛(VFI)的异步片上网络能耗优化问题,提出了具有可靠性的、基于电压频率岛的划分和分配及片上网络任务映射的能耗优化方法.该方法采用递进优化的方式,根据IP核的动态处理能耗,不同电压频率岛之间的转换能耗和可靠性带来的能耗开销定义了IP核在电压频率岛之间移动的阈值函数,并通过对阈值函数进行判断完成电压频率岛的划分和分配,应用基于三元相关性量子粒子群优化算法完成处理单元到资源节点的映射,在映射中考虑保证系统可靠性的通信开销,对异步片上网络系统的可靠性进行优化.实验结果表明,该算法可以在不过多消耗能耗的情况下显著的改善片上网络系统的可靠性,且可有效降低NOC系统的能耗.  相似文献   

11.
保证QoS的片上网络低能耗映射与路由方法   总被引:3,自引:0,他引:3  
为解决二维mesh片上网络的服务质量和低能耗问题,提出基于最优化搜索的拓扑映射与路由方法Q-LEMR.该方法以降低芯片通信能耗为目标,在保证系统延迟与带宽的服务质量的前提下,自动将给定应用的IP核映射到片上网络结构上,并为通信踪迹定制设计确定的、非死锁的最短路径路由;同时通过加速策略使映射和路由的计算在可接受的时间范围内完成.实验结果表明,Q—LEMR较现有工作平均降低通信能耗28.8%,并满足服务质量要求.  相似文献   

12.
基于蚁群优化算法的NoC映射   总被引:4,自引:0,他引:4  
功耗问题正逐渐成为NoC领域的研究热点,很多研究人员都在研究NoC功耗最小化的设计技术。文章采用一种有效的蚁群优化算法实现了NoC映射:在自动映射处理单元的同时,尽可能地减少了系统的通讯功耗。实验结果表明采用蚁群优化算法可以很快地收敛;针对不同的应用,可以减少25%-70%通讯功耗。  相似文献   

13.
When a number of applications simultaneously running on a many-core chip multiprocessor (CMP) chip connected through network-on-chip (NoC), significant amount of on-chip traffic is one-to-many (multicast) in nature. As a matter of fact, when multiple applications are mapped onto an NoC architecture with applicable traffic isolation constraints, the corresponding sub-networks of these applications are mapped onto actually tend to be irregular. In the literature, multicasting for irregular topologies is supported through either multiple unicasting or broadcasting, which, unfortunately, results in overly high power consumption and/or long network latency. To address this problem, a simple, yet efficient hardware-based multicasting scheme is proposed in this paper. First, an irregular oriented multicast strategy is proposed. Literally, following this strategy, an irregular oriented multicast routing algorithm can be designed based on any regular mesh based multicast routing algorithm. One such algorithm, namely, Alternative Recursive Partitioning Multicasting (AL + RPM), is proposed based on RPM, which was designed for regular mesh topology originally. The basic idea of AL + RPM is to find the output directions following the basic RPM algorithm and then decide to replicate the packets to the original output directions or the alternative (AL) output directions based on the shape of the sub-network. The experiment results show that the proposed multicast AL + RPM algorithm can consume, on average, 14% and 20% less power than bLBDR (a broadcasting-based routing algorithm) and the multiple unicast scheme, respectively. In addition, AL + RPM has much lower network latency than the above two approaches. To incorporate AL + RPM into a baseline router to support multicasting, the area overhead is fairly modest, less than 5.5%.  相似文献   

14.
Network-on-Chip (NoC) has been proposed to overcome the complex on-chip communication problem of System-on-Chip (SoC) design in deep sub-micron. A complete NoC design contains exploration on both hardware and software architectures. The hardware architecture includes the selection of Processing Elements (PEs) with multiple types and their topology. The software architecture contains allocating tasks to PEs, scheduling of tasks and their communications. To find the best hardware design for the target tasks, both hardware and software architectures need to be considered simultaneously. Previous works on NoC design have concentrated on solving only one or two design parameters at a time. In this paper, we propose a hardware–software co-synthesis algorithm for a heterogeneous NoC architecture. The design goal is to minimize energy consumption while meeting the real-time requirements commonly seen in embedded applications. The proposed algorithm is based on Simulated-Annealing (SA). To compare the solution quality and efficiency of the proposed algorithm, we also implement the branch-and-bound and iterative algorithm to solve the hardware–software co-synthesis problem of a heterogeneous NoC. With the given synthetic task sets, the experimental results show that the proposed SA-based algorithm achieves near-optimal solution in a reasonable time, while the branch-and-bound algorithm takes a very long time to find the optimal solution, and the iterative algorithm fails to achieve good solution quality. When applying the co-synthesis algorithms to a real-world application with PE library that has little variation in PE performance and energy consumption, the iterative algorithm achieves solution quality comparable to that of the proposed SA-based algorithm.  相似文献   

15.
Many-core architectures based on network-on-chip (NoC) are scalable and have the ability to meet the increasing performance requirements of complex concurrent applications (real-time video, communications, control, etc.). This paper addresses the mapping problem of hard real-time task sets on NoC-based many-core processors. Our main contribution is a novel static mapping scheme called K-level task splitting (KTS). If a task cannot be allocated on a given core of the NoC because of a too high processing utilization ratio, it gets replicated so that its jobs execute on more than one core without migrating. Synchronization between task replicas could then be ensured by assigning offsets and virtual deadlines to them. KTS’s advantage is that data migration is not required, thus involving no overheads due to migrations. The only requirement is that all core clocks be synchronized within the NoC. In this newly proposed algorithm, the schedulability of each task is determined based upon fundamental results relative to the feasibility analysis of asynchronous real-time task sets. The paper describes the principles of task splitting, our algorithm and its properties. We evaluate the efficiency of KTS, demonstrating that it is a good compromise between existing semi-partitioned schemes (with possible migrations) and fully partitioned approaches.  相似文献   

16.

The rapid increase in the number of cores on chips forced the designers to invent new communication methods such as Network-on-Chip (NoC) paradigm. Advances in integrated circuit fabrications even allowed three-dimensional NoC (3D-NoC) implementations. 3D-NoCs have more advantages than their 2D counterparts such as lower area, higher throughput, better performance, and less energy consumption. However, they lack the design automation algorithms. An important design problem for a given application is mapping it on a 3D-NoC topology. In this paper, we present an integer linear programming (ILP) formulation and a novel heuristic algorithm, called CastNet3D, for application mapping onto mesh-based 3D-NoCs with energy minimization being the objective. The algorithm tries to utilize vertical links for communicating nodes as much as possible. Vertical links are shorter than horizontal ones; therefore, they are faster and consume less energy. We compared CastNet3D against ILP in terms of energy consumption and execution time on several benchmarks. Our results show that CastNet3D obtains close to optimum results in much shorter time frames.

  相似文献   

17.
将一个应用程序部署到给定的片上网络上执行时,需要将应用程序中的每一个子任务都指派给片上网络中的一个节点执行。该问题一般被建模成一组子任务作为顶点的有向无环图,任务在片上网络上的部署过程就等同于一个有向无环图的顶点向一个片上网络拓扑映射的过程。而随着应用程序和片上网络规模的增大,计算一个最优的映射方案是典型的难解问题。为了加速有向无环图到片上网络拓扑的映射过程,提出了有向无环图的归约算法,使归约后的图中的顶点数量尽可能地与给定片上网络中的节点数量相同。提出的图归约算法可以有效地识别出所有可归约子图,这些可归约子图可被归约为单一顶点。新算法的适用范围从嵌套图扩展到了任意图,并且拥有与原算法相同的复杂度量级。还提出了一种并行化的算法思想来加速可归约子图的搜索过程。  相似文献   

18.
Application mapping in 2-D mesh-based Network-on-Chip (NoC) architecture is an optimization problem in which each application task (e.g., processor or memory units) should be mapped one-to-one onto a network element (switch or router) to optimize performance requirements (e.g., communication energy or communication latency) under certain platform constraints (e.g., bandwidth and/or latency). Network-on-Chip is a scheme that establishes links between limited application-specific components within Multi-Processor System-on-Chip (MPSoC), but it has a vital role to ensure the maximum data transfer rate and reduce total number of physical interconnections. Most of the works on heuristic application mapping for mesh-based NoC design aim to minimize both total communication energy and run-time, however they experience the following issues: (i) relatively high CPU time due to linear search for the task and tile mapping combinations, (ii) consumption of relatively high communication energy due to random tile selection when two or more tiles are equivalent in terms of average weighted distance by their adjacent mapped tasks, and (iii) even after constructive application mapping, some of the tasks consume higher communication energy due to their inappropriate placements. In this paper we present a low time-complexity heuristic mapping algorithm of weighted application graph under permissible bandwidth constraint to minimize communication energy of 2-D mesh-based NoC architecture. The experimental results of multimedia benchmarks, as well as randomly generated samples show the low communication energy as well as time-complexity under bandwidth constraints in comparison to the recent heuristic application mapping approaches. In our approach, the communication energy is also close to the optimal solution obtained by Integer Linear Programming (ILP).  相似文献   

19.
In this paper, we describe how our computational model can be used for the problems of processor allocation and task mapping. The intended applications for this model include the dynamic mapping problems of shrinking or spreading an existing mapping when the available pool of processors changes during execution of the problem. The concept of problem edge class and other features of our model are developed to realistically and efficiently support task partitioning and merging for static and dynamic mapping. The model dictates realistic changes in the computation and communication characteristics of a problem when the problem partitioning is modified dynamically. This model forms the basis of our algorithms for shrinking and spreading, and yields realistic results for a variety of problems mapped onto real systems. An emulation program running on a network of workstations under PVM is used to measure execution times for the mapping solutions found by the algorithms. The results indicate that the problem edge class is a crucial consideration for processor allocation and task mapping  相似文献   

20.
Network-on-chip (NoC) communication architectures present promising solutions for scalable communication requests in large system-on-chip (SoC) designs. Intellectual property (IP) core assignment and mapping are two key steps in NoC design, significantly affecting the quality of NoC systems. Both are NP-hard problems, so it is necessary to apply intelligent algorithms. In this paper, we propose improved intelligent algorithms for NoC assignment and mapping to overcome the draw-backs of traditional intelligent algorithms. The aim of our proposed algorithms is to minimize power consumption, time, area, and load balance. This work involves multiple conflicting objectives, so we combine multiple objective optimization with intelligent algorithms. In addition, we design a fault-tolerant routing algorithm and take account of reliability using comprehensive performance indices. The proposed algorithms were implemented on embedded system synthesis benchmarks suite (E3S). Experimental results show the improved algorithms achieve good performance in NoC designs, with high reliability.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号