期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Application mapping to mesh NoCs using a Tabu-search based swarm optimization

《Microprocessors and Microsystems》2017

A hybrid optimization scheme is presented that combines Tabu-search, communication volume based core swapping and Discrete Particle Swarm Optimization (DPSO) for NoC (Network-on-Chip) mapping. The main goal of the optimization is to map an application core-graph such that the overall communication latency of the NoC is minimal. It is assumed that the target NoC has a 2D-mesh topology. DPSO is used as the main optimization technique where each swarm particle move is influenced by the global and local best, previous visited search space locations, and a deterministic method to reduce communication volume of existing mapping. We employ a Tabu-list to discourage swarm particles to re-visit the explored search space and propose an alternative direction towards the intended movement direction. The methodology is tested for some multimedia applications as well as randomly generated large network of synthetic cores-graphs. For larger applications, our hybrid scheme generates high quality NoC mapping solutions as compared to DPSO based existing techniques. 相似文献

2.

A NoC-based simulator for design and evaluation of deep neural networks

《Microprocessors and Microsystems》2020

The astonishing development in the field of artificial neural networks (ANN) has brought significant advancement in many application domains, such as pattern recognition, image classification, and computer vision. ANN imitates neuron behaviors and makes a decision or prediction by learning patterns and features from the given data set. To reach higher accuracies, neural networks are getting deeper, and consequently, the computation and storage demands on hardware platforms are steadily increasing. In addition, the massive data communication among neurons makes the interconnection more complex and challenging. To overcome these challenges, ASIC-based DNN accelerators are being designed which usually incorporate customized processing elements, fixed interconnection, and large off-chip memory storage. As a result, DNN computation involves large memory accesses due to frequent load/off-loading data, which significantly increases the energy consumption and latency. Also, the rigid architecture and interconnection among processing elements limit the efficiency of the platform to specific applications. In recent years, Network-on-Chip-based (NoC-based) DNN becomes an emerging design paradigm because the NoC interconnection can help to reduce the off-chip memory accesses while offers better scalability and flexibility. To evaluate the NoC-based DNN in the early design stage, we introduce a cycle-accurate NoC-based DNN simulator, called DNNoC-sim. To support various operations such as convolution and pooling in the modern DNN models, we first propose a DNN flattening technique to convert diverse DNN operation into MAC-like operations. In addition, we propose a DNN slicing method to evaluate the large-scale DNN models on a resource-constraint NoC platform. The evaluation results show a significant reduction in the off-chip memory accesses compared to the state-of-the-art DNN model. We also analyze the performance and discuss the trade-off between different design parameters. 相似文献

3.

An analytical model of broadcast in QoS-aware wormhole-routed NoCs

Mahmoud Moadeli^{Author Vitae} Wim Vanderbauwhede Author Vitae 《Journal of Systems and Software》2011,84(1):12-20

Networks-on-Chip (NoC) emerged to address the technological and design issues related to development of large systems-on-chip (SoCs). Due to diversity of the application's performance requirements, most NoC architectures offer supports for quality of service (QoS). Also, to utilize the available bandwidth efficiently, they might implement mechanisms for delivering collective communication operations. This paper presents an analytical model to predict the average latency of wormhole-routed prioritized broadcast communication in NoCs. The model assumes that the network uses all-port routers scheme and offers differentiated services-based QoS. To verify the analysis, the model predictions are compared against the results obtained from a discrete-event simulator developed using OMNET++. 相似文献

4.

基于虫孔交换的NoC映射和通讯参数自动化设计方法研究

曹亚菲王大伟李思昆《计算机工程与科学》2010,32(11)

NoC映射和通讯参数设计是NoC设计过程中非常重要的部分,其结果直接影响NoC的性能、面积和功耗。本文将NoC映射问题和通讯参数设计问题统一考虑,首先对NoC映射问题进行了形式化定义,然后提出了基于虫孔交换的NoC延迟性能分析方法,根据应用的通讯延迟约束,将应用模型映射到NoC拓扑结构上,并自动设计出NoC通讯参数。实验表明,本文所提出的延迟性能分析方法比以往方法精确7%～13%,映射结果和通讯参数设计更优。相似文献

5.

基于虫孔交换的NoC映射和通讯参数自动化设计方法研究

下载免费PDF全文

曹亚菲王大伟李思昆《计算机工程与科学》2010,32(11):111-113

NoC映射和通讯参数设计是NoC设计过程中非常重要的部分,其结果直接影响NoC的性能、面积和功耗。本文将NoC映射问题和通讯参数设计问题统一考虑,首先对NoC映射问题进行了形式化定义,然后提出了基于虫孔交换的NoC延迟性能分析方法,根据应用的通讯延迟约束,将应用模型映射到NoC拓扑结构上,并自动设计出NoC通讯参数。实验表明,本文所提出的延迟性能分析方法比以往方法精确7%~13%,映射结果和通讯参数设计更优。相似文献

6.

Energy efficient heuristic application mapping for 2-D mesh-based network-on-chip

《Microprocessors and Microsystems》2019

Application mapping in 2-D mesh-based Network-on-Chip (NoC) architecture is an optimization problem in which each application task (e.g., processor or memory units) should be mapped one-to-one onto a network element (switch or router) to optimize performance requirements (e.g., communication energy or communication latency) under certain platform constraints (e.g., bandwidth and/or latency). Network-on-Chip is a scheme that establishes links between limited application-specific components within Multi-Processor System-on-Chip (MPSoC), but it has a vital role to ensure the maximum data transfer rate and reduce total number of physical interconnections. Most of the works on heuristic application mapping for mesh-based NoC design aim to minimize both total communication energy and run-time, however they experience the following issues: (i) relatively high CPU time due to linear search for the task and tile mapping combinations, (ii) consumption of relatively high communication energy due to random tile selection when two or more tiles are equivalent in terms of average weighted distance by their adjacent mapped tasks, and (iii) even after constructive application mapping, some of the tasks consume higher communication energy due to their inappropriate placements. In this paper we present a low time-complexity heuristic mapping algorithm of weighted application graph under permissible bandwidth constraint to minimize communication energy of 2-D mesh-based NoC architecture. The experimental results of multimedia benchmarks, as well as randomly generated samples show the low communication energy as well as time-complexity under bandwidth constraints in comparison to the recent heuristic application mapping approaches. In our approach, the communication energy is also close to the optimal solution obtained by Integer Linear Programming (ILP). 相似文献

7.

Performance-driven assignment and mapping for reliable networks-on-chips

Qian-qi Le Guo-wu Yang William N. N. Hung Xiao-yu Song Fu-you Fan 《浙江大学学报:C卷英文版》2014,15(11):1009-1020

Network-on-chip （NoC） communication architectures present promising solutions for scalable communication requests in large system-on-chip （SoC） designs. Intellectual property （IP） core assignment and mapping are two key steps in NoC design, significantly affecting the quality of NoC systems. Both are NP-hard problems, so it is necessary to apply intelligent algorithms. In this paper, we propose improved intelligent algorithms for NoC assignment and mapping to overcome the draw-backs of traditional intelligent algorithms. The aim of our proposed algorithms is to minimize power consumption, time, area, and load balance. This work involves multiple conflicting objectives, so we combine multiple objective optimization with intelligent algorithms. In addition, we design a fault-tolerant routing algorithm and take account of reliability using comprehensive performance indices. The proposed algorithms were implemented on embedded system synthesis benchmarks suite （E3S）. Experimental results show the improved algorithms achieve good performance in NoC designs, with high reliability. 相似文献

8.

多核处理器非一致Cache体系结构延迟优化技术研究综述 总被引：1，自引：0，他引：1

黄安文高军张民选《计算机研究与发展》2012,(Z1):118-124

非一致Cache体系结构(non-uniform cache architecture,NUCA)为解决多核处理器(chip multi-processor)"存储墙"难题提供了新的设计思路.重点关注面向CMP的NUCA延迟优化技术,在介绍若干典型NUCA模型的基础上,分析大容量Cache环境下共享/私有机制中的延迟-容量权衡问题,讨论映射、迁移、复制和搜索等数据管理机制在多核环境下的优缺点.最后,针对基于片上网络(network-on-chip,NoC)互连结构的可扩展CMP体系结构,从NUCA模型优化、数据管理和一致性维护机制3个方面讨论和预测未来CMP NUCA延迟优化领域的发展趋势及面临的挑战性问题. 相似文献

9.

Dynamic task mapping for Network-on-Chip based systems 总被引：1，自引：0，他引：1

《Journal of Systems Architecture》2015,61(7):293-306

Efficiency of Network-on-Chip (NoC) based multi-processor systems largely depends on optimal placement of tasks onto processing elements (PEs). Although number of task mapping heuristics have been proposed in literature, selecting best technique for a given environment remains a challenging problem. Keeping in view the fact that comparisons in original study of each heuristic may have been conducted using different assumptions, environment, and models. In this study, we have conducted a detailed quantitative analysis of selected dynamic task mapping heuristics under same set of assumptions, similar environment, and system models. Comparisons are conducted with varying network load, number of tasks, and network size for constantly running applications. Moreover, we propose an extension to communication-aware packing based nearest neighbor (CPNN) algorithm that attempts to reduce communication overhead among the interdependent tasks. Furthermore, we have conducted formal verification and modeling of proposed technique using high level Petri nets. The experimental results indicate that proposed mapping algorithm reduces communication cost, average hop count, and end-to-end latency as compared to CPNN especially for large mesh NoCs. Moreover, proposed scheme achieves up to 6% energy savings for smaller mesh NoCs. Further, results of formal modeling indicate that proposed model is workable and operates according to specifications. 相似文献

10.

MMNNN: A tree-based Multicast Mechanism for NoC-based deep Neural Network accelerators

《Microprocessors and Microsystems》2021

Network-on-Chip (NoC) devices have been widely used in multiprocessor systems. In recent years, NoC-based Deep Neural Network (DNN) accelerators have been proposed to connect neural computing devices using NoCs. Such designs dramatically reduce off-chip memory accesses of these platforms. However, the large number of one-to-many packet transfers significantly degrade performance with traditional unicast channels. We propose a multicast mechanism for a NoC-based DNN accelerator called Multicast Mechanism for NoC-based Neural Network accelerator (MMNNN). To do so, we propose a tree-based multicast routing algorithm with excellent scalability and the ability to minimize the number of packets in the network. We also propose a router architecture for single-flit packets. Our proposed router transfers flits to multiple destinations in a single process and has no head-of-line blocking issue, offering higher throughput and lower latency than traditional wormhole router architectures. Simulation results show that our proposed multicast mechanism offers excellent performance in classification latency, average packet latency, and energy consumption. 相似文献

11.

A Novel Two-stage Learning Pipeline for Deep Neural Networks

Chunhui Ding Zhengwei Hu Saleem Karmoshi Ming Zhu 《Neural Processing Letters》2017,46(1):159-169

In this work, a training method was proposed for Deep Neural Networks (DNNs) based on a two-stage structure. Local DNN models are trained in all local machines and uploaded to the center with partial training data. These local models are integrated as a new DNN model (combination DNN). With another DNN model (optimization DNN) connected, the combination DNN forms a global DNN model in the center. This results in greater accuracy than local DNN models with smaller amounts of data uploaded. In this case, the bandwidth of the uploaded data is saved, and the accuracy is maintained as well. Experiments are conducted on MNIST dataset, CIFAR-10 dataset and LFW dataset. The results show that with less training data uploaded, the global model produces greater accuracy than local models. Specifically, this method focuses on condition of big data. 相似文献

12.

支持双拓扑结构的片上网络评估高层仿真平台

胡婧瑾潘赟严晓浪 MOTTEN Andy CLAESEN Luc 《计算机应用研究》2013,30(9):2827-2830

为实现高效的NoC（片上网络）性能评估, 缩短系统芯片的开发周期, 针对时钟精确级的NoC仿真方法进行研究, 提出了一种新型的高层次、高效率仿真平台, 与仅支持网格拓扑结构的传统仿真器相比, 其创新地支持了网格和环型双拓扑结构的性能评估, 同时支持虚通道扩展的路由器结构设计, 能快速得到网络的延迟、吞吐率、功耗等性能结果。实验结果表明, 该仿真平台能准确模拟NoC功能行为, 快速获得其仿真性能, 为NoC设计验证提供了高效的方法。相似文献

13.

Energy-efficient task-resource co-allocation and heterogeneous multi-core NoC design in dark silicon era

《Microprocessors and Microsystems》2021

To address power limitation issues in dark silicon era for multi-core systems-on-chip and chip multiprocessors, run time task-resource and voltage co-allocation with reconfigurable network-on-chip (NoC) framework for energy efficiency (higher performance/watt) is proposed in this work. Distributed resource managers strategy dynamically reconfigures the voltage/frequency-levels of the NoC links and routers and dynamically power-gates the resources depending on the traffic demands and utilization of the resources. A mapping heuristic, namely MinEnergy, has been proposed to minimize overall chip power and energy hotspots in large-scale NoC. We have formulated the mapping and configuration problem into a linear optimization model for the optimal solution and implemented a state-of-the-art Minimum-Path contiguous mapping for comparisons. Simulations are carried out under real-world benchmarks and platforms to demonstrate the effectiveness and efficiency of the proposed schemes and results show that the energy, power, and performance of the proposed dynamic mapping and configuration solution are significantly better (more than 50%) than those of minimum-path mapping solution, while the energy and power consumption of the proposed solution are more than 90% close to the optimal solution. 相似文献

14.

片上网络路由节点优化设计

王坚李玉柏彭启琮《计算机应用》2011,31(3):617-620

针对虚输出队列结构的路由节点所构成的片上网络(NoC),提出了一种定制化路由节点中各个虚拟通道缓存大小的方法,以提高片上网络的整体通信性能。在有限的片上缓存资源约束下,分析各个虚输入队列中缓存大小对数据通过片上网络的平均延迟的影响,并在此基础上提出一种缓存资源配置方法,以便将缓存资源分配到片上网络通信瓶颈处,从而在不增加缓存资源开销的情况下提高片上网络的通信性能。最后通过仿真验证了路由节点优化设计对提高片上网络性能的可行性,并同未优化的路由节点构成的片上网络性能进行了比较。相似文献

15.

有向无环图的高效归约算法

侯睿武继刚《计算机科学》2015,42(7):78-84

将一个应用程序部署到给定的片上网络上执行时,需要将应用程序中的每一个子任务都指派给片上网络中的一个节点执行。该问题一般被建模成一组子任务作为顶点的有向无环图,任务在片上网络上的部署过程就等同于一个有向无环图的顶点向一个片上网络拓扑映射的过程。而随着应用程序和片上网络规模的增大,计算一个最优的映射方案是典型的难解问题。为了加速有向无环图到片上网络拓扑的映射过程,提出了有向无环图的归约算法,使归约后的图中的顶点数量尽可能地与给定片上网络中的节点数量相同。提出的图归约算法可以有效地识别出所有可归约子图,这些可归约子图可被归约为单一顶点。新算法的适用范围从嵌套图扩展到了任意图,并且拥有与原算法相同的复杂度量级。还提出了一种并行化的算法思想来加速可归约子图的搜索过程。相似文献

16.

基于深度强化学习的云边协同DNN推理

刘先锋梁赛李强张锦《计算机工程》2022,48(11):30-38

现有基于云边协同的深度神经网络(DNN)推理仅涉及边缘设备同构情况下的静态划分策略,未考虑网络传输速率、边缘设备资源、云服务器负载等变化对DNN推理计算最佳划分点的影响,以及异构边缘设备集群间DNN推理任务的最佳卸载策略。针对以上问题,提出基于深度强化学习的自适应DNN推理计算划分和任务卸载算法。以最小化DNN推理时延为优化目标,建立自适应DNN推理计算划分和任务卸载的数学模型。通过定义状态、动作空间和奖励,将DNN推理计算划分和任务卸载组合优化问题转换为马尔可夫决策过程下的最优策略问题。利用深度强化学习方法,从经验池中学习动态环境下边缘设备与云服务器间DNN推理计算划分和异构边缘集群间任务卸载的近似最优策略。实验结果表明,与经典DNN推理算法相比,该算法在异构动态环境下的DNN推理时延约平均降低了28.83%,能更好地满足DNN推理的低时延需求。相似文献

17.

From UML specifications to mapping and scheduling of tasks into a NoC,with reliability considerations

《Journal of Systems Architecture》2013,59(7):429-440

This paper describes a technique for performing mapping and scheduling of tasks belonging to an executable application into a NoC-based MPSoC, starting from its UML specification. A toolchain is used in order to transform the high-level UML specification into a middle-level representation, which takes the form of an annotated task graph. Such an input task graph is used by an optimization engine for the sake of carrying out the design space exploration. The optimization engine relies on a Population-based Incremental Learning (PBIL) algorithm for performing mapping and scheduling of tasks into the NoC. The PBIL algorithm is also proposed for dynamic mapping of tasks in order to deal with failure events at runtime. Simulation results are promising and exhibit a good performance of the proposed solution when problem size is increased. 相似文献

18.

Evaluation of energy and buffer aware application mapping for networks-on-chip

Coşkun Çelik Cüneyt F. Bazlamaçcı 《Microprocessors and Microsystems》2014

Networks-on-Chip (NoC) is a communication paradigm for Systems-on-Chip (SoC). NoC design flow contains many problems, one of which is called as application mapping problem, which is generally solved in the literature by considering minimization of the communication energy consumption only. Energy and Buffer Aware Application Mapping (EBAM) is a recently proposed method, which handles the application mapping issue as a joint optimization problem for minimizing the energy consumption and buffer utilization simultaneously. EBAM avoids possible high input loads on router buffers at the early mapping stage by using a priori traffic characteristics of the application. Self similarity is already an accepted model in local and wide area networks and many on-chip applications have also been proven to have self similar characteristics. EBAM therefore employs self similar traffic in its joint optimization process and a genetic algorithm is already proposed for its solution. 相似文献

19.

A hardwired NoC infrastructure for embedded systems on FPGAs

Muhammad E.S. Elrabaa^{Author Vitae} Abdelhafidh Bouhraoua Author Vitae 《Microprocessors and Microsystems》2011,35(2):200-216

A hardwired network-on-chip based on a modified Fat Tree (MFT) topology is proposed as a communication infrastructure for future FPGAs. With extremely simple routing, such an infra structure would greatly enhance the ongoing trend of embedded systems implementation using multi-cores on FPGAs. An efficient H-tree based floor plan that naturally follows the MFT construction methodology was developed. Several instances of the proposed NoC were implemented with various inter-routers links progression schemes combined with very simple router architecture and efficient client network interface (CNI). The performance of all these implementations was evaluated using a cycle-accurate simulator for various combinations of NoC sizes and traffic models. Also a new data transfer circuit for transferring data between clients and NoC operating at different (unrelated) clock frequencies has been developed. Allowing data transfer at one data per cycle, the operation of this circuit has been verified using gate-level simulations for several ratios of NoC/client clock frequencies. 相似文献

20.

Parameter selection in synchronous and asynchronous deterministic particle swarm optimization for ship hydrodynamics problems

《Applied Soft Computing》2016

Deterministic optimization algorithms are very attractive when the objective function is computationally expensive and therefore the statistical analysis of the optimization outcomes becomes too expensive. Among deterministic methods, deterministic particle swarm optimization (DPSO) has several attractive characteristics such as the simplicity of the heuristics, the ease of implementation, and its often fairly remarkable effectiveness. The performances of DPSO depend on four main setting parameters: the number of swarm particles, their initialization, the set of coefficients defining the swarm behavior, and (for box-constrained optimization) the method to handle the box constraints. Here, a parametric study of DPSO is presented, with application to simulation-based design in ship hydrodynamics. The objective is the identification of the most promising setup for both synchronous and asynchronous implementations of DPSO. The analysis is performed under the assumption of limited computational resources and large computational burden of the objective function evaluation. The analysis is conducted using 100 analytical test functions (with dimensionality from two to fifty) and three performance criteria, varying the swarm size, initialization, coefficients, and the method for the box constraints, resulting in more than 40,000 optimizations. The most promising setup is applied to the hull-form optimization of a high speed catamaran, for resistance reduction in calm water and at fixed speed, using a potential-flow solver. 相似文献