首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Simulation-based techniques can be used to evaluate whether a particular NoC-based platform configuration is able to meet the timing constraints of an application, but they can only evaluate a finite set of scenarios. In safety-critical applications with hard real-time constraints, this is clearly not sufficient because there is an expectation that the application should be schedulable on that platform in all possible scenarios. This paper presents a particular NoC-based multiprocessor architecture, as well as a number of analytical methods that can be derived from that architecture, aiming to allow designers to check, for a given platform configuration, whether all application tasks and communication messages always meet their hard real-time constraints in every possible scenario. Experiments are presented, showing the use of the proposed methods when evaluating different task mapping and platform topologies.  相似文献   

2.
Biologically-inspired packet switched network on chip (NoC) based hardware spiking neural network (SNN) architectures have been proposed as an embedded computing platform for classification, estimation and control applications. Storage of large synaptic connectivity (SNN topology) information in SNNs require large distributed on-chip memory, which poses serious challenges for compact hardware implementation of such architectures. Based on the structured neural organisation observed in human brain, a modular neural networks (MNN) design strategy partitions complex application tasks into smaller subtasks executing on distinct neural network modules, and integrates intermediate outputs in higher level functions. This paper proposes a hardware modular neural tile (MNT) architecture that reduces the SNN topology memory requirement of NoC-based hardware SNNs by using a combination of fixed and configurable synaptic connections. The proposed MNT contains a 16:16 fully-connected feed-forward SNN structure and integrates in a mesh topology NoC communication infrastructure. The SNN topology memory requirement is 50 % of the monolithic NoC-based hardware SNN implementation. The paper also presents a lookup table based SNN topology memory allocation technique, which further increases the memory utilisation efficiency. Overall the area requirement of the architecture is reduced by an average of 66 % for practical SNN application topologies. The paper presents micro-architecture details of the proposed MNT and digital neuron circuit. The proposed architecture has been validated on a Xilinx Virtex-6 FPGA and synthesised using 65 nm low-power CMOS technology. The evolvable capability of the proposed MNT and its suitability for executing subtasks within a MNN execution architecture is demonstrated by successfully evolving benchmark SNN application tasks representing classification and non-linear control functions. The paper addresses hardware modular SNN design and implementation challenges and contributes to the development of a compact hardware modular SNN architecture suitable for embedded applications  相似文献   

3.
Now that system-on-chip technology is emerging, single-chip multi-processors are becoming feasible. A key problem of designing such systems is the complexity of their on-chip interconnects and memory architecture. It is furthermore unclear at what level software should be integrated. An example of a single-chip multi-processor for real-time (networked) embedded systems is the multi-microprocessor (MμP). Its architecture consists of a scalable number of identical master processors and a configurable set of shared co-processors. Additionally, an on-chip real-time operating system kernel is included to support transparent multi-tasking over the set of master processors. In this paper, we explore the main design issues of the architecture platform on which the MμP is based. In addition, synthesis results are presented for a lightweight configuration of this architecture platform.  相似文献   

4.
Highly regular multi-processor architectures are suitable for inherently highly parallelizable applications such as most of the image processing domain. Systems embedded in a single programmable chip platform (SoPC) allow hardware designers to tailor every aspect of the architecture in order to match the specific application needs. These platforms are now large enough to embed an increasing number of cores, allowing implementation of a multi-processor architecture with an embedded communication network. In this paper we present the parallelization and the embedding of a real time image stabilization algorithm on a SoPC platform. Our overall hardware implementation method is based upon meeting algorithm processing power requirements and communication needs with refinement of a generic parallel architecture model. Actual implementation is done by the choice and parameterization of readily available reconfigurable hardware modules and customizable commercially available IPs (Intellectual Property). We present both software and hardware implementation with performance results on a Xilinx SoPC target.  相似文献   

5.
The astonishing development in the field of artificial neural networks (ANN) has brought significant advancement in many application domains, such as pattern recognition, image classification, and computer vision. ANN imitates neuron behaviors and makes a decision or prediction by learning patterns and features from the given data set. To reach higher accuracies, neural networks are getting deeper, and consequently, the computation and storage demands on hardware platforms are steadily increasing. In addition, the massive data communication among neurons makes the interconnection more complex and challenging. To overcome these challenges, ASIC-based DNN accelerators are being designed which usually incorporate customized processing elements, fixed interconnection, and large off-chip memory storage. As a result, DNN computation involves large memory accesses due to frequent load/off-loading data, which significantly increases the energy consumption and latency. Also, the rigid architecture and interconnection among processing elements limit the efficiency of the platform to specific applications. In recent years, Network-on-Chip-based (NoC-based) DNN becomes an emerging design paradigm because the NoC interconnection can help to reduce the off-chip memory accesses while offers better scalability and flexibility. To evaluate the NoC-based DNN in the early design stage, we introduce a cycle-accurate NoC-based DNN simulator, called DNNoC-sim. To support various operations such as convolution and pooling in the modern DNN models, we first propose a DNN flattening technique to convert diverse DNN operation into MAC-like operations. In addition, we propose a DNN slicing method to evaluate the large-scale DNN models on a resource-constraint NoC platform. The evaluation results show a significant reduction in the off-chip memory accesses compared to the state-of-the-art DNN model. We also analyze the performance and discuss the trade-off between different design parameters.  相似文献   

6.
基于多FPGA的NoC多核处理器验证平台设计   总被引:1,自引:0,他引:1  
为了能够灵活地验证和实现自主设计的基于NoC的多核处理器,缩短NoC多核处理器的设计周期,提出了设计集成4片Virtex-6-550T FPGA的NoC多核处理器原型芯片设计/验证平台.分析和评估了NoC多核处理器的规模以及对FPGA硬件资源的需求,在此基础上给出了集成4片FPGA的开发板详细设计方案,并对各主要模块如互联架构、电源、板级时钟分布、接口技术、存储资源等关键设计要点进行阐述.描述了开发板各个主要模块的测试过程和结果,表明了该设计的可行性.  相似文献   

7.
康炜  张翔  王金伟  苗艳超  马捷 《计算机工程》2008,34(10):256-258
机群系统已成为高性能计算的主流体系结构,机群模拟环境是学习机群操作的重要工具。该文提出一种基于龙芯2E多处理器硬件平台的机群模拟方案——虚拟机群系统(VCS)。该系统在共享内存的多处理器上同时运行多个操作系统并使用内存操作模拟网络通信,实现机群环境的模拟。  相似文献   

8.
Contention in performance-critical shared resources affects performance and quality-of-service (QoS) significantly. While this issue has been studied recently in CMP architectures, the same problem exists in SoC architectures where the challenge is even more severe due to the contention of shared resources between programmable cores and fixed-function IP blocks. In the SoC environment, efficient resource sharing and a guarantee of a certain level of QoS are highly desirable. Researchers have proposed different techniques to support QoS, but most existing works focus on only one individual resource. Coordinated management of multiple QoS-aware shared resources remains an open problem. In this paper, we propose a class-of-service based QoS architecture (CoQoS), which can jointly manage three performance-critical resources (cache, NoC, and memory) in a NoC-based SoC platform. We evaluate the interaction between the QoS-aware allocation of shared resources in a trace-driven platform simulator consisting of detailed NoC and cache/memory models. Our simulations show that the class-of-service based approach provides a low-cost flexible solution for SoCs. We show that assigning the same class-of-service to multiple resources is not as effective as tuning the class-of-service of each resource while observing the joint interactions. This demonstrates the importance of overall QoS support and the coordination of QoS-aware shared resources.  相似文献   

9.
Network-on-chip (NoC) is an emerging interconnect infrastructure to address the scalability limitation of conventional shared bus architecture for many-core system-on-chip (MCSoC). Current field-programmable gate arrays (FPGAs) have over million lookup tables, making it possible to prototype a complete NoC-based MCSoC on a single FPGA device. FPGA prototyping allows rapid system verification and optimum design parameters estimation. However, existing NoC-based MCSoC prototypes are usually adopting simple NoC architectural functionality. These NoC prototypes cannot represent a realistic projection of the state-of-the-art application-specific integrated circuit (ASIC) NoCs as these prototypes have limited overall system performance. This paper presents ProNoC, an integrated tool for rapid prototyping and validation of NoC-based MCSoC projects targeting FPGA devices. ProNoC adopts most advanced NoC features such as the support of virtual channel (VC), virtual network, low latency routing and different routing algorithms. Results show that NoC interconnect in ProNoC outperforms CONNECT, the most recent VC based prototype NoC with lower logic cell utilization, higher maximum operating frequency, higher average saturation throughput, and lower average communication latency. Moreover, ProNoC is equipped with graphical user interface to facilitate the development of MCSoC prototypes on FPGA platforms.  相似文献   

10.
Network-on-Chip (NoC) devices have been widely used in multiprocessor systems. In recent years, NoC-based Deep Neural Network (DNN) accelerators have been proposed to connect neural computing devices using NoCs. Such designs dramatically reduce off-chip memory accesses of these platforms. However, the large number of one-to-many packet transfers significantly degrade performance with traditional unicast channels. We propose a multicast mechanism for a NoC-based DNN accelerator called Multicast Mechanism for NoC-based Neural Network accelerator (MMNNN). To do so, we propose a tree-based multicast routing algorithm with excellent scalability and the ability to minimize the number of packets in the network. We also propose a router architecture for single-flit packets. Our proposed router transfers flits to multiple destinations in a single process and has no head-of-line blocking issue, offering higher throughput and lower latency than traditional wormhole router architectures. Simulation results show that our proposed multicast mechanism offers excellent performance in classification latency, average packet latency, and energy consumption.  相似文献   

11.
随着硬件系统和软件技术的发展,对片上网络多处理系统的研究进入了交叉研究状态,但是关于实际软件应用与硬件平台结合的研究尚有些不足.本文讲述了基于实际硬件平台的片上网络系统实现.通过FPGA平台实现了一个通用片上网络系统,并通过多任务映射方法将两个多媒体应用程序映射在片上网络系统中,实现了软件多任务与硬件片上网络多处理系统的合理结合.  相似文献   

12.
贾杰  秦永元  罗绪涛 《计算机应用》2008,28(5):1355-1358
利用蠕虫算法,结合多处理器并行计算的特点,构成了一种基于总线共享和链路口通信,并可自适应扩展的混合并行处理的结构。从可测性设计、实时操作系统角度对该系统进行了设计和讨论,着重介绍了系统的硬件调试、系统测试以及软件加载方法,为高速实时信号处理硬件平台的设计与开发提出了一种可行的解决方案。  相似文献   

13.
交叉开关是交换芯片和芯片组的核心逻辑。该文设计并实现了多处理器芯片组中的交叉开关,其工作频率在FPGA布局布线后可以达到100 MHz。通过实践采样,对延迟和带宽进行测试,提出性能优化的策略,目前该交叉开关已稳定运行于龙芯2E多处理器系统中。  相似文献   

14.
颜坚  毕硕本  汪大  郭忆 《计算机科学》2013,40(2):16-19,57
改进了周培德的Z3-2算法,提出一种在多核架构下计算平面点集凸壳的并行算法。用“颜氏距离”来数字化平面上点与有向线段的位置关系,减少了计算次数和时间。进一步将原算法中比较耗时的两个过程分别在O(1)的时间复杂度内进行迭代分解,即当原问题规模大于给定阂值时,将原问题分解为若干个独立的子问题,若所得子问题的规模仍大于给定阂值,则再对子问题进行分解;所有子问题被加入并行任务组进行并行求解以充分利用多核处理器的并行计算资源。给出了算法的正确性说明,实验结果也表明本算法稳定高效。  相似文献   

15.
Future embedded systems demand multi-processor designs to meet real-time deadlines. The large number of applications in these systems generates an exponential number of use-cases. The key design automation challenges are designing systems for these use-cases and fast exploration of software and hardware implementation alternatives with accurate performance evaluation of these use-cases. These challenges cannot be overcome by current design methodologies which are semi-automated, time consuming and error prone.In this paper, we present a fully automated design flow to generate communication assist (CA) based multi-processor systems (CA-MPSoC). A worst-case performance model of our CA is proposed so that the performance of the CA-based platform can be analyzed before its implementation. The design flow provides performance estimates and timing guarantees for both hard real-time and soft real-time applications, provided the task to processor mappings are given by the user. The flow automatically generates a super-set hardware that can be used in all use-cases of the applications. The software for each of these use-cases is also generated including the configuration of communication architecture and interfacing with application tasks.CA-MPSoC has been implemented on Xilinx FPGAs for evaluation. Further, it is made available on-line for the benefit of the research community and in this paper, it is used for performance analysis of two real life applications, Sobel and JPEG encoder executing concurrently. The CA-based platform generated by our design flow records a maximum error of 3.4% between analyzed and measured periods. Our tool can also merge use-cases to generate a super-set hardware which accelerates the evaluation of these use-cases. In a case study with six applications, the use-case merging results in a speed up of 18 when compared to the case where each use-case is evaluated individually.  相似文献   

16.
复用NoC测试SoC内嵌IP芯核的测试规划研究   总被引:1,自引:0,他引:1       下载免费PDF全文
测试规划是SoC芯片测试中需要解决的一个重要问题。一种复用片上网络测试内嵌IP芯核的测试规划方法被用于限制测试模式下SoC芯片功耗不超出最大芯片功耗范围,消除测试资源共享所引起的冲突,达到减小测试时间的目的。提出了支持测试规划的无拥塞路由算法和测试扫描链优化配置方法。使用VHDL硬件描述语言实现了在FPGA芯片中可综合的二维Mesh片上网络测试平台,用于片上网络性能参数、路由算法以及基于片上网络的SoC芯片测试方法的分析评估。  相似文献   

17.
王亮  李涛  梁刚 《计算机工程与设计》2011,32(10):3251-3253,3265
针对复杂高速网络环境风险评估面临的性能瓶颈以及实时准确性低的问题,提出了多核构架的网络风险评估模型。该模型利用多核网络处理平台资源,提出了高速并行处理网络流量构架和实时入侵检测均衡算法。同时以人工免疫为基础,模拟了人体记忆细胞对外部抗原的免疫过程,采用克隆选择算法增扩抗体浓度,为实时风险评估提供理论依据。理论分析和仿真实验结果表明,该模型能够定量实时评估高速网络环境风险,能够保证处理性能和评估准确性。  相似文献   

18.
一种新型片上网络互连结构的仿真和实现   总被引:2,自引:0,他引:2  
综合性能、硬件实现等方面考虑,提出一种基于片上网络的互连拓扑结构-层次化路由结构MLR(Multi-Layer Router).该结构通过层次化设计减小网络直径,具有良好的对称性和扩展性.网络建模仿真和硬件实现结果显示,在不同网络负载和不同IP核节点数的情况下,MLR与传统结构相比,在处理网络通信时,对于网络丢包率、通信延迟和网络吞吐量等网络性能参数均有最多50%-70%的提升;同时通过共享路由的方式,减少了超过20%的芯片面积和40%以上的动态功耗,有效降低了互连结构的硬件开销  相似文献   

19.
Traditionally, mechanically steered dishes or analog phased array beamforming systems have been used for radio frequency receivers, where strong directivity and high performance were much more important than low-cost requirements. Real-time controlled digital phased array beamforming could not be realized due to the high computational requirements and the implementation costs. Today, digital hardware has become powerful enough to perform the massive number of operations required for real-time digital beamforming. With the continuously decreasing price per transistor, high performance signal processing has become available by using multi-processor architectures. More and more applications are using beamforming to improve the spatial utilization of communication channels, resulting in many dedicated digital architectures for specific applications. By using a reconfigurable architecture, a single hardware platform can be used for different applications with different processing needs.In this article, we show how a reconfigurable multi-processor system-on-chip based architecture can be used for phased array processing, including an advanced tracking mechanism to continuously receive signals with a mobile satellite receiver. An adaptive beamformer for DVB-S satellite reception is presented that uses an Extended Constant Modulus Algorithm to track satellites. The receiver consists of 8 antennas and is mapped on three reconfigurable Montium TP processors. With a scenario based on a phased array antenna mounted on the roof of a car, we show that the adaptive steering algorithm is robust in dynamic scenarios and correctly demodulates the received signal.  相似文献   

20.
陈泽玮  杨茂林  雷航  廖勇  谢玮 《计算机应用》2017,37(5):1270-1275
近年来,随着实时调度研究的快速发展,可调度性实验的复杂性随之增加,然而,由于缺乏标准化、模块化的可调度性实验工具,研究者往往需要耗费大量时间进行实验;此外,由于实验源码不能公开获得,使得实验结果难以验证,实验代码难以重用与扩展。针对可调度性实验重复工作量大、难以验证的问题,提出一种可调度性实验基础框架。该框架通过随机分布产生任务系统集合,并测试其可调度性,基于该框架设计并实现了一个新的可调度性实验开源平台——SET-MRTS。该平台采用模块化架构设计了任务模块、处理器模块、共享资源模块、算法库、配置解析模块以及输出模块。实验结果表明,SET-MRTS支持单/多处理器实时调度算法和实时同步协议分析,能够正确地进行可调度性对比实验,输出直观的实验结果,并且支持算法库的扩充,与算法库中已实现的算法进行对比实验,具有良好的兼容性与可扩展性。SET-MRTS是第一个支持完整实验流程,包括算法实现、参数配置、结果统计、图表绘制等的可调度性实验开源平台。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号