首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 218 毫秒
1.
任务绑定与调度是众核软件综合过程中要研究的关键问题,由于众核平台的多样性与特殊性,任务绑定与调度算法在设计时需要充分考虑任务集与物理平台的特性.本文针对2D-Torus同构众核处理器平台,提出一种基于BAMSE近似算法的任务绑定与调度方案,实现了具有通信开销的非独立任务集到物理内核的绑定,并通过实验探究了改进后的BAMSE算法在2D-Torus众核平台上实现任务绑定与调度的性能.  相似文献   

2.
通过分析Feistel结构和SP网络结构的分组密码算法的特点与实现结构,说明了分组算法适合可重构实现的机理,提出了以众核方式构造高性能密码算法协处理器的架构思想,在适当降低单核处理性能的情况下,因为能最大化利用芯片的电路容量,实践证明这种架构相比起单核、多核架构重构方式实现分组算法具有较为明显的整体性能优势,最后,也指出了众核带来的编程复杂、资源分配困难等问题和众核架构的核数与整体性能的压线性关系等注意事项.  相似文献   

3.
多核处理器使得并行系统的结构日益复杂,已经成为处理器的主流,并发展成为各种通信与媒体应用的主流处理平台.通讯结构是多核系统中的核心技术之一,核间通信的效率是影响多核处理器性能的重要指标.目前有三种主要的通讯架构:总线系统结构、交叉开关网络和片上网络.总线结构设计相对方便、硬件消耗较少、成本较低,交叉开关是适用于构建大容...  相似文献   

4.
随着单芯片上集成处理器内核数量的增加,在支持多核处理器的应用程序方面,核间通信变得更加重要.通过分析多核运行任务特点,根据处理核上运行任务功能的不同,将处理核分成两类:控制核和计算核.根据对核的分类,提出了一种新的核间通信模型,该模型提供了三种不同的通信通道.运用这三条通道,把应用程序的I/O部分从计算核迁移到控制核来...  相似文献   

5.
在众核系统中,并行任务在执行前需要被映射到处理器,这一过程被称为任务映射,任务映射算法对芯片性能影响巨大,所以近年来众核任务映射算法成为研究热点。针对不同的系统架构(如二维和三维众核系统)和优化目标(如通信开销、功耗、温度等)对现有任务映射算法进行综述,并展望了任务映射算法的未来发展趋势。  相似文献   

6.
针对目前通用的达芬奇异构多核处理器,研究了其ARM核、DSP核以及视频协处理器之间的通信与协作机制.在分析多核处理器核间通信原理的基础上,研究了TMS320DM816x系列达芬奇异构多核处理器的核间通信技术,详细阐述片上核间互联结构与核间通信软件的实现.最后基于SysLink底层通信模块设计了多路高清音视频应用系统,对核间通信进行验证.系统可充分发挥各处理核的性能,实现了各核间的高效协作.  相似文献   

7.
随着单芯片上集成处理器内核数量的增加,在支持多核处理器的应用程序方面,核间通信变得更加重要。通过分析多核运行任务特点,根据处理核上运行任务功能的不同,将处理核分成两类:控制核和计算核。根据对核的分类,提出了一种新的核间通信模型,该模型提供了三种不同的通信通道。运用这三条通道,把应用程序的I/O部分从计算核迁移到控制核来提高多核的利用率,实验结果表明该方式有效提高核间协作以及核间通信的效率,提升处理器的利用率。  相似文献   

8.
《现代电子技术》2016,(16):83-87
针对多核处理器的特点提出一种新型的异构多核DSP处理器结构。主处理器为通用处理器,作为控制密集型处理器核用于系统管理和控制;8个DSP作为计算密集型处理器核,用于大信息量融合计算。详细设计8个DSP之间的No C互连结构。首先采用2×4 2D Turos结构进行单个路由节点结构的设计,包括数据包格式、路由和仲裁设计;其次对路由节点进行编码、路由算法设计和确定节点路由方向。该结构具有总线局部通信带宽高的优点,采用No C的易扩展性和No C在各DSP之间通信的并行性使系统规模易于扩展并满足大批量数据传输要求。最后通过仿真实验,验证了该设计的有效性,为后续多核处理器的设计与实现打下坚实的技术基础。  相似文献   

9.
为解决单核处理器时钟频率难以提高、处理器功耗逐渐增加等问题,文中提出了一种新型异构多核处理器的设计方案.该结构中增加了B--Cache结构和C--Core控制器,这种新型异构多核处理器避免了流水线因分支预测失误而flush,提高了整个处理器执行效率.  相似文献   

10.
《电信技术》2007,(7):125-126
多核技术可以提供更高的处理器性能、更有效的电源利用率,并且占用更少的物理空间,因而具有许多单核处理器无法具备的优势,已成为未来处理器的发展方向.值得注意的是,多核处理器性能潜力的发挥与具体应用中的可并行性密切相关.相对其他应用,网络设备的可并行性是比较高的,所以有望成为多核处理器应用率先取得突破的领域之一.  相似文献   

11.
A promising solution to reliability challenges in nano-scale fabrication technologies is self-test and reconfiguration. In this direction, we propose an autonomous test mechanism for online detection of permanent faults in many-core processors. Several hardware test components are incorporated in the many-core architecture. Some of these components distribute software-based self-test routines among the processing cores and make each test routine accessible for a limited amount of time. A processing core that has an idle slot executes the test routine, otherwise it skips it without loss of test continuity. Several components of the proposed test architecture monitor behavior of the processing cores during execution of test routines, detect faulty cores, and make their omission from the system possible. We propose the use of an extended form of Petri NET modeling method to model and analyze the proposed test mechanism and tune our test architecture to preserve quality of test, and at the same time, manage the overall test time. Our experimental results show that test time and hardware overhead of the proposed test mechanism are low and its performance overhead is zero. Furthermore, the proposed test architecture can efficiently scale to a many-core with a large number of processing cores.  相似文献   

12.
Today’s many-core processors are manufactured in inherently unreliable technologies. Massively defective technologies used for production of many-core processors are the direct consequence of the feature size shrinkage in today’s CMOS (complementary metal-oxide-semiconductor) technology. Due to these reliability problems, fault-tolerance of many-core processors becomes one of the major challenges. To reduce the probability of failures of many-core processors various fault tolerance techniques can be applied. The most preferable and promising techniques are the ones that can be easily implemented and have minimal cost while providing high level of processor fault tolerance. One of the promising techniques for detection of faulty cores, and consequently, for performing the first step in providing many-core processor fault tolerance is mutual testing among processor cores. Mutual testing can be performed either in a random manner or according to a deterministic scheduling policy. In the paper we deal with random execution of mutual tests. Effectiveness of such testing can be evaluated through its modeling. In the paper, we have shown how Stochastic Petri Nets can be used for this purpose and have obtained some results that can be useful for developing and implementation of testing procedure in many-core processors.  相似文献   

13.
With an increasing number of processors forming many-core chip multiprocessors (CMP), there exists a need for easily scalable, high-performance and low-power intra-chip communication infrastructure for emerging systems. In CMPs with hundreds of processing elements, 3D integration can be utilized to shorten long wires forming communication links. In this paper, we propose a Clos network-on-chip (CNOC) in conjunction with 3D integration as a viable network topology for many core CMPs. The primary benefit of 3D CNOC is scalability and a clear upper bound on power dissipation. We present the architectural and physical design of 3D CNOC and compare its performance with several other topologies. Comparisons are made among several topologies (fat tree, flattened butterfly, mesh and Clos) showing the power consumption of a 3D CNOC increases only minimally as the network size is scaled from 64 to 512 nodes relative to the other topologies. Furthermore, in a 512-node system, 3D CNOC consumes about 15% less average power than any other topology. We also compare 3D partitioning strategies for these topologies and discuss their effect on wire delay and the number of through-silicon vias.  相似文献   

14.
Many-core processors are good candidates for speeding up video coding because the parallelism of these applications can be exploited more efficiently by the many-core architecture. Lock methods are important for many-core architecture to ensure correct execution of the program and communication between threads on chip. The efficiency of lock method is critical to overall performance of chipped many-core processor. In this paper, we propose two types of hardware locks for on-chip many-core architecture, a centralized lock and a distributed lock. First, we design the architectures of centralized lock and distributed lock to implement the two hardware lock methods. Then, we evaluate the performance of the two hardware locks and a software lock by quantitative evaluation micro-benchmarks on a many-core processor simulator Godson-T. The experimental results show that the locks with dedicated hardware support have higher performance than the software lock, and the distributed hardware lock is more scalable than the centralized hardware lock.  相似文献   

15.
This paper is a brief introduction to a new class of computers, the reconfigurable massively parallel computer. Its most distinguishing feature is the utilization of the reconfigurability of the interconnection network to establish a network topology well mapped to the algorithm communication graph so that higher efficiency can be achieved, and to remove faulty processors from the network so that the system operation can be kept uninterrupted while maintaining the same or slightly degraded efficiency. Several existing reconfigurable single instruction multiple data (SIMD) parallel architectures and their reconfiguration mechanism are described, the effectiveness of algorithm mapping, through reconfiguration, is demonstrated, and fault-tolerant schemes via reconfiguration are discussed  相似文献   

16.
The reconfiguration management scheme changes a logical topology in response to changing traffic patterns in the higher layer of a network or the congestion level on the logical topology. In this paper, we formulate a reconfiguration scheme with a shared buffer‐constrained cost model based on required quality‐of‐service (QoS) constraints, reconfiguration penalty cost, and buffer gain cost through traffic aggregation. The proposed scheme maximizes the derived expected reward‐cost function as well as guarantees the required flow's QoS. Simulation results show that our reconfiguration scheme significantly outperforms the conventional one, while the required physical resources are limited.  相似文献   

17.
《Microelectronics Reliability》2015,55(11):2439-2452
In this paper, the design space exploration problem is concerned with finding the best composition of different Non-Uniform Cache Access (NUCA) specifications in many-core processors. The single-objective and multi-objective exploration problems are intended to meet the desired level of reliability without violating the performance and energy constraints. The main objective is to find the best choice for each cache specification which can minimize the vulnerability of L1 and L2 caches in NUCA architectures. The design space consists of 72 implementations, made up of combinations of different structures in the current NUCA specifications (cache organization, write policy, coherence protocol, inclusiveness, replacement policy, and network topology). Moreover, the effects of design implementations on reliability (as the main objective), performance, cache energy consumption, and interconnection traffic (as the constraints) have been investigated.  相似文献   

18.
尤志强  彭福慧  邝继顺  张大方 《电子学报》2011,39(11):2663-2669
随着集成电路制作工艺的进步,多核与众核系统是片上系统的发展趋势.传统的二维网格(2D-mesh)型拓扑结构通信效率低、功耗高和时延长等缺点变得越来越明显.本文首先分析对比了几种常用拓扑结构在多核与众核情况下的性能,进而采用布线复杂度较低、性能较好的蝴蝶型胖树(BFT)拓扑结构来解决片上系统的设计和测试问题.随后,本文针...  相似文献   

19.
Fault tolerance in VLSI/WSI FFT arrays acquires relevance when defects and run-time faults become significant, due to large dimensions of processors and arrays. Then, both restructuring to overcome end-of-production defects and reconfiguration to overcome run-time faults are required, to achieve the dual purposes of higher yield and higher reliability.Adopting as basic FFT network the two-dimensions array that directly corresponds to the FFT flow graph, the usual structure redundancy techniques tailored for two-dimensions arrays reconfiguration are not well applicable, since the limited locality of this network leads to relevant area increase due to the augmented interconnection structure.In this paper,time redundancy is suggested as a viable alternative for the two-dimensions FFT array; two different solutions are presented, one based oninter-stage reconfiguration, the other one adoptingintra-state reconfiguration, both allowing for survival to multiple faults with limited increase of network complexity and very small hard-core sections. As usual in many time redundancy methods, both approaches result in a processing speed equal to half the processing speed granted by an ideal, fault-free device.Reliability and survival ratios to multiple faults are evaluated for the two cases, taking into account also the area increments necessary for fault tolerance. The reliability evaluations allow for a direct comparison of the two solutions.  相似文献   

20.

Being independent of any fixed equipment, Ad Hoc wireless sensor networks, a kind of acentric and self-organized wireless network, possesses some features such as easiness of deployment, strong invulnerability and flexibility of networking, which leads to a promising application prospect in terms of UAV military and civilian use. This paper proposes a new slot adaptive 4D network clustering algorithm based on UAV autonomous formation and reconfiguration to solve the problem of UAV Ad Hoc network such as networking confusion, poor network reconstruction performance, huge energy consumption and other issues. The algorithm can optimize the topology of UAVs network. We build the network topology and generate clustering network by the slot adaptive 4D network clustering algorithm in Matlab. According to the real combat of UAV, four states are simulated and analyzed. The simulation results validate the feasibility of the slot adaptive 4D network clustering algorithm. The clustering structure generated by the slot adaptive 4D network clustering algorithm is robust and the algorithm is suitable for the UAV group operation.

  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号