首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 125 毫秒
1.
《电子技术应用》2016,(1):19-21
多核同时多线程处理器(SMT_PAAG)是用于图形、图像及数字信号处理的一种多核处理器。基于这种处理器提出了一种硬件线程调度器,该调度器采用同时多线程技术,最多可同时执行四个线程,支持八个线程阻塞模式下的快速上下文切换。这样避免了因阻塞带来的等待问题,能够有效提高处理器的工作效率和资源利用率。通过在处理器上运行图形处理算法进行性能评测。结果表明,SMT-PAAG处理器通过挖掘指令级并行和线程级并行,将处理器的性能提高了69.25%。  相似文献   

2.
同时多线程(SMT,Simultaneous Multithreading)处理器通过每个周期同时运行来自多个线程的指令来提高性能.同时执行的线程在共享资源的同时也在竞争资源.如果一个发生L2 cache失效的线程长时间占用共享资源,那么会导致其他线程运行速度减慢,甚至会因为缺少资源而停顿下来,从而降低了SMT处理器的总体性能.为了减小L2 cache失效给SMT处理器性能带来的负面影响,许多取指策略被提了出来,DWarn就是其中比较有效的一种.本文在DWarn的基础上进行改进,提出了DWarn+取指策略.模拟结果表明,当同时运行的线程数目不超过4时,无论使用IPC作为度量标准还是使用Hmean作为度量标准,DWarn+都要明显优于DWarn;当同时运行的线程数目大于4时,DWarn+相对于DWarn的提高主要体现在存储器访问密集的工作负载上,而对于所有类型工作负载,DWarn+相对于DWarn的平均提高非常有限.  相似文献   

3.
任建  安虹  路放  梁博 《计算机科学》2006,33(3):239-243
同时多线程处理器(SMT)每个周期能够从多个线程中发射指令执行,从而大大地提高了超标量微处理器的指令吞吐量,但多个线程的同时执行也带来了许多硬件资源的共享冲突问题.其中,多个线程共享分支预测硬件的方案会对分支预测精度产生较大的影响.研究SMT处理器中分支处理方案对于处理器整体性能的影响,对于指导SMT处理器的设计是十分重要的.本文利用SMT处理器模拟器,针对各线程运行独立应用的SMT结构实验评估了几种著名的分支预测方案;给出了在单线程和多线程情况下,分支预测方案对分支预测精度和处理器整体性能的影响的分析;总结出在这样的SMT结构中,各线程拥有独立的预测器是一种较好的选择,并且由于各独立预测器可以采用小而简单的结构,所以不会带来太多的硬件开销.  相似文献   

4.
同时多线程处理器允许多个线程同时执行,一方面提高了处理器的性能,另一方面也为通过线程冗余执行来容错提供了支持.冗余多线程结构将线程复制成两份,二者独立执行,并比较结果,从而实现检错或者容错.冗余多线程结构主要采用ICOUNT调度策略来解决线程间资源共享问题.然而这种策略有可能造成"饥饿"现象,并降低处理器吞吐率.提出一...  相似文献   

5.
随着线延迟的逐渐增加,指令调度技术作为一种可以有效减少处理器片上通信的技术日益重要。本文介绍一种分片式处理器结构上基于加权路径的指令调度算法,该算法利用已经放置好的指令——锚指令信息精确计算路径长度,再用指令所在路径长度作为权值对指令进行调度。实验结果表明,本算法实现的调度器IPC比已有的两种TRIPS调度算法的IPC分别提高了21%和3%。  相似文献   

6.
刘清华  吴悦  杨洪斌 《计算机工程与设计》2011,32(6):2116-2118,2123
为提高多核处理器系统的调度效率,充分发挥多核处理器的性能,提出了一种新的线程调度算法。该方法利用遗传算法的快速随机全局搜索能力,生成蚁群算法所需的信息素分布,利用蚁群算法的正反馈性,并将其应用到CMP的线程调度中,以提高线程调度的效率。通过两种算法的结合,弥补了遗传算法随着求解范围增大而效率降低,蚁群算法需要信息素浓度的增加才能提高效率的不足,更好地发挥它们的优势,提升求解速度。实验结果表明,该算法能够很好地降低任务的执行时间,充分发挥多核处理器系统的优势。  相似文献   

7.
针对具有独立DVFS的多核处理器系统,提出了一种K线程低能耗模型的并行任务调度优化算法(Tasks Optimization based on Energy-Effectiveness Model,TO-EEM)。与传统的并行任务节能调度相比,该算法的主要目标是不仅通过降低处理器频率来减少处理器瞬时功耗,而且结合并行任务间的同步互斥所造成的线程阻塞情况,合理分配线程资源来减少线程同步时间,优化并行性能;保证任务在一定的并行加速比性能前提下,提高资源利用率,减少能耗,达到程序能耗和性能之间的折衷。文中进行了大量模拟实验,结果证明提出的任务优化模型算法节能效果明显,能有效降低处理器的功耗,并始终保持线性加速比。  相似文献   

8.
针对多核环境中操作系统的线程调度问题,提出一种基于线程流水线的线程调度策略。基于片上多线程处理器,借鉴流水线技术的并行优势,引入线程流水线的概念。通过确定线程特征指标,计算线程流水线的聚合度及对应线程的吻合度,从而完成线程调度,并在此基础上对其进行嵌入式方向的优化。模拟真实环境的实验结果表明,与基于静态优先级的调度策略相比,该策略消耗时间较少。  相似文献   

9.
异构多核处理器通常由高性能的大核和低能耗的小核组成,在其上进行合理的线程调度可以有效地提高资源利用率,节省能耗。之前论文提出的大小核上的公平性调度并没有考虑核上有不同频率/电压状态的情况,而现在支持DVFS调节的处理器越来越普遍,因此很有必要将线程间公平度的计算进行扩展和改进。提出在每个核有若干种不同的DVFS状态时异构多核处理器上线程公平度的计算方法,对已有的性能预测模型进行改进,采用自适应算法调整模型中的系数,并在此基础上提出了一种调度策略,维持各线程之间的公平度和处理器功率满足提前设定的阈值,同时选取能效最优化的配置,实现减小应用运行能耗的目的。实验结果表明,与所提出的调度策略相比,采用static、DVFS-only、swap-only三种调度方法时,在总的运行时间几乎相同的情况下,平均要多产生20%以上能耗,对于有些应用甚至达到了50%。  相似文献   

10.
基于粒子群算法的多核处理器线程调度研究   总被引:1,自引:1,他引:0  
为有效解决多核处理器的线程调度问题,提出了一种基于粒子群算法框架上的线程调度算法.该算法依据设计的调度模型,在线程DAG图上通过复制不在同一处理器上且存在相关性的线程,生成相互独立的子DAG图,并采用改进的粒子群优化算法对其进行合理调度,由此提高线程调度效率.仿真实现了该算法,并通过实验数据验证了该算法的优越性.  相似文献   

11.
Multiple-context processors provide register resources that allow rapid context switching between several threads as a means of tolerating long communication and synchronization latencies. When scheduling threads on such a processor, we must first decide which threads should have their state loaded into the multiple contexts, and second, which loaded thread is to execute instructions at any given time. In this paper we show that both decisions are important, and that incorrect choices can lead to serious performance degradation. We propose thread prioritization as a means of guiding both levels of scheduling. Each thread has a priority that can change dynamically, and that the scheduler uses to allocate as many computation resources as possible to critical threads. We briefly describe its implementation, and we show simulation performance results for a number of simple benchmarks in which synchronization performance is critical.  相似文献   

12.
General-purpose graphics processing unit (GPGPU) plays an important role in massive parallel computing nowadays. A GPGPU core typically holds thousands of threads, where hardware threads are organized into warps. With the single instruction multiple thread (SIMT) pipeline, GPGPU can achieve high performance. But threads taking different branches in the same warp violate SIMD style and cause branch divergence. To support this, a hardware stack is used to sequentially execute all branches. Hence branch divergence leads to performance degradation. This article represents the PDOM (post dominator) stack as a binary tree, and each leaf corresponds to a branch target. We propose a new PDOM stack called PDOM-ASI, which can schedule all the tree leaves. The new stack can hide more long operation latencies with more schedulable warps without the problem of warp over-subdivision. Besides, a multi-level warp scheduling policy is proposed, which lets part of the warps run ahead and creates more opportunities to hide the latencies. The simulation results show that our policies achieve 10.5% performance improvements over baseline policies with only 1.33% hardware area overhead.  相似文献   

13.
Simultaneous Multi-Threading (SMT) provides a technique to improve resource utilization ability by sharing key data-path components among multiple independent threads. When critical resources are shared by multiple threads, effective use of these resources proves to be the most important factor in fully exploiting the system potential. Transient behaviors of various threads in terms of their execution parallelism can easily affect utilization efficiency of these shared resources. To commit more resources to threads that are more active allows for better resource utilization and thus higher throughput. In this paper, we propose a real-time dynamic scheduler for the SMT which dispatches instructions from threads based on thread-activeness information gathered in real time and dynamically adjusts dispatching priorities among threads accordingly. An extensive simulation shows a significant gain in system throughput by this technique. The performance of the proposed dispatching technique is evaluated on different workload mixtures created based on instruction-level parallelism available in each thread. An average of 6.5% and maximum of 15% performance improvement is observed with the proposed dispatching technique.  相似文献   

14.
在使用浸入边界-格子玻尔兹曼方法(IB-LBM)求解流场时,为了得出比较精确的结果,往往需要规模较大、较密集的流场网格,这就会造成模拟过程时间长的问题。为了提高模拟的效率,利用IB-LBM局部计算的特点,结合OpenMP中三种不同的任务调度方式,给出了IB-LBM的并行优化方法。在并行优化中混合使用三种任务调度方式,以弥补单一任务调度造成的负载不均衡问题;将IB-LBM进行结构化分解,测试每一结构部分的最优调度方式,根据实验结果选择最优的调度组合方式,而在不同线程数下,最优的组合方式是不同的。优化结果通过并行加速比来检验,可以得出:在线程数较少的情况下,加速比趋近于理想状态;在线程数较多的情况下,虽然线程开辟和销毁的额外时间消耗对性能的优化产生了影响,模型的并行性能仍有了很大的提升。流场的模拟结果显示,在进行并行优化后, IB-LBM对流固耦合问题模拟的准确性并没有受到影响。  相似文献   

15.
Memory access scheduling is an effective manner to improve performance of Chip Multi-Processors (CMPs) by taking advantage of the timing characteristics of a DRAM. A memory access scheduler can subdivide resources utilization (banks and rows) to increase throughput by accessing different DRAM banks in parallel. However, different threads running on different cores may exhibit different performance. One thread may experience starvation while the others are serviced normally. Therefore, designing a scheduler which reduces the unfairness in the DRAM system, while also improving system throughput on a variety of workloads and systems, is necessary. In this paper, a distributed fair DRAM scheduling for two-dimensional mesh network-on-chips (NoCs), called DFDS, is presented. The key design points in DFDS are: (i) assessing the total waiting cycles of a memory request in NoC and considering it as a metric in arbitration. For this purpose waiting cycles of a memory request are put in an additional flit in a packet and are updated while traversing the NoC, and (ii) proposing a semi-dynamic virtual channel allocation to provide in-order memory requests to memory controllers (MCs). Consequently, we use a simple scheduling algorithm in MCs, instead of complex algorithms. To validate our approach, we apply synthetic and real workload from Parsec benchmark suite. The results show effectiveness of our approach, as we reduce the waiting time of memory requests by up to 15%.  相似文献   

16.
When the critical path of a communication session between end points includes the actions of operating system kernels, there are attendant overheads. Along with other factors, such as functionality and flexibility, such overheads motivate and favor the implementation of communication protocols in user space. When implemented with threads, such protocols may hold the key to optimal communication performance and functionality. Based on implementations of reliable user‐space protocols supported by a threads framework, we focus on our experiences with internal threads' scheduling techniques and their potential impact on performance. We present scheduling strategies that enable threads to do both application‐level and communication‐related processing. With experiments performed on a Sun SPARC‐5 LAN environment, we show how different scheduling strategies yield different levels of application‐processing efficiency, communication latency and packet‐loss. This work forms part of a larger study on the implementation of multiple thread‐based protocols in a single address space, and the benefits of coupling protocols with applications. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

17.
Simultaneous Multi-Threading (SMT) has been a very popular design in improving resource utilization by sharing key datapath components among multiple independent threads. However, allowing any of the threads to overwhelm these shared resources not only leads to unfair thread processing but may also result in severely degraded overall performance. How to prevent idling threads from clogging the critical resources in the pipeline becomes a must in sustaining desired system performance. In this paper, we show that, if one can manage to recall instructions of idling threads from the shared Issue Queue (IQ), the system performance is easily enhanced by a significant margin, with up to 20% for some benchmark mixes. An even more noteworthy feature about this technique is that the ensuing hardware overhead is very insignificant and it can also be coupled with other advanced techniques employed in other stages of the SMT pipeline for potentially additive benefits.  相似文献   

18.
Although nonuniform memory access architecture provides better scalability for multicore systems, cores accessing memory on remote nodes take longer than those accessing on local nodes. Remote memory access accompanied by contention for internode interconnection degrades performance. Properly mapping threads to cores and data accessed to their nodes can substantially improve performance and energy efficiency. However, an operating system kernel's load-balancing activity may migrate threads across nodes, which thus messes up the thread mapping. Besides, subsequent data mapping behavior pays for the cost of page migration to reduce remote memory access. Once unsuitable threads are migrated, it is detrimental to system performance. This paper focuses on improving the kernel's internode load balancing on nonuniform memory access systems. We develop a memory-aware kernel mechanism and policies to reduce remote memory access incurred by internode thread migration. The Linux kernel's load balancing mechanism is modified to incorporate selection policies in the internode thread migration, and the kernel is modified to track the amount of memory used by each thread on each node. With this information, well-designed policies can then choose suitable threads for internode migration. The purpose is to avoid migrating a thread that might incur relatively more remote memory access and page migration. The experimental results show that with our mechanism and the proposed selection policies, the system performance is substantially increased when compared with the unmodified Linux kernel that does not consider memory usage and always migrates the first-fit thread in the runqueue that can be migrated to the target central processing unit.  相似文献   

19.
多核多线程结构线程调度策略研究   总被引:1,自引:0,他引:1  
片上多核多线程(CMT)结构兼具了片上多处理(CMP)和同时多线程(sMT)结构的优势,支持片上所有处于执行状态的线程每周期并行执行,导致核内与核间硬件资源共享和争用问题。该文在阐述CMT结构的资源共享特征并简要介绍SMT线程调度发展状况的基础上,主要围绕以减少资源争用为目标的线程调度策略和资源划分机制等热点,分析其研究现状,论述已有策略在处理这些问题上的优缺点,并探讨了可能的研究发展方向。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号