首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 93 毫秒
1.
多核处理器的性能与系统软件有着密切的联系:操作系统是处理器与应用程序之间的接口,对于充分利用处理器特性和提高应用程序的性能起着极其重要的作用;编译器与处理器体系结构密切相关,一方面要产生处理器支持的二进制代码,另一方面还要结合处理器特性产生高效运行的代码,其性能好坏直接影响着系统的整体性能.为了提高龙芯3A系统的实际性能,从操作系统和编译器着手,结合龙芯3A微结构特征,进行了一系列有效的优化.这些措施包括CC-NUMA多核操作系统的实现、操作系统二级Cache锁机制、操作系统调度共享二级Cache分配、自动向量化编译和支持预取机制的编译等.实验结果表明,在系统软件中增加对处理器特性的支持,能够充分挖掘体系结构的优势,对系统性能有较大的好处.其性能优化技术对于其他处理器的优化也有一定的借鉴价值.  相似文献   

2.
随着现代应用对计算机性能要求的提高,计算机主频不断提升。由于功耗和半导体工艺的限制,仅靠提高单核主频难以继续维持“摩尔定律”,同构多核处理器(Homogeneous Multi-core)应运而生。在同构多核处理器的支持下,一个芯片汇集多个地位对等、结构相同的通用处理器核,以最小的代价满足了提高系统性能、负载均衡、处理器容错的需要。并行体系结构需要结合与之适应的软件实现性能效益的倍增。本文从操作系统层面,针对处理器结构的变化,研究并实现多核任务调度。系统采用混合调度策略,簇间独立调度,簇内统一调度。从调度模式、调度算法、分配算法、调度时机等方面详细分析了多核调度的原理和实现机制。最后通过模拟实验证明功能正确性及算法可调度性。  相似文献   

3.
一种异构多核处理器嵌入式实时操作系统构架设计   总被引:3,自引:1,他引:2  
由于异构多核处理器和多处理器系统及同构多核处理器的构架存在很大差别,应用于多处理器系统的分布式结构以及应用于同构多核系统的主从式结构操作系统不能解决异构多核处理器的实时调度和效率问题。对异构多核处理器的特点及发展趋势进行了研究,提出了一种适用异构多核处理器的多主模式实时操作系统构架。这种构架将通信总线中的多主模式引入多核操作系统构架中,采用对称式结构及组件模式设计操作系统模型,使多核处理器中每个内核都可以作为主核实现对资源、任务的实时管理,提高系统性能,同时可以解决主从式操作系统存在的由于处理器核增多而带来的主内核不能满足系统性能要求的瓶颈问题。通过这种单一构架模型可以进行灵活配置,以适应不同结构及功能要求的处理器内核,降低操作系统开发难度。  相似文献   

4.
本文首先描述了WindowsNT操作系统的调度。然后针对其不足,介绍一种新的处理器调度方法──处理器继承调度。WindowsNT操作系统若能够在今后的版本中采用此框架,将大大提高其系统的灵活性。  相似文献   

5.
借助CMT(芯片多线程)技术,Sun将不断强化其处理器的并行多线程处理能力,让它与Solaris操作系统搭配,为虚拟化技术及应用的推进提供更好的性能支持。  相似文献   

6.
相对于对称多核处理器,非对称多核处理器具有更高的效能,将成为未来并行操作系统中的主流体系结构.对于非对称多核处理器上操作系统的并行任务调度问题,现有的研究假设所有核心频率恒定,缺乏理论分析,也没有考虑算法的效能和通用性.针对该问题,该文首先建立非线性规划模型,分析得出全面考虑并行任务同步特性、核心非对称性以及核心负载的调度原则.然后,基于调度原则提出一个集成调度算法,该算法通过集成线程调度和动态电压频率调整来提高效能,并通过参数调整机制实现了算法的通用性.提出的算法是第一个在非对称多核处理器上结合线程调度和动态电压频率调整的调度算法.实际平台上的实验表明:该算法可适用于多种环境,且效能比其他同类算法高24%~50%.  相似文献   

7.
Xen中VCPU调度算法分析   总被引:1,自引:0,他引:1  
为了降低虚拟化环境中虚拟机的性能开销,提高虚拟化实施效率,在综合考虑虚拟处理器在虚拟机调度过程中的需求的基础上,对Xen中基于信用度的调度算法进行了分析,该算法在处理器密集型应用、多处理器调度和QoS控制方面具有明显的优势.针对目前调度算法在多处理器和新型虚拟机监控器结构下存在的性能问题,提出了自旋锁优先和处理器绑定等优化措施.实例表明,该措施能够提高虚拟处理器的调度效率.  相似文献   

8.
通用处理器以其优异的性能已经越来越广泛地应用到实时控制系统中,但其所采用的体系结构会引起指令执行时间的不确定性,导致基于它的实时系统存在调度抖动问题,调度抖动是实时控制系统的重要性能指标之一,本文讨论了抖动测试和分析技术,并介绍了减少抖动的补偿技术.  相似文献   

9.
超标量处理器和特长指令字(VLIW)处理器两者每个周期都能执行多条指令,各自采用一种不同的指令调度垭达到多指令执行目的。超标题处理器动态地调度指令,VLIW处理器静态地执行被调度指令。本文对几种不同的超标量处理器结构与加利福尼亚大学研制的一种特长指令字处理器结构进行定量性能比较。概述了几种超标量处理器和为利用并行渗滤调度能力而设计的一种VLIW处理器的体系结构,分析了它们的性能。进行这种比较的动机  相似文献   

10.
芯片多线程处理器给现代商业负载带来了高吞吐率和并行化高性能,同时也给操作系统和软件的设计以及性能优化带来难题。为此,设计一种完全可定制的集成负载多线程测试方法,在多种负载配置下对芯片多线程处理器进行性能测试,分析不同调度方式对性能的影响,为操作系统多线程调度提出优化思想。  相似文献   

11.
多核多线程结构线程调度策略研究   总被引:1,自引:0,他引:1  
片上多核多线程(CMT)结构兼具了片上多处理(CMP)和同时多线程(sMT)结构的优势,支持片上所有处于执行状态的线程每周期并行执行,导致核内与核间硬件资源共享和争用问题。该文在阐述CMT结构的资源共享特征并简要介绍SMT线程调度发展状况的基础上,主要围绕以减少资源争用为目标的线程调度策略和资源划分机制等热点,分析其研究现状,论述已有策略在处理这些问题上的优缺点,并探讨了可能的研究发展方向。  相似文献   

12.
针对多核环境中操作系统的线程调度问题,提出一种基于线程流水线的线程调度策略。基于片上多线程处理器,借鉴流水线技术的并行优势,引入线程流水线的概念。通过确定线程特征指标,计算线程流水线的聚合度及对应线程的吻合度,从而完成线程调度,并在此基础上对其进行嵌入式方向的优化。模拟真实环境的实验结果表明,与基于静态优先级的调度策略相比,该策略消耗时间较少。  相似文献   

13.
Current trend of research on multithreading processors is toward the chip multithreading (CMT), which exploits thread level parallelism (TLP) and improves performance of softwares built on traditional threading components, e.g., Pthread. There exist commercially available processors that support simultaneous multithreading (SMT) on multicore processors. But they are basically based on the conventional sequential execution model, and execute multiple threads in parallel under the control of OS that handles interruptions. Moreover, there exist few languages or programming techniques to utilize the multicore processors effectively. We are taking another approach to develop a multithreading processor, which is dedicated to TLP. Our processor, named Fuce, is based on the continuation-based multithreading. A thread is defined as a block of sequentially ordered instructions which are executed without interruption. Every thread execution is triggered only by the event called continuation. This paper first introduces the continuation-based multithread execution model and its processor architecture then gives multithreaded programming techniques and the continuation-based multithreading language system CML. Last, the performance of the Fuce processor is evaluated by means of the clock-level software simulation.  相似文献   

14.
RISC-V指令集架构具有永久开源、指令集精简且高效、处理器微架构模块化、架构扩展性强等特点,在云计算、边缘计算、车载智能计算等领域的应用日渐广泛,其向量扩展部件可以大幅度提高计算机的运算效率,减少不必要的硬件开销。随着处理器运算能力增强和寄存器位数扩展等硬件的进一步发展,向量部件已成为处理器芯片架构中的常用技术,可用来增强处理器性能。向量控制模块是向量部件的核心控制单元,具有时序关系复杂、规范难以描述等特点。本文针对向量控制模块特点,优化设计验证流程,构建高效率验证平台,以功能覆盖率和代码覆盖率为牵引量化验证进度。通过RISC-V向量控制模块验证,有效提升向量控制模块的可靠性,降低流片风险,减轻子系统级验证和系统级验证负担,使之专注于互联、交互响应和接口验证。  相似文献   

15.
Continuous improvements in semiconductor fabrication density are supporting new classes of System-on-a-Chip (SoC) architectures that combine extensive processing logic/processor with high-density memory. Such architectures are generally called Processor-in-Memory (PIM) or Intelligent Memory (I-RAM) and can support high-performance computing by reducing the performance gap between the processor and the memory. The PIM architecture combines various processors in a single system. These processors are characterized by their computation and memory-access capabilities. Therefore, a novel strategy must be developed to identify their capabilities and dispatch the most appropriate jobs to them in order to exploit them fully. Accordingly, this study presents an automatic source-to-source parallelizing system, called statement-analysis-grouping-evaluation (SAGE), to exploit the advantages of PIM architectures. Unlike conventional iteration-based parallelizing systems, SAGE adopts statement-based analyzing approaches. This study addresses the configuration of a PIM architecture with one host processor (i.e., the main processor in state-of-the-art computer systems) and one memory processor (i.e., the computing logic integrated with the memory). The strategy of the SAGE system, in which the original program is decomposed into blocks and a feasible execution schedule is produced for the host and memory processors, is investigated as well. The experimental results for real benchmarks are also discussed.  相似文献   

16.
Throughput computing is based on chip multithreading processor design technology. In CMT technology, maximizing the amount of work accomplished per unit of time or other relevant resource, rather than minimizing the time needed to complete a given task or set of tasks, defines performance. By CMT standards, the best processor accomplishes the most work per second of time, per watt of expended power, per square millimeter of die area, and so on (that is, it operates most efficiently). The processor described is a member of Sun's first generation of CMT processors designed to efficiently execute network-facing workloads. Network-facing systems primarily service network clients and are often grouped together under die label "Web servers". The processor's dual-thread execution capability, compact die size, and minimal power consumption combine to produce high throughput performance per watt, per transistor, and per square millimeter of die area. Given the short design cycle Sun needed to create the processor, the result is a compelling early proof of the value of throughput computing.  相似文献   

17.
Computation in the Context of Transport Triggered Architectures   总被引:1,自引:0,他引:1  
Processors used in embedded systems have specific requirements which are not always met by off-the-shelf processors. A templated processor architecture, which can easily be tuned towards a certain application (domain) offers a solution. The transport triggered architecture (TTA) template presented in this paper has a number of properties that make it very suitable for embedded system design. Key to its success is to give the compiler more control; it has to schedule all data transports within the processor. This paper highlights two important TTA-related issues. First a new code generation method for TTAs is discussed; it integrates scheduling and register allocation, thereby avoiding the notorious phase ordering problem between these two steps. Secondly, we discuss how to tune the instruction repertoire for an embedded processor. A tool is described which automatically detects frequent patterns of operations. These patterns can then be implemented on special function units.  相似文献   

18.
在嵌入式应用中,为了满足小面积低功耗的设计需求,设计了一种支持RISC-V指令集架构的微处理器,系统采用2级流水结构,实现了RV32IMAC指令集。处理器采用AHB总线作为片上互连总线,可方便调用外部IP核进行功能拓展。在VCS环境下验证了该微处理器的逻辑功能,仿真结果表明该微处理器能够正常稳定运行。在面积、功耗和性能等方面与蜂鸟E203处理器以及ARM Cortex-M系列处理器进行了对比,该设计比蜂鸟E203处理器面积小了6%,功耗和性能上与Cortex-M0处理器相当。分析结果表明该处理器较适合在小面积、低功耗的嵌入式应用领域进行开发。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号