首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
With the proliferation of multi-processor core systems, parallel programming imposes a difficult challenge where current solutions are far from being considered efficient. In order to alleviate the difficulty of parallel programming, we propose a scheduler, which is part of a master–slave RTOS, to efficiently manage the parallel programs running on a multi-processor core system. We also propose an efficient protocol that serves as the interface between the operating system and application programs. This interface protocol runs on a dedicated control subnet to cut down the synchronization overhead between the parallel tasks. Such synchronization overhead incurred in these multi-core parallel systems has been recognized as one of the severe limiting factors when pushing up the performance envelope. Experimental results, obtained from the register-transfer level simulations of various benchmark parallel programs, show that the proposed protocol and the control subnet can improve the system efficiency by up to 33.5%. This protocol, as it is designed to be compatible with the minimum subset of the massage-passing interface functions (MPI), scales well with the number of cores.  相似文献   

2.
Spatial locality of task execution is becoming important in future hardware platforms since the number of cores is steadily increasing. The large amount of cores requires an intelligent power manager and the high chip and core density requires increased thermal awareness to avoid thermal hotspots on the chip. This paper presents a lightweight task migration mechanism explicitly for distributed operating systems running on many-core platforms. As the distributed OS runs one scheduler on each core, the tasks are migrated between OS kernels within the same shared memory platform. The benefits, such as performance and energy efficiency, of task migration are achieved by re-locating running tasks on the most appropriate cores and keeping the overhead of executing such a migration sufficiently low. We investigate the overhead of migrating tasks on a distributed OS running both on a bus-based platform and a many-core NoC—with these means of measures, we can predict the task migration overhead and pinpoint the emerging bottlenecks. With the presented task migration mechanism, we intend to improve the dynamism of power and performance characteristics in distributed many-core operating systems.  相似文献   

3.
This paper reports on a parallel implementation of a general 3D multi-block CFD code. The parallelization is achieved by using three strategies. Firstly, it is done on dual-processor PC-clusters where Windows NT systems are running. A multi-thread programming model is adopted for the multi-block code, where one thread corresponds to a block. Shared-memory is used for the exchange of inner-boundaries between neighboring blocks (threads) on the same node, while WinSockets are employed for those on different nodes. Secondly, the parallelization is extended to UNIX operating system. MPI is applied for all the message passing between different processors, including those on the same node. Thirdly, Pthreads (POSIX threads), a standardized application interface for threads, are adopted to take the advantage of the shared-memory feature of the SMP nodes, while MPI is only applied for the message passing between processors on different nodes. In all the strategies, a static load-balancing method is employed for equitable distribution of computational work to specified nodes. The parameters of the present code is studied in detail to facilitate the explanation of the speedup results. Two examples are provided to show the speedup and load balancing of the parallel calculation. Detailed comparison is made to evaluate the efficiency of different strategies.  相似文献   

4.
异构众核架构具有超高的性能功耗比,已成为超级计算机体系结构的重要发展方向.但众核系统更为复杂的并行层次和存储层次,给编程和优化带来了极大的挑战,因此研究面向众核系统的并行编程技术,对于降低国产众核系统并行应用的编程难度、提升并行程序的性能都具有重要的意义.提出统一架构的多模式并行编程模型,包括异构融合的加速运算模型和按同构方式编程的自主运算模型,根据编程模型设计了Parallel C语言,能有效描述国产众核系统的异构并行性,与其它众核系统上MPI+X的使用模式相比,编程和系统优化都具有全局视角,在多级局部性描述、单边消息、兼容已有多核应用等方面具有特色;基于Open64构建了Parallel C编译系统,全面支持加速运算模型和自主运算模型,提出并实现了数据布局与自动DMA、编译指导的线程代理和拓扑位置感知的集合通信等优化.Micro Benchmark和实际应用在神威太湖之光计算机系统上的测试数据表明,Parallel C语言和编译系统具有良好的性能和可扩展性,能够有效支撑大型应用.  相似文献   

5.
FairThreads introduces fair threads which are executed in a cooperative way when linked to a scheduler, and in a preemptive way otherwise. Constructs exist for programming the dynamic linking/unlinking of threads during execution. Users can profit from the cooperative scheduling when threads are linked. For example, data only accessed by the threads linked to the same scheduler does not need to be protected by locks. Users can also profit from the preemptive scheduling provided by the operating system (OS) when threads are unlinked, for example to deal with blocking I/Os. In the cooperative context, for the threads linked to the same scheduler, FairThreads make it possible to use broadcast events. Broadcasting is a powerful, abstract, and modular means of communication. Basically, event broadcasting is made possible by the specific way threads are scheduled by the scheduler to which they are linked (the ‘fair’ strategy). FairThreads give a way to deal with some limitations of the OS. Automata are special threads, coded as state machines, which do not need the allocation of a native thread and which have efficient execution. Automata also give a means to deal with the limited number of native threads available when large numbers of concurrent tasks are needed, for example in simulations. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

6.
《Parallel Computing》2013,39(10):549-566
Embedded SoC designs are embracing the many-core paradigm to deliver the required performance to run an ever-increasing number of applications in parallel. Networks-on-Chip (NoC) are considered as a convenient technology to implement many-core embedded platforms. The complex and non-uniform nature of the traffic flows generated when multiple parallel applications are running simultaneously calls for Quality-of-Service (QoS) extensions in the NoC, but to efficiently exploit similar services it is necessary to expose them to the software in a easy-to-use yet efficient manner. In this paper we present an integrated hardware/software approach for delivering QoS on top of an hybrid OpenMP-MPI parallel programming model. Our experimental results show the effectiveness of our proposal over a broad range of benchmarks and application mappings, demonstrating the ability to manage parallelism under QoS requirements effortlessly from the programming model.  相似文献   

7.
In this work, we provide an experimental comparison between Global-EDF and Partitioned-EDF, considering the run-time overhead of a real-time operating system (RTOS). Recent works have confirmed that OS implementation aspects, such as the choice of scheduling data structures and interrupt handling mechanisms, impact real-time schedulability as much as scheduling theoretic aspects. However, these studies used real-time patches applied into a general-purpose OS. By measuring the run-time overhead of an RTOS designed from scratch, we show how close the schedulability ratio of task sets is to the theoretical hard real-time schedulability tests. Moreover, we show how a well-designed object-oriented RTOS allows code reuse of scheduling components (e.g., thread, scheduling criteria, and schedulers) and easy real-time scheduling extensions. We compare our RTOS to a real-time patch for Linux in terms of the task set schedulability ratio of several generated task sets. In some cases, Global-EDF considering the overhead of the RTOS is superior to Partitioned-EDF considering the overhead of the patched Linux, which clearly shows how different OSs impact hard real-time schedulers.  相似文献   

8.
分布式实时操作系统消息机制的设计与实现   总被引:1,自引:1,他引:0  
随着数字信号处理技术的迅猛发展,针对并行数字信号处理(DSP)应用自主开发了一个满足用户需要的高性能分布式实时操作系统--腾飞分布式实时操作系统(TF-RTOS).消息机制用于线程间的通信,是操作系统中的重要部分.在开发TF-RTOS过程中,从消息命令包、消息队列、消息传递过程和消息原语这4个方面设计并实现了一种直接消息传递的消息机制,该消息机制具有简化线程间通信、增强系统功能、提高系统性能的特点.  相似文献   

9.
PARC++ is a system that supports object-oriented parallel programming in C++. PARC++ provides the user with a set of predefined C++ classes that can easily be used for the construction of parallel C++ programs. With the help of PARC++ objects, the programmer is able to create and start new processes (threads), to synchronize their activities (Blocklock, Monitor) and to manage communication via message passing (Mailbox). PARC++ is written in C++ and currently runs on top of the EMEX operating system on a FORCE machine with 11 processing elements and an EDS (European Declarative System) with 28 processing elements. The paper also contains information about the run-time system model, the implementation and some performance measurements.  相似文献   

10.
多线程并行运算技术在环境与化学计算中的应用   总被引:1,自引:0,他引:1  
描述大气环境污染演化过程仿真和模式识别计算等越来越多迫切地需要使用先进高效的计算机并行处理、交互技术、多线程和可视化计算,而学习、使用和掌握这些专门的计算机技术又常使环境和化学等领域的研究者感到困惑和无从着手。本文从非计算机专业人员的角度出发,介绍多线程的基本概念、编程技术和实现方法,并结合大气环境与化学计算的情况,给出一个较为完整的交互式并行处理可视化多线程计算的实例。  相似文献   

11.
在并行计算的消息传递编程中,由于处理器间的通信将花费大量的时间,因此减少通信开销变得非常关键。基于这一点,注意到网络传输中存在大量小消息的特点,文章采用数据合并的思想,提出了一种减少弦振荡问题并行程序设计通信开销的方案,推导出一个使用性能达到最佳的公式,并对其进行了实验,得出的实验结果表明这种方案能够有效地减少并行计算中的通信开销.而且这种方案也能应用于一些其它的并行计算问题中。  相似文献   

12.
This article focuses on the effect of both process topology and load balancing on various programming models for SMP clusters and iterative algorithms. More specifically, we consider nested loop algorithms with constant flow dependencies, that can be parallelized on SMP clusters with the aid of the tiling transformation. We investigate three parallel programming models, namely a popular message passing monolithic parallel implementation, as well as two hybrid ones, that employ both message passing and multi-threading. We conclude that the selection of an appropriate mapping topology for the mesh of processes has a significant effect on the overall performance, and provide an algorithm for the specification of such an efficient topology according to the iteration space and data dependencies of the algorithm. We also propose static load balancing techniques for the computation distribution between threads, that diminish the disadvantage of the master thread assuming all inter-process communication due to limitations often imposed by the message passing library. Both improvements are implemented as compile-time optimizations and are further experimentally evaluated. An overall comparison of the above parallel programming styles on SMP clusters based on micro-kernel experimental evaluation is further provided, as well.  相似文献   

13.
14.
嵌入式实时操作系统分析   总被引:5,自引:16,他引:5  
张克非 《计算机工程与设计》2005,26(8):2020-2022,2063
实时多任务操作系统(RTOS)是嵌入式应用软件的基础和开发平台。在从功能、性能模型等角度对实时操作系统进行分析的基础上,描述了抢占式任务调度和中断禁止时间与中断延迟事件的实现。对Linux的系统调用功能的分析,是研究Linux内核源码几个很好的入口点之一。  相似文献   

15.
Hua Zhang  Joohan Lee  Ratan Guha 《Software》2008,38(10):1049-1071
Clusters, composed of symmetric multiprocessor (SMP) machines and heterogeneous machines, have become increasingly popular for high‐performance computing. Message‐passing libraries, such as message‐passing interface (MPI) and parallel virtual machine (PVM), are de facto parallel programming libraries for clusters that usually consist of homogeneous and uni‐processor machines. For SMP machines, MPI is combined with multithreading libraries like POSIX Thread and OpenMP to take advantage of the architecture. In addition to existing parallel programming libraries that are in C/C++ and FORTRAN programming languages, the Java programming language presents itself as another alternative with its object‐oriented framework, platform neutral byte code, and ever‐increasing performance. This paper presents a new parallel programming model and a library, VCluster, which implements this model. VCluster is based on migrating virtual threads instead of processes to support clusters of SMP machines more efficiently. The implementation uses thread migration, which can be used in dynamic load balancing. VCluster was developed in pure Java, utilizing the portability of Java to support clusters of heterogeneous machines. Several applications are developed to illustrate the use of this library and compare the usability and performance of VCluster with other approaches. Copyright © 2007 John Wiley & Sons, Ltd.  相似文献   

16.
17.
本文描述了嵌入式系统环境下并发服务器与代理线程的设计模式。该系统基于μC/OS-II操作系统和LWIP协议栈,可以在RTOS系统下建立和回收动态线程,支持网络并发连接,其中代理线程与委托线程设计模式可以极大地节省宝贵内存和避免数据共享错误。该设计模式不仅对于嵌入式系统有效,而且对其他网络设计模式也有一定参考价值。  相似文献   

18.
分布并行系统的并行程序设计环境   总被引:1,自引:0,他引:1  
分布式并行计算机系统中,由于没有共享内存以支持处理机间的数据交换,因而需采用messagepassing的方式实现并行计算中处理机间的数据通讯,并行程序设计环境作为程序员使用并行计算机系统工具,对于并行处理技术以及并行计算机系统的发展与推广应用都有重要的作用,本文将分布基于messagepassing的并行计算机系统中的并行程序设计环境的基本问题,并介绍几种典型的并行程序设计环境。  相似文献   

19.
SMPCluster:如何开发两级并行   总被引:3,自引:1,他引:3       下载免费PDF全文
本文由基础的Linux操作系统入手,考察在一个SMP系统内部的两种不同的并行实现机制:代表共享存储模型的线程模型(和OpenMP模型)和代表消息传递模型的MPI模型。然后,通过分析应当如何结合节点和节点内两级并行得出:从效率和易用性的综合考虑,在LinuxSMP Cluster上应当直接使用利用共享内存进行通信的MPI进行编程。  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号