首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
In the standard kernel organization on a bus-based multiprocessor, all processors share the code and data of the operating system; explicit synchronization is used to control access to kernel data structures. Distributed-memory multicomputers use an alternative approach, in which each instance of the kernel performs local operations directly and uses remote invocation to perform remote operations. Either approach to interkernel communication can be used in a large-scale shared-memory multiprocessor. In the paper we discuss the issues and architectural features that must be considered when choosing between remote memory access and remote invocation. We focus in particular on experience with the Psyche multiprocessor operating system on the BBN Butterfly Plus. We find that the Butterfly architecture is biased towards the use of remote invocation for kernel operations that perform a significant number of memory references, and that current architectural trends are likely to increase this bias in future machines. This conclusion suggests that straightforward parallelization of existing kernels (e.g. by using semaphores to protect shared data) is unlikely in the future to yield acceptable performance. We note, however, that remote memory access is useful for small, frequently-executed operations, and is likely to remain so.  相似文献   

3.
Many interesting applications of concurrent logic languages require the ability to initiate, monitor, and control computations. Two approaches to the implementation of these control functions have been proposed: one based on kernel support and the other on program transformation. The efficiency of the two approaches has not previously been compared. This paper presents an implementation scheme based on kernel support, applicable to both uniprocessor and multiprocessor architectures. Experimental studies on a uniprocessor show the scheme to be more efficient than equivalent program transformations. Analysis of a multiprocessor implementation shows that the scheme imposes little communication and computation overhead.  相似文献   

4.
Predictability is of paramount concern for hard real-time systems. In one approach to predictability, every aspect of a real-time system and every primitive provided by the underlying operating system must be bounded and predictable in order to achieve overall predictability. In this paper, we describe several concurrency control synchronization mechanisms developed for a next generation multiprocessor real-time kernel, the Spring Kernel. The important features of these mechanisms include semaphore support for mutual exclusion with linear waiting and bounded resource usage, termed strong semaphores. Three, more efficient, strong semaphore solutions are proposed in this paper. Two of them are based on the main theorem of the paper, the Deferred Bus theorem. These two solutions can either be implemented in hardware or software. The third solution, a pure software solution, is an extension to the existing Burns' algorithm. A performance comparison and a complexity analysis in terms of time, space and bus traffic are presented.This work is part of the Spring Project directed by Prof. Krithi Ramamritham and Prof. John A. Stankovic at the University of Massachusetts and is funded in part by the Office of Naval Research under contract N00014-85-K-0398 and by the National Science Foundation under grant DCR-8500332.  相似文献   

5.
The Data Locality of Work Stealing   总被引:1,自引:0,他引:1  
This paper studies the data locality of the work-stealing scheduling algorithm on hardware-controlled shared-memory machines, where movement of data to and from the cache is solely controlled by the hardware. We present lower and upper bounds on the number of cache misses when using work stealing, and introduce a locality-guided work-stealing algorithm and its experimental validation. {As a lower bound, we show that a work-stealing application that exhibits good data locality on a uniprocessor may exhibit poor data locality on a multiprocessor. In particular, we show a family of multithreaded computations G n whose members perform Θ(n) operations (work) and incur a constant number of cache misses on a uniprocessor, while even on two processors the total number of cache misses soars to Ω(n) . On the other hand, we show a tight upper bound on the number of cache misses that nested-parallel computations, a large, important class of computations, incur due to multiprocessing. In particular, for nested-parallel computations, we show that on P processors a multiprocessor execution incurs an expected
more misses than the uniprocessor execution. Here m is the execution time of an instruction incurring a cache miss, s is the steal time, C is the size of cache, and T fty is the number of nodes on the longest chain of dependencies. Based on this we give strong execution time bounds for nested-parallel computations using work stealing.} For the second part of our results, we present a locality-guided work-stealing algorithm that improves the data locality of multithreaded computations by allowing a thread to have an affinity for a processor. Our initial experiments on iterative data-parallel applications show that the algorithm matches the performance of static-partitioning under traditional work loads but improves the performance up to 50% over static partitioning under multiprogrammed work loads. Furthermore, locality-guided work stealing improves the performance of work stealing up to 80%.  相似文献   

6.
《Parallel Computing》1997,22(13):1837-1851
The PAPS (Performance Analysis of Parallel Systems) toolset is a testbed for the model based performance prediction of message passing parallel applications executed on private memory multiprocessor computer systems. PAPS allows to describe the execution behavior of the computer hardware and operating system software resources up to a very detailed level. This enables very accurate performance prediction of parallel applications even in the case of substantial performance degradation due to contention for shared resources. In this paper the fundamental design principles and implementation methodologies for the development of the PAPS toolset are presented and the PAPS parallel system specification formalisms are described. A simplified performance study of a parallel Gaussian elimination application on the nCUBE 2 multiprocessor system is used to demonstrate the usage of the tool.  相似文献   

7.
Dynamic scheduling of hard real-time tasks and real-time threads   总被引:1,自引:0,他引:1  
The authors investigate the dynamic scheduling of tasks with well-defined timing constraints. They present a dynamic uniprocessor scheduling algorithm with an O(n log n) worst-case complexity. The preemptive scheduling performed by the algorithm is shown to be of higher efficiency than that of other known algorithms. Furthermore, tasks may be related by precedence constraints, and they may have arbitrary deadlines and start times (which need not equal their arrival times). An experimental evaluation of the algorithm compares its average case behavior to the worst case. An analytic model used for explanation of the experimental results is validated with actual system measurements. The dynamic scheduling algorithm is the basis of a real-time multiprocessor operating system kernel developed in conjunction with this research. Specifically, this algorithm is used at the lowest, threads-based layer of the kernel whenever threads are created  相似文献   

8.
The complexity of characterizing both parallel hardware and software makes it very difficult to explain and predict the performances of parallel programs for real industrial CFD applications. A performance model based on a generalized Amdahl's formulation has been developed and applied to a flow solver. The present formulation allows us to explain the behavior of a typical CFD explicit multiblock solver when the program is run on a multiprocessor distributed-memory system. Using this approach, it is possible to gain an insight on the performance limitations of this class of parallel solvers, by considering the impact of larger and larger number of processors on fixed-scaled problems.  相似文献   

9.
Implications of classical scheduling results for real-time systems   总被引:2,自引:0,他引:2  
Knowledge of complexity, fundamental limits and performance bounds-well known for many scheduling problems-helps real time designers choose a good design and algorithm and avoid poor ones. The scheduling problem has so many dimensions that it has no accepted taxonomy. We divide scheduling theory between uniprocessor and multiprocessor results. In the uniprocessor section, we begin with independent tasks and then consider shared resources and overload. In the multiprocessor section, we divide the work between static and dynamic algorithms  相似文献   

10.
邓良  曾庆凯 《软件学报》2016,27(5):1309-1324
在现代操作系统中,内核运行在最高特权层,管理底层硬件并向上层应用程序提供系统服务,因而安全敏感的应用程序很容易受到来自底层不可信内核的攻击.提出了一种在不可信操作系统内核中保护应用程序的方法AppFort.针对现有方法的高开销问题,AppFort结合x86硬件机制(操作数地址长度)、内核代码完整性保护和内核控制流完整性保护,对不可信内核的硬件操作和软件行为进行截获和验证,从而高效地保证应用程序的内存、控制流和文件I/O安全.实验结果表明:AppFort的开销极小,与现有工作相比明显提高了性能.  相似文献   

11.
An evolution is happening in the way that operating systems support the real-time requirements of their applications. The need to run real-time applications such as multimedia in the same environment as complex non-real-time servers and applications has motivated much interest in restructuring existing operating systems. Many issues related to thread scheduling and synchronization have been investigated. However, little consideration has been given to the flexibility and modularity required in the support of application-level scheduling needs, although it is well known that application requirements are diverse. In this paper, we describe a real-time scheduling abstraction which provides modularity and flexibility to the scheduling support of operating systems. Our design has been implemented using the Mach 3.0 kernel and a locally developed multiprocessor kernel (the r-kernel) as development platforms. © 1997 John Wiley & Sons, Ltd.  相似文献   

12.
Islam  N. 《Computer》1997,30(2):69-78
Today's applications have exploded in their diversity, but most operating systems are still general-purpose and inefficient. One of the benefits of using an OO approach is the ability to modify very small details of an operating system, which makes it easy to tailor the system to the application. My experience indicates that optimizing an operating system for the general case can result in mediocre performance for specialized applications, especially parallel applications. Therefore, I envision a customizable operating system built from components that will allow an optimal match between application behavior and hardware architecture. I propose an object-oriented operating system in which design frameworks support alternative implementations of key systems software services  相似文献   

13.
In the early 1990s, researchers at Sandia National Laboratories and the University of New Mexico began development of customized system software for massively parallel ‘capability’ computing platforms. These lightweight kernels have proven to be essential for delivering the full power of the underlying hardware to applications. This claim is underscored by the success of several supercomputers, including the Intel Paragon, Intel Accelerated Strategic Computing Initiative Red, and the Cray XT series of systems, each having established a new standard for high‐performance computing upon introduction. In this paper, we describe our approach to lightweight compute node kernel design and discuss the design principles that have guided several generations of implementation and deployment. A broad strategy of operating system specialization has led to a focus on user‐level resource management, deterministic behavior, and scalable system services. The relative importance of each of these areas has changed over the years in response to changes in applications and hardware and system architecture. We detail our approach and the associated principles, describe how our application of these principles has changed over time, and provide design and performance comparisons to contemporaneous supercomputing operating systems. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

14.
15.
Elmwood is an object-oriented, multiprocessor operating system designed and implemented during a graduate seminar. It consists of a minimal kernel and a collection of user-implemented services. The kernel provides two major abstractions: objects, which consist of code and data, and processes, which represent asynchronous activity. Objects, like programs, are passive. To operate on an abstraction or to request a service, processes invoke an entry procedure defined by the corresponding object. Objects implement their own protection and synchronization policies using minimal kernel mechanisms. We describe the Elmwood kernel interface, an implementation on the BBN Butterfly parallel processor, and our experiences in developing a multiprocessor operating system under rigid time constraints. These experiences illustrate several general lessons regarding kernel design and trade-offs for implementation expedience.  相似文献   

16.
17.
Marr  D.T. Natarajan  S. Thakkar  S. Zucker  R. 《Computer》1996,29(11):47-53
In the past, multiprocessor systems have taken a year longer than uniprocessor systems to introduce because of the need to develop and validate the additional functionality required for multiprocessor systems. Manufacturers, however, don't want to wait this long to release products using the latest multiprocessor systems. Our challenge in designing Intel's newest microprocessor, the Pentium Pro processor, was to eliminate the lag time. We wanted to accomplish this by introducing systems where the multiprocessor functionality was already an integral part of the main processor and the chipset. Specifically, we wanted to introduce the uniprocessor and the multiprocessor systems at the same time. We thus had to make sure even before first silicon that the multiprocessor system would work. In the past, multiprocessor system validation has generally taken place after first silicon, because the external logic has usually been developed after the processor functionality integrated into the processor and chipset. Intel developed an extensive test methodology for functional and performance validation  相似文献   

18.
The development of computing systems with large numbers of processors has been motivated primarily by the need to solve large, complex problems more quickly than is possible with uniprocessor systems. Traditionally, multiprocessor systems have been uniprogrammed, i.e., dedicated to the execution of a single set of related processes, since this approach provides the fastest response for an individual program once it begins execution. However, if the goal of a multiprocessor system is to minimize average response time or to maximize throughput, then multiprogramming must be considered. In this paper, a model of a simple multiprocessor system with a two-program workload is reviewed; the model is then applied to an Intel iPSC/2 hypercube multiprocessor with a workload consisting of parallel wavefront algorithms for solving triangular systems of linear equations. Throughputs predicted by the model are compared with throughputs obtained experimentally from an actual system. The results provide validation for the model and indicate that significant performance improvements for multiprocessor systems are possible through multiprogramming.  相似文献   

19.
本文主要阐述UNIX源程序从单处理机到共享存储器多处理机系统的移植技术.主要从三个方面介绍:基本移植技术,提高效率技术以及增加新概念.我们在国外研究成果和经验的基础上提出我们对该问题的看法.  相似文献   

20.
Hierarchical scheduling has been proposed as a scheduling technique to achieve aggregate resource partitioning among related groups of threads and applications in uniprocessor and packet scheduling environments. Existing hierarchical schedulers are not easily extensible to multiprocessor environments because 1) they do not incorporate the inherent parallelism of a multiprocessor system while resource partitioning and 2) they can result in unbounded unfairness or starvation if applied to a multiprocessor system in a naive manner. In this paper, we present hierarchical multiprocessor scheduling (H-SMP), a novel hierarchical CPU scheduling algorithm designed for a symmetric multiprocessor (SMP) platform. The novelty of this algorithm lies in its combination of space and time multiplexing to achieve the desired bandwidth partition among the nodes of the hierarchical scheduling tree. This algorithm is also characterized by its ability to incorporate existing proportional-share algorithms as auxiliary schedulers to achieve efficient hierarchical CPU partitioning. In addition, we present a generalized weight feasibility constraint that specifies the limit on the achievable CPU bandwidth partitioning in a multiprocessor hierarchical framework and propose a hierarchical weight readjustment algorithm designed to transparently satisfy this feasibility constraint. We evaluate the properties of H-SMP using hierarchical surplus fair scheduling (H-SFS), an instantiation of H-SMP that employs surplus fair scheduling (SFS) as an auxiliary algorithm. This evaluation is carried out through a simulation study that shows that H-SFS provides better fairness properties in multiprocessor environments as compared to existing algorithms and their naive extensions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号