首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
Parallel execution of application programs on a multiprocessor system may lead to performance degradation if the workload of a parallel region is not large enough to amortize the overheads associated with the parallel execution. Furthermore, if too many processes are running on the system in a multiprogrammed environment, the performance of the parallel application may degrade due to resource contention. This work proposes a comprehensive dynamic processor allocation scheme that takes both program behavior and system load into consideration when dynamically allocating processors. This mechanism was implemented on the Solaris operating system to dynamically control the execution of parallel C and Java application programs. Performance results show the effectiveness of this scheme in dynamically adapting to the current execution environment and program behavior, and that it outperforms a conventional time‐shared system. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

2.
Parallel execution is normally used to decrease the amount of time required to run a program. However, the parallel execution may require far more space than that required by the sequential execution. Worse yet, the parallel space requirement may be very much more difficult to predict than the sequential space requirement because there are more factors to consider. These include essentially nondeterministic factors that can influence scheduling, which in turn may dramatically influence space requirements. We survey some scheduling algorithms that attempt to place bounds on the amount of time and space used during parallel execution. We also outline a direction for future research. This direction takes us into the area of functional programming, where the declarative nature of the languages can help the programmer to produce correct parallel programs, a feat that can be difficult with procedural languages. Currently the high-level nature of functional languages can make it difficult for the programmer to understand the operational behavior of the program. We look at some of the problems in this area, with the goal of achieving a programming environment that supports correct, efficient parallel programs.  相似文献   

3.
一个基于图像代数的并行图像处理环境   总被引:3,自引:0,他引:3  
系统可用性和应用程序可移植性差是许多现有的并行图像处理计算结构难以获得实际应用的重要原因,基于Ritter提出的图像代数理论,研究、实现了一个并行图像处理环境.用户在并行计算结构上进行程序设计时,只需用图像处理环境提供的图像代数运算描述算法即可,处理环境能够根据用户算法的描述,依据一个时间开销模型,自动从并行实现函数库中提取出最优或近似最优的并行代码完成算法的运行,算法的并行实现和并行计算结构的硬件细节对用户透明。  相似文献   

4.
Debuggers play an important role in developing parallel applications. They are used to control the state of many processes, to present distributed information in a concise and clear way, to observe the execution behavior, and to detect and locate programming errors. More sophisticated debugging systems also try to improve understanding of global execution behavior and intricate details of a program. In this paper we describe the design and implementation of SPiDER, which is an interactive source‐level debugging system for both regular and irregular High‐Performance Fortran (HPF) programs. SPiDER combines a base debugging system for message‐passing programs with a high‐level debugger that interfaces with an HPF compiler. SPiDER, in addition to conventional debugging functionality, allows a single process of a parallel program to be expected or the entire program to be examined from a global point of view. A sophisticated visualization system has been developed and included in SPiDER to visualize data distributions, data‐to‐processor mapping relationships, and array values. SPiDER enables a programmer to dynamically change data distributions as well as array values. For arrays whose distribution can change during program execution, an animated replay displays the distribution sequence together with the associated source code location. Array values can be stored at individual execution points and compared against each other to examine execution behavior (e.g. convergence behavior of a numerical algorithm). Finally, SPiDER also offers limited support to evaluate the performance of parallel programs through a graphical load diagram. SPiDER has been fully implemented and is currently being used for the development of various real‐world applications. Several experiments are presented that demonstrate the usefulness of SPiDER. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

5.
This paper considers a model of a parallel program that can efficiently be interpreted on an instrumental computer, providing a means for a sufficiently accurate prediction of the actual time needed for execution of the parallel program on a given parallel computational system. The model is designed for parallel programs with explicit message passing written in Java with calls to the MPI library and is a part of the ParJava environment. The model is obtained by transforming the program control tree, which, for Java programs, can be constructed by modifying the abstract syntax tree. The communication functions are simulated on the basis of the LogGP model, which makes it possible to take into account specific features of the distributed computational system.  相似文献   

6.
We consider the time-dependent demands for data movement that a parallel program makes on the architecture that executes it. The result is an architecture-independent metric that represents the temporal behavior of data-movement requirements. Programs are described as series of computations and data movements, and while message passing is not ruled out, we focus on explicit parallel programs using a fixed number of processes in a distributed shared-memory environment. Operations are assumed to be explicitly allocated to processors when the metric is applied, which might correspond to intermediate code in a parallelizing compiler. The metric is called the interprocess read (IR) temporal metric. A key to developing an architecture-independent temporal metric is modeling program execution time in an architecture-independent way. This is possible because well-synchronized parallel programs make coordinated progress above a certain level of granularity. Our execution time characterization takes into account barrier synchronization and critical sections. We illustrate the metric using instruction count on simple code fragments and then from multiprocessor program traces (Splash benchmarks). Results of running the benchmarks on simulated network architectures show that the IR metric for the time scale of network response predicts performance better than whole program measures.  相似文献   

7.
We describe a compiler and run-time system that allow data-parallel programs to execute on a network of heterogeneous UNIX workstations. The programming language supported is Dataparallel C, a SIMD language with virtual processors and a global name space. This parallel programming environment allows the user to take advantage of the power of multiple workstations without adding any message-passing calls to the source program. Because the performance of Individual workstations in a multi-user environment may change during the execution of a Dataparallel C program, the run-time system automatically performs dynamic load balancing. We present experimental results that demonstrate the usefulness of dynamic load-balancing In a multi-user environment These results suggest that initially allocating the same amount of work to each processor and letting the dynamic load balancing algorithm adjust the load during program execution yields very good performance. Hence neither the compiler nor the run-time system need a priori knowledge of the speeds of the machines that will participate in a program execution.  相似文献   

8.
A new technique for estimating and understanding the speed improvement that can result from executing a program on a parallel computer is described. The technique requires no additional programming and minimal effort by a program's author. The analysis begins by tracing a sequential program. A parallelism analyzer uses information from the trace to simulate parallel execution of the program. In addition to predicting parallel performance, the parallelism analyzer measures many aspects of a program's dynamic behavior. Measurements of six substantial programs are presented. These results indicate that the three symbolic programs differ substantially from the numeric programs and, as a consequence, cannot be automatically parallelized with the same compilation techniques  相似文献   

9.
GOP is a graph‐oriented programming model which aims at providing high‐level abstractions for configuring and programming cooperative parallel processes. With GOP, the programmer can configure the logical structure of a parallel/distributed program by constructing a logical graph to represent the communication and synchronization between the local programs in a distributed processing environment. This paper describes a visual programming environment, called VisualGOP, for the design, coding, and execution of GOP programs. VisualGOP applies visual techniques to provide the programmer with automated and intelligent assistance throughout the program design and construction process. It provides a graphical interface with support for interactive graph drawing and editing, visual programming functions and automation facilities for program mapping and execution. VisualGOP is a generic programming environment independent of programming languages and platforms. GOP programs constructed under VisualGOP can run in heterogeneous parallel/distributed systems. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

10.
The tension between software development costs and efficiency is especially high when considering parallel programs intended to run on a variety of architectures. In the domain of shared memory architectures and explicitly parallel programs, the authors have addressed this problem by defining a programming structure that eases the development of effectively portable programs. On each target multiprocessor, an effectively portable program runs almost as efficiently as a program fine-tuned for that machine. Additionally, its software development cost is close to that of a single program that is portable across the targets. Using this model, programs are defined in terms of data structure and partitioning-scheduling abstractions. Low software development cost is attained by writing source programs in terms of abstract interfaces and thereby requiring minimal modification to port; high performance is attained by matching (often dynamically) the interfaces to implementations that are most appropriate to the execution environment. The authors include results of a prototype used to evaluate the benefits and costs of this approach  相似文献   

11.
A noninterference monitoring and replay mechanism using the recorded execution history of a program to control the replay of the program behavior and guarantee the reproduction of its errors is presented. Based on this approach, a noninterference monitoring architecture has been developed to collect the program execution data of a target real-time software system without affecting its execution. A replay mechanism designed to control the reproduction of the program behavior as well as the examination of the states of the target system and its behavior is presented. The monitoring system has been implemented using a Motorola 68000 computer in a Unix system environment. An example is used to illustrate how the mechanism detects timing errors of real-time software systems  相似文献   

12.
The construction of large software systems is always achieved through assembly of independently written components — program modules. For these software components to work together, they must share a common set of data types and principles for representing structured data such as arrays of values and files. This common set of tools for creating and operating on data objects is provided by the infrastructure of the computer system: the hardware, operating system and runtime code. Because the nature and properties of these tools are crucial for correct operation of software components and their inter-operation, it is essential to have a precise specification that may be used for verifying correctness of application software on one hand, and to verify correctness of system behavior on the other. We call such a specification a program execution model (PXM). It is evident that the properties of the PXM implemented by a computer system can have serious impact on the ability of application programmers to practice modular software construction. This paper discusses the concept of program execution models and presents a set of principles that a PXM must satisfy to provide a sound basis for modular software construction. Because parallel program execution on computer systems with many processing units is an essential part of contemporary computing environments, the expression of parallelism and modular software construction using components involving parallel operations is included in this treatment. The conclusion is that it is possible to build computer systems that implement a PXM within which any parallel program may be used, unmodified, as a component for building more substantial parallel programs.  相似文献   

13.
The amount of memory required by a parallel program may be spectacularly larger than the memory required by an equivalent sequential program, particularly for programs that use recursion extensively. Since most parallel programs are nondeterministic in behavior, even when computing a deterministic result, parallel memory requirements may vary from run to run, even with the same data. Hence, parallel memory requirements may be both large (relative to memory requirements of an equivalent sequential program) and unpredictable. Assume that each parallel program has an underlying sequential execution order that may be used as a basis for predicting parallel memory requirements. We propose a simple restriction that is sufficient to ensure that any program that will run in n units of memory sequentially can run in mn units of memory on m processors, using a scheduling algorithm that is always within a factor of two of being optimal with respect to time. Any program can be transformed into one that satisfies the restriction, but some potential parallelism may be lost in the transformation. Alternatively, it is possible to define a parallel programming language in which only programs satisfying the restriction can be written  相似文献   

14.
Charlotte: Metacomputing on the Web   总被引:3,自引:0,他引:3  
Parallel computing on local area networks is generally based on mechanisms that specifically target the properties of the local area network environment. However, these mechanisms do not effectively extend to wide area networks due to issues such as heterogeneity, security, and administrative boundaries. We present a system which enables application programmers to write parallel programs in Java and allows Java-capable browsers to execute parallel tasks. It comprises a virtual machine model which isolates the program from the execution environment, and a runtime system realizing this virtual machine on the Web. Load balancing and fault masking are transparently provided by the runtime system.  相似文献   

15.
This paper deals with the construction and use of simple synthetic programs that model the behavior of more complex, real parallel programs. Synthetic programs can be used in many ways: to construct an easily ported suite of benchmark programs, to experiment with alternate parallel implementations of a program without actually writing them, and to predict the behavior and performance of an algorithm on a new or hypothetical machine. Synthetic programs are constructed easily from scratch and from existing programs and can even be constructed using nothing but information obtained from traces of the real program's execution.  相似文献   

16.
Adaptive data partitioning (ADP) which reduces the execution time of parallel programs by reducing interprocessor communication for iterative parallel loops is discussed. It is shown that ADP can be integrated into a communication-reducing back end for existing parallelizing compilers or as part of a machine-specific partitioner for parallel programs. A multiprocessor model to analyze program execution factors that lead to interprocessor communication and a model for the iterative parallel loop to quantify communication patterns within a program are defined. A vector notation is chosen to quantify communication across a global data set. Communication parameters are computed by examining the indexes of array accesses and are adjusted to reflect the underlying system architecture by compensating for cache line sizes. These values are used to generate rectangular and hexagonal partitions that reduce interprocessor communication  相似文献   

17.
本文在并行系统模拟环境中,采集了一个迭代类并行程序实例的运行时间数据,据此,分析了影响程序运行时间的主要因素,建立了一个并行程序运行时间推算模型,从而可以在迭代次数,输入数据规模,以及并行系统的配置等三个方向上对程序运行时间进行预测,实验数据表明,该模型是相当精确的,可以为我们节省大量的模拟时间。  相似文献   

18.
P-GRADE provides a high-level graphical environment to develop parallel applications transparently both for parallel systems and the Grid. P-GRADE supports the interactive execution of parallel programs as well as the creation of a Condor, Condor-G or Globus job to execute parallel programs in the Grid. In P-GRADE, the user can generate either PVM or MPI code according to the underlying Grid where the parallel application should be executed. PVM applications generated by P-GRADE can migrate between different Grid sites and as a result P-GRADE guarantees reliable, fault-tolerant parallel program execution in the Grid. The GRM/PROVE performance monitoring and visualisation toolset has been extended towards the Grid and connected to a general Grid monitor (Mercury) developed in the EU GridLab project. Using the Mercury/GRM/PROVE Grid application monitoring infrastructure any parallel application launched by P-GRADE can be remotely monitored and analysed at run time even if the application migrates among Grid sites. P-GRADE supports workflow definition and co-ordinated multi-job execution for the Grid. Such workflow management can provide parallel execution at both inter-job and intra-job level. Automatic checkpoint mechanism for parallel programs supports the migration of parallel jobs inside the workflow providing a fault-tolerant workflow execution mechanism. The paper describes all of these features of P-GRADE and their implementation concepts.  相似文献   

19.
Sidle系统是运行在SUN工作站网络上的一组实用程序,利用空闲的处理机资源进行大粒度的并行计算.同其它远程执行设备相比,它能支持程序内部并行和嵌套的远程执行,允许一个服务员机接受多个远程执行任务.本文介绍了这些特点和透明性的实现方法.  相似文献   

20.
IPS, a performance measurement system for parallel and distributed programs, is currently running on its second implementation. IPS's model of parallel programs uses knowledge about the semantics of a program's structure to provide two important features. First, IPS provides a large amount of performance data about the execution of a parallel program, and this information is organized so that access to it is easy and intuitive. Secondly, IPS provides performance analysis techniques that help to guide the programmer automatically to the location of program bottlenecks. The first implementation of IPS was a testbed for the basic design concepts, providing experience with a hierarchical program and measurement model, interactive program analysis, and automatic guidance techniques. It was built on the Charlotte distributed operating system. The second implementation, IPS-2, extends the basic system with new instrumentation techniques, an interactive and graphical user interface, and new automatic guidance analysis techniques. This implementation runs on 4.3BSD UNIX systems, on the VAX, DECstation, Sun 4, and Sequent Symmetry multiprocessor  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号