首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The programming of efficient parallel software typically requires extensive experimentation with program prototypes. To facilitate such experimentation, any programming system that supports rapid prototyping of parallel programs should provide high-level language primitives with which programs can be explicitly, statically, or dynamically tuned with respect to performance and reliability. Such language primitives should be able to refer conveniently to the information about the executing program and the parallel hardware required for tuning. Such information may include monitoring data about the current or previous program or even hints regarding appropriate tuning decisions. Language primitives and an associated programming system for program tuning are presented. The primitives and system have been implemented, and have been tested with several parallel applications on a network of Unix workstations.<>  相似文献   

2.
IPS, a performance measurement system for parallel and distributed programs, is currently running on its second implementation. IPS's model of parallel programs uses knowledge about the semantics of a program's structure to provide two important features. First, IPS provides a large amount of performance data about the execution of a parallel program, and this information is organized so that access to it is easy and intuitive. Secondly, IPS provides performance analysis techniques that help to guide the programmer automatically to the location of program bottlenecks. The first implementation of IPS was a testbed for the basic design concepts, providing experience with a hierarchical program and measurement model, interactive program analysis, and automatic guidance techniques. It was built on the Charlotte distributed operating system. The second implementation, IPS-2, extends the basic system with new instrumentation techniques, an interactive and graphical user interface, and new automatic guidance analysis techniques. This implementation runs on 4.3BSD UNIX systems, on the VAX, DECstation, Sun 4, and Sequent Symmetry multiprocessor  相似文献   

3.
D.A.  P.D. 《Performance Evaluation》2005,60(1-4):165-187
We present a new performance modeling system for message-passing parallel programs that is based around a Performance Evaluating Virtual Parallel Machine (PEVPM). We explain how to develop PEVPM models for message-passing programs using a performance directive language that describes a program’s serial segments of computation and message-passing events. This is a novel bottom-up approach to performance modeling, which aims to accurately model when processing and message-passing occur during program execution. The times at which these events occur are dynamic, because they are affected by network contention and data dependencies, so we use a virtual machine to simulate program execution. This simulation is done by executing models of the PEVPM performance directives rather than executing the code itself, so it is very fast. The simulation is still very accurate because enough information is stored by the PEVPM to dynamically create detailed models of processing and communication events. Another novel feature of our approach is that the communication times are sampled from probability distributions that describe the performance variability exhibited by communication subject to contention. These performance distributions can be empirically measured using a highly accurate message-passing benchmark that we have developed. This approach provides a Monte Carlo analysis that can give very accurate results for the average and the variance (or even the probability distribution) of program execution time. In this paper, we introduce the ideas underpinning the PEVPM technique, describe the syntax of the performance modeling language and the virtual machine that supports it, and present some results, for example, parallel programs to show the power and accuracy of the methodology.  相似文献   

4.
Novel approaches are presented for designing performance measurement systems for parallel and distributed programs. The first approach involves unifying performance information into a single, regular structure that reflects the structure of programs under measurement. The authors define a hierarchical model for the execution of parallel and distributed programs as a framework for the performance measurement. A complete different levels of detail in the hierarchy. The second approach is based on the development of automatic guidance techniques that can direct users to the location of performance problems in the program. Guidance information from such techniques supplies facts about problems in the program and provides possible answers for the further improvement of program efficiency. A performance measurement system, called IPS, has been developed as a prototype of the authors' model and design. Some of the test results from IPS are also discussed  相似文献   

5.
Observing the activities of a complex parallel computer system is no small feat, and relating these observations to program behavior is even harder. In this paper, we present a general measurement approach that is applicable to a large class of scalable programs and machines, specifically SPMD and data-parallel programs executing on distributed memory computer systems. The combined instrumentation and visualization paradigm, called VISTA, is based on our experiences in programming and monitoring applications running on an nCUBE 2 computer and a MasPar MP-1 computer. The key is that performance data are treated similarly to any distributed data in the context of the programming models and presented via a hierarchy of multiple views. Because of the data-parallel mapping of program onto machine, we can view the performance as it relates to each processor, processor cluster, or the processor ensemble and as it relates to the data structures of the program. We illustrate the utility of VISTA by example.  相似文献   

6.
Parallel languages allow the programmer to express parallelism at a high level. The management of parallelism and the generation of interprocessor communication is left to the compiler and the runtime system. This approach to parallel programming is particularly attractive if a suitable widely accepted parallel language is available. High Performance Fortran (HPF) has emerged as the first popular machine independent parallel language, and remarkable progress has been made towards compiling HPF efficiently. However, the performance of HPF programs is often poor and unpredictable, and obtaining adequate performance is a major stumbling block that must be overcome if HPF is to gain widespread acceptance. The programmer is often in the dark about how to improve the performance of an HPF program since poor performance can be attributed to a variety of reasons, including poor choice of algorithm, limited use of parallelism, or an inefficient data mapping. This paper presents a profiling tool that allows the programmer to identify the regions of the program that execute inefficiently, and to focus on the potential causes of poor performance. The central idea is to distinguish the code that is executing efficiently from the code that is executing poorly. Efficient code uses all processors of a parallel system to make progress, while inefficient code causes processors to wait, execute replicated code, idle, communicate, or perform compiler bookkeeping. We designate the latter code as non-scalable, since adding more processors generally does not lead to improved performance for such code. By analogy, the former code is called scalable. The tool presented here separates a program into scalable and non-scalable components and identifies the causes of non-scalability of different components. We show that compiler information is the key to dividing the execution times into logical categories that are meaningful to the programmer. We present the design and implementation of a profiler that is integrated with Fx, a compiler for a variant of HPF. The paper includes two examples that demonstrate how the data reported by the profiler are used to identify and resolve performance bugs in parallel programs. © 1997 John Wiley & Sons, Ltd.  相似文献   

7.
With the variety of computer architectures available today, it is often difficult to determine which particular type of architecture will provide the best performance on a given application program. In fact, one type of architecture may be well suited to executing one section of a program while another architecture may be better suited to executing another section of the same program. One potentially promising approach for exploiting the best features of different computer architectures is to partition an application program to simultaneously execute on two or more types of machines interconnected with a high-speed communication network. A fundamental difficulty with this heterogeneous computing, however, is the problem of determining how to partition the application program across the interconnected machines. The goal of this paper is to show how a programmer or a compiler can use a model of a heterogeneous system to determine the machine on which each subtask should be executed. This technique is illustrated with a simple model that relates the relative performance of two heterogeneous machines to the communication time required to transfer partial results across their interconnection network. Experiments with a Connection Machine CM-200 demonstrate how to apply this model to partition two different application programs across the sequential front-end processor and the parallel back-end array.  相似文献   

8.
A new technique for estimating and understanding the speed improvement that can result from executing a program on a parallel computer is described. The technique requires no additional programming and minimal effort by a program's author. The analysis begins by tracing a sequential program. A parallelism analyzer uses information from the trace to simulate parallel execution of the program. In addition to predicting parallel performance, the parallelism analyzer measures many aspects of a program's dynamic behavior. Measurements of six substantial programs are presented. These results indicate that the three symbolic programs differ substantially from the numeric programs and, as a consequence, cannot be automatically parallelized with the same compilation techniques  相似文献   

9.
Writing and debugging distributed programs can be difficult. When a program is working, it can be difficult to achieve reasonable execution performance. A major cause of these difficulties is a lack of tools for the programmer. We use a model of distributed computation and measurement to implement a program monitoring system for programs running on the Berkeley UNIX 4.2BSD operating system. The model of distributed computation describes the activities of the processes within a distributed program in terms of computation (internal events) and communication (external events). The measurement model focuses on external events and separates the detection of external events, event record selection and data analysis. The implementation of the measurement tools involved changes to the Berkeley UNIX kernel, and the addition of daemon processes to allow the monitoring activity to take place across machine boundaries. A user interface has also been implemented.  相似文献   

10.
We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison.One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular, the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify the program execution without changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.  相似文献   

11.
Advances in high performance computing, communications and user interfaces enable developers to construct increasingly interactive high performance applications. The Falcon system presented in this paper supports such interactivity by providing runtime libraries, tools and user interfaces that permit the on-line monitoring and steering of large-scale parallel codes. The principal aspects of Falcon described in this paper are its abstractions and tools for capture and analysis of application-specific program information, performed on-line, with controlled latencies and scalable to parallel machines of substantial size. In addition, Falcon provides support for the on-line graphical display of monitoring information, and it allows programs to be steered during their execution, by human users or algorithmically. This paper presents our basic research motivation, outlines the Falcon system's functionality, and includes a detailed evaluation of its performance characteristics in light of its principal contributions. Falcon's functionality and performance evaluation are driven by our experiences with large-scale parallel applications being developed with end users in physics and in atmospheric sciences. The sample application highlighted in this paper is a molecular dynamics simulation program (MD) used by physicists to study the statistical mechanics of liquids. © 1998 John Wiley & Sons, Ltd.  相似文献   

12.
张斌 《微机发展》1998,8(2):10-13
分析讨论了在分布式系统环境下分布式程序的功能行为和性能方法 ;指出事件驱动监视是一种非常适于分析并行和分布式系统上程序的技术。通过分析 Transputer多处理机上运行的并行光线跟踪程序 ,说明了性能监视所起的巨大作用。  相似文献   

13.
Spirn  J. 《Computer》1976,9(11):14-20
To analyze or simulate an operating/computer system, one must construct a model of the programs executing within the system. Many recent analyses, particularly queueing models, have used mathematically convenient program models. For example, in one popular model the times between page faults are assumed to be exponentially distributed and independent. However, such a program model is inaccurate, which may cause the analysis to be unrepresentative of the real world. Simulation models of systems, on the other hand, have relied largely on traces of actual programs. Such traces are undoubtedly more accurate than simple mathematical models, but they have several drawbacks. They are expensive to generate, they may not be truly representative of typical programs, and they may contain more detail than is necessary for accurate system modeling. Moreover, it is difficult to extrapolate their behavior to other similar programs.  相似文献   

14.
Data flow analysis has been used by compilers in diverse contexts, from optimization to register allocation. Traditional analysis of sequential programs has centered on scalar variables. More recently, several researchers have investigated analysis of array sections for optimizations on modern architectures. This information has been used to distribute data, optimize data movement and vectorize or parallelize programs. As multiprocessors become more common-place. we believe there will be considerable interest in explicitly parallel programming languages. In this paper, we extend traditional analysis to array section analysis for parallel languages which include additional control and synchronization structures.

We show how to compute array section data flow information, i.e., Communication Sets, for a class of parallel programs. To illustrate its use, we show how this information can be applied in compile-time program partitioning. Information about array accesses can also be used to improve estimates of program execution time used to direct runtime thread scheduling.  相似文献   

15.
Programmers have always been curious about what their programs are doing while it is executing, especially when the behavior is not what they are expecting. Since program execution is intricate and involved, visualization has long been used to provide the programmer with appropriate insights into program execution. This paper looks at the evolution of on-line visual representations of executing programs, showing how they have moved from concrete representations of relatively small programs to abstract representations of larger systems. Based on this examination, we describe the challenges implicit in future execution visualizations and methodologies that can meet these challenges.  相似文献   

16.
The data parallel meta language (DPML) and its associated Fortran source code rewriter (DP77) support architecture independent, high performance climate and weather prediction models. The language allows the data domains over which a program operates, the communication patterns required between elements of those data domains, and some or all of the calculations of a program to be expressed at a very high level. DPML uses explicit data parallelism to express the inherent parallelism of the models, with the result that programs are easily compilable into target machine code. DP77 uses information from the DPML program to translate Fortran routines into the host specific Fortran form required for their parallel execution within the model. This paper describes the general strategy behind the development of DPML, discusses its language features using examples drawn from climate modelling, and provides details of the mechanism it uses for incorporating Fortran into data parallel programs. Encouraging results are reported for DPML versions of the standard weather benchmark models executing on vector, SIMD, and MIMD (shared memory) machines. While the paper is set within the framework of climate modelling, the technique has obvious wider implications.  相似文献   

17.
We introduce a middleware infrastructure that provides software services for developing and deploying high-performance parallel programming models and distributed applications on clusters and networked heterogeneous systems. This middleware infrastructure utilizes distributed agents residing on the participating machines and communicating with one another to perform the required functions. An intensive study of the parallel programming models in Java has helped identify the common requirements for a runtime support environment, which we used to define the middleware functionality. A Java-based prototype, based on this architecture, has been developed along with a Java object-passing interface (JOPI) class library. Since this system is written completely in Java, it is portable and allows executing programs in parallel across multiple heterogeneous platforms. With the middleware infrastructure, users need not deal with the mechanisms of deploying and loading user classes on the heterogeneous system. Moreover, details of scheduling, controlling, monitoring, and executing user jobs are hidden, while the management of system resources is made transparent to the user. Such uniform services are essential for facilitating the development and deployment of scalable high-performance Java applications on clusters and heterogeneous systems. An initial deployment of a parallel Java programming model over a heterogeneous, distributed system shows good performance results. In addition, a framework for the agents' startup mechanism and organization is introduced to provide scalable deployment and communication among the agents.  相似文献   

18.
一种基于GIS的网络层次化地图模型及实现算法   总被引:3,自引:0,他引:3  
王恺  杨峰  毕经平 《计算机工程》2005,31(6):12-15,86
大型计算机网络地域分布广,网元数目众多,传统网络管理系统以网络拓扑这种虚拟空间的方式实现网络的监控管理,没有充分利用网元的地理位置信息,网络监控和管理存在不便之处.地理信息系统(GIS)的应用能够赋予网络监控管理系统清晰直观、易于监控和管理的特性.通过建立一种基于GIS的网络层次化地图模型,系统解决了网络拓扑与GIS地图有效结合这一问题,实现了在GIS地图中网络拓扑与运行状态信息的层次化管理,给出了网络GIS地图的生成、维护的非递归算法.模型和算法的有效性和完备性在大型网络性能监测与分析系统NIPMAS中得到了实际验证.  相似文献   

19.
《Parallel Computing》1988,7(1):11-24
We present a single-program-multiple-data computational model which we have implemented in the EPEX system to run in parallel mode FORTRAN scientific application programs. The computational model assumes a shared memory organization and is based on the scheme that all processes executing a program in parallel remain in existence for the entire execution; however, the tasks to be executed by each process are determined dynamically during execution by the use of appropriate synchronizing constructs that are imbedded in the program. We have demonstrated the applicability of the model in the parallelization of several applications. We discuss parallelization features of these applications and performance issues such as overhead, speedup, efficiency.  相似文献   

20.
This paper describes how Crystal, a language based on familiar mathematical notation and lambda calculus, addresses the issues of programmability and performance for parallel supercomputers. Some scientifc programmers and theoreticians may ask, “What is new about Crystal?” or “How is it different from existing functional languages?” The answers lie in its model of parallel computation and a theory of parallel program optimization, and we examine this in the text to follow. We illustrate the power of our approach with benchmarks of compiled parallel code from Crystal source. The target machines are hypercube multiprocessors with distributed memory, on which it is considered difficult for functional programs to achieve high efficiency.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号