首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 187 毫秒
1.
微机环境下基于PVM的网络并行程序开发方法   总被引:1,自引:0,他引:1  
并行虚拟机PVM是一种通用的网络并行程序开发环境,它可以把连网的巨型机,大规模并行机,工作站以及微机作为一大型并行机使用,供人们开发并行算法或运行并行系统。此文对PVM的基本情况和最新进展进行介绍,讨论了基于PVM的网络并行程序开发方法,最后给出了具体的实例。  相似文献   

2.
张延园  刘敏 《微机发展》1997,7(5):17-19
在并行程序的开发过程中,常常会出现负载不平衡、通讯开销过大、同步等待等一些导致计算机系统性能降低的因素。为了克服这些问题,及时对并行程序进行性能分析是十分重要的.在[1]、[2]、[3]中虽然对并行程序的性能分析作了一些研究,但都没有实现对并行程序的全局住分析,作者在对并行程序的运行状态进行分析的基础上,研究和开发了一个住能分析系统,它能自动地提取描述程序运行过程的真实数据,依据这些数据描述并行程序的各种性能指标,并对影响并行程序运行性能的原因作出直观的图形表述。  相似文献   

3.
一个用于工作站网络的动态负载平衡算法   总被引:3,自引:0,他引:3  
数学和科学计算中的大部分问题都可以用数据并行程序来开发其并行性,但是在工作站网络环境中,负载波动很大,负载平衡是影响其效率的一个重要因素。本文提出了一种动态负载平衡的算法,它可以使数据并行程序在运行时动态地调整负载。并且文中给出了这种算法的实验结果。  相似文献   

4.
为了使Petri网技术能够应用于MPI并行程序的正确性和性能的验证,提出了Petri网共享合成运算构建MPI并行程序Petri网的算法。对分布式并行处理系统MPI并行程序的结构与消息传递过程进行分析,给出并行程序基本语句与传递函数的Petri网,将Petri网共享合成运算从两个Petri网的共享合成运算推广到并行程序的多个Petri网的共享合成运算,给出了推广定理和证明。提出了共享合成构建MPI并行程序Petri网的算法,并在消息传递并行系统中给出构建MPI并行程序Petri网的应用示例。实验结果表明,共享合成运算是构建MPI并行程序Petri网模型的一种有效方法。  相似文献   

5.
并行机群是一套硬件,软件集成的并行程序开发和运行环境,为了增强该系统的开放性,兼容性,可移植性,需要实现该系统通信层与TCP/IP网络的接口,本文按问题的提出,分析,解决思路,对此实现方法进行探讨。所讨论了问题集中在TCP/IP协议层次模型的网络接口层上,本文不仅对并行机群与TCP/IP网络的接口实现,而且对其它专用多机系统与通用网络的接口的实现,都有重要的参考价值。  相似文献   

6.
TDCE:基于Tspaces的分布并行计算系统   总被引:1,自引:0,他引:1       下载免费PDF全文
Tspaces是一种新型的网络中间件。它为网络环境中各进程提供一种强大的共享存储机制来处理相互之间通信和同步。在Tspaces的基础上,构造了一个用于群机环境的并行计算系统TDCE。TDCE支持SPMD模式的并行程序,实验结果表明TDCE能以较小的系统配置和管理开销构建分布式计算平台,为并行程序的开发运行提供有效的支持,给出了系统MPI的对比测试结果并作了分析。  相似文献   

7.
基于MPI的动态负载平衡算法的研究   总被引:1,自引:1,他引:0  
MPI是目前集群系统中最重要的并行编程工具,它采用消息传递的方式实现并行程序间通信.在MPI并行程序设计中实现负载平衡有着重要的意义,可以减少运行时间,提高MPI并行程序的性能.为了解决同构集群中动态负载均衡问题,提出了一种在MPI并行程序中实现的方法,可有效地根据节点的负载情况在节点间迁移任务.  相似文献   

8.
在多核处理器芯片中,分布式共享存储DSM虽然提供了统一的全局寻址的存储空间,但却引入了虚地址向实地址转换的开销,这对性能产生了负面的影响。我们注意到,在并行程序的执行过程中,被处理的数据属性(私有或共享)并不是一成不变的。并行程序中不同的数据具有不同的属性,即使同一数据在程序的不同执行阶段也可能具有不同的属性。本文首先详细地阐述了一种混合式的分布式共享存储空间,支持对共享数据采用全局寻址的虚地址访问而对私有数据采用快速寻址的实地址访问;进而提出了一种针对混合式的分布式共享存储空间的实时划分技术。该技术根据并行程序中数据的属性,在程序运行时,实时地调整和划分分布式共享存储空间。当数据为私有时,通过实地址访问加快数据的访问速度,当数据为共享时则维持虚地址访问,从而减少整个并行程序运行过程中的地址转换开销,提高系统的性能。实际应用程序的实验结果表明,与传统的分布式共享存储空间相比,实时划分的混合式的分布式共享存储空间具有性能优势,性能的提升比例与具体的网络规模、计算规模、并行程序映射方式等有关。在我们的实验中,性能的提升比例最高为13.14%,最低为6.98%。  相似文献   

9.
一、多用户共事程各的编制LndowSNT网络系统环境建立起来后,为用户提供了多用户在WindOWSNT网络系统下工作的外部环境,用户要在该环境下工作,还需建立起多用户的共享程序和数据库文件及各种应用软件。在网络下运行多个用户程序,其实际是并行程序设计,即几个程序可以同时在网络系统下运行,这样可以加快程序的运行速度。1打开文件的方式在多个用户运行的情况下,数据库文件的打开有两种方式:独享方式和共享方式。独享方式打开的数据库文件只可为打开的程序(用户)自己使用,在关闭之前其它程序(用户)不能对其进行访问。共享方式…  相似文献   

10.
消息传递界面PVM和MPI的现状与发展趋势   总被引:11,自引:0,他引:11  
PVM和MPI是目前国际上最有影响的两种消息传递并行计算环境,两者都能在MPP和工作站网络上运行。由于设计背景和侧重点不同,使得这两种界面既有共同点,又各具特色。文中就性能和发展趋势对两者进行叙述和比较,可供并行程序开发者选择并行计算环境时参考。  相似文献   

11.
由于并行应用程序的运行效率往往很低,如何帮助程序员提高性能成为高性能计算中的重要问题,本文介绍了一个基于MPI的性能评价工具,它可以在应用程序运行的同时是收集系统负载信息,跟踪程序流程,根据硬件资源情况对处理机进行分组,并将负载信息和程序流程同时以图形方式展示,程序员可以藉此对并行应用程序运行情况进行监测,分析算法执行过程和系统负载的关系,找出性能瓶颈,发掘应用程序的潜力,最终提高应用程序的性能。  相似文献   

12.
Parallel processing systems using networks of workstations are being used to provide an alternative to expensive parallel processors. Scheduling of tasks on these networks is an important and practical problem that must be addressed. Although CPU load is an important parameter to many of the proposed scheduling schemes, no quantitative analysis of CPU load and its precise relation to the run time of application programs has to date been presented. The work in this paper describes the experimental analysis of one common load measure, the UNIX load average, and its relationship to the run time of computation-bound parallel programs. Data was gathered using a test application program designed to mimic common applications, performing long bursts of computation with occasional interprocess data exchange over the network. The resulting execution times and measured load averages were then analyzed using regression analysis to detect load-run time trends. This paper describes the test program and the experiments, then details the results of the data analysis. A technique is then presented for the evaluation of the load-run time relationship for a computation-bound program on a network of workstations.  相似文献   

13.
Exploiting cache locality of parallel programs at runtime is a complementary approach to a compiler optimization. This is particularly important for those applications with dynamic memory access patterns. We propose a memory-layout oriented technique to exploit cache locality of parallel loops at runtime on Symmetric Multiprocessor (SMP) systems. Guided by application-dependent and targeted architecture-dependent hints, our system, called Cacheminer, reorganizes and partitions a parallel loop using the memory-access space of its execution. Through effective runtime transformations, our system maximizes the data reuse in each partitioned data region assigned in a cache, and minimizes the data sharing among the partitioned data regions assigned to all caches. The executions of tasks in the partitions are scheduled in an adaptive and locality-presented way to minimize the execution time of programs by trading off load balance and locality. We have implemented the Cacheminer runtime library on two commercial SMP servers and an SimCS simulated SMP. Our simulation and measurement results show that our runtime approach can achieve comparable performance with the compiler optimizations for programs with regular computation and memory-access patterns, whose load balance and cache locality can be well optimized by the tiling and other program transformations. However, our experimental results show that our approach is able to significantly improve the memory performance for the applications with irregular computation and dynamic memory access patterns. These types of programs are usually hard to optimize by static compiler optimizations  相似文献   

14.
高性能集群工作方式越来越受到人们的关注。通常集群是一组通过网络连接的多个异构的计算机系统。在集群工作模式下,一个非常重要的问题就是要确保负载量的均衡。由于目前的负载均衡系统大多只支持同构集群环境,且均衡粒度为作业级,过于粗糙,所以不能很好的适用于并行程序中并行任务的均衡。本文提出了一种并行程序的开发框架,使用移动Agent技术解决任务的动态迁移性,为程序员提供了一个简单的开发接口,大大地简化了他们的工作。系统采用java和Aglet平台开发而成。实验表明,该系统灵活有效。  相似文献   

15.
A load balancing framework for adaptive and asynchronous applications   总被引:1,自引:0,他引:1  
We describe the design of a flexible load balancing framework and runtime software system for supporting the development of adaptive applications on distributed-memory parallel computers. The runtime system supports a global namespace, transparent object migration, automatic message forwarding and routing, and automatic load balancing. These features can be used at the discretion of the application developer in order to simplify program development and to eliminate complex bookkeeping associated with mobile data objects. An evaluation of this system in the context of a three-dimensional tetrahedral advancing front parallel mesh generator shows that overall runtime improvements of 15 percent compared to common stop-and-repartition load balancing methods, 30 percent compared to explicit intrusive load balancing methods, and 42 percent compared to no load balancing are possible on large processor configurations. At the same time, the overheads attributable to the runtime system are a fraction of 1 percent of the total runtime. The parallel advancing front method is a coarse-grained and highly adaptive application and therefore exercises all of the features of the runtime system.  相似文献   

16.
In this article, we study the effects of network topology and load balancing on the performance of a new parallel algorithm for solving triangular systems of linear equations on distributed-memory message-passing multiprocessors. The proposed algorithm employs novel runtime data mapping and workload redistribution methods on a communication network which is configured as a toroidal mesh. A fully parameterized theoretical model is used to predict communication behaviors of the proposed algorithm relevant to load balancing, and the analytical performance results correctly determine the optimal dimensions of the toroidal mesh, which vary with the problem size, the number of available processors, and the hardware parameters of the machine. Further enhancement to the proposed algorithm is then achieved through redistributing the arithmetic workload at runtime. Our FORTRAN implementation of the proposed algorithm as well as its enhanced version has been tested on an Intel iPSC/2 hypercube, and the same code is also suitable for executing the algorithm on the iPSC/860 hypercube and the Intel Paragon mesh multiprocessor. The actual timing results support our theoretical findings, and they both confirm the very significant impact a network topology chosen at runtime can have on the computational load distribution, the communication behaviors and the overall performance of parallel algorithms.  相似文献   

17.
In this paper we propose a methodology underlying a development of system-wide energy consumption models for servers, which is based on the analysis of performance counters. It enables to estimate the power usage of a machine under any load at runtime. By clustering applications we extract groups of programs having similar characteristics. This allows us to create more specialized and accurate power usage models. By using decision trees it is possible to automatically select an appropriate model to current system load. Training and test sets of programs were used to test the estimates. The presented models are accurate within an error of 4% as verified on servers from different vendors, including the latest pre-production one.  相似文献   

18.
Using runtime information of load distributions and processor affinity, the authors propose an adaptive scheduling algorithm and its variations from different control mechanisms. The proposed algorithm applies different degrees of aggressiveness to adjust loop scheduling granularities, aiming at improving the execution performance of parallel loops by making scheduling decisions that match the real workload distributions at runtime. They experimentally compared the performance of the algorithm and its variations with several existing scheduling algorithms on two parallel machines: the KSR-1 and the Convex Exemplar. The kernel application programs used for performance evaluation were carefully selected for different classes of parallel loops. The results show that using runtime information to adaptively adjust scheduling granularity is an effective way to handle loops with a wide range of load distributions when no prior knowledge of the execution can be used. The overhead caused by collecting runtime information is insignificant in comparison with the performance improvement. The experiments show that the adaptive algorithm and its five variations outperformed the existing scheduling algorithms  相似文献   

19.
PC集群及其并行程序性能的实用检测方法   总被引:4,自引:0,他引:4  
随着微处理器、网络技术和并行编程环境的发展,给集群系统尤其是对适合我国国情的PC集群的开发和应用带来了机遇。廉价的PC集群是以其高难度的并行编程要求为代价的,通常用户应用程序的运行速度往往都很不理想,所以如何在用户级实现对应用程序有效速度的检测,进而提供改进程序设计的信息以提高效率就显得至关重要。该文着重就几种在MPI编程环境下如何利用嵌入的MPE技术检测并行程序性能的实现方法进行了分析研究。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号