首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Features of an explicitly parallel programming language targeted for reconfigurable parallel processing systems, where the machine's N processing elements (PEs) are capable of operating in both the SIMD and SPMD modes of parallelism, are described. The SPMD (single program-multiple data) mode of parallelism is a subset of the MIMD mode where all processors execute the same program. By providing all aspects of the language with an SIMD mode version and an SPMD mode version that are syntactically and semantically equivalent, the language facilitates experimentation with and exploitation of hybrid SIMD/SPMD machines. Language constructs (and their implementations) for data management, data-dependent control-flow, and PE-address-dependent control-flow are presented. These constructs are based on experience gained from programming a parallel machine prototype and are being incorporated into a compiler under development. Much of the research presented is applicable to general SIMD machines and MIMD machines  相似文献   

2.
Obtaining efficient execution of parallel programs in workstation networks is a difficult problem for the user. Unlike dedicated parallel computer resources, network resources are shared, heterogeneous, vary in availability, and offer communication performance that is still an order of magnitude slower than parallel computer interconnection networks. Prophet, a system that automatically schedules data parallel SPMD programs in workstation networks for the user, has been developed. Prophet uses application and resource information to select the appropriate type and number of workstations, divide the application into component tasks and data across these workstations, and assign tasks to workstations. This system has been integrated into the Mentat parallel processing system developed at the University of Virginia. A suite of scientific Mentat applications has been scheduled using Prophet on a heterogeneous workstation network. The results are promising and demonstrate that scheduling SPMD applications can be automated with good performance. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

3.
一种针对结构化并行控制机制的任务调度算法   总被引:4,自引:0,他引:4  
缩短程序的执行时间是并行处理的首要目标,有效的任务分配算法是实现这一目标的关键,对机群系统来说更是如此.研究机群系统上针对结构化并行控制机制的任务调度问题,并基于贪心算法、粒度控制、反馈式分派的原则,提出近优的任务调度算法SSA(sub-optimal scheduling algorithm).实验结果表明,在机群环境下,该算法的并行计算性能与其他算法相比均有所提高.  相似文献   

4.
一种实用的并行计算模型   总被引:11,自引:0,他引:11  
对于当前流行的工作站集群环境和各类并行机系统,文中提出了一种实用的并行计算模型,即基于LogGP的非独占异质同步模型NHBL(Nondedicated Heterogeneous Barrier LogGP),它旨在反映具有异质性和非独占性的NOW计算环境对并行算法设计和分析的影响,然后用NHBL模型分析了PSRS算法在国家高性能计算中心(合肥)的工作站集群NHPCC-Cluster和曙光-1000MPP由的代价,并用实测结果进行了验证。  相似文献   

5.
Coordinating Parallel Processes on Networks of Workstations   总被引:1,自引:0,他引:1  
The network of workstations (NOW) we consider for scheduling is heterogeneous and nondedicated, where computing power varies among the workstations and local and parallel jobs may interact with each other in execution. An effective NOW scheduling scheme needs sufficient information about system heterogeneity and job interactions. We use the measured power weight of each workstation to quantify the differences of computing capability in the system. Without a processing power usage agreement between parallel jobs and local user jobs in a workstation, job interactions are unpredictable, and performance of either type of jobs may not be guaranteed. Using the quantified and deterministic system information, we design a scheduling scheme calledself-coordinated local schedulingon a heterogeneous NOW. Based on a power usage agreement between local and parallel jobs, this scheme coordinates parallel processes independently in each workstation based on the coscheduling principle. We discuss its implementation on Unix System V Release 4 (SVR4). Our simulation results on a heterogeneous NOW show the effectiveness of the self-coordinated local scheduling scheme.  相似文献   

6.
In recent years, network of workstations/PCs (so called NOW) are becoming appealing vehicles for cost-effective parallel computing. Due to the commodity nature of workstations and networking equipment, LAN environments are gradually becoming heterogeneous. The diverse sources of heterogeneity in NOW systems pose a challenge on the design of efficient communication algorithms for this class of systems. In this paper, we propose efficient algorithms for multiple multicast on heterogeneous NOW systems, focusing on heterogeneity in processing speeds of workstations/PCs. Multiple multicast is an important operation in many scientific and industrial applications. Multicast on heterogeneous systems has not been investigated until recently. Our work distinguishes itself from others in two aspects: (1) In contrast to the blocking communication model used in prior works, we model communication in a heterogeneous cluster more accurately by a non-blocking communication model, and design multicast algorithms that can fully take advantage of non-blocking communication. (2) While prior works focus on single multicast problem, we propose efficient algorithms for general, multiple multicast (in which single multicast is a special case) on heterogeneous NOW systems. To our knowledge, our work is the earliest effort that addresses multiple multicast for heterogeneous NOW systems. These algorithms are evaluated using a network simulator for heterogeneous NOW systems. Our experimental results on a system of up to 64 nodes show that some of the algorithms outperform others in many cases. The best algorithm achieves completion time that is within 2.5 times of the lower bound.  相似文献   

7.
张艳  孙世新 《计算机应用》2000,20(10):29-32
随着高速网络技术(如ATM)的出现,网络并行计算系统(NOW)已成为并行处理的主要平台,由于它的高通信延迟,某些在并行机上实现的细粒度并行算法已不适合在该环境下运行。为此,有必要对算法重新进行任务划分,研究它在网络环境中的并行实现。基于这一点,本文对矩阵的QR分解提出了一种新的任务划分策略,并由此得到了它的一种粗粒度并行算法,实验结果表明,设计的并行算法在网络并行计算环境中具有较高的加速比。  相似文献   

8.
Networks of workstations (NOW) are receiving increased attention as a viable platform for high performance parallel computations. Heterogeneity and time-sharing are two characteristics that distinguish the NOW systems from conventional multiprocessor/multicomputer systems which are homogeneous and dedicated. It is important to have a practical model for users to predict the execution times of large-scale parallel applications on nondedicated heterogeneous NOW. Another objective of this study is to provide insight into the dynamic performance of parallel computing and into the effects of program structures and system factors on such a platform. In this paper, we study performance predictions for parallel computing on nondedicated heterogeneous networks of workstations. Our approach is based on a two-level model. On the top level, a semideterministic task graph is used to capture the parallel execution behavior including the variances of communication and synchronization. On the bottom level, a discrete time model is used to quantify effects from NOW systems. An iterative process is used to determine the interactive effects between network contention and task execution. We validate the prediction model using experiments on a nondedicated heterogeneous NOW. The maximum differences between predicted results and measured results were less than 10% in most cases and 15% in the worst cases.  相似文献   

9.
Network Of Workstations (NOW) platforms put together with off-the-shelf workstations and networking hardware have become a cost effective, scalable, and flexible platform for video processing applications. Still, one has to manually schedule an algorithm to the available processors of the NOW to make efficient use of the resources. However, this approach is time-consuming and impractical for a video processing system that must perform a variety of different algorithms, with new algorithms being constantly developed. Improved support for program development is absolutely necessary before the full benefits of parallel architectures can be realized for video processing applications. Toward this goal, an automatic compile-time scheduler has been developed to schedule input tasks of video processing applications with precedence constraints onto available processors. The scheduler exploits both spatial (parallelism) and temporal (pipelining) concurrency to make the best use of machine resources. Two important scheduling problems are addressed. First, given a task graph and a desired throughput, a schedule is constructed to achieve the desired throughput with the minimum number of processors. Second, given a task graph and a finite set of available resources, a schedule is constructed such that the throughput is maximized while meeting the resource constraints. Results from simulations show that the scheduler and proposed optimization techniques effectively tackle these problems by maximizing processor utilization. A code generator has been developed to generate parallel programs automatically. The tools developed in this paper make it much easier for a programmer to develop video processing applications on these parallel architectures.  相似文献   

10.
The network of workstations (NOW) we consider for parallel computing is heterogeneous and nondedicated (time-sharing), where computing power varies among the workstations, and multiple jobs may interact with each other in execution. We address three performance issues in this paper. First, we examine the effects of heterogeneity on co-scheduling and local scheduling policies for parallel computing. Through experimentation and quantitative comparisons, we discuss features and requirements of scheduling policies on heterogeneous NOW. Second, the heterogeneity and non-dedication of NOW introduce new performance factors into parallel computing, which make traditional performance metrics for parallel computing under homogeneous platforms not suitable. We conducted a collection of experimental measurements to show the performance impact to parallel computing. Finally, using network latencies we experimentally evaluate the parallel computing scalability on NOW. Our objective of this study is to provide insights into unique performance bottlenecks and potentials of networks of workstations.  相似文献   

11.
基于数据空间融合的全局计算与数据划分方法   总被引:2,自引:1,他引:2  
夏军  杨学军 《软件学报》2004,15(9):1311-1327
计算与数据划分问题是影响并行程序在分布主存多处理机中执行性能的重要因素,也是并行编译优化的重点.针对该问题,提出了一套关于数据空间融合的理论框架,并基于该框架给出了一种有效的全局计算与数据划分方法,用于分布主存计算环境中的计算与数据划分问题的求解.该方法能够尽量开发计算空间的并行度,利用数据融合技术优化数据分布,并能搜寻优化的全局计算与数据划分.该方法还能很自然地与数据复制以及偏移常量的对准结合在一起,从而使得数据通信量尽可能地小.实验结果表明了所提出方法的有效性.  相似文献   

12.
In a network of high performance workstations, many workstations are underutilized by their owners. The problem of using these idle cycles for solving computationally intensive tasks by executing a large task on many workstations has been addressed before and algorithms with O(N2) time and O(N) space for choosing the optimal subset of workstations out of N workstations were presented. We improve these algorithms to reduce the running time to O(N log N), while keeping the space requirement the same. The proposed algorithms are particularly useful for SPMD parallelism where computation is the same for all workstations and the data space is partitioned between the workstations  相似文献   

13.
Abstract

Heterogeneous networks of workstations and/or personal computers (NOW) are increasingly used as a powerful platform for the execution of parallel applications. When applications previously developed for traditional parallel machines (homogeneous and dedicated) are ported to NOWs, performance worsens owing in part to less efficient communications but more often to unbalancing.

In this paper, we address the problem of the efficient porting to heterogeneous NOWs of data-parallel applications originally developed using the SPMD paradigm for homogeneous parallel systems with regular topology like ring.

To achieve good performance, the computation time on the various machines composing the NOW must be as balanced as possible. This can be obtained in two ways: by using an heterogeneous data partition strategy with a single process per node, or by splitting homogeneously data among processes and assigning to each node a number of processes proportional to its computing power. The first method is however more difficult, since some modifications in the code are always needed, whereas the second approach requires very few changes.

We carry out a simplified but reliable analysis, and propose a simple model able to simulate performance in the various situations. Two test cases, matrix multiplication and computation of long-range interactions, are considered, obtaining a good agreement between simulated and experimental results.

Our analysis shows that an efficient porting of regular homogeneous data-parallel applications on heterogeneous NOWs is possible. Particularly, the approach based on multiple processes per node turns out to be a straightforward and effective way for achieving very satisfying performance in almost all situations, even dealing with highly heterogeneous systems.  相似文献   

14.
MPEG-4视频编码的并行实现   总被引:4,自引:2,他引:4  
该文采取不同的方式探讨了MPEG-4视频校验模型的并行资源,以实现MPEG-4视频的实时处理。首先简要介绍了现有的几种调度策略,如模块并行策略、组调度策略、基于GOV的调度策略、基于VOP的调度策略等;在数据划分方面,对现有的几种数据划分方式(传统数据划分方式、殆正方形方式、形状自适应方式等)做了介绍。  相似文献   

15.
This paper introduces queuing network models for the performance analysis of SPMD applications executed on general-purpose parallel architectures such as MIMD and clusters of workstations. The models are based on the pattern of computation, communication, and I/O operations of typical parallel applications. Analysis of the models leads to the definition of speedup surfaces which capture the relative influence of processors and I/O parallelism and show the effects of different hardware and software components on the performance. Since the parameters of the models correspond to measurable program and hardware characteristics, the models can be used to anticipate the performance behavior of a parallel application as a function of the target architecture (i.e., number of processors, number of disks, I/O topology, etc).  相似文献   

16.
17.
18.
Small organisations can now have access to high raw processing power using networks of workstations (NOW) as parallel computing platforms. Software Distributed Shared Memory (Software DSM) packages have been developed to facilitate the programming of such systems. However, because of the high interprocess latencies in a NOW, the performance of a software DSM application is more susceptible to the partitioning of the problem than what might be expected.This paper presents an approach for a tool to visualise the execution of a program in a way that highlights performance bottlenecks. The tool associates identified bottlenecks with the corresponding source code lines in order to determine what piece of code is the cause of poor performance. The visualisation technique is demonstrated in two case studies. They clearly show that the visualisation is indeed useful and provides an effective way to acquire an understanding of what characterises an applications sharing behaviour.  相似文献   

19.
20.
Introduces queuing network models for the performance analysis of SPMD (single-program, multiple-data) applications executed on general-purpose parallel architectures such as MIMD (multiple-input, multiple data) and clusters of workstations. The models are based on the pattern of computation, communication and I/O operations of typical parallel applications. Analysis of the models leads to the definition of speedup surfaces which capture the relative influence of processors and I/O parallelism and show the effects of different hardware and software components on the performance. Since the parameters of the models correspond to measurable program and hardware characteristics, the models can be used to anticipate the performance behavior of a parallel application as a function of the target architecture (i.e. the number of processors, number of disks, I/O topology, etc.)  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号