共查询到20条相似文献,搜索用时 15 毫秒
1.
Coordinating Parallel Processes on Networks of Workstations 总被引:1,自引:0,他引:1
The network of workstations (NOW) we consider for scheduling is heterogeneous and nondedicated, where computing power varies among the workstations and local and parallel jobs may interact with each other in execution. An effective NOW scheduling scheme needs sufficient information about system heterogeneity and job interactions. We use the measured power weight of each workstation to quantify the differences of computing capability in the system. Without a processing power usage agreement between parallel jobs and local user jobs in a workstation, job interactions are unpredictable, and performance of either type of jobs may not be guaranteed. Using the quantified and deterministic system information, we design a scheduling scheme calledself-coordinated local schedulingon a heterogeneous NOW. Based on a power usage agreement between local and parallel jobs, this scheme coordinates parallel processes independently in each workstation based on the coscheduling principle. We discuss its implementation on Unix System V Release 4 (SVR4). Our simulation results on a heterogeneous NOW show the effectiveness of the self-coordinated local scheduling scheme. 相似文献
2.
Stergios V. Anastasiadis Kenneth C. Sevcik 《Journal of Parallel and Distributed Computing》1997,43(2):364
Parallel applications can be executed using the idle computing capacity of workstation clusters. However, it remains unclear how to schedule the processors among different applications most effectively. Processor scheduling algorithms that were successful for shared-memory machines have proven to be inadequate for distributed memory environments due to the high costs of remote memory accesses and redistributing data. We investigate how knowledge of system load and application characteristics can be used in scheduling decisions. We propose a new algorithm based on adaptive equipartitioning, which, by properly exploiting both the information types above, performs better than other nonpreemptive scheduling rules, and nearly as well as idealized versions of preemptive rules (with free preemption). We conclude that the new algorithm is suitable for use in scheduling parallel applications on networks of workstations. 相似文献
3.
Broadcasting on Networks of Workstations 总被引:1,自引:0,他引:1
Broadcasting and multicasting are fundamental operations. In this work we develop algorithms for performing broadcast and
multicast in clusters of workstations. In this model, sending a message to a machine in the same cluster takes 1 time unit,
and sending a message to a machine in a different cluster takes C(≥1) time units. The clusters may have arbitrary sizes. Lowekamp and Beguelin proposed heuristics for this model, but their
algorithms may produce broadcast times that are arbitrarily worse than optimal. We develop the first constant factor approximation
algorithms for this model. Algorithm LCF (Largest Cluster First) for the basic model is simple, efficient and has a worst
case approximation guarantee of 2. We then extend these models to more complex models where we remove the assumption that
an unbounded amount of communication may happen using the global network. The algorithms for these models build on the LCF
method developed for the basic problem. Finally, we develop broadcasting algorithms for the postal model where the sending
node does not block for C time units when the message is in transit. 相似文献
4.
工作站网络环境中三对角方程组并行求解 总被引:3,自引:0,他引:3
此文考虑工作站网络环境中三对角方程组的有效并行求解,其中每台处理机只拥有原方程组的部分等式信息,并提出适合于分布主存并行计算的并行LU分解算法,同时给出算法的计算与通讯的复杂性分析;并从理论及实验两方面阐述了缩减系统求解策略是影响算法在多机系统中求解效率的重要因素;所有算法由PVM软件系统,支持实现并在工作站网络环境中测试。 相似文献
5.
一种基于MPI和工作站群集的并行计算 总被引:1,自引:0,他引:1
本文主要分析了消息传递的模型及其实现的一种方式MPI,构造了一个四节点基于MPI的工作站群集并实现了求π的并行计算,最后给出性能分析和负载平衡分析. 相似文献
6.
考虑工作站网络(NOWs)中三对角线性方程组的并行求解,基于最小秩解耦算法与分布治之并行计算模式,并行最小秩解耦算法(PMRD)。它在计算过程中保持原矩阵的结构特征,数值稳定性高,本文给出算法的数值特征分析以及计算与通讯复杂性分析并与Mehrmann分治算比较,所有算法由PVM软件系统实现并在工作站网络中测试。 相似文献
7.
《Journal of Parallel and Distributed Computing》1997,40(1):131-137
Circulating active barrier (CAB) is a new low-cost, high-performance hardware mechanism for synchronizing multiple processing elements (PEs) in networks of workstations at fine-grained programmed barriers. CAB is significantly less complex than other hardware barrier synchronization mechanisms with equivalent performance, using only a single conductor, such as a wire or copper run on a printed-circuit board, to circulate barrier packets between PEs. When a PE checks in at a barrier, the CAB hardware will decrement the count associated with that barrier in a bit-serial fashion as a barrier packet passes through, and then will monitor the packets until all PEs have checked in at the barrier. The ring has no clocked sequential logic in the serial loop. A cluster controller (CC) generates packets for active barriers, removes packets when no longer needed, and resets counters when all PEs have seen the zero-count. A hierarchy of PEs can be achieved by connecting the CCs in intercluster rings. When using conservative timing assumptions, the expected synchronization times with optimal clustering are shown to be under 1 μs for as many as 4096 PEs in multiprocessor workstations or 1024 single-processor workstations. The ideal number of clusters for a two-dimensional hierarchy ofNPEs is shown to be [N(D+G)/(I+G)]1/2, whereGis the gate propagation delay,Dis the inter-PE delay, andIis the intercluster transmission time. CAB allows rapid, contention-free check-in and proceed-from- barrier and is applicable to a wide variety of system architectures and topologies. 相似文献
8.
《Journal of Systems Architecture》2000,46(8):641-653
The paper is motivated by efficiency considerations about porting mathematical software from Massively Parallel Processors (MPPs) to Networks of Workstations (NOWs). Heterogeneity of the network is the major obstacle to efficient porting: it can be overcome with a specialized system, Programming envIronment for Network of COmputers (PINCO), for monitoring available computational power at different nodes, both statically and dynamically. The structure and functionalities of PINCO are outlined, and a significant porting example, matrix multiplication, is presented. 相似文献
9.
矩阵相乘Cannon并行算法在工作站机群上的实现 总被引:6,自引:1,他引:6
矩阵相乘Cannon并行算法是一个基于分布式存储多处理机模型的并行数据算法,文章研究了它在工作站机群上的实现。在满足负载平衡和减少网络间数据传输的条件下,主要探讨了子任务在工作站上的优化分配策略,最后给出了在PVM并行编程环境下的具体实现方法。 相似文献
10.
In this paper, we present an adaptive version of the parallel Distributive Join (DJ) algorithm that we proposed in [5]. The adaptive parallel DJ algorithm can handle the data skew in operand relations efficiently. We implemented the original and adaptive parallel DJ algorithms on a network of Alpha workstations using the Parallel Virtual Machine (PVM). We analyzed the performance of the algorithms, and compared it with that of the parallel Hybrid-Hash (HH) join algorithms. Our results show that the parallel DJ algorithms perform comparably with the parallel HH join algorithms over the entire range of the number of processors used and for different join selectivities. A significant advantage of the parallel DJ algorithms is that they can easily support non-equijoin operations. 相似文献
11.
《Journal of Parallel and Distributed Computing》2001,61(11):1665-1679
Networks of workstations (NOWs) provide an economical platform for high performance parallel computing. Such networks may comprise a variety of different types of workstations and network devices. This paper addresses the problem of efficient multicast in a heterogeneous communication model. Although the problem of finding optimal multicast schedules is known to be NP-complete in this model, a greedy algorithm has been shown experimentally to find good solutions in practice. In this paper we show that the greedy algorithm finds provably near-optimal schedules in polynomial time and that optimal schedules can be found in polynomial time when the number of distinct types of workstations is bounded by a constant. Specifically, this paper presents three results. First, when there are n workstations of some constant k distinct types, the greedy algorithm is shown to find schedules that complete at most a constant additive term later than optimal. Second, an algorithm is given that finds optimal schedules in time O(n2k). Finally, it is shown that for the general problem, the greedy algorithm finds solutions that complete the multicast in at most twice the optimal time. 相似文献
12.
Networks of workstations are an emerging architectural paradigm for high-performance parallel and distributed systems. Exploiting networks of workstations for massive data management poses exciting challenges. We consider here the problem of managing record-structured data in such an environment. For example, managing collections of HTML documents on a cluster of WWW servers is an important application for which our approach provides support. The records are accessed by a dynamically growing set of clients based on a search key (e.g., a URL). To scale up the throughput of client accesses with approximately constant response time, the records and thus also their access load are dynamically redistributed across a growing set of workstations. The paper addresses two problems of realistic workloads: skewed access frequencies to the records and evolving access patterns where previously cold records may become hot and vice versa. Our solution incorporates load tracking at different levels of granularity and automatically chooses the appropriate granularity for dynamic data migrations. Experimental results based on a detailed simulation model show that our method is indeed successful in providing scalable cost/performance and explicitly controlling its level. 相似文献
13.
Adam Beguelin Erik Seligman Peter Stephan 《Journal of Parallel and Distributed Computing》1997,43(2):2078
We have explored methods for checkpointing and restarting processes within the distributed object migration environment (Dome), a C++ library of data parallel objects that are automatically distributed over heterogeneous networks of workstations (NOWs). System level checkpointing methods, although transparent to the user, were rejected because they lack support for heterogeneity. We have implemented application level checkpointing which places the checkpoint and restart mechanisms within Dome's C++ objects. Application level checkpointing has been implemented with a library-based technique for the programmer and a more transparent preprocessor-based technique. Dome's implementation of checkpointing successfully checkpoints and restarts processes on different numbers of machines and different architectures. Results from executing Dome programs across a NOW with realistic failure rates have been experimentally determined and are compared with theoretical results. The overhead of checkpointing is found to be low, while providing substantial decreases in expected runtime on realistic systems. 相似文献
14.
矩阵相乘Cannon并行算法在工作站机群上的实现 总被引:2,自引:0,他引:2
矩阵相乘Cannon并行算法是一个基于分布式存储多处理机模型的并行数值算法,本文研究了它在工作站机群上的实现。在满足负载平衡和减少网络间数据传输的条件下,主要探讨了子任务在工作站上的优化分析策略,最后给出了在pvm并行编程环境下的具体实现方法。 相似文献
15.
Parallel Computing on an Ethernet Cluster of Workstations: Opportunities and Constraints 总被引:1,自引:0,他引:1
Parallel computing on clusters of workstations is receiving much attention from the research community. Unfortunately, many aspects of parallel computing over this parallel computing engine is not very well understood. Some of these issues include the workstation architectures, the network protocols, the communication-to-computation ratio, the load balancing strategies, and the data partitioning schemes. The aim of this paper is to assess the strengths and limitations of a cluster of workstations by capturing the effects of the above issues. This has been achieved by evaluating the performance of this computing environment in the execution of a parallel ray tracing application through analytical modeling and extensive experimentation. We were successful in illustrating the effect of major factors on the performance and scalability of a cluster of workstations connected by an Ethernet network. Moreover, our analytical model was accurate enough to agree closely with the experimental results. Thus, we feel that such an investigation would be helpful in understanding the strengths and weaknesses of an Ethernet cluster of workstation in the execution of parallel applications. 相似文献
16.
Networks of workstations are poised to become the primary computing infrastructure for science and engineering. NOWs may dramatically improve virtual memory and file system performance; achieve cheap, highly available, and scalable file storage: and provide multiple CPUs for parallel computing. Hurdles that remain include efficient communication hardware and software, global coordination of multiple workstation operating systems, and enterprise-scale network file systems. Our 100-node NOW prototype aims to demonstrate practical solutions to these challenges 相似文献
17.
基于群机系统的并行程序的最大加速比计算 总被引:1,自引:0,他引:1
加速比是并行程序的重要指标之一。在大多数并行系统中,在数据规 模确定的情况下,程序的加速比随节点工作站的增加而增加,但是大多数群机 系统的节点工作站是共享物理传输介质的,这使得许多并行程序的加速比在节 点机数目超过某一个值之后会随着节,点机的增加而减少。本文通过对群机系统 上并行程序执行时间的分析,论述了在数据规模确定的情况下,程序能够获得 的最大加速比和最短的计算时间,以及获得这个加速比和计算时间的节点机个 数。 相似文献
18.
The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. In this paper we address the feasibility and limitation of such a nondedicated parallel processing environment assuming workstation processes have priority over parallel tasks. We develop a simple analytical model to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. It forms a foundation for task partitioning and scheduling in a nondedicated network environment. A new term, task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. We propose that task ratio is a useful metric for determining how a parallel applications should be partitioned and scheduled in order to make efficient use of a nondedicated distributed system. 相似文献
19.
This paper describes WOW, a distributed system that combines virtual machine, overlay networking and peer-to-peer techniques
to create scalable wide-area networks of virtual workstations for high-throughput computing. The system is architected to:
facilitate the addition of nodes to a pool of resources through the use of system virtual machines (VMs) and self-organizing
virtual network links; to maintain IP connectivity even if VMs migrate across network domains; and to present to end-users
and applications an environment that is functionally identical to a local-area network or cluster of workstations. We describe
IPOP, a network virtualization technique that builds upon a novel, extensible user-level decentralized technique to discover,
establish and maintain overlay links to tunnel IP packets over different transports (including UDP and TCP) and across firewalls.
We evaluate latency and bandwidth overheads of IPOP and also time taken for a new node to become fully-routable over the virtual
network. We also report on several experiments conducted on a testbed WOW deployment with 118 P2P router nodes over PlanetLab
and 33 VMware-based VM nodes distributed across six firewalled domains. Experiments show that the testbed delivers good performance
for two unmodified, representative benchmarks drawn from the life-sciences domain. We also demonstrate that the system is
capable of seamlessly maintaining connectivity at the virtual IP layer for typical client/server applications (NFS, SSH, PBS)
when VMs migrate across a WAN. 相似文献
20.
This paper presents efficient algorithms for broadcasting on heterogeneous switch-based networks of workstations (HNOW) by two partitioned sub-networks. In an HNOW, many multiple speed types of workstations have different send and receive overheads. Previous research has found that routing by two sub-networks in a NOW can significantly increase system’s performance (Proc. 10th International Conference on Computer Communications and Networks, pp. 68–73, 2001). Similarly, EBS and VBBS (Proc. 8th IEEE International Symposium on Computer and Communication, pp. 1277–1284, (2003)), designed by applying the concept of fastest nodes first, can be executed in O(nlog(n)) time, where n is the number of workstations. This paper proposes two schemes TWO-EBS and TWO-VBBS for broadcasting in an HNOW. These two schemes divide an HNOW into two sub-networks that are routed concurrently and combine EBS and VBBS to broadcast in an HNOW. Based on simulation results, TWO-VBBS outperforms EBS, VBBS, VBBSWF (Proc. 8th IEEE International Symposium on Computer and Communication, pp. 1277–1284, (2003)), the postorder recursive doubling (Proc. Merged IPPS/SPDP Conference, pp. 358–364, (1998)), and the optimal scheduling tree (Proc. Parallel and Distributed Processing Symposium, Proc. 15th International (2001)) generated by dynamic programming in an HNOW. 相似文献