期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

吴礼发谢立《计算机研究与发展》1998,35(11):1042-1047

工作站网络（ＮＯＷ）作为一种新的并行计算结构越来越受到人们的重视。文中讨论了基于异构网络的ＮＯＷ中群通信问题，提出了一种群通信层次模型，较好地解决了异构网络环境中的群通信问题，其实现群通信的思想是将参与群通信操作的结点按它们所处的子网分成一个个基计算域，通过这些基计算域的并行操作来达到降低通信时延同时又不增加太多网络流量的目的，并且还可以在不同的子网中选择最合适的群通信实现，这一做法做全网面向过程相似文献

2.

一种实用的并行计算模型 总被引：11，自引：0，他引：11

计永昶丁卫群陈国良安虹《计算机学报》2001,24(4):437-441

对于当前流行的工作站集群环境和各类并行机系统,文中提出了一种实用的并行计算模型,即基于LogGP的非独占异质同步模型NHBL（Nondedicated Heterogeneous Barrier LogGP）,它旨在反映具有异质性和非独占性的NOW计算环境对并行算法设计和分析的影响,然后用NHBL模型分析了PSRS算法在国家高性能计算中心（合肥）的工作站集群NHPCC-Cluster和曙光－1000MPP由的代价,并用实测结果进行了验证。相似文献

3.

Performance bottlenecks and potentials of parallel computing on networks of workstations

YONG YANJ XING DU XIAODONG ZHANG CHENXI ZHANG 《International journal of systems science》2013,44(11):1045-1056

The network of workstations (NOW) we consider for parallel computing is heterogeneous and nondedicated (time-sharing), where computing power varies among the workstations, and multiple jobs may interact with each other in execution. We address three performance issues in this paper. First, we examine the effects of heterogeneity on co-scheduling and local scheduling policies for parallel computing. Through experimentation and quantitative comparisons, we discuss features and requirements of scheduling policies on heterogeneous NOW. Second, the heterogeneity and non-dedication of NOW introduce new performance factors into parallel computing, which make traditional performance metrics for parallel computing under homogeneous platforms not suitable. We conducted a collection of experimental measurements to show the performance impact to parallel computing. Finally, using network latencies we experimentally evaluate the parallel computing scalability on NOW. Our objective of this study is to provide insights into unique performance bottlenecks and potentials of networks of workstations. 相似文献

4.

局域网上并行计算中的通信问题

刘芳翁惠玉《计算机工程与应用》1999,35(8):87-89

基于局域网的工作站机群（NOW）,以其巨大的计算潜力、良好的性能价格比、可扩展性又灵活的体系结构而受到人们的重视。要建立有良好并行性能的NOW通用系统,必须解决许多问题,如工作站间通信开销大的问题。负载平衡问题、异构、容错等问题。文章较详细地讨论了在实现NOW系统存在的一些通信问题,并分析了国内外对此所做的一些工作。相似文献

5.

An Effective and Practical Performance Prediction Model for Parallel Computing on Nondedicated Heterogeneous NOW

《Journal of Parallel and Distributed Computing》1996,38(1):63-80

Networks of workstations (NOW) are receiving increased attention as a viable platform for high performance parallel computations. Heterogeneity and time-sharing are two characteristics that distinguish the NOW systems from conventional multiprocessor/multicomputer systems which are homogeneous and dedicated. It is important to have a practical model for users to predict the execution times of large-scale parallel applications on nondedicated heterogeneous NOW. Another objective of this study is to provide insight into the dynamic performance of parallel computing and into the effects of program structures and system factors on such a platform. In this paper, we study performance predictions for parallel computing on nondedicated heterogeneous networks of workstations. Our approach is based on a two-level model. On the top level, a semideterministic task graph is used to capture the parallel execution behavior including the variances of communication and synchronization. On the bottom level, a discrete time model is used to quantify effects from NOW systems. An iterative process is used to determine the interactive effects between network contention and task execution. We validate the prediction model using experiments on a nondedicated heterogeneous NOW. The maximum differences between predicted results and measured results were less than 10% in most cases and 15% in the worst cases. 相似文献

6.

一种针对结构化并行控制机制的任务调度算法 总被引：4，自引：0，他引：4

张宏莉方滨兴胡铭曾《软件学报》2001,12(5):706-710

缩短程序的执行时间是并行处理的首要目标,有效的任务分配算法是实现这一目标的关键,对机群系统来说更是如此.研究机群系统上针对结构化并行控制机制的任务调度问题,并基于贪心算法、粒度控制、反馈式分派的原则,提出近优的任务调度算法SSA(sub-optimal scheduling algorithm).实验结果表明,在机群环境下,该算法的并行计算性能与其他算法相比均有所提高. 相似文献

7.

网络并行计算中矩阵QR分解的并行算法

张艳孙世新《计算机应用》2000,20(10):29-32

随着高速网络技术（如ＡＴＭ）的出现,网络并行计算系统（ＮＯＷ）已成为并行处理的主要平台,由于它的高通信延迟,某些在并行机上实现的细粒度并行算法已不适合在该环境下运行。为此,有必要对算法重新进行任务划分,研究它在网络环境中的并行实现。基于这一点,本文对矩阵的ＱＲ分解提出了一种新的任务划分策略,并由此得到了它的一种粗粒度并行算法,实验结果表明,设计的并行算法在网络并行计算环境中具有较高的加速比。相似文献

8.

Architectural support for efficient multicasting in irregularnetworks

Sivaram R. Kesavan R. Panda D.K. Stunkel C.B. 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(5):489-513

Parallel computing on networks of workstations is fast becoming a cost-effective high-performance computing alternative to MPPs. Such a computing environment typically consists of processing nodes interconnected through a switch-based irregular network. Many of the problems that were solved for regular networks have to be solved anew for these systems. One such problem is that of efficient multicast communication. In this paper, we propose two broad categories of schemes for efficient multicasting in such irregular networks: network interface-based (NI-based) and switch-based. The NI-based multicasting schemes use the network interface of intermediate destinations for absorbing and retransmitting messages to other destinations in the multicast tree. In contrast, the switch-based multicasting schemes use hardware support for packet replication at the switches of the network and a concept known as multidestination routing to convey a multicast message from one source to multiple destinations. We first present alternative schemes for efficient multipacket forwarding at the NI and derive an optimal k-binomial multicast tree for multipacket NI-based multicast. We then propose two switch-based multicasting schemes that differ in the power of the encoding scheme and the complexity of the decoding logic at the switches. These multicasting schemes use path-based multidestination worms that can cover all nodes connected to switches along a valid unicast path and tree-based multidestination worms that can cover entire destination sets in a single phase using one worm, respectively. For each scheme, we describe the associated header encoding and decoding operation, the method for deriving multidestination worms that cover arbitrary multicast destination sets, and the multicasting scheme using the derived multidestination worms 相似文献

9.

A case for NOW (Networks of Workstations)

Anderson T.E. Culler D.E. Patterson D. 《Micro, IEEE》1995,15(1):54-64

Networks of workstations are poised to become the primary computing infrastructure for science and engineering. NOWs may dramatically improve virtual memory and file system performance; achieve cheap, highly available, and scalable file storage: and provide multiple CPUs for parallel computing. Hurdles that remain include efficient communication hardware and software, global coordination of multiple workstation operating systems, and enterprise-scale network file systems. Our 100-node NOW prototype aims to demonstrate practical solutions to these challenges 相似文献

10.

PORTING REGULAR APPLICATIONS ON HETEROGENEOUS WORKSTATION NETWORKS: PERFORMANCE ANALYSIS AND MODELING

《International Journal of Parallel, Emergent and Distributed Systems》2012,27(3):205-226

Abstract

Heterogeneous networks of workstations and/or personal computers (NOW) are increasingly used as a powerful platform for the execution of parallel applications. When applications previously developed for traditional parallel machines (homogeneous and dedicated) are ported to NOWs, performance worsens owing in part to less efficient communications but more often to unbalancing.

In this paper, we address the problem of the efficient porting to heterogeneous NOWs of data-parallel applications originally developed using the SPMD paradigm for homogeneous parallel systems with regular topology like ring.

To achieve good performance, the computation time on the various machines composing the NOW must be as balanced as possible. This can be obtained in two ways: by using an heterogeneous data partition strategy with a single process per node, or by splitting homogeneously data among processes and assigning to each node a number of processes proportional to its computing power. The first method is however more difficult, since some modifications in the code are always needed, whereas the second approach requires very few changes.

We carry out a simplified but reliable analysis, and propose a simple model able to simulate performance in the various situations. Two test cases, matrix multiplication and computation of long-range interactions, are considered, obtaining a good agreement between simulated and experimental results.

Our analysis shows that an efficient porting of regular homogeneous data-parallel applications on heterogeneous NOWs is possible. Particularly, the approach based on multiple processes per node turns out to be a straightforward and effective way for achieving very satisfying performance in almost all situations, even dealing with highly heterogeneous systems. 相似文献

11.

Dynamic scheduling techniques for heterogeneous computing systems

Babak Hamidzadeh Yacine Atif David J. Lilja 《Concurrency and Computation》1995,7(7):633-652

There has been a recent increase of interest in heterogeneous computing systems, due partly to the fact that a single parallel architecture may not be adequate for exploiting all of a program's available parallelism. In some cases, heterogeneous systems have been shown to produce higher performance for lower cost than a single large machine. However, there has been only limited work on developing techniques and frameworks for partitioning and scheduling applications across the components of a heterogeneous system. In this paper we propose a general model for describing and evaluating heterogeneous systems that considers the degree of uniformity in the processing elements and the communication channels as a measure of the heterogeneity in the system. We also propose a class of dynamic scheduling algorithms for a heterogeneous computing system interconnected with an arbitrary communication network. These algorithms execute a novel optimization technique to dynamically compute schedules based on the potentially non-uniform computation and communication costs on the processors of a heterogeneous system. A unique aspect of these algorithms is that they easily adapt to different task granularities, to dynamically varying processor and system loads, and to systems with varying degrees of heterogeneity. Our simulations are designed to facilitate the evaluation of different scheduling algorithms under varying degrees of heterogeneity. The results show improved performance for our algorithms compared to the performance resulting from existing scheduling techniques. 相似文献

12.

Coordinating Parallel Processes on Networks of Workstations 总被引：1，自引：0，他引：1

Xing Du Xiaodong Zhang 《Journal of Parallel and Distributed Computing》1997,46(2):186

The network of workstations (NOW) we consider for scheduling is heterogeneous and nondedicated, where computing power varies among the workstations and local and parallel jobs may interact with each other in execution. An effective NOW scheduling scheme needs sufficient information about system heterogeneity and job interactions. We use the measured power weight of each workstation to quantify the differences of computing capability in the system. Without a processing power usage agreement between parallel jobs and local user jobs in a workstation, job interactions are unpredictable, and performance of either type of jobs may not be guaranteed. Using the quantified and deterministic system information, we design a scheduling scheme calledself-coordinated local schedulingon a heterogeneous NOW. Based on a power usage agreement between local and parallel jobs, this scheme coordinates parallel processes independently in each workstation based on the coscheduling principle. We discuss its implementation on Unix System V Release 4 (SVR4). Our simulation results on a heterogeneous NOW show the effectiveness of the self-coordinated local scheduling scheme. 相似文献

13.

Usefulness of adaptive load sharing for parallel processing on networks of workstations

Sheldon Clarke Sivarama P. Dandamudi 《Concurrency and Computation》1999,11(8):387-405

Networks of workstations (NOWs) can be used for parallel processing by using public domain software like PVM. However, NOW-based parallel processing suffers from node heterogeneity, background load variations, and high-latency, low-bandwidth communication network. Previous studies on load sharing in NOW-based systems have indicated that, for applications using the work-pile model, a simple load sharing scheme in which the master process gives a fixed amount of work to the slave processes performs as well as any other, more complex schemes. In this paper, we propose a new adaptive load sharing scheme and evaluate its performance using a Pentium-based NOW machine. The communication network used in the system consists of the standard 10 Mbps Ethernet and the 100 Mbps fast Ethernet. We use both these networks to study their impact on the performance of our new policy. The results presented here indicate that the new policy is useful for computation-intensive applications. Copyright © 1999 John Wiley & Sons, Ltd. 相似文献

14.

On Multicast Algorithms for Heterogeneous Networks of Workstations

《Journal of Parallel and Distributed Computing》2001,61(11):1665-1679

Networks of workstations (NOWs) provide an economical platform for high performance parallel computing. Such networks may comprise a variety of different types of workstations and network devices. This paper addresses the problem of efficient multicast in a heterogeneous communication model. Although the problem of finding optimal multicast schedules is known to be NP-complete in this model, a greedy algorithm has been shown experimentally to find good solutions in practice. In this paper we show that the greedy algorithm finds provably near-optimal schedules in polynomial time and that optimal schedules can be found in polynomial time when the number of distinct types of workstations is bounded by a constant. Specifically, this paper presents three results. First, when there are n workstations of some constant k distinct types, the greedy algorithm is shown to find schedules that complete at most a constant additive term later than optimal. Second, an algorithm is given that finds optimal schedules in time O(n^2k). Finally, it is shown that for the general problem, the greedy algorithm finds solutions that complete the multicast in at most twice the optimal time. 相似文献

15.

Efficient fault-tolerant routing in multihop optical WDM networks

Hong Shen Chin F. Yi Pan 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(10):1012-1025

This paper addresses the problem of efficient routing in unreliable multihop optical networks supported by Wavelength Division Multiplexing (WDM). We first define a new cost model for routing in (optical) WDM networks that is more general than the existing models. Our model takes into consideration not only the cost of wavelength access and conversion but also the delay for queuing signals arriving at different input channels that share the same output channel at the same node. We then propose a set of efficient algorithms in a reliable WDM network on the new cost model for each of the three most important communication patterns-multiple point-to-point routing, multicast, and multiple multicast. Finally, we show how to obtain a set of efficient algorithms in an unreliable WDM network with up to f faulty optical channels and wavelength conversion gates. Our strategy is to first enhance the physical paths constructed by the algorithms for reliable networks to ensure success of fault-tolerant routing, and then to route among the enhanced paths to establish a set of fault-free physical routes to complete the corresponding routing request for each of the communication patterns 相似文献

16.

Multicast communication in wormhole-routed 2D torus networks with hamiltonian cycle model

Neng-Chung Wang Yi-Ping Hung 《Journal of Systems Architecture》2009,55(1):70-78

In this paper, we propose an efficient multipath multicast routing algorithm in wormhole-routed 2D torus networks. We first introduce a hamiltonian cycle model for exploiting the feature of torus networks. Based on this model, we find a hamiltonian cycle in torus networks. Then, an efficient multipath multicast routing algorithm with hamiltonian cycle model (mulitpath-HCM) is presented. The proposed multipath multicast routing algorithm utilizes communication channels more uniformly in order to reduce the path length of the routing messages, making the multicasting more efficient. Simulation results show that the multicast latency of the proposed multipath-HCM routing algorithm is superior to that of fixed and dual-path routing algorithms. 相似文献

17.

基于NOW的对象式分布式程序设计语言NC++

顾庆谢立陈道蓄吴迎红孙钟秀《软件学报》2001,12(2):183-189

提出了一个基于工作站网(networkofworkstations,简称NOW)的分布式程序设计语言NC++(NOWC++).它是DC++语言的扩充.NC++提供了一个完备的编程环境,包括NC++预编译器、图视编程界面、多目通信机制和测试系统.它完善了组管理机制和进程通信机制,提出了一个基于信度推理网络的分布共享内存(distributedsharedmemory,简称DSM)机制以管理C++公共变量.实践证明,NC++语言在确保编程方便性的前提下保证了分布式程序的性能. 相似文献

18.

Multiple multicast with minimized node contention on wormhole k-aryn-cube networks

Kesavan R. Panda D.K. 《Parallel and Distributed Systems, IEEE Transactions on》1999,10(4):371-393

This paper presents a new approach to minimize node contention while performing multiple multicast/broadcast on wormhole k-ary n-cube networks with overlapped destination sets. The existing multicast algorithms in the literature deliver poor performance under multiple multicast because these algorithms have been designed with only single multicast in mind. The new algorithms introduced in this paper do not use any global knowledge about the respective destination sets of the concurrent multicasts. Instead, only local information and a source-specific partitioning approach are used. For systems supporting unicast message-passing, a new SPUmesh (Source-Partitioned Umesh) algorithm is proposed and is shown to be superior than the conventional Umesh algorithm for multiple multicast. Two different algorithms, SQHL (Source-Quadrant Hierarchical Leader) and SCHL (Source-Centered Hierarchical Leader), are proposed for systems with multidestination message-passing and shown to be superior than the HL scheme. All of these algorithms perform 1) 5-10 times faster than the existing algorithms under multiple multicast and 2) as fast as existing algorithms under single multicast. Furthermore, the SCHL scheme demonstrates that the latency of multiple multicast can, in fact, be reduced as the degree of multicast increases beyond a certain number. Thus, these algorithms demonstrate significant potential to be used for designing fast and scalable collective communication libraries on current and future generation wormhole systems 相似文献

19.

Fault-Tolerant Matrix Operations for Networks of Workstations Using Diskless Checkpointing

James S. Plank Youngbae Kim Jack J. Dongarra 《Journal of Parallel and Distributed Computing》1997,43(2):427

Networks of workstations (NOWs) offer a cost-effective platform for high-performance, long-running parallel computations. However, these computations must be able to tolerate the changing and often faulty nature of NOW environments. We present high-performance implementations of several fault-tolerant algorithms for distributed scientific computing. The fault-tolerance is based on diskless checkpointing, a paradigm that uses processor redundancy rather than stable storage as the fault-tolerant medium. These algorithms are able to run on clusters of workstations that change over time due to failure, load, or availability. As long as there are at leastnprocessors in the cluster, and failures occur singly, the computation will complete in an efficient manner. We discuss the details of how the algorithms are tuned for fault-tolerance and present the performance results on a PVM network of Sun workstations connected by a fast, switched ethernet. 相似文献

20.

Parallel application performance on shared high performance reconfigurable computing resources

Melissa C. Gregory D. 《Performance Evaluation》2005,60(1-4):107-125

The use of a network of shared, heterogeneous workstations each harboring a reconfigurable computing (RC) system offers high performance users an inexpensive platform for a wide range of computationally demanding problems. However, effectively using the full potential of these systems can be challenging without the knowledge of the system's performance characteristics. While some performance models exist for shared, heterogeneous workstations, none thus far account for the addition of RC systems. Our analytic performance model includes the effects of the reconfigurable device, application load imbalance, background user load, basic message passing communication, and processor heterogeneity. The methodology proves to be accurate in characterizing these effects for applications running on shared, homogeneous, and heterogeneous HPRC resources. The model error in all cases was found to be less than 5% for application runtimes greater than 30 s, and less than 15% for runtimes less than 30 s. 相似文献