首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Using a workstation cluster for parallel program development requires consideration of various factors to optimise the mapping of the algorithm to the characteristics of the environment. In this paper we present a new analysis and verification of well-known ideas in parallel programming research of specific importance to both the use and design of workstation cluster computing systems. We define a new performance measure related to memory resource utilisation and show how redundant memory usage can lead to poor memory utilisation of the cluster. We also present analytical and experimental evidence that the pool-of-tasks paradigm can lead to significantly improved speedup over series–parallel algorithms, especially when considering equivalent computational and communication requirements. The effect of load balancing on the series–parallel and pool-of-tasks algorithms is examined, and our analysis and experimental results confirm not only that the pool-of-tasks algorithms are more robust to load imbalances but that the effect of the imbalance is mitigated when more workstations are used. © 1998 John Wiley & Sons, Ltd.  相似文献   

2.
Parallel computing on interconnected workstations is becoming a viable and attractive proposition due to the rapid growth in speeds of interconnection networks and processors. In the case of workstation clusters, there is always a considerable amount of unused computing capacity available in the network. However, heterogeneity in architectures and operating systems, load variations on machines, variations in machine availability, and failure susceptibility of networks and workstations complicate the situation for the programmer. In this context, new programming paradigms that reduce the burden involved in programming for distribution, load adaptability, heterogeneity and fault tolerance gain importance. This paper identifies the issues involved in parallel computing on a network of workstations. The anonymous remote computing (ARC) paradigm is proposed to address the issues specific to parallel programming on workstation systems. ARC differs from the conventional communicating process model by treating a program as one single entity consisting of several loosely coupled remote instruction blocks instead of treating it as a collection of processes. The ARC approach results in distribution transparency and heterogeneity transparency. At the same time, it provides fault tolerance and load adaptability to parallel programs on workstations. ARC is developed in a two-tiered architecture consisting of high level language constructs and low level ARC primitives. The paper describes an implementation of the ARC kernel supporting ARC primitives  相似文献   

3.
Hamdi  Mounir  Pan  Yi  Hamidzadeh  B.  Lim  F. M. 《The Journal of supercomputing》1999,13(2):111-132
Parallel computing on clusters of workstations is receiving much attention from the research community. Unfortunately, many aspects of parallel computing over this parallel computing engine is not very well understood. Some of these issues include the workstation architectures, the network protocols, the communication-to-computation ratio, the load balancing strategies, and the data partitioning schemes. The aim of this paper is to assess the strengths and limitations of a cluster of workstations by capturing the effects of the above issues. This has been achieved by evaluating the performance of this computing environment in the execution of a parallel ray tracing application through analytical modeling and extensive experimentation. We were successful in illustrating the effect of major factors on the performance and scalability of a cluster of workstations connected by an Ethernet network. Moreover, our analytical model was accurate enough to agree closely with the experimental results. Thus, we feel that such an investigation would be helpful in understanding the strengths and weaknesses of an Ethernet cluster of workstation in the execution of parallel applications.  相似文献   

4.
Many important parallel applications are data parallel, and may be efficiently implemented on a workstation cluster by allocating each workstation a contiguous partition of the data domain. Implementation on non-dedicated clusters, however, is complicated by the possibility of changes in workstation availability. For example, a personal workstation may be reclaimed by its primary user for interactive use. In such situations, a node must be removed from the collection of workstations forming the “virtual parallel machine” allocated to the application, and data redistributed accordingly. Conversely, workstations may become available to join the virtual parallel machine.This paper identifies fundamental characteristics of efficient policies for data redistribution following addition/removal of workstations from the cluster. The following conclusions are obtained based on mathematical analysis and simulations: (a) allocating data to a new node from the center of the data domain substantially reduces data migration costs compared to allocation from the edge; (b) addition in groups is beneficial compared to repeated single additions; and (c) even a large number of incremental adjustments of the data domain partitions, owing to successive additions/removals of nodes, do not appear to substantially degrade partition quality compared to that obtained by partitioning from scratch. We believe that these observations can be fruitfully incorporated in the design of workstation cluster support systems for data parallel computing.  相似文献   

5.
LAN-connected workstations are a heterogeneous environment, where each workstation provides time-varying computing power, and thus dynamic load balancing mechanisms are necessary for parallel applications to run efficiently. Parallel basic linear algebra subprograms (BLAS) have recently shown promise as a means of taking advantage of parallel computing in solving scientific problems. Most existing parallel algorithms of BLAS are designed for conventional parallel computers; they do not take the particular characteristics of LAN-connected workstations into consideration. This paper presents a parallelizing method of Level 3 BLAS for LAN-connected workstations. The parallelizing method makes dynamic load balancing throughcolumn-blockingdata distribution. The experiment results indicate that this dynamic load balancing mechanism really leads to a more efficient parallel level 3 BLAS for LAN-connected workstations.  相似文献   

6.
The author analyzes workstation patterns in order to understand opportunities for exploiting idle capacity. This study is based on traces of users workstation activity in a university environment. It identifies two areas where enhancements can be made. One area is the ability of a manager of the shared capacity of a workstation cluster to schedule jobs with deadline constraints. This opportunity is the result of the ability to make good predictions of the time-varying amount of capacity that is available for sharing. A prediction strategy is developed that is shown to have only a small amount of error. For the second area of enhancement, it is shown that it is feasible to allocate partitions of workstations for specific periods. This aids those users who on occasion need exclusive access to several machines. The author examines the profile of periods during which exclusive access to partitions can be given, the rate that owners preempt users of partitions, and the distribution of interpreemption intervals  相似文献   

7.
针对利用网上空闲机进行非专用机群分布式并行计算的环境,研究复杂通用网络上用什么指标发现空闲处理机及如何实时衡量处理机的负载变化以调度分配处理机.在研究现有分布式系统和专用机群负载指标的基础上,提出了一种适用于网络机群计算环境的复合型负载指标,详细讨论了它在系统中的作用和实现,并通过大量测试分析实验,得出了一个合理的负载指标更新周期.  相似文献   

8.
Coordinating Parallel Processes on Networks of Workstations   总被引:1,自引:0,他引:1  
The network of workstations (NOW) we consider for scheduling is heterogeneous and nondedicated, where computing power varies among the workstations and local and parallel jobs may interact with each other in execution. An effective NOW scheduling scheme needs sufficient information about system heterogeneity and job interactions. We use the measured power weight of each workstation to quantify the differences of computing capability in the system. Without a processing power usage agreement between parallel jobs and local user jobs in a workstation, job interactions are unpredictable, and performance of either type of jobs may not be guaranteed. Using the quantified and deterministic system information, we design a scheduling scheme calledself-coordinated local schedulingon a heterogeneous NOW. Based on a power usage agreement between local and parallel jobs, this scheme coordinates parallel processes independently in each workstation based on the coscheduling principle. We discuss its implementation on Unix System V Release 4 (SVR4). Our simulation results on a heterogeneous NOW show the effectiveness of the self-coordinated local scheduling scheme.  相似文献   

9.
The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. In this paper we address the feasibility and limitation of such a nondedicated parallel processing environment assuming workstation processes have priority over parallel tasks. We develop a simple analytical model to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. It forms a foundation for task partitioning and scheduling in a nondedicated network environment. A new term, task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. We propose that task ratio is a useful metric for determining how a parallel applications should be partitioned and scheduled in order to make efficient use of a nondedicated distributed system.  相似文献   

10.
基于Myrinet的用户空间精简协议   总被引:5,自引:0,他引:5  
董春雷  郑纬民 《软件学报》1999,10(3):299-303
通信子系统是影响工作站机群系统整体性能的主要因素.文章在分析和比较了3种常用的网络性能之后,指出上层协议的处理是影响工作站机群系统性能的主要瓶颈.在由640Mbps的Myrinet连接的8台Sun SPARC工作站组成的机群系统上实现了一个用户层的高性能的精简通信协议——RCP(reduced communication protocol).通过精简协议的冗余功能、减少数据拷贝次数和直接操作硬件缓冲区等方法,达到低延迟、高效率.RCP的回路延迟时间比TCP/IP小得多(200μs vs 1 540μs),  相似文献   

11.
As the prices of commodity workstations go down, clusters of workstations have started to emerge as a viable economic solution for scalable computing. Recent advances in networking technology have made it possible to obtain high-bandwidth connections between applications. However, the interconnect latency between workstation nodes in a cluster remains a serious concern and can prove to be the limiting factor in workstation performance. In this paper, we present the CNI orcluster network interface that achieves the twin goals of low latency and high bandwidth. In addition, CNI efficiently supports multiple programming paradigms for programming generality. This is done by functionally coupling the network interface more closely to the CPU without violating the constraints of a standard workstation architecture, CNI results in performance gains for applications, substantially reducing communication overhead and delay.  相似文献   

12.
工作站网络上协作任务的调度   总被引:9,自引:0,他引:9  
齐红  鞠九滨 《软件学报》1998,9(1):14-17
在利用工作站群集系统进行的协作模式并行计算中,任务调度在很大程度上决定并行计算的性能.本文给出了一个任务调度的模型和算法,它考虑了协作模式并行计算中任务间的同步时间、通信时间、数据加载及结果收集时间.根据这个调度模型,可以选择一组并行执行时间最短的工作站,从而获得较好的并行计算性能.  相似文献   

13.
Networks of workstations are poised to become the primary computing infrastructure for science and engineering. NOWs may dramatically improve virtual memory and file system performance; achieve cheap, highly available, and scalable file storage: and provide multiple CPUs for parallel computing. Hurdles that remain include efficient communication hardware and software, global coordination of multiple workstation operating systems, and enterprise-scale network file systems. Our 100-node NOW prototype aims to demonstrate practical solutions to these challenges  相似文献   

14.
On examining offices, one can notice a rapidly increasing usage of decentralized computing power combined with digital communication capabilities incorporated in personal workstations for supporting a wide variety of tasks. Since an appropriate technology is now commercially available, these tasks will comprise the preparation and interchange of documents containing text, graphics, facsimiles, data, digitized speech annotations, etc. Currently existing document communication services such as Telefax and Teletex are presented. Important techniques for the interchange of mixed text-image documents are outlined, i.e. suitable facsimile rasters and a Document Architecture Model applicable to the presentation layer of the Open Systems Interconnection Reference Model. Data volumes, transmission times and buffer sizes are analysed. The features of a future standardized Mixed Mode Teletex Option are discussed, and finally an experimental text-image workstation is described.  相似文献   

15.
Parallel applications can be executed using the idle computing capacity of workstation clusters. However, it remains unclear how to schedule the processors among different applications most effectively. Processor scheduling algorithms that were successful for shared-memory machines have proven to be inadequate for distributed memory environments due to the high costs of remote memory accesses and redistributing data. We investigate how knowledge of system load and application characteristics can be used in scheduling decisions. We propose a new algorithm based on adaptive equipartitioning, which, by properly exploiting both the information types above, performs better than other nonpreemptive scheduling rules, and nearly as well as idealized versions of preemptive rules (with free preemption). We conclude that the new algorithm is suitable for use in scheduling parallel applications on networks of workstations.  相似文献   

16.
A Knowledge Based Design Methodology for manufacturing assembly lines   总被引:1,自引:0,他引:1  
In assembly line design, the problem of balancing has received most attention from past researchers, and a number of algorithms have been devised for the analysis of single, multi- and mixed-product assembly lines [Int. J. Prod. Res. 27 (1989) 637]. In many cases, such algorithms seek a solution for the particular situation, which is under consideration and therefore have very little flexibility for generic application to assembly line design. Real life practical design issues include stochastic operation times, parallel workstation requirements, feasibility for workstation combining, and parallel line implementations, all of which are features which are ignored in many analyses. This paper presents a Knowledge Based Design Methodology (KBDM) for automated and manual assembly lines, which can be applied equally well to single, multi- and mixed-product assembly lines with either deterministic operation times or stochastic operation times. The methodology starts from a suitable assembly system selection and thereafter decides suitable cycle times, parallel workstation requirements, and parallel line implementation for the type of assembly system being selected. An economical number of workstations are decided with the aid of workstation combining options depending upon the factual information provided. The end result is the detailed design of a manufacturing assembly line. A case study from a practical assembly line is presented to illustrate how the KBDM works.  相似文献   

17.
基于群机系统的并行程序的最大加速比计算   总被引:1,自引:0,他引:1  
加速比是并行程序的重要指标之一。在大多数并行系统中,在数据规 模确定的情况下,程序的加速比随节点工作站的增加而增加,但是大多数群机 系统的节点工作站是共享物理传输介质的,这使得许多并行程序的加速比在节 点机数目超过某一个值之后会随着节,点机的增加而减少。本文通过对群机系统 上并行程序执行时间的分析,论述了在数据规模确定的情况下,程序能够获得 的最大加速比和最短的计算时间,以及获得这个加速比和计算时间的节点机个 数。  相似文献   

18.
One of the major challenges facing cloud computing is to accurately predict future resource usage to provision data centers for future demands. Cloud resources are constantly in a state of flux, making it difficult for forecasting algorithms to produce accurate predictions for short times scales (ie, 5 minutes to 1 hour). This motivates the research presented in this paper, which compares nonlinear and linear forecasting methods with a sequence prediction algorithm known as a recurrent neural network to predict CPU utilization and network bandwidth usage for live migration. Experimental results demonstrate that a multitime-ahead prediction algorithm reduces bandwidth consumption during critical times and improves overall efficiency of a data center.  相似文献   

19.
We propose a new approach, called cluster-based search (CBS), for scheduling large task graphs in parallel on a heterogeneous cluster of workstations connected by a high-speed network (e.g., using an ATM switch at OC-3 speed). The CBS algorithm uses a parallel random neighborhood search which works by refining multiple different initial schedules simultaneously using different workstations. The workstations communicate periodically to exchange their best solutions found thus far in order to direct the search to more promising regions in the search space. Heterogeneity of machines is exploited by the biased partitioning of the search space. The parallel random neighborhood search is fault-tolerant in that the workload of a failed workstation is automatically redistributed to other workstations so that the search can continue. We have implemented the CBS algorithm as a core function of our on-going development of SSI middleware for a Sun workstation cluster.  相似文献   

20.
The performance and proliferation of workstations continues to increase at a rapid rate. However, the practical utilization of workstation networks for parallel computing is still in its infancy. This is due to the relative immaturity of programming tools, low bandwidth networks such as Ethernet, and high message latencies. However, programming tools are becoming more mature and network bandwidths are increasing rapidly. Hence, networks of commodity workstations may prove to be practical for certain classes of parallel applications. This paper describes our experiences with two applications parallelized on a network of Sun workstations. The first application is from Shell's petroleum engineering department. This program quantitatively derives rock and porefill composition from well-log data, using a compute-intensive iterative optimization procedure. The second application is time filtering, which is a fundamental operation performed on seismic traces. Through our experiments we identify the limits of networked parallel computing based on the current state of network technology. We also provide a discussion on the possible impact of future high speed networks on networked parallel computing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号