期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Anonymous remote computing: a paradigm for parallel programming oninterconnected workstations

Joshi R.K. Ram D.J. 《IEEE transactions on pattern analysis and machine intelligence》1999,25(1):75-90

Parallel computing on interconnected workstations is becoming a viable and attractive proposition due to the rapid growth in speeds of interconnection networks and processors. In the case of workstation clusters, there is always a considerable amount of unused computing capacity available in the network. However, heterogeneity in architectures and operating systems, load variations on machines, variations in machine availability, and failure susceptibility of networks and workstations complicate the situation for the programmer. In this context, new programming paradigms that reduce the burden involved in programming for distribution, load adaptability, heterogeneity and fault tolerance gain importance. This paper identifies the issues involved in parallel computing on a network of workstations. The anonymous remote computing (ARC) paradigm is proposed to address the issues specific to parallel programming on workstation systems. ARC differs from the conventional communicating process model by treating a program as one single entity consisting of several loosely coupled remote instruction blocks instead of treating it as a collection of processes. The ARC approach results in distribution transparency and heterogeneity transparency. At the same time, it provides fault tolerance and load adaptability to parallel programs on workstations. ARC is developed in a two-tiered architecture consisting of high level language constructs and low level ARC primitives. The paper describes an implementation of the ARC kernel supporting ARC primitives 相似文献

2.

Molecular dynamics simulation on a network of workstations using a machine-independent parallel programming language.

M A Shifman A Windemuth K Schulten P L Miller 《Computers and biomedical research》1992,25(2):168-180

Molecular dynamics simulations investigate local and global motion in molecules. Several parallel computing approaches have been taken to attack the most computationally expensive phase of molecular simulations, the evaluation of long range interactions. This paper reviews these approaches and develops a straightforward but effective algorithm using the machine-independent parallel programming language, Linda. The algorithm was run both on a shared memory parallel computer and on a network of high performance Unix workstations. Performance benchmarks were performed on both systems using two proteins. This algorithm offers a portable cost-effective alternative for molecular dynamics simulations. In view of the increasing numbers of networked workstations, this approach could help make molecular dynamics simulations more easily accessible to the research community. 相似文献

3.

TreadMarks: shared memory computing on networks of workstations 总被引：2，自引：0，他引：2

Amza C. Cox A.L. Dwarkadas S. Keleher P. Honghui Lu Rajamony R. Weimin Yu Zwaenepoel W. 《Computer》1996,29(2):18-28

Shared memory facilitates the transition from sequential to parallel processing. Since most data structures can be retained, simply adding synchronization achieves correct, efficient programs for many applications. We discuss our experience with parallel computing on networks of workstations using the TreadMarks distributed shared memory system. DSM allows processes to assume a globally shared virtual memory even though they execute on nodes that do not physically share memory. We illustrate a DSM system consisting of N networked workstations, each with its own memory. The DSM software provides the abstraction of a globally shared memory, in which each processor can access any data item without the programmer having to worry about where the data is or how to obtain its value 相似文献

4.

Parallel computing in networks of workstations with Paralex

Davoli R. Giachini L.-A. Bebaoglu O. Amoroso A. Alvisi L. 《Parallel and Distributed Systems, IEEE Transactions on》1996,7(4):371-384

Modern distributed systems consisting of powerful workstations and high-speed interconnection networks are an economical alternative to special-purpose supercomputers. The technical issues that need to be addressed in exploiting the parallelism inherent in a distributed system include heterogeneity, high-latency communication, fault tolerance and dynamic load balancing. Current software systems for parallel programming provide little or no automatic support towards these issues and require users to be experts in fault-tolerant distributed computing. The Paralex system is aimed at exploring the extent to which the parallel application programmer can be liberated from the complexities of distributed systems. Paralex is a complete programming environment and makes extensive use of graphics to define, edit, execute, and debug parallel scientific applications. All of the necessary code for distributing the computation across a network and replicating it to achieve fault tolerance and dynamic load balancing is automatically generated by the system. In this paper we give an overview of Paralex and present our experiences with a prototype implementation 相似文献

5.

Parallel programming for multimedia applications 总被引：2，自引：2，他引：0

Hari Kalva Aleksandar Colic Adriana Garcia Borko Furht 《Multimedia Tools and Applications》2011,51(2):801-818

Computing capabilities are continuing to increase with the availability of multi core and many core processors. The wide availability of multi core processors has made parallel programming possible for end user applications running on desktops, workstations, and mobile devices. While parallel hardware has become common, software that exploits parallel capabilities is just beginning to take hold. Multimedia applications, with their data parallel nature and large computing requirements will benefit significantly from parallel programming. In this paper an overview of parallel programming is presented and languages and tools for parallel programming such as OpenMP and CUDA are introduced within the scope of multimedia applications. 相似文献

6.

Performance of the NAS Parallel Benchmarks on PVM-Based Networks

《Journal of Parallel and Distributed Computing》1995,26(1):61-71

The NAS parallel benchmarks are a set of applications that embody the key characteristics of typical processing in computational aerodynamics. Five of these, the kernel benchmarks, have been implemented on the PVM system, a software system for network-based concurrent computing, with a view to determining the efficacy of networked environments for high-performance computational aerodynamics applications. We present results of porting and executing the NPB kernels in three different duster environments using low- to medium-powered workstations on Ethernet and two types of FDDI networks. Our results indicate that mediocre to good performance could be obtained despite the communications-intensive nature of the applications. In most cases, we were able to achieve performance levels within an order of magnitude of a Cray Y/MP-1 on eight-workstation clusters via optimizations to the PVM infrastructure alone, i.e., with little or no algorithmic modifications. However, our results also indicate that further improvements are possible and that network-based computing has the potential to be a viable technology for high-performance scientific computing. 相似文献

7.

Parallel Computing on an Ethernet Cluster of Workstations: Opportunities and Constraints 总被引：1，自引：0，他引：1

Hamdi Mounir Pan Yi Hamidzadeh B. Lim F. M. 《The Journal of supercomputing》1999,13(2):111-132

Parallel computing on clusters of workstations is receiving much attention from the research community. Unfortunately, many aspects of parallel computing over this parallel computing engine is not very well understood. Some of these issues include the workstation architectures, the network protocols, the communication-to-computation ratio, the load balancing strategies, and the data partitioning schemes. The aim of this paper is to assess the strengths and limitations of a cluster of workstations by capturing the effects of the above issues. This has been achieved by evaluating the performance of this computing environment in the execution of a parallel ray tracing application through analytical modeling and extensive experimentation. We were successful in illustrating the effect of major factors on the performance and scalability of a cluster of workstations connected by an Ethernet network. Moreover, our analytical model was accurate enough to agree closely with the experimental results. Thus, we feel that such an investigation would be helpful in understanding the strengths and weaknesses of an Ethernet cluster of workstation in the execution of parallel applications. 相似文献

8.

Prophet: automated scheduling of SPMD programs in workstation networks

Jon B. Weissman 《Concurrency and Computation》1999,11(6):301-321

Obtaining efficient execution of parallel programs in workstation networks is a difficult problem for the user. Unlike dedicated parallel computer resources, network resources are shared, heterogeneous, vary in availability, and offer communication performance that is still an order of magnitude slower than parallel computer interconnection networks. Prophet, a system that automatically schedules data parallel SPMD programs in workstation networks for the user, has been developed. Prophet uses application and resource information to select the appropriate type and number of workstations, divide the application into component tasks and data across these workstations, and assign tasks to workstations. This system has been integrated into the Mentat parallel processing system developed at the University of Virginia. A suite of scientific Mentat applications has been scheduled using Prophet on a heterogeneous workstation network. The results are promising and demonstrate that scheduling SPMD applications can be automated with good performance. Copyright © 1999 John Wiley & Sons, Ltd. 相似文献

9.

Performance Tuning Software DSM Applications using Visualisation

Brorsson Mats Kral Martin 《The Journal of supercomputing》1999,13(3):249-265

Small organisations can now have access to high raw processing power using networks of workstations (NOW) as parallel computing platforms. Software Distributed Shared Memory (Software DSM) packages have been developed to facilitate the programming of such systems. However, because of the high interprocess latencies in a NOW, the performance of a software DSM application is more susceptible to the partitioning of the problem than what might be expected.This paper presents an approach for a tool to visualise the execution of a program in a way that highlights performance bottlenecks. The tool associates identified bottlenecks with the corresponding source code lines in order to determine what piece of code is the cause of poor performance. The visualisation technique is demonstrated in two case studies. They clearly show that the visualisation is indeed useful and provides an effective way to acquire an understanding of what characterises an applications sharing behaviour. 相似文献

10.

Piranha scheduling: Strategies and their implementation

Nicholas Carriero David Gelernter Marc Jourdenais David Kaminsky 《International journal of parallel programming》1995,23(1):5-33

Piranha is a execution model for Linda⁴ developed at Yale⁽¹⁾ to reclaim idle cycles from networked workstations for use in executing parallel programs. Piranha has proven to be an effective system for harnessing large amounts of computing power. Most Piranha research to this point has concentrated on efficiently executing a single application at a time. In this paper we evaluate strategies for scheduling multiple Piranha applications. We examine methods for predicting idle periods and the effectiveness of scheduling strategies that make use of these predictions. We present a prototype scheduler for the Piranha system implemented using the process trellis software architecture for networks of workstations. This work was supported by AASERT Grant F49620-92-J-0240. AFOSR-91-0098 and NASA Training Grant NGT-50719. 相似文献

11.

Efficient scheduling of MPI applications on networks of workstations

M.A.R. Dantas E.J. Zaluska 《Future Generation Computer Systems》1998,13(6):489-499

The availability of a large number of workstations connected through a network can represent an attractive option for high-performance computing for many applications. The message-passing interface (MPI) software environment is an effort from many organisations to define a de facto message-passing standard. In other words, the original specification was not designed as a comprehensive parallel programming environment and some researchers agree that the standard should be preserved as simple and clean as possible. Nevertheless, a software environment such as MPI should have somehow a scheduling mechanism for the effective submission of parallel applications on network of workstations. This paper presents an alternative lightweight approach called Selective-MPI (S-MPI), which was designed to enhance the efficiency of the scheduling of applications on an MPI implementation environment. 相似文献

12.

Parallel processing of chemical information in a local area network — I. HYDRA: Concept,configuration, and implementation of parallel applications

《Computers & chemistry》1996,20(4):431-438

Sophisticated software packages put an increasing demand on computer hardware. In local area networks, computational intensive programs can lower the performance of individual workstations to an unacceptable level. However, utilizing in a coarse grained sense the computing power of all hosts in such networks, offers the potential to achieve considerable improvements in execution speed within reasonable cost limits. Since conventional workstations are not designed to be used in a parallel configuration, the program HYDRA is developed to control and synchronize parallel processing in a local area network. Part I of this paper focuses on the technical aspects of HYDRA, i.e. configuration and implementation. The second and third parts describe two applications of the HYDRA package in the field of chemistry: using parallel genetic algorithms for the conformational analysis of nucleic acids, and parallel cross-validation of artificial neural networks. 相似文献

13.

An Effective and Practical Performance Prediction Model for Parallel Computing on Nondedicated Heterogeneous NOW

《Journal of Parallel and Distributed Computing》1996,38(1):63-80

Networks of workstations (NOW) are receiving increased attention as a viable platform for high performance parallel computations. Heterogeneity and time-sharing are two characteristics that distinguish the NOW systems from conventional multiprocessor/multicomputer systems which are homogeneous and dedicated. It is important to have a practical model for users to predict the execution times of large-scale parallel applications on nondedicated heterogeneous NOW. Another objective of this study is to provide insight into the dynamic performance of parallel computing and into the effects of program structures and system factors on such a platform. In this paper, we study performance predictions for parallel computing on nondedicated heterogeneous networks of workstations. Our approach is based on a two-level model. On the top level, a semideterministic task graph is used to capture the parallel execution behavior including the variances of communication and synchronization. On the bottom level, a discrete time model is used to quantify effects from NOW systems. An iterative process is used to determine the interactive effects between network contention and task execution. We validate the prediction model using experiments on a nondedicated heterogeneous NOW. The maximum differences between predicted results and measured results were less than 10% in most cases and 15% in the worst cases. 相似文献

14.

Dynamic load-balancing of image processing applications on clusters of workstations

《Parallel Computing》1997,22(11):1477-1492

Cluster-based computing, which exploits the aggregate power of a network of workstations, has drawn increasing attention from the parallel processing community. The main problem with this computing environment is the permanently changing workload of individual workstations which makes the efficiency and the execution time of parallel applications unpredictable. In this paper, we introduce an efficient load balancing scheme which aims at dynamically balancing the workload of data parallel applications in this computing environment. Simulation and experimental studies of our load balancing strategy are performed under various load situations and it is shown that it can effectively balance the workload among the workstations involved. Further, it was shown that a significant improvement in computing performance can be achieved when using our load balancing strategy as compared to the case where no load balancing is applied, particularly under a heavily loaded system. 相似文献

15.

Development and performance analysis of real‐world applications for distributed and parallel architectures

T. Fahringer P. Blaha A. Hssinger J. Luitz E. Mehofer H. Moritsch B. Scholz 《Concurrency and Computation》2001,13(10):841-868

Several large real‐world applications have been developed for distributed and parallel architectures. We examine two different program development approaches. First, the usage of a high‐level programming paradigm which reduces the time to create a parallel program dramatically but sometimes at the cost of a reduced performance; a source‐to‐source compiler, has been employed to automatically compile programs—written in a high‐level programming paradigm—into message passing codes. Second, a manual program development by using a low‐level programming paradigm—such as message passing—enables the programmer to fully exploit a given architecture at the cost of a time‐consuming and error‐prone effort. Performance tools play a central role in supporting the performance‐oriented development of applications for distributed and parallel architectures. SCALA—a portable instrumentation, measurement, and post‐execution performance analysis system for distributed and parallel programs—has been used to analyze and to guide the application development, by selectively instrumenting and measuring the code versions, by comparing performance information of several program executions, by computing a variety of important performance metrics, by detecting performance bottlenecks, and by relating performance information back to the input program. We show several experiments of SCALA when applied to real‐world applications. These experiments are conducted for a NEC Cenju‐4 distributed‐memory machine and a cluster of heterogeneous workstations and networks. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献

16.

Linux高性能计算集群的设计与实现

LI Hong-mei 《数字社区&智能家居》2008,(14)

计算机和网络硬件设备逐步实现商品化和标准化,PC机或工作站的性能越来越高而价格越来越便宜,同时开源Linux微内核及集群工具中间件技术也日趋成熟稳定,高性能计算集群逐渐发展起来,并成为主流的高性能计算平台。高性能计算集群逐渐替代专用、昂贵的超级计算机对大规模并行应用构建原型、调试和运行。基于PCs或工作站的高性能计算快速部署及其可靠性和可管理性研究,对高性能计算集群在科学研究和工程计算等领域的应用,促进高性能计算技术的应用方面具有深远的意义。本文以OSCAR集群为实例,部署一个五结点的集群环境并运行简单的并行测试例子。相似文献

17.

Analysis of Load Average and its Relationship to Program Run Time on Networks of Workstations

Trevor E. Meyer James A. Davis Jennifer L. Davidson 《Journal of Parallel and Distributed Computing》1997,44(2):141

Parallel processing systems using networks of workstations are being used to provide an alternative to expensive parallel processors. Scheduling of tasks on these networks is an important and practical problem that must be addressed. Although CPU load is an important parameter to many of the proposed scheduling schemes, no quantitative analysis of CPU load and its precise relation to the run time of application programs has to date been presented. The work in this paper describes the experimental analysis of one common load measure, the UNIX load average, and its relationship to the run time of computation-bound parallel programs. Data was gathered using a test application program designed to mimic common applications, performing long bursts of computation with occasional interprocess data exchange over the network. The resulting execution times and measured load averages were then analyzed using regression analysis to detect load-run time trends. This paper describes the test program and the experiments, then details the results of the data analysis. A technique is then presented for the evaluation of the load-run time relationship for a computation-bound program on a network of workstations. 相似文献

18.

Integrating task parallelism in data parallel languages for parallel programming on NOWs

K. J. Binu D. Janaki Ram 《Concurrency and Computation》2000,12(13):1291-1315

A number of high‐level parallel programming platforms for networks of workstations (NOWs) have been developed in recent times. Most of these platforms target the exploitation of data parallelism in applications. They do not allow expressibility of applications as a collection of tasks along with their precedence relationships. As a result, the control or task parallelism in an application cannot be expressed or exploited. The current work aims at integrating the notion of task parallelism and precedence relationships among constituting tasks to such high‐level data parallel platforms for NOWs. Our model of integration provides for arbitrary nesting of data and task parallel modules. Also, the precedence relationships are clearly reflected from the program structure. The model relieves the programmer from the need to design applications for non‐determinism in the order of completion of constituting tasks. The design of the runtime support as well as system‐level book keeping is discussed. The model is general enough to be applied to a wide range of data parallel platforms. A specific case of integrating the model into anonymous remote computing (ARC), a data parallel programming platform, is presented. The performance related aspects are also discussed. Copyright © 2000 John Wiley & Sons, Ltd. 相似文献

19.

一种基于ATM的支持并行处理的高速通信机制

吴礼发谢立孙钟秀《计算机学报》1998,21(7):586-594

随着高速网络（如ＡＴＭ）的发展以及工作站性能的不断提高，工作站网络（ＮＯＷ）作为一种新型的并行计算结构越来越受到人们的重视。传统的传输协议和报文传递系统不能充分作为高速网的传输能力。本文提出一种基于ＡＴＭ的支持并行处理的高速通信机制ＨＰＭＰＡ。在ＨＰＭＰＡ中，可靠的端－端传输协议ＨＳＴＰ为并行应用提供高速可靠的数据传输，而不可靠的端－端传输协议ＵＴＰ则提供不可靠的高速数据报服务，以混合树结构为基础相似文献

20.

Performance bottlenecks and potentials of parallel computing on networks of workstations

YONG YANJ XING DU XIAODONG ZHANG CHENXI ZHANG 《International journal of systems science》2013,44(11):1045-1056

The network of workstations (NOW) we consider for parallel computing is heterogeneous and nondedicated (time-sharing), where computing power varies among the workstations, and multiple jobs may interact with each other in execution. We address three performance issues in this paper. First, we examine the effects of heterogeneity on co-scheduling and local scheduling policies for parallel computing. Through experimentation and quantitative comparisons, we discuss features and requirements of scheduling policies on heterogeneous NOW. Second, the heterogeneity and non-dedication of NOW introduce new performance factors into parallel computing, which make traditional performance metrics for parallel computing under homogeneous platforms not suitable. We conducted a collection of experimental measurements to show the performance impact to parallel computing. Finally, using network latencies we experimentally evaluate the parallel computing scalability on NOW. Our objective of this study is to provide insights into unique performance bottlenecks and potentials of networks of workstations. 相似文献