首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The performance and proliferation of workstations continues to increase at a rapid rate. However, the practical utilization of workstation networks for parallel computing is still in its infancy. This is due to the relative immaturity of programming tools, low bandwidth networks such as Ethernet, and high message latencies. However, programming tools are becoming more mature and network bandwidths are increasing rapidly. Hence, networks of commodity workstations may prove to be practical for certain classes of parallel applications. This paper describes our experiences with two applications parallelized on a network of Sun workstations. The first application is from Shell's petroleum engineering department. This program quantitatively derives rock and porefill composition from well-log data, using a compute-intensive iterative optimization procedure. The second application is time filtering, which is a fundamental operation performed on seismic traces. Through our experiments we identify the limits of networked parallel computing based on the current state of network technology. We also provide a discussion on the possible impact of future high speed networks on networked parallel computing.  相似文献   

2.
基于多媒体图形终端的协同工作   总被引:4,自引:0,他引:4  
多媒体X终端是以多媒体化的X Window系统为界面的网络终端。多媒体X客户程序可以在网络主机(高性能工作站)上运行,而在该终端上输出各种媒体,其性能可与直接使用主机相媲美。基于多媒体X终端CACW体系结构充分利用了X终端的网络透明性,将需要大量通讯的用户模块集中到一台高性能的主机上,从而大大提高了多媒体协同工作系统的性能,避免了传统的协同工人系统中由于带宽的限制,无法达到很高的多媒体性能的问题。  相似文献   

3.
Obtaining efficient execution of parallel programs in workstation networks is a difficult problem for the user. Unlike dedicated parallel computer resources, network resources are shared, heterogeneous, vary in availability, and offer communication performance that is still an order of magnitude slower than parallel computer interconnection networks. Prophet, a system that automatically schedules data parallel SPMD programs in workstation networks for the user, has been developed. Prophet uses application and resource information to select the appropriate type and number of workstations, divide the application into component tasks and data across these workstations, and assign tasks to workstations. This system has been integrated into the Mentat parallel processing system developed at the University of Virginia. A suite of scientific Mentat applications has been scheduled using Prophet on a heterogeneous workstation network. The results are promising and demonstrate that scheduling SPMD applications can be automated with good performance. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

4.
The low cost and availability of clusters of workstations have lead researchers to re-explore distributed computing using independent workstations. This approach may provide better cost/performance than tightly coupled multiprocessors. In practice, this approach often utilizes wasted cycles to run parallel jobs. In this paper we address the feasibility and limitation of such a nondedicated parallel processing environment assuming workstation processes have priority over parallel tasks. We develop a simple analytical model to predict parallel job response times. Our model provides insight into how significantly workstation owner interference degrades parallel program performance. It forms a foundation for task partitioning and scheduling in a nondedicated network environment. A new term, task ratio, which relates the parallel task demand to the mean service demand of nonparallel workstation processes, is introduced. We propose that task ratio is a useful metric for determining how a parallel applications should be partitioned and scheduled in order to make efficient use of a nondedicated distributed system.  相似文献   

5.
6.
The availability of a large number of workstations connected through a network can represent an attractive option for high-performance computing for many applications. The message-passing interface (MPI) software environment is an effort from many organisations to define a de facto message-passing standard. In other words, the original specification was not designed as a comprehensive parallel programming environment and some researchers agree that the standard should be preserved as simple and clean as possible. Nevertheless, a software environment such as MPI should have somehow a scheduling mechanism for the effective submission of parallel applications on network of workstations. This paper presents an alternative lightweight approach called Selective-MPI (S-MPI), which was designed to enhance the efficiency of the scheduling of applications on an MPI implementation environment.  相似文献   

7.
Parallel computing on interconnected workstations is becoming a viable and attractive proposition due to the rapid growth in speeds of interconnection networks and processors. In the case of workstation clusters, there is always a considerable amount of unused computing capacity available in the network. However, heterogeneity in architectures and operating systems, load variations on machines, variations in machine availability, and failure susceptibility of networks and workstations complicate the situation for the programmer. In this context, new programming paradigms that reduce the burden involved in programming for distribution, load adaptability, heterogeneity and fault tolerance gain importance. This paper identifies the issues involved in parallel computing on a network of workstations. The anonymous remote computing (ARC) paradigm is proposed to address the issues specific to parallel programming on workstation systems. ARC differs from the conventional communicating process model by treating a program as one single entity consisting of several loosely coupled remote instruction blocks instead of treating it as a collection of processes. The ARC approach results in distribution transparency and heterogeneity transparency. At the same time, it provides fault tolerance and load adaptability to parallel programs on workstations. ARC is developed in a two-tiered architecture consisting of high level language constructs and low level ARC primitives. The paper describes an implementation of the ARC kernel supporting ARC primitives  相似文献   

8.
Workstation clusters provide significant aggregate amounts of resources, including processing power and main memory. In this paper we explore the collective use of main memory in a workstation cluster to boost the performance of applications that require more memory than a single workstation can provide. We describe the design, simulation, implementation, and evaluation of a pager that uses main memory of remote workstations in a workstation cluster as a faster-than-disk paging device and provides reliability in case of single workstation failures and adaptivity in network and disk load variations. Our pager has been implemented as a block device driver linked to the Digital UNIX operating system, without any modifications to the kernel code. Using several test applications we measure the performance of remote memory paging over an Ethernet interconnection network and find it to be up to twice as fast as traditional disk paging. We also evaluate the performance of various reliability policies and demonstrate their feasibility even over low bandwith networks such as Ethernet. We conclude that the benefits of reliable remote memory paging in workstation clusters are significant today and are likely to increase in the near future.  相似文献   

9.
Hamdi  Mounir  Pan  Yi  Hamidzadeh  B.  Lim  F. M. 《The Journal of supercomputing》1999,13(2):111-132
Parallel computing on clusters of workstations is receiving much attention from the research community. Unfortunately, many aspects of parallel computing over this parallel computing engine is not very well understood. Some of these issues include the workstation architectures, the network protocols, the communication-to-computation ratio, the load balancing strategies, and the data partitioning schemes. The aim of this paper is to assess the strengths and limitations of a cluster of workstations by capturing the effects of the above issues. This has been achieved by evaluating the performance of this computing environment in the execution of a parallel ray tracing application through analytical modeling and extensive experimentation. We were successful in illustrating the effect of major factors on the performance and scalability of a cluster of workstations connected by an Ethernet network. Moreover, our analytical model was accurate enough to agree closely with the experimental results. Thus, we feel that such an investigation would be helpful in understanding the strengths and weaknesses of an Ethernet cluster of workstation in the execution of parallel applications.  相似文献   

10.
用户级通信中基于网络接口的虚实地址变换技术   总被引:1,自引:0,他引:1  
用户级通信允许应用程序直接访问网络接口,减小了通信操作的软件层开销。为了支持用户级通信,高效的虚拟地址到物理地址的变换起到关键作用。本文提出了基于地址变换表的地址变换机制,虚实地址变换都在网络接口控制器上完成,变换过程不需要操作系统的参与,并且无需失效处理。采用这种机制,我们实现了基于PCI-X面向集群系统的互连通信子系统CNI。实际测试获得了2.4μs的最小单边延迟和850MB/s的峰值带宽。  相似文献   

11.
基于Myrinet的用户空间精简协议   总被引:5,自引:0,他引:5  
董春雷  郑纬民 《软件学报》1999,10(3):299-303
通信子系统是影响工作站机群系统整体性能的主要因素.文章在分析和比较了3种常用的网络性能之后,指出上层协议的处理是影响工作站机群系统性能的主要瓶颈.在由640Mbps的Myrinet连接的8台Sun SPARC工作站组成的机群系统上实现了一个用户层的高性能的精简通信协议——RCP(reduced communication protocol).通过精简协议的冗余功能、减少数据拷贝次数和直接操作硬件缓冲区等方法,达到低延迟、高效率.RCP的回路延迟时间比TCP/IP小得多(200μs vs 1 540μs),  相似文献   

12.
This paper describes an implementation of an adaptive finite element program for coupled fluid-structure problems using a network of workstations. A pool of task programming paradigm suitable for a heterogeneous distributed workstation environment is presented. The issues of load balancing and fault recovery are explored. Numerical results for this distributed programming paradigm are presented and compared with sequential and parallel programming models.  相似文献   

13.
一种基于Linux工作站网络的分布并行计算的设计方法   总被引:1,自引:0,他引:1  
基于工作站网络的分布并行计算是并行计算的一个重要发展方向。而Linux系统是目前有发展前途的支持并发开放的操作系统。文章提出了一种在Linux工作站网环境下,基于CORBA技术和多进程编程实现分布并行计算的程序设计方法,经过实例测试,表明该方法具有实用价值。  相似文献   

14.
Graphics processor units (GPU) that are originally designed for graphics rendering have emerged as massively-parallel “co-processors” to the central processing unit (CPU). Small-footprint multi-GPU workstations with hundreds of processing elements can accelerate compute-intensive simulation science applications substantially. In this study, we describe the implementation of an incompressible flow Navier–Stokes solver for multi-GPU workstation platforms. A shared-memory parallel code with identical numerical methods is also developed for multi-core CPUs to provide a fair comparison between CPUs and GPUs. Specifically, we adopt NVIDIA’s Compute Unified Device Architecture (CUDA) programming model to implement the discretized form of the governing equations on a single GPU. Pthreads are then used to enable communication across multiple GPUs on a workstation. We use separate CUDA kernels to implement the projection algorithm to solve the incompressible fluid flow equations. Kernels are implemented on different memory spaces on the GPU depending on their arithmetic intensity. The memory hierarchy specific implementation produces significantly faster performance. We present a systematic analysis of speedup and scaling using two generations of NVIDIA GPU architectures and provide a comparison of single and double precision computational performance on the GPU. Using a quad-GPU platform for single precision computations, we observe two orders of magnitude speedup relative to a serial CPU implementation. Our results demonstrate that multi-GPU workstations can serve as a cost-effective small-footprint parallel computing platform to accelerate computational fluid dynamics (CFD) simulations substantially.  相似文献   

15.
Networks of workstations and high-performance microcomputers have been rarely used for running parallel applications, because, although they have significant aggregate computing power, they lack the support for efficient message-passing and shared-memory communication. In this paper we presentTelegraphos, a distributed system that provides efficient message-passing and shared-memory support on top of a workstation cluster. We focus on the network interface of Telegraphos that provides a variety of shared-memory operations such as remote read, remote write, remote atomic operations, and DMA, all launched from user level without any intervention of the operating system. Telegraphos I, the Telegraphos prototype, has been implemented. Emphasis was placed on rapid prototyping, so the technology used was conservative: FPGAs, SRAMs, and TTL buffers.  相似文献   

16.
Distributed applications executing on clustered environments typically share resources (computers and network links) with other applications. In such systems, application execution may be retarded by the competition for these shared resources. In this paper, we define a model that calculates the slowdown imposed on applications in time-shared multi-user clusters. Our model focuses on three kinds of slowdown: local slowdown, which synthesizes the effect of contention for CPU in a single workstation; communication slowdown, which synthesizes the effect of contention for the workstations and network links on communication costs; and aggregate slowdown, which determines the effect of contention on a parallel task caused by other applications executing on the entire cluster, i.e., on the nodes used by the parallel application. We verify empirically that this model provides an accurate estimate of application performance for a set of compute-intensive parallel applications on different clusters with a variety of emulated loads  相似文献   

17.
Users of parallel machines need to have a good grasp for how different communication patterns and styles affect the performance of message-passing applications. LogGP is a simple performance model that reflects the most important parameters required to estimate the communication performance of parallel computers. The message passing interface (MPI) standard provides new opportunities for developing high performance parallel and distributed applications. In this paper, we use LogGP as a conceptual framework for evaluating the performance of MPI communications on three platforms: Cray-Research T3D, Convex Exemplar 1600SP, and a network of workstations (NOW). We develop a simple set of communication benchmarks to extract the LogGP parameters. Our objective in this is to compare the performance of MPI communication on several platforms and to identify a performance model suitable for MPI performance characterization. In particular, two problems are addressed: how LogGP quantifies MPI performance and what extra features are required for modeling MPI, and how MPI performance compare on the three computing platforms: Cray Research T3D, Convex Exemplar 1600SP, and workstations clusters.  相似文献   

18.
提出了一个基于工作站网(networkofworkstations,简称NOW)的分布式程序设计语言NC++(NOWC++).它是DC++语言的扩充.NC++提供了一个完备的编程环境,包括NC++预编译器、图视编程界面、多目通信机制和测试系统.它完善了组管理机制和进程通信机制,提出了一个基于信度推理网络的分布共享内存(distributedsharedmemory,简称DSM)机制以管理C++公共变量.实践证明,NC++语言在确保编程方便性的前提下保证了分布式程序的性能.  相似文献   

19.
A DBMS kernel architecture is proposed for improved DB support of engineering applications running on a cluster of workstations. Using such an approach, part of the DBMS code—an application-specific layer—is allocated close to the corresponding application on a workstation while the kernel code is executed on a central server. Emperical performance results from DB-based engineering applications are reported to justify the chosen DBMS architecture.The paper focuses on design issues of the application layer including server coupling, processing model and application interface. Moreover, a transaction model for long-term database work in a coupled workstation-server environment is investigated in detail.  相似文献   

20.
Steenkiste  P.A. 《Computer》1994,27(3):47-57
Optical fiber has made it possible to build networks with link speeds of over a gigabit per second; however, these networks are pushing end-systems to their limits. For high-speed networks (100 Mbits per second and up), network throughput is typically limited by software overhead on the sending and receiving hosts. Minimizing this overhead improves application-level latency and throughput and reduces the number of cycles that applications lose to communication overhead. Several factors influence communication overhead: communication protocols, the application programming interface (API). and the network interface hardware architecture. The author describes how these factors influence communication performance and under what conditions hardware support on the network adapter can reduce overhead. He first describes the organization of a typical network interface and discusses performance considerations for interfaces to high-speed networks. He then discusses software optimizations that apply to simple network adapters and show how more powerful adapters can improve performance on high-speed networks  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号