首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Zippy: A Framework for Computation and Visualization on a GPU Cluster   总被引:1,自引:0,他引:1  
Due to its high performance/cost ratio, a GPU cluster is an attractive platform for large scale general‐purpose computation and visualization applications. However, the programming model for high performance general‐purpose computation on GPU clusters remains a complex problem. In this paper, we introduce the Zippy frame‐work, a general and scalable solution to this problem. It abstracts the GPU cluster programming with a two‐level parallelism hierarchy and a non‐uniform memory access (NUMA) model. Zippy preserves the advantages of both message passing and shared‐memory models. It employs global arrays (GA) to simplify the communication, synchronization, and collaboration among multiple GPUs. Moreover, it exposes data locality to the programmer for optimal performance and scalability. We present three example applications developed with Zippy: sort‐last volume rendering, Marching Cubes isosurface extraction and rendering, and lattice Boltzmann flow simulation with online visualization. They demonstrate that Zippy can ease the development and integration of parallel visualization, graphics, and computation modules on a GPU cluster.  相似文献   

2.
分布对象技术作为分布异构环境下软件开发和系统集成的良好解决方案,目前在性能敏感的分布计算领域正受到越来越多的重视。针对性能敏感应用对可扩展性的需求,本文提出一个基于分布对象的动态可扩展的 异步消息模型。文章重点研究了扩展策略的实施时机和对象组的扩展规模等关键技术问题。  相似文献   

3.
In the rollback recovery of large‐scale long‐running applications in a distributed environment, pessimistic message logging protocols enable failed processes to recover independently, though at the expense of logging every message synchronously during fault‐free execution. In contrast, coordinated checkpointing protocols avoid message logging, but they are poor in scalability with a sharply increased coordinating overhead as the system grows. With the aim of achieving efficient rollback recovery by trading off logging overhead and coordinating overhead, this paper suggests a partitioning of the system into clusters, and then presents a scheme to implement the conversion between these overheads. Using the proposed conversion, coordination can be introduced to reduce the unbearable logging overhead found in some systems, whereas proper logging can be employed to alleviate the unacceptable coordinating overhead in others. Furthermore, heuristics are introduced to address the issue of how to partition the system into clusters in order to speed up the recovery process and to improve recovery efficiency. Performance evaluation results indicate that our scheme can lower the overall system overhead effectively. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

4.
Replacing traditional operating systems communication implementations with customized implementations increases the performance of parallel and distributed applications. This paper describes the design and implementation of customizable message passing systems. The customized message passing systems are generated using application-specific information such as the profile of an application's communication pattern. FFT, Simplex, and Cholesky are used as example parallel applications. The message passing system has also been customized for different types of distributed system services including a distributed scheduling facility. The customized message passing system likewise improves the performance of these facilities and enhances their scalability. As a practical concern, as there are a large number of possible optimizations, object-oriented frameworks are employed to organize the implementations and to facilitate the choice of optimizations.  相似文献   

5.
This paper presents a Java implementation of the recently published MPI 3.0 nonblocking message passing collectives in order to analyze and assess the feasibility of taking advantage of these operations in shared memory systems using Java. Nonblocking collectives aim to exploit the overlapping between computation and communication for collective operations to increase scalability of message passing codes, as it has been carried out for nonblocking point‐to‐point primitives. This scalability has become crucial not only for clusters but also for shared memory systems because of the current trend of increasing the number of cores per chip, which is leading to the generalization of multi‐core and many‐core processors. Message passing libraries based on remote direct memory access, thread‐based progression, or implementing pure multi‐threading shared memory support could potentially benefit from the lack of imposed synchronization by nonblocking collectives. But, although the distributed memory scenario has been well studied, the shared memory one has not been tackled yet. Hence, nonblocking collectives support has been included in FastMPJ, a Message Passing in Java (MPJ) implementation, and evaluated on a representative shared memory system, obtaining significant improvements because of overlapping and lack of implicit synchronization, and with barely any overhead imposed over common blocking operations. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

6.
Since its release, the Java programming language has attracted considerable attention from the high‐performance computing (HPC) community because of its portability, high programming productivity, and built‐in multithreading and networking support. As a consequence, several initiatives have been taken to develop a high‐performance Java message‐passing library to program distributed memory architectures, such as clusters. The performance of Java message‐passing applications relies heavily on the communications performance. Thus, the design and implementation of low‐level communication devices that support message‐passing libraries is an important research issue in Java for HPC. MPJ Express is our Java message‐passing implementation for developing high‐performance parallel Java applications. Its public release currently contains three communication devices: the first one is built using the Java New Input/Output (NIO) package for the TCP/IP; the second one is specifically designed for the Myrinet Express library on Myrinet; and the third one supports thread‐based shared memory communications. Although these devices have been successfully deployed in many production environments, previous performance evaluations of MPJ Express suggest that the buffering layer, tightly coupled with these devices, incurs a certain degree of copying overhead, which represents one of the main performance penalties. This paper presents a more efficient Java message‐passing communications device, based on Java Input/Output sockets, that avoids this buffering overhead. Moreover, this device implements several strategies, both in the communication protocol and in the HPC hardware support, which optimizes Java message‐passing communications. In order to evaluate its benefits, this paper analyzes the performance of this device comparatively with other Java and native message‐passing libraries on various high‐speed networks, such as Gigabit Ethernet, Scalable Coherent Interface, Myrinet, and InfiniBand, as well as on a shared memory multicore scenario. The reported communication overhead reduction encourages the upcoming incorporation of this device in MPJ Express ( http://mpj‐express.org ). Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

7.
基于多虚空间多重映射技术的并行操作系统   总被引:3,自引:0,他引:3  
陈左宁  金怡濂 《软件学报》2001,12(10):1562-1568
高性能计算机系统的可扩展性是系统设计的一大难题,NUMA(non-uniformmemoryarchitecture)结构正是为了解决共享存储体系的可扩展性问题而提出来的.研究和实践表明,整机系统的可扩展性与操作系统的结构有着密切的关系.典型的多处理机操作系统通常采用两种结构,基于共享的单一核心结构以及基于消息的多核心结构.通过分析得出结论认为,这两种结构都不能很好地适应可扩展并行机尤其是NUMA结构并行机的需求.针对存在的问题,提出了新的结构设计思想:多虚空间多重映射与主动消息相结合.测试和运行结果显示,该结构成功地解决了系统的可扩展问题.  相似文献   

8.
一种可靠可伸缩组通信系统设计与实现   总被引:2,自引:0,他引:2  
组通信系统是支持一致性和容错的分布式协同系统中非常重要的组成部分.为了满足大规模协同应用的需求,文中采用了基于流言的协议与确定性协议组合的方法设计并实现了一种可靠可伸缩组通信系统SGCS.该系统主要包括可靠消息传输服务与组成员管理服务,其中基于流言的可靠多播协议和确定的消息恢复、流量控制、排序协议的组合,基于流言的失败检测协议与确定的视图一致化协议的组合以及乐观虚同步机制应用使系统具有良好的可伸缩性、可靠性和灵活性.  相似文献   

9.
Multi-agent systems are widely used to address large-scale distributed combinatorial applications in the real world. One such application is meeting scheduling (MS), which is defined by a variety of features. The MS problem is naturally distributed and especially subject to many alterations. In addition, this problem is characterized by the presence of users’ preferences that turn it into a search for an optimal rather than a feasible solution. However, in real-world applications users usually have conflicting preferences, which makes the solving process an NP-hard problem. Most research efforts in the literature, adopting agent-based technologies, tackle the MS problem as a static problem. They often share some common properties: allowing the relaxation of any user's time restriction, not dealing with achieving any level of consistency among meetings to enhance the efficiency of the solving process, not tackling the consequences of the dynamic environment, and especially not addressing the real difficulty of distributed systems which is the complexity of message passing operations.In an attempt to facilitate and streamline the process of scheduling meetings in any organization, the main contribution of this work is a new scalable agent-based approach for any dynamic MS problem (that we called MSRAC, for Meeting Scheduling with Reinforcement of Arc Consistency). In this approach we authorize only the relaxation of users’ preferences while maintaining arc-consistency on the problem. The underlying protocol can efficiently reach the optimal solution (satisfying some predefined optimality criteria) whenever possible, using only minimum localized asynchronous communications. This purpose is achieved with minimal message passing while trying to preserve at most the privacy of involved users. Detailed experimental results on randomly generated MS problems show that MSRAC is scalable and it leads to speed up over other approaches, especially for large problems with strong constraints.  相似文献   

10.
从消息的角度出发提出了一种构建分布式控制程序的新方法。分布式控制程序由通过消息协作的独立分布点程序构成,分布点的独立性使其可以独立设计,通过点间的消息传递,即消息体系,连接成完整应用;提出了基于规则的独立点设计方法、用消息流图表示的消息体系以及基于内容的消息传递机制。通过对汽车行驶监控系统的设计和模拟实现表明,该构建方法降低了设计和建造系统的复杂度,更好地保证了程序的正确性;同时,使用该方法构建的系统体系结构灵活,可扩展性好。  相似文献   

11.
Many multi-agent applications based on mobile agents require message propagation among group of agents. A fast and scalable group communication mechanism can considerably improve performance of these applications. Unfortunately, most of the existing approaches do not scale well and disseminate messages slowly when the number of agents grows.In this paper, we propose Sama, a new group communication mechanism, to speed up message delivery for a group of mobile agents on a heterogeneous internetwork. The main contribution of Sama is distribution and parallelization of message propagation in an efficient way to achieve scalability and high-speed of message delivery to group members. Sama uses message dispatcher objects (MDOs), which are stationary agents on each host, to propagate messages concurrently. The proposed mechanism is independent of agent locations and transparently delivers messages to the group using constant number of remote messages. It also transparently recovers from host failures. We also present a Hop-Ring protocol that considerably improves the performance of message dissemination in Sama. Our experimental results show that message propagation in Sama is significantly fast compared to the previously proposed methods.  相似文献   

12.
We propose an approach to image segmentation that views it as one of pixel classification using simple features defined over the local neighborhood. We use a support vector machine for pixel classification, making the approach automatically adaptable to a large number of image segmentation applications. Since our approach utilizes only local information for classification, both training and application of the image segmentor can be done on a distributed computing platform. This makes our approach scalable to larger images than the ones tested. This article describes the methodology in detail and tests it efficacy against 5 other comparable segmentation methods on 2 well‐known image segmentation databases. Hence, we present the results together with the analysis that support the following conclusions: (i) the approach is as effective, and often better than its studied competitors; (ii) the approach suffers from very little overfitting and hence generalizes well to unseen images; (iii) the trained image segmentation program can be run on a distributed computing environment, resulting in linear scalability characteristics. The overall message of this paper is that using a strong classifier with simple pixel‐centered features gives as good or better segmentation results than some sophisticated competitors and does so in a computationally scalable fashion.  相似文献   

13.
14.
The implementation of the GESIMA mesoscale atmospheric model on message passing, distributed memory parallel computers is presented. Particular emphasis is given to the parallelization of the conjugate gradient solver using pre-conditioning by an incomplete LU factorization. Performance results are presented for the Cray T3D and Cray T3E systems, which show good scalability over a range of problem sizes and numbers of processors.  相似文献   

15.
SFT:一个具有较短冻结时间的一致检查点算法   总被引:1,自引:0,他引:1  
介绍了一个基于消息记录的一致检查点算法-SFT算法,SFT算法能够实现分布式系统的容错,该算法具有无多米诺效应,冻结时间短,开销小和重启动算法简单的优点,SFT的IPC机制基于PVM,能够保证消息的有序到达,并且其消息的发送和接收操作都是原子操作,另外,IPC机制中进程的id值编码与所在机器无关,这样一个过程即使从故障机器迁移到其它机器上运行仍可与其它进程继续通信,为提高检查点操作的并行性,SFT  相似文献   

16.
This paper introduces hybrid address spaces as a fundamental design methodology for implementing scalable runtime systems on many-core architectures without hardware support for cache coherence. We use hybrid address spaces for an implementation of MapReduce, a programming model for large-scale data processing, and the implementation of a remote memory access (RMA) model. Both implementations are available on the Intel SCC and are portable to similar architectures. We present the design and implementation of HyMR, a MapReduce runtime system whereby different stages and the synchronization operations between them alternate between a distributed memory address space and a shared memory address space, to improve performance and scalability. We compare HyMR to a reference implementation and we find that HyMR improves performance by a factor of 1.71× over a set of representative MapReduce benchmarks. We also compare HyMR with Phoenix++, a state-of-art implementation for systems with hardware-managed cache coherence in terms of scalability and sustained to peak data processing bandwidth, where HyMR demonstrates improvements of a factor of 3.1× and 3.2× respectively. We further evaluate our hybrid remote memory access (HyRMA) programming model and assess its performance to be superior of that of message passing.  相似文献   

17.
Many distributed algorithms require knowledge of the causal relationships between events. Examples include optimistic recovery protocols, distributed debugging systems, and causal distributed shared memory. Determining causal relationships can be difficult, however, because there is no global clock and local clocks cannot be perfectly synchronized. Vector time is a useful abstraction for capturing the causal relationships between events and, unlike Lamport's logical clocks, allows identification of concurrent events. Some drawbacks of vector time include transmission and logging overhead, since the size of a vector clock is linear in the number of processes. This paper presents a technique to reduce these overheads for applications that dynamically create and destroy processes and log event information with attached vector timestamps. The reduction in logging overhead comes at the expense of a more complicated timestamp comparison protocol and more sophisticated data structures for maintaining vector time. Distributed process recovery mechanisms and debugging systems that require “on-the-fly” causality information can benefit directly from the proposed technique  相似文献   

18.
Data replication, as an essential service for MANETs, is used to increase data availability by creating local or nearly located copies of frequently used items, reduce communication overhead, achieve fault-tolerance and load balancing. Data replication protocols proposed for MANETs are often prone to scalability problems due to their definitions or underlying routing protocols they are based on. In particular, they exhibit poor performance when the network size is scaled up. However, scalability is an important criterion for several MANET applications. We propose a scalable and reactive data replication approach, named SCALAR, combined with a low-cost data lookup protocol. SCALAR is a virtual backbone based solution, in which the network nodes construct a connected dominating set based on network topology graph. To the best of our knowledge, SCALAR is the first work applying virtual backbone structure to operate a data lookup and replication process in MANETs. Theoretical message-complexity analysis of the proposed protocols is given. Extensive simulations are performed to analyze and compare the behavior of SCALAR, and it is shown to outperform the other solutions in terms of data accessibility, message overhead and query deepness. It is also demonstrated as an efficient solution for high-density, high-load, large-scale mobile ad hoc networks.  相似文献   

19.
Data‐driven programming models such as many‐task computing (MTC) have been prevalent for running data‐intensive scientific applications. MTC applies over‐decomposition to enable distributed scheduling. To achieve extreme scalability, MTC proposes a fully distributed task scheduling architecture that employs as many schedulers as the compute nodes to make scheduling decisions. Achieving distributed load balancing and best exploiting data locality are two important goals for the best performance of distributed scheduling of data‐intensive applications. Our previous research proposed a data‐aware work‐stealing technique to optimize both load balancing and data locality by using both dedicated and shared task ready queues in each scheduler. Tasks were organized in queues based on the input data size and location. Distributed key‐value store was applied to manage task metadata. We implemented the technique in MATRIX, a distributed MTC task execution framework. In this work, we devise an analytical suboptimal upper bound of the proposed technique, compare MATRIX with other scheduling systems, and explore the scalability of the technique at extreme scales. Results show that the technique is not only scalable but can achieve performance within 15% of the suboptimal solution. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

20.
Future chip-multiprocessors (CMP) will integrate many cores interconnected with a high-bandwidth and low-latency scalable network-on-chip (NoC). However, the potential that this approach offers at the transport level needs to be paired with an analogous paradigm shift at the higher levels. In particular, the standard shared-memory programming model fails to address the requirements of scalability of the many-core era. Fast data exchange among the cores and low-latency synchronization are desirable but hard to achieve in practice due to the memory hierarchy. The message-passing paradigm permits instead direct data communication and synchronization between the cores. The shared-memory could still be used for the instruction fetch. Hence, we propose a hybrid approach that combines shared-memory and message passing in a single general-purpose CMP architecture that allows efficient execution of applications developed with both parallel programming approaches. Cores fetch instructions from a hierarchical memory and exchange their data through the same memory, for compatibility with existing software, or directly through the fast NoC. We developed a fast SystemC based cycle-accurate simulator for design space explorations that we used to evaluate the performance with real benchmarks. The various components have been RTL coded and mapped to a CMOS 45 nm technology to build a silicon area model that we used to select the best architectural configurations.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号