首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
With the increasing uniprocessor and symmetric multiprocessor computational power available today, interprocessor communication has become an important factor that limits the performance of clusters of workstations/multiprocessors. Many factors including communication hardware overhead, communication software overhead, and the user environment overhead (multithreading, multiuser) affect the performance of the communication subsystems in such systems. A significant portion of the software communication overhead belongs to a number of message copying operations. Ideally, it is desirable to have a true zero‐copy protocol where the message is moved directly from the send buffer in its user space to the receive buffer in the destination without any intermediate buffering. However, due to the fact that message‐passing applications at the send side do not know the final receive buffer addresses, early arrival messages have to be buffered at a temporary area. In this paper, we show that there is a message reception communication locality in message‐passing applications. We have utilized this communication locality and devised different message predictors at the receiver sides of communications. In essence, these message predictors can be efficiently used to drain the network and cache the incoming messages even if the corresponding receive calls have not yet been posted. The performance of these predictors, in terms of hit ratio, on some parallel applications are quite promising and suggest that prediction has the potential to eliminate most of the remaining message copies. We also show that the proposed predictors do not have sensitivity to the starting message reception call, and that they perform better than (or at least equal to) our previously proposed predictors. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

2.
In many real applications, for example, those with frequent and irregular communication patterns or those using large messages, network contention and contention for message processing resources can be a significant part of the total execution time. This paper presents a new cost model, called LoGPC, that extends the LogP and LogGP models to account for the impact of network contention and network interface DMA behavior on the performance of message passing programs. We validate LoGPC by analyzing three applications implemented with Active Messages on the MIT Alewife multiprocessor. Our analysis shows that network contention accounts for up to 50 percent of the total execution time. In addition, we show that the impact of communication locality on the communication costs is at most a factor of two on Alewife. Finally, we use the model to identify trade-offs between synchronous and asynchronous message passing styles  相似文献   

3.
Low-latency communication over ATM networks using active messages   总被引:1,自引:0,他引:1  
von Eicken  T. Basu  A. Buch  V. 《Micro, IEEE》1995,15(1):46-53
Today's communication architectures for parallel machines reduce communication overheads and latencies by over an order of magnitude. However, carrying over these techniques to workstation clusters connected by an ATM network presents major design challenges. We discuss the differences in communication characteristics between workstation clusters built from standard hardware and software components and state-of-the-art multiprocessors, and then evaluate a prototype implementation of an active message communication layer. Application round-trip latencies of about 50 microseconds for small messages roughly compare to a similar implementation on the Thinking Machines CM-5 multiprocessor  相似文献   

4.
位置透明的MA可靠消息传递机制   总被引:1,自引:0,他引:1  
杨娟  李建国 《计算机应用》2004,24(3):25-26,30
移动Agent系统中的通信机制多由RMI加上消息发送机制实现,在现有的三种主流消息发送机制上进行改进,提出了新的消息转发策略——资源分散模型(Resource Distributed Model)。RDM提供了一种类似于结合了Homeagent和按路径转发方式的寻址策略,达到消息可达的目的,基于RDM的移动服务(Mobile Service)是一种在快速寻址后将消息快速转发的方式,MS减少了消息缓存部件的消息缓存量,并可用多个MS同时寻址从而提高消息发送速度。  相似文献   

5.
Zhang  X. 《Micro, IEEE》1991,11(2)
A series of experiments and analyses on five types of hypercube and grid-topology multicomputers, carried out to evaluate interprocessor communication performance, is described. The effects on the system of communication speed, message routing, interprocessor connectivity, and message-passing software/hardware protocols were studied. The experimental results clearly show the difference in interprocessor communication performance between the first-generation multicomputer systems and the second-generation distributed multiprocessor systems. The traditional store-and-forward technique for interprocessor communication greatly limits the communication speed among the processors. In addition, the processors of the first-generation systems are not very powerful, which is another major reason communication proceeds slowly in these systems. It is seen that the wormhole routing model greatly reduces communication latency and is not sensitive to the distance involved in passing messages  相似文献   

6.
In this paper, we propose a new multicomputer node architecture, theDI-multicomputerwhich uses packet routing on a uniform point-to-point interconnect for both local memory access and internode communication. This is achieved by integrating a router into each processor chip and eliminating the memory bus interface. Since communication resources such as pins and wires are allocated dynamically via packet routing, the DI-multicomputer is able to maximize the available communication resources, providing much higher performance for both intranode and internode communication. Multi-packet handling mechanisms are used to implement a high performance memory interface based on packet routing. The DI-multicomputer network interface provides efficient communication for both short and long messages, decoupling the processor from the transmission overhead for long messages while achieving minimum latency for short messages. Trace-driven simulations based on a suite of message passing applications show that the communication mechanisms of the DI-multicomputer can achieve up to four times speedup when compared to existing architectures.  相似文献   

7.
This paper presents a novel approach to reducing the communication costs incurred when performing multiple multicasts on wormhole routed two-dimensional mesh multiprocessor systems. Both unicast and path-based implementations of multicasting incur communication costs due to the inherent message passing and contention for network resources. The start-up time dominates the transmission time when the data volume is small. However, in the presence of multiple multicasts when the data volume is very large, the communication delays due to message blocking and resource contention become very significant. Because of this, we present a hybrid static-dynamic technique to reduce the communication costs incurred when performing multiple multicasts on wormhole routed direct networks. This technique requires a focus on ordering and routing information for the individual message transmissions. At compile time, each message is assigned a priority using the recently developed collision graph model. Then at runtime these priorities are used to arbitrate the message transmissions. As a base, dimension-ordered routing is used. However, to further reduce the communication costs, some messages will be rerouted. This technique is useful either as a stand-alone algorithm or as an embedded procedure into existing algorithms. Furthermore, the techniques can be applied to higher dimension direct networks. For a single multicast, our work performs as well as conventional methods. For multiple multicasts, results show that our approach provides significant improvement over baseline techniques.  相似文献   

8.
In irregular all-to-all communication, messages are exchanged between every pair of processors. The message sizes vary from processor to processor and are known only at run time. This is a fundamental communication primitive in parallelizing irregularly structured scientific computations. Our algorithm reduces the total number of message start-ups. It also reduces node contention by smoothing out the lengths of the messages communicated. As compared to the earlier approaches, our algorithm provides deterministic performance and also reduces the buffer space at the nodes during message passing. The performance of the algorithm is characterised using a simple communication model of high-performance computing (HPC) platforms. We show the implementation on T3D and SP2 using C and the message passing interface standard. These can be easily ported to other HPC platforms. The results show the effectiveness of the proposed technique as well as the interplay among the machine size, the variance in message length, and the network interface.  相似文献   

9.
The hypercube multiprocessor is a popular architecture in parallel computing environments. Recently, computer security and privacy issues have gained significance. This paper considers the security issues of a network of processors connected over a hypercube topology. We demonstrate that a covert channel can be established by exploiting the underlying message communication mechanism of the hypercube, even when the access-control denies such communication. This can occur because node-to-node communication in a hypercube may require multiple hops and two or more disjoint message communications may actually be transmitted along common links. Congestion (and the resulting delay) in such shared links can provide the basis for a covert channel. We introduce security considerations for a multiprocessor by focussing on the covert channel issue in hypercube message communication. A security model for the hypercube routing function is presented. Based on noninterference, we develop sufficient conditions for the routing mechanism to be free of covert channels. Two secure hypercube message routing approaches are proposed for store-and-forward communication strategy. The first approach (Virtual Channel) achieves security by fixed bandwidth partitioning of links, for which the price is paid in delay performance. The second approach (Bypass) prioritizes lower security class messages, for which delay of higher class messages is sacrificed. Performance (i.e., cost of security) of these two approaches are shown using simulation. Finally, a time-out feature is introduced to the Bypass approach, which disallows potential starvation of higher class messages at the expense of limited bandwidth covert channel. Maximum covert channel bandwidth (in terms of the time-out parameter) is analyzed.  相似文献   

10.
Distributed-memory parallel systems rely on explicit message exchange for communication, but the communication operations they support can differ in many aspects. One key difference is the way messages are generated or consumed. With systolic communication, a message is transmitted as it is generated. For example, the result computed by the multiplier is sent directly to the communication subsystem for transmission to another node. With memory communication, the complete message is generated and stored in memory, and then transmitted to its destination. Since sender and receiver nodes are individually controlled, they can use different communication styles. One example of memory communication is message passing: both the sender and receiver buffer the message in memory. These two communication styles place different demands on processor design. This article illustrates each style's effect on processor resources for some key application kernels. We are targeting the iWarp system because it supports both communication styles. Two parallel-program generators, one for each communication style, automatically map the sample programs  相似文献   

11.
本文对共享存储器的多微处理器系统与消息传递的多微自理器系统进行了比较,提出了存储信箱的通用机制,并用此通信机制构造了一个多i860微处理器系统模型,并对此系统模型进行了性能分析。  相似文献   

12.
A linear scaling parallel clustering algorithm implementation and its application to very large datasets for cluster analysis is reported. WaveCluster is a novel clustering approach based on wavelet transforms. Despite this approach has an ability to detect clusters of arbitrary shapes in an efficient way, it requires considerable amount of time to collect results for large sizes of multi-dimensional datasets. We propose the parallel implementation of the WaveCluster algorithm based on the message passing model for a distributed-memory multiprocessor system. In the proposed method, communication among processors and memory requirements are kept at minimum to achieve high efficiency. We have conducted the experiments on a dense dataset and a sparse dataset to measure the algorithm behavior appropriately. Our results obtained from performed experiments demonstrate that developed parallel WaveCluster algorithm exposes high speedup and scales linearly with the increasing number of processors.  相似文献   

13.
有效的消息通讯是提高分布存储器并行计算机性能的关键因素.点对点通讯和广播通讯是2种常用的消息通讯方法,而多播通讯(Multicasting)是指从一个源节点同时给任意多个目标节点发送消息,这种通讯比点对点和广播2种方式更具一般性,适用于很多实际应用的需求.本文针对PAR95并行计算机的二维网格结构,提出一种基于网络分解的多播消息通讯方法,并比较了该方法与用多个点对点方法实现多播通讯的性能.  相似文献   

14.
Trace visualization is a viable approach for gaining insight into the behavior of complex distributed real-time systems. Grasp is a versatile trace visualization toolset. Its flexible plugin infrastructure allows for easy extension with custom visualization and analysis techniques for automatic trace verification. This paper presents its visualization capabilities for hierarchical multiprocessor systems, including partitioned and global multiprocessor scheduling with migrating tasks and jobs, communication between jobs via shared memory and message passing, and hierarchical scheduling in combination with multiprocessor scheduling. For tracing distributed systems with asynchronous local clocks Grasp also supports the synchronization of traces from different processors during the visualization and analysis.  相似文献   

15.
主动消息以其高效性与灵活性正逐渐成为在规模并行机上重要的通信机制。利用主动消息的思想可以在多种体系结构上实现相当优化的通信处理层。本文分析了主动消息的实质和它在传统消息传递、远程过程调用、消息驱动、直接内存访问系统上的实现,比较了不同系统组织对其实现技术的影响。  相似文献   

16.
TTEthernet is a cross-industry communication standard that supports the integration of predictable time-triggered communication and event-triggered standard Ethernet traffic. This paper explores the ability of extending the firmware of Commercial-Off-The-Shelf (COTS) routers in order to support TTEthernet. Thereby, we can achieve a significant cost reduction, upgrade existing infrastructures and make field-failure rates of COTS devices available for certification. Based on a generic model of a COTS router, we introduce four methods for extending a COTS router with support for time-triggered and event-triggered message exchanges. The extended COTS router redirects time-triggered messages within pre-planed time intervals, while also processing event-triggered messages when no time-triggered message are scheduled. We achieve temporal predictability and low jitter by minimizing the effect of event-triggered messages onto the timing of time-triggered messages. Furthermore, experimental results from a prototype implementation provide insight into the performance differences between a COTS router and dedicated hardware.  相似文献   

17.
Many multi-agent applications based on mobile agents require message propagation among group of agents. A fast and scalable group communication mechanism can considerably improve performance of these applications. Unfortunately, most of the existing approaches do not scale well and disseminate messages slowly when the number of agents grows.In this paper, we propose Sama, a new group communication mechanism, to speed up message delivery for a group of mobile agents on a heterogeneous internetwork. The main contribution of Sama is distribution and parallelization of message propagation in an efficient way to achieve scalability and high-speed of message delivery to group members. Sama uses message dispatcher objects (MDOs), which are stationary agents on each host, to propagate messages concurrently. The proposed mechanism is independent of agent locations and transparently delivers messages to the group using constant number of remote messages. It also transparently recovers from host failures. We also present a Hop-Ring protocol that considerably improves the performance of message dissemination in Sama. Our experimental results show that message propagation in Sama is significantly fast compared to the previously proposed methods.  相似文献   

18.
《Computer Languages》1996,22(2-3):181-192
An effective resolution multiprocessor can be built from distributed processing, logic programming, and interface elements. Widely used, portable, components can be modularly composed into a portable parallel system that displays good resistance to premature obsolescence by software evolution. A virtual multiprocessor offering common message passing and configuration services integrates a distributed mesh of sequential resolution engines. Users configure and control the resolution engines and virtual multiprocessor through a GUI using an embedded command language to drive its facilities. Prolog programs either explicitly control parallel execution through message passing or would have to rely on program transformation techniques to extract parallelism implicitly.  相似文献   

19.
多处理器MPEG2并行解码系统的设计   总被引:1,自引:0,他引:1  
MPEG2运动图像及伴音压缩标准是许多视频服务应用的核心算法。基于软件结合多处理器的并行系统实现MPEG2算法解压,不仅灵活适用于多种MPEG2产品的回放功能,避免了硬件芯片解压的局限性,而且随着个人计算机的普及和性能的提高,这种系统适配卡方案可以令个人计算机拥有更多的MPEG2服务功能,对MPEG2系列标准更新算法的研究和测试工作也带来方便。本文分析了MPEG2解码对实现系统的要求,特别是解压处理时各部分运算量和数据传输、处理的要求。根据这些数据本文基于多种TMS320C40并行处理系统板,对MPEG2输入码流的数据分割,并行解码存储控制和通信、解码算法复杂度等问题进行了实验和分析,据此得到相应的设计选择和数据。最后提出了MPEG2并行处理解码系统的设计方案。  相似文献   

20.
The problem of scheduling directed acyclic task flow graphs to multiprocessor systems using point-to-point networks is examined. An environment where the application has a strict throughput requirement is assumed. Pipelined parallelism is used to meet the throughput requirement. Communication and computation are completely overlapped. Each task and message has a periodic rate and deadline equal to the throughput requirement. A heuristic procedure based on preclustering, recursive mincut bipartitioning, and iterative improvement is proposed to reduce the maximum contention due to communication in the network, increasing the likelihood that messages meet their deadlines. The task assignment procedure takes into account the topology of the multiprocessor system and the distance between communicating tasks  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号