共查询到19条相似文献,搜索用时 203 毫秒
1.
在大规模并行处理系统中,采用共享存储和消息传递两种通信模型均存在各自的局限性,本文提出了一种新的改善共享存储系统性能的设计策略:用户级共享存储协议,并在基于X86处理器的分布式共享存储系统仿真器SimDSM上对两个典型应用问题进行了测试,实验结果表明,它的性能比采用传统协议有显著提高. 相似文献
2.
在由高性能PC搭建的Linux机群系统上,传统的网络接口体系结构引入了巨大的软件处理开销,无法满足虚拟共享存储并行应用对通信带宽、延迟和进程间同步的需求.用户级网络接口标准——虚拟接口体系结构(Vilxual Interface Architecture,VIA)与传统的网络接口体系结构相比,在软件协议开销、通信关键路径上操作系统的干预程度、通信和计算的重叠程度以及实现零拷贝等方面,具有明显的优势.通过在传统网络通信接口和VIA通信接口上虚拟共享存储系统的性能对比,采用VIA网络接口体系结构可有效地提高虚拟共享存储系统的性能和可扩展性. 相似文献
3.
4.
分布式共享存储系统在分布式存储器的基础上构造逻辑上的共享存储模型。提出了在操作系统层实现分布式共享存储的系统框架,并以Linux操作系统为平台介绍了其实现。该系统提供简单的调用接口,并与Linux内存管理框架紧密结合。通过采用合适的DSM一致性协议提高了整体性能。 相似文献
5.
6.
BlueOcean是基于对象存储技术的大规模分布式存储系统,详细描述了其客户端软件的设计。客户端基于用户态文件系统fuse框架进行开发,既保证了客户端的通用性,又降低了开发和维护的复杂度。客户端实现了常用的posix接口,可支持绝大多数应用程序的透明运行;设计了一套高效的缓存机制,减少了元数据访问过程中的通信开销,减小了读写延迟,有效地提高了BlueOcean存储系统的性能。 相似文献
7.
8.
9.
VIA(VirtualInterfaceArchitecture)是用户级集群通信工业标准。对VIA做了扩展,将其引入到网络存储领域,实现了内核级VIA,并且在Linux系统平台上进行了测试。和用户级VIA相比,内核级VIA全面提高了存储系统节点和服务器间通信性能。尤其是对于 512字节以下小数据包,延迟至少降低 30%。 相似文献
10.
SCI(IEEE1596-1992)高速互连协议以其极低延迟的特点被应用于许多关键领域.SCI软件反射存储网是基于SCI的软件分布式共享存储系统,主要用于集群并行计算和实时系统中的数据共享.结点通过读写物理上分布、逻辑上唯一的共享存储空间共享数据.任何结点写入数据时,数据将以一定的逻辑拓扑被传送到所有结点的物理内存上.传送数据的逻辑拓扑直接影响网络的写延迟,因此提出一种动态、低延迟的最优树逻辑拓扑,并在此基础上设计和实现了延迟低、易编程的SCI软件反射存储通信库-RFM.实验证明,提出的最优树逻辑拓扑设计大大降低了网络的写延迟,提高了网络的通信性能. 相似文献
11.
12.
Liss L. Birk Y. Schuster A. 《Parallel and Distributed Systems, IEEE Transactions on》2005,16(9):830-840
The infiniband (IB) system area network (SAN) enables applications to access hardware directly from user level, reducing the overhead of user-kernel crossings during data transfer. However, distributed applications that exhibit close coupling between network and OS services may benefit from accessing IB from the kernel through IB's native verbs interface, which permits tight integration of these services. We assess this approach using a sequential-consistency distributed shared memory (DSM) system as an example. We first develop primitives that abstract the low-level communication and kernel details, and efficiently serve the application's communication, memory, and scheduling needs. Next, we combine the primitives to form a kernel DSM protocol. The approach is evaluated using our full-fledged Linux kernel DSM implementation over infiniband. We show that overheads are reduced substantially, and overall application performance is improved in terms of both absolute execution time and scalability relative to an entirely user level implementation. 相似文献
13.
《Journal of Parallel and Distributed Computing》1999,58(2):260-300
High-level parallel programming models supporting dynamic fine-grained threads in a global object space are becoming increasingly popular for expressing irregular applications based on sophisticated adaptive algorithms and pointer-based data structures. However, implementing these multithreaded computations on scalable parallel machines poses significant challenges, particularly with respect to object caching. Object caching techniques must be able to tolerate unresponsive processors and protocol handler occupancy delays. This paper examines whether these challenges can be offset by leveraging responsive general-purpose communication architectural features (such as remote memory access and atomic operations), possibly compensating for the lack of more sophisticated hardware primitives by relying upon increased involvement of the run-time system and the compiler. A detailed performance analysis of four irregular applications, using the Illinois Concert System on the Cray T3D and the SGI Origin 2000, finds that existing software distributed shared memory (DSM) systems are capable of delivering good performance only in the presence of a high level of responsive communication architecture support (specifically, support for remote atomic operations). Recognizing that this situation stems from the synchronous request–reply nature of DSM protocols, we present a composable object caching framework, called view caching, which exploits knowledge of application data access semantics to construct custom protocols that require reduced processor synchronization. View caching protocols are more tolerant to responsiveness and occupancy delays and are able to exploit even lower level responsive communication primitives (such as nonatomic remote memory accesses) for a performance benefit. 相似文献
14.
A Distributed Shared Memory (DSM) system provides a distributed application with a shared virtual address space. This article proposes a design for implementing the DSM communication layer on top of the Virtual Interface Architecture (VIA), an industry standard for user‐level networking protocols on high‐speed clusters. User‐level communication protocols operate in user mode, thus removing the operating system kernel's overhead from the critical communication pass, and significantly diminishing communication overhead as a result. We analyze VIA's facilities and limitations in order to ascertain which implementation trade‐offs can be best applied to our development of an efficient communication substrate optimized for DSM requirements. We then implement a multithreaded version of the Home‐based Lazy Release Consistency (HLRC) protocol on top of this substrate. In addition, we compare the performance of this HLRC protocol with that of the Sequential Consistency (SC) protocol in which a Multi View (MV) memory mapping technique was used. This technique enables a fine‐grained access to shared memory, while still relying on the virtual memory hardware to track memory accesses. We perform an ‘apple‐to‐apple’ comparison on the same testbed environment and benchmark suite, and investigate the effectiveness and scalability of both protocols. Copyright © 2005 John Wiley & Sons, Ltd. 相似文献
15.
1IntroductionSoftwaredistributedsharedmemory(DSM)system,orsharedvirtualmemory(SVM)system,providesanabstractionofsinglesharedspaceontopofthephysicallydistributedmemoriespresentedonnetworkofworkstations.Ithasbeenextensivelystudiedinthepastdecadesinceitcombinestheprogrammabilityofsharedmemorysystemsandscalabilityofdistributedsystems[1].However,theperformancegapbetweensoftwareDSMsystemsandmessagepajssingplatformsremainsexisting,whichpreventstheprevalenceofthesoftwareDSMsystemsgreatly.Ingenera… 相似文献
16.
Distributed shared memory (DSM) systems provide a simple programming paradigm for networks of workstations, which are gaining popularity due to their cost-effective high computing power. However, DSM systems usually exhibit poor performance due to the large communication delay between the nodes; and a lot of different memory consistency models have been proposed to mask the network delay. In this paper, we propose an asynchronous protocol for the release consistent memory model, which we call an Asynchronous Release Consistency (ARC) protocol. Unlike other protocols where the communication adheres to the synchronous request/receive paradigm, the ARC protocol is asynchronous, such that the necessary pages are broadcast before they are requested. Hence, the network delay can be reduced by proper prefetching of necessary pages. We have also compared the performance of the ARC protocol with the lazy release protocol by running standard benchmark programs; and the experimental results showed that the ARC protocol achieves a performance improvement of up to 29%. 相似文献
17.
Recent distributed shared memory (DSM) systems provide increasingly more support for the sharing of objects rather than portions of memory. However, like earlier DSM systems these distributed shared object systems (DSO) still force developers to use a single protocol, or a small set of given protocols, for the sharing of application objects. This limitation prevents the applications from optimizing their communication behaviour and results in unnecessary overhead. A current general trend in software systems development is towards customizable systems, for example frameworks, reflection, and aspect‐oriented programming all aim to give the developer greater flexibility and control over the functionality and performance of their code. This paper describes a novel object‐oriented framework that defines a DSM system in terms of a consistency model and an underlying coherency protocol. Different consistency models and coherency protocols can be used within a single application because they can be customized, by the application programmer, on a per‐object basis. This allows application specific semantics to be exploited at a very fine level of granularity and with a resulting improvement in performance. The framework is implemented in JAVA and the speed‐up obtained by a number of applications that use the framework is reported. Copyright © 2002 John Wiley & Sons, Ltd. 相似文献
18.
19.
Abandah G.A. Davidson E.S. 《Parallel and Distributed Systems, IEEE Transactions on》1998,9(2):206-216
In a distributed shared memory (DSM) multiprocessor, the processors cooperate in solving a parallel application by accessing the shared memory. The latency of a memory access depends on several factors, including the distance to the nearest valid data copy, data sharing conditions, and traffic of other processors. To provide a better understanding of DSM performance and to support application tuning and compiler development for DSM systems, this paper extends microbenchmarking techniques to characterize the important aspects of a DSM system. We present an experiment-based methodology for characterizing the memory, communication, scheduling, and synchronization performance, and apply it to the Convex SPP1000. We present carefully designed microbenchmarks to characterize the performance of the local and remote memory, producer-consumer communication involving two or more processors, and the effects on performance when multiple processors contend for utilization of the distributed memory and the interconnection network 相似文献