首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 203 毫秒
1.
在大规模并行处理系统中,采用共享存储和消息传递两种通信模型均存在各自的局限性,本文提出了一种新的改善共享存储系统性能的设计策略:用户级共享存储协议,并在基于X86处理器的分布式共享存储系统仿真器SimDSM上对两个典型应用问题进行了测试,实验结果表明,它的性能比采用传统协议有显著提高.  相似文献   

2.
史岗  尹宏达  胡明昌  胡伟武 《计算机学报》2003,26(12):1621-1628
在由高性能PC搭建的Linux机群系统上,传统的网络接口体系结构引入了巨大的软件处理开销,无法满足虚拟共享存储并行应用对通信带宽、延迟和进程间同步的需求.用户级网络接口标准——虚拟接口体系结构(Vilxual Interface Architecture,VIA)与传统的网络接口体系结构相比,在软件协议开销、通信关键路径上操作系统的干预程度、通信和计算的重叠程度以及实现零拷贝等方面,具有明显的优势.通过在传统网络通信接口和VIA通信接口上虚拟共享存储系统的性能对比,采用VIA网络接口体系结构可有效地提高虚拟共享存储系统的性能和可扩展性.  相似文献   

3.
曙光1000A上消息传递与共享存储的比较   总被引:12,自引:2,他引:12  
分布式共享存储虽然有易于编程的优点,但往往被认为效率不高、完全由软件实现的分布式共享存储系统(又称为虚拟共享存储系统)更是如此,文中以典型的消息传递系统PVM与分布式共享存储系统JIAJIA粉列,报这两种并行程序设计环境的特点,并用7个应用程序在曙光1000A上分别比较了这两个系统的性能,实验3结果表明,JIAJIA的与PV玎当,但基于JIAJIA的并行程序设计却比PVN简单得多。  相似文献   

4.
李鹏  王雷 《计算机工程》2006,32(4):58-60
分布式共享存储系统在分布式存储器的基础上构造逻辑上的共享存储模型。提出了在操作系统层实现分布式共享存储的系统框架,并以Linux操作系统为平台介绍了其实现。该系统提供简单的调用接口,并与Linux内存管理框架紧密结合。通过采用合适的DSM一致性协议提高了整体性能。  相似文献   

5.
内核级VIA的实现   总被引:2,自引:0,他引:2       下载免费PDF全文
VIA是用户级集群通信工业标准。本文对VIA做了扩展,将其引入到网络存储领域,实现了内核级VIA,并且在Linux系统平台上进行了测试。和用户级VIA相比,内核级VIA全面 提高了存储系统节点和服务器间通信性能,尤其是对于512字节以下小数据包,延迟至少降低30%。  相似文献   

6.
BlueOcean是基于对象存储技术的大规模分布式存储系统,详细描述了其客户端软件的设计。客户端基于用户态文件系统fuse框架进行开发,既保证了客户端的通用性,又降低了开发和维护的复杂度。客户端实现了常用的posix接口,可支持绝大多数应用程序的透明运行;设计了一套高效的缓存机制,减少了元数据访问过程中的通信开销,减小了读写延迟,有效地提高了BlueOcean存储系统的性能。  相似文献   

7.
尹宏达  史岗  胡明昌 《计算机工程》2005,31(11):190-192
在系统域网环境中,网络硬件具备非常优良的性能,然而传统的通信库存在大量不必的要软件开销,大幅度地降低了通信性能。通过允许用户进程直接访问网络设备并减少收发过程中的内存拷贝,可以避免由操作系统带来的开销,从而实现用户级通信,降低延迟并提高带宽。经过对用户级通信库的性能分析,可以发现用户级通信库具有更好的性能。  相似文献   

8.
LPCA中没有考虑到存储节点伪造份额的主动攻击,为了改进LPCA的不足,利用单向陷门函数设计了一种LPCA的改进方案。它能有效抵抗某些受到主动攻击的存储节点向用户提供篡改或者伪造的秘密份额,致使用户恢复出错误的数据或者无法恢复数据的攻击,弥补了LPCA的不足,同时又不会给存储系统带来很大的空间、计算与通信的额外开销,提高了分布式存储系统的可生存性,它也可用于所有的分布式存储中利用秘密共享方案实现数据分离的改进方案。  相似文献   

9.
黄浩丹  冯丹 《计算机应用》2005,25(3):732-733
VIA(VirtualInterfaceArchitecture)是用户级集群通信工业标准。对VIA做了扩展,将其引入到网络存储领域,实现了内核级VIA,并且在Linux系统平台上进行了测试。和用户级VIA相比,内核级VIA全面提高了存储系统节点和服务器间通信性能。尤其是对于 512字节以下小数据包,延迟至少降低 30%。  相似文献   

10.
SCI(IEEE1596-1992)高速互连协议以其极低延迟的特点被应用于许多关键领域.SCI软件反射存储网是基于SCI的软件分布式共享存储系统,主要用于集群并行计算和实时系统中的数据共享.结点通过读写物理上分布、逻辑上唯一的共享存储空间共享数据.任何结点写入数据时,数据将以一定的逻辑拓扑被传送到所有结点的物理内存上.传送数据的逻辑拓扑直接影响网络的写延迟,因此提出一种动态、低延迟的最优树逻辑拓扑,并在此基础上设计和实现了延迟低、易编程的SCI软件反射存储通信库-RFM.实验证明,提出的最优树逻辑拓扑设计大大降低了网络的写延迟,提高了网络的通信性能.  相似文献   

11.
分布共享存储系统中的数据预送技术   总被引:3,自引:0,他引:3  
远程数据访问的延迟已成有分布共享存储系统发展的最大障碍。它直接影响到DSM系统的效率,尤其是对用软件实现的DSM系统。为理解和分析DSM系统中的数据行为,论文提出了一种新的分布共享存储结构模型,并在此基础上提出了一种叫做“数据预送”技术,旨在从缩小数据在系统不同层次间的语义差别入手,减少DSM中的通信次数,提高对远程访问延迟的容忍力。文中对数据预送技术的原理和实现进行了描述。经对对原形系统的测试,  相似文献   

12.
The infiniband (IB) system area network (SAN) enables applications to access hardware directly from user level, reducing the overhead of user-kernel crossings during data transfer. However, distributed applications that exhibit close coupling between network and OS services may benefit from accessing IB from the kernel through IB's native verbs interface, which permits tight integration of these services. We assess this approach using a sequential-consistency distributed shared memory (DSM) system as an example. We first develop primitives that abstract the low-level communication and kernel details, and efficiently serve the application's communication, memory, and scheduling needs. Next, we combine the primitives to form a kernel DSM protocol. The approach is evaluated using our full-fledged Linux kernel DSM implementation over infiniband. We show that overheads are reduced substantially, and overall application performance is improved in terms of both absolute execution time and scalability relative to an entirely user level implementation.  相似文献   

13.
High-level parallel programming models supporting dynamic fine-grained threads in a global object space are becoming increasingly popular for expressing irregular applications based on sophisticated adaptive algorithms and pointer-based data structures. However, implementing these multithreaded computations on scalable parallel machines poses significant challenges, particularly with respect to object caching. Object caching techniques must be able to tolerate unresponsive processors and protocol handler occupancy delays. This paper examines whether these challenges can be offset by leveraging responsive general-purpose communication architectural features (such as remote memory access and atomic operations), possibly compensating for the lack of more sophisticated hardware primitives by relying upon increased involvement of the run-time system and the compiler. A detailed performance analysis of four irregular applications, using the Illinois Concert System on the Cray T3D and the SGI Origin 2000, finds that existing software distributed shared memory (DSM) systems are capable of delivering good performance only in the presence of a high level of responsive communication architecture support (specifically, support for remote atomic operations). Recognizing that this situation stems from the synchronous request–reply nature of DSM protocols, we present a composable object caching framework, called view caching, which exploits knowledge of application data access semantics to construct custom protocols that require reduced processor synchronization. View caching protocols are more tolerant to responsiveness and occupancy delays and are able to exploit even lower level responsive communication primitives (such as nonatomic remote memory accesses) for a performance benefit.  相似文献   

14.
A Distributed Shared Memory (DSM) system provides a distributed application with a shared virtual address space. This article proposes a design for implementing the DSM communication layer on top of the Virtual Interface Architecture (VIA), an industry standard for user‐level networking protocols on high‐speed clusters. User‐level communication protocols operate in user mode, thus removing the operating system kernel's overhead from the critical communication pass, and significantly diminishing communication overhead as a result. We analyze VIA's facilities and limitations in order to ascertain which implementation trade‐offs can be best applied to our development of an efficient communication substrate optimized for DSM requirements. We then implement a multithreaded version of the Home‐based Lazy Release Consistency (HLRC) protocol on top of this substrate. In addition, we compare the performance of this HLRC protocol with that of the Sequential Consistency (SC) protocol in which a Multi View (MV) memory mapping technique was used. This technique enables a fine‐grained access to shared memory, while still relying on the virtual memory hardware to track memory accesses. We perform an ‘apple‐to‐apple’ comparison on the same testbed environment and benchmark suite, and investigate the effectiveness and scalability of both protocols. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

15.
1IntroductionSoftwaredistributedsharedmemory(DSM)system,orsharedvirtualmemory(SVM)system,providesanabstractionofsinglesharedspaceontopofthephysicallydistributedmemoriespresentedonnetworkofworkstations.Ithasbeenextensivelystudiedinthepastdecadesinceitcombinestheprogrammabilityofsharedmemorysystemsandscalabilityofdistributedsystems[1].However,theperformancegapbetweensoftwareDSMsystemsandmessagepajssingplatformsremainsexisting,whichpreventstheprevalenceofthesoftwareDSMsystemsgreatly.Ingenera…  相似文献   

16.
Distributed shared memory (DSM) systems provide a simple programming paradigm for networks of workstations, which are gaining popularity due to their cost-effective high computing power. However, DSM systems usually exhibit poor performance due to the large communication delay between the nodes; and a lot of different memory consistency models have been proposed to mask the network delay. In this paper, we propose an asynchronous protocol for the release consistent memory model, which we call an Asynchronous Release Consistency (ARC) protocol. Unlike other protocols where the communication adheres to the synchronous request/receive paradigm, the ARC protocol is asynchronous, such that the necessary pages are broadcast before they are requested. Hence, the network delay can be reduced by proper prefetching of necessary pages. We have also compared the performance of the ARC protocol with the lazy release protocol by running standard benchmark programs; and the experimental results showed that the ARC protocol achieves a performance improvement of up to 29%.  相似文献   

17.
Recent distributed shared memory (DSM) systems provide increasingly more support for the sharing of objects rather than portions of memory. However, like earlier DSM systems these distributed shared object systems (DSO) still force developers to use a single protocol, or a small set of given protocols, for the sharing of application objects. This limitation prevents the applications from optimizing their communication behaviour and results in unnecessary overhead. A current general trend in software systems development is towards customizable systems, for example frameworks, reflection, and aspect‐oriented programming all aim to give the developer greater flexibility and control over the functionality and performance of their code. This paper describes a novel object‐oriented framework that defines a DSM system in terms of a consistency model and an underlying coherency protocol. Different consistency models and coherency protocols can be used within a single application because they can be customized, by the application programmer, on a per‐object basis. This allows application specific semantics to be exploited at a very fine level of granularity and with a resulting improvement in performance. The framework is implemented in JAVA and the speed‐up obtained by a number of applications that use the framework is reported. Copyright © 2002 John Wiley & Sons, Ltd.  相似文献   

18.
基于新型Cache一致性协议的共享虚拟存储系统   总被引:11,自引:2,他引:9  
介绍了一个基于新型Cache一致性协议的共享虚拟存储系统JIAJIA,与目前国际上具有代表性的共享虚拟存储系统相比,JIAJIA采用了基于UNMA的结构,能够把多个机器的物理地址空间组织成一个更大的共享虚拟地址空间,此外,JIAJIA实现了一种基于锁的新型一致性协议,通过附带在锁上的write-notice来维护一致性,从而避免了传统的目录协议中由目录引起的存储开销和系统复杂度,利用一些被广泛使用  相似文献   

19.
In a distributed shared memory (DSM) multiprocessor, the processors cooperate in solving a parallel application by accessing the shared memory. The latency of a memory access depends on several factors, including the distance to the nearest valid data copy, data sharing conditions, and traffic of other processors. To provide a better understanding of DSM performance and to support application tuning and compiler development for DSM systems, this paper extends microbenchmarking techniques to characterize the important aspects of a DSM system. We present an experiment-based methodology for characterizing the memory, communication, scheduling, and synchronization performance, and apply it to the Convex SPP1000. We present carefully designed microbenchmarks to characterize the performance of the local and remote memory, producer-consumer communication involving two or more processors, and the effects on performance when multiple processors contend for utilization of the distributed memory and the interconnection network  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号