首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
随着数据与系统规模的不断扩大,网络传输成为了键值存储系统的性能瓶颈。同时,远程直接内存访问(RDMA)技术能够支持高带宽和低时延的数据传输,为键值存储系统设计提供了新的思路。结合高性能网络中的RDMA技术,设计并实现了高性能、低CPU负载的键值存储系统Chequer;结合RDMA原语的特性,重新设计了键值存储系统的基本操作工作流程;并设计了基于线性探测的共享hash表,解决客户端缓存失效的问题以及提高hash命中率来减少客户端的读取轮数,进一步提高了系统的性能。在小规模集群上实现了Chequer系统,并通过实验验证了其性能。  相似文献   

2.
相比于传统内存,持久性内存具有容量大和非易失的特点,这为构建大规模键值存储系统提供了新的机遇.然而,在多核服务器架构下设计持久性内存键值系统面临着诸多挑战,包括并发控制带来的CPU缓存抖动、对持久性内存有限写带宽的消耗和竞争以及持久性内存高延迟带来的线程冲突加剧.提出一种多核友好的持久性内存键值系统(multicore-friendly persistent memory key-value store,MPKV),通过设计高效并发控制方法和减少对持久性内存的写操作,充分提高多核并发性能.为避免锁资源带来的额外持久性内存写带宽消耗,MPKV引入了易失性锁管理机制,将写锁资源从索引中分离,在DRAM(dynamic RAM)中单独维护它们.为保证崩溃一致性和提高并发查询性能,MPKV引入了2阶段原子写机制,利用CPU提供的原子写操作指令将系统从一个一致性状态原子地切换到另一个一致性状态,并支持了无锁查询.基于易失性锁管理机制,MPKV还提出一种并发写消除机制,以提高更新操作之间的并发效率.当出现2个冲突的更新操作时,并发写消除机制让其中一个操作直接返回,不做任何持久性内存的分配与写操作.实验显示,MPKV相比于pmemkv具有更良好的性能以及多核扩展性.其中,在18线程环境下,MPKV的吞吐达到pmemkv的1.7~6.2倍.  相似文献   

3.
This paper presents the design and implementation of a protocol offload engine that processes TCP/IP and remote direct memory access (RDMA) protocols by means of hardware/software coprocessing. In the offload engine, time-consuming operations such as TCP/IP header generation are implemented as hardware to improve performance. The software performs control operations and RDMA header generation. In the experiments and analyses, it is proved that the hardware can provide satisfactory performance to process all operations at speeds of over 1 Gbps. Our engine can offload most protocol processing overheads – up to 95% to 100% – from the host CPU. Finally, although the embedded processors operate with a 300 MHz clock that is seven times slower than the clock of the host CPU, our engine shows maximum bandwidths of 673 Mbps for TCP/IP and 551 Mbps for RDMA on a gigabit Ethernet network.  相似文献   

4.
董豪宇  陈康 《计算机应用》2005,40(9):2577-2585
针对在使用高速存储硬件时常规网络文件系统会被软件开销影响整体性能的问题,提出了利用存储性能开发套件(SPDK)搭建文件系统的方法,并在此基础上实现了一个网络文件系统RUFS的原型。该系统通过键值存储模拟文件系统的目录树结构以及对文件系统的元数据进行管理,通过SPDK存储文件的内容。另外,利用远程直接内存访问(RDMA)技术对外提供文件系统服务。RUFS相较于NFS+ext4,在4 KB随机访问上,读写吞吐性能分别提高了202.2%和738.9%,读写平均延迟分别降低了74.4%和97.2%;在4 MB顺序访问上,读写吞吐性能分别提高了153.1%和44.0%。在大部分元数据操作上,RUFS相比NFS+ext4也有显著优势,特别是文件夹创建操作,RUFS的吞吐性能提高了约5 693.8%。该系统能够充分发挥高速网络和高速存储设备的性能优势,为用户提供延时更低、吞吐性能更好的文件系统服务。  相似文献   

5.
大规模非结构化数据的爆炸式增长给传统关系型数据库带来了极大的挑战.基于日志结构合并树(log-structured merge tree,LSM-tree)的键值存储系统已被广泛应用,并起到重要的作用,原因在于基于LSM-tree的键值存储能够将随机写转化为顺序写,从而提升性能.然而,LSM-tree键值存储也存在一些...  相似文献   

6.
董豪宇  陈康 《计算机应用》2020,40(9):2577-2585
针对在使用高速存储硬件时常规网络文件系统会被软件开销影响整体性能的问题,提出了利用存储性能开发套件(SPDK)搭建文件系统的方法,并在此基础上实现了一个网络文件系统RUFS的原型。该系统通过键值存储模拟文件系统的目录树结构以及对文件系统的元数据进行管理,通过SPDK存储文件的内容。另外,利用远程直接内存访问(RDMA)技术对外提供文件系统服务。RUFS相较于NFS+ext4,在4 KB随机访问上,读写吞吐性能分别提高了202.2%和738.9%,读写平均延迟分别降低了74.4%和97.2%;在4 MB顺序访问上,读写吞吐性能分别提高了153.1%和44.0%。在大部分元数据操作上,RUFS相比NFS+ext4也有显著优势,特别是文件夹创建操作,RUFS的吞吐性能提高了约5 693.8%。该系统能够充分发挥高速网络和高速存储设备的性能优势,为用户提供延时更低、吞吐性能更好的文件系统服务。  相似文献   

7.
分布式键值存储将数据复制到多个存储服务器的本地引擎中,并通过一致性协议保证各副本数据的一致性。其中,以日志结构合并树为核心数据结构的实现方式最为常见。然而,面向通用业务模式设计的日志结构合并树,并不适合一致性逻辑的特殊业务模式,会引发增删改性能的降低,并在全量修复过程中造成空间放大。针对上述问题,该文提出了一种新型本地引擎 PheonixLSM,通过增加增删改操作和回刷操作的约束,消除了分布式键值存储增删改流程中的双写问题,提升了引擎性能。通过重构日志结构合并树底层的 SST 文件布局,支持删除实时回收空间,消除了全量修复时的额外空间放大。实验结果显示,与原生本地引擎相比,使用 PheonixLSM 的分布式键值存储系统,增删改性能提升 90.7%,全量修复的空间放大从 65.6% 降至 6.4%,并减少了 72.3% 的修复时间。  相似文献   

8.
The ubiquity of location enabled devices has resulted in a wide proliferation of location based applications and services. To handle the growing scale, database management systems driving such location based services (LBS) must cope with high insert rates for location updates of millions of devices, while supporting efficient real-time analysis on latest location. Traditional DBMSs, equipped with multi-dimensional index structures, can efficiently handle spatio-temporal data. However, popular open-source relational database systems are overwhelmed by the high insertion rates, real-time querying requirements, and terabytes of data that these systems must handle. On the other hand, key-value stores can effectively support large scale operation, but do not natively provide multi-attribute accesses needed to support the rich querying functionality essential for the LBSs. We present the design and implementation of $\mathcal {MD}$ -HBase, a scalable data management infrastructure for LBSs that bridges this gap between scale and functionality. Our approach leverages a multi-dimensional index structure layered over a key-value store. The underlying key-value store allows the system to sustain high insert throughput and large data volumes, while ensuring fault-tolerance, and high availability. On the other hand, the index layer allows efficient multi-dimensional query processing. Our optimized query processing technique accesses only the index and storage level entries that intersect with the query region, thus ensuring efficient query processing. We present the design of $\mathcal {MD}$ -HBase that demonstrates how two standard index structures—the K-d tree and the Quad tree—can be layered over a range partitioned key-value store to provide scalable multi-dimensional data infrastructure. Our prototype implementation using HBase, a standard open-source key-value store, can handle hundreds of thousands of inserts per second using a modest 16 node cluster, while efficiently processing multi-dimensional range queries and nearest neighbor queries in real-time with response times as low as few hundreds of milliseconds.  相似文献   

9.
持久性内存技术与远程直接内存访问(remote direct memory access,RDMA)技术的发展,为高效分布式系统的设计提供了新的思路.然而,现有的基于RDMA的分布式系统没有充分利用RDMA的多播能力,难以解决1对多传输场景下的多拷贝文件数据传输问题,严重影响了系统性能.针对此问题,提出一种基于RDMA多播机制的分布式持久性内存文件系统(RDMA multicast transmission based distributed persistent memory file system,MTFS),通过低延迟多播通信机制充分利用RDMA多播能力,将数据高效传输到多个数据节点,从而避免了多拷贝传输操作带来的高延迟.为提升传输操作灵活性,MTFS设计了多模式多播远程过程调用(remote procedure call,RPC)机制,实现了RPC请求自适应识别,并通过优化返回机制将部分传输操作移出关键路径,进一步提升传输效率.同时MTFS提供了轻量级一致性保障机制,通过设计故障恢复功能、数据校验系统、重传策略与窗口机制,当节点出现崩溃时进行快速恢复,并在传输出现错误时实现数据精准检测与纠正,保证了数据的可靠性和一致性.实验证明,MTFS在各测试集上相比现有系统GlusterFS吞吐量提升了10.2~219倍.在Redis数据库的工作负载下,MTFS相比于NOVA取得了最高10.7%的性能提升,并在多线程测试中取得了良好的可扩展性.  相似文献   

10.
RDMA网络具有高带宽,低延时,低CPU负载的特点,广泛应用于数据密集型任务中,例如深度学习,高性能计算,数据分析等.RDMA的实现需要软硬件支持,在云环境下,RDMA虚拟化方案有助于多用户共享RDMA网络传输的高性能,同时实现对RDMA网络的统一管理和控制.本文调研了近年来的RDMA虚拟化解决方案,覆盖了虚拟机和容器环境;然后将这些解决方案进行分类和比较;最后,对RDMA虚拟化中存在的问题和未来的发展做出了总结和展望.  相似文献   

11.
针对开放式远程实验平台的高并发、实时性、可靠性和安全性需求,设计并实现了一种高并发访问的远程实验通信方案。该方案能够实时连接大规模传感器网络,具有实验仪器与用户并发访问、实时可靠消息传输、网络安全控制和仪器安全保障功能,能有效提高远程实验的通信性能。在模拟大规模用户实时并发访问的情况下进行性能测试,测试结果表明该方案不仅能满足实验通信的并发需求,还能有效确保消息的实时可靠传输和安全控制,具有较高的应用价值。  相似文献   

12.
本文描述了一个基于PCI-X总线高速通讯卡的通讯软件接口实现技术,该接口通过虚拟硬件资源,实现了保护的用户级通讯操作,提供报文传输和RDMA两种数据传输方式,实现进程间数据的零拷贝传输,同时基于该接口还支持IP报文的传输。测试中该接口在单链路上实现501MB/s,双链路上实现1002MB/s的带宽,在基于Socket接口的测试中,实现了384MB/s的通讯带宽。  相似文献   

13.
An elastic and highly available data store is a key component of many cloud applications. Existing data stores with strong consistency guarantees are designed and optimized for small updates, key-value access, and (if supported) small range queries over a predefined key column. This raises performance and availability problems for applications which inherently require large updates, non-key access, and large range queries. This paper presents a solution to these problems: Crescando/RB; a distributed, scan-based, main memory, relational data store (single table) with robust performance and high availability. The system addresses a real, large-scale industry use case: the Amadeus travel management system. This paper focuses on the distribution layer of Crescando/RB, the problem and theory behind it, the rationale underlying key design decisions, and the novel multicast protocol and replication framework it is composed of. Highlighting the key features of the distribution layer, we present experimental results showing that even under permanent node failures and large-scale data repartitioning, Crescando/RB remains fully available and capable of sustaining a heavy query and update load.  相似文献   

14.
For its low-latency, high bandwidth, and low CPU utilization, Remote Direct Memory Access (RDMA) has established itself as an effective data movement technology in many networking environments. However, the transport protocols of grid run-time systems, such as GridFTP in Globus, are not yet capable of utilizing RDMA. In this study, we examine the architecture of GridFTP for the feasibility of enabling RDMA. An RDMA-capable XIO (RXIO) framework is designed and implemented to extend its XIO system and match the characteristics of RDMA. Our experimental results demonstrate that RDMA can significantly improve the performance of GridFTP, reducing the latency by 32% and increasing the bandwidth by more than three times. In achieving such performance improvements, RDMA dramatically cuts down CPU utilization of GridFTP clients and servers. These results demonstrate that RXIO can effectively exploit the benefits of RDMA for GridFTP. It offers a good prototype to further leverage GridFTP on wide-area RDMA networks.  相似文献   

15.
内存键值存储系统中索引方法决定了系统的时间性能和空间开销,是改进和优化的关键因素。哈希索引提供了O(1)时间复杂度的访问操作,但会产生存储冲突,引起访问性能下降。为此,提出了一种基于位图的键值存储哈希优化方法,可以避免存储冲突提升访问性能。该方法将共前缀的键哈希到同一个块,减少键存储空间;在块内使用层次位图结构,全域位图表示所有键的后缀部分来避免存储冲突,摘要位图支持快速定位和范围查询加速。实验结果表明,优化后的哈希索引在多种负载上均能取得较高吞吐量并具有良好的并发性能,同时内存占用较现有方案大大降低。  相似文献   

16.
High Performance RDMA-Based MPI Implementation over InfiniBand   总被引:5,自引:0,他引:5  
Although InfiniBand Architecture is relatively new in the high performance computing area, it offers many features which help us to improve the performance of communication subsystems. One of these features is Remote Direct Memory Access (RDMA) operations. In this paper, we propose a new design of MPI over InfiniBand which brings the benefit of RDMA to not only large messages, but also small and control messages. We also achieve better scalability by exploiting application communication pattern and combining send/receive operations with RDMA operations. Our RDMA-based MPI implementation achieves a latency of 6.8 sec for small messages and a peak bandwidth of 871 million bytes/sec. Performance evaluation shows that for small messages, our RDMA-based design can reduce the latency by 24%, increase the bandwidth by over 104%, and reduce the host overhead by up to 22% compared with the original design. For large data transfers, we improve performance by reducing the time for transferring control messages. We have also shown that our new design is beneficial to MPI collective communication and NAS Parallel Benchmarks.  相似文献   

17.
Android应用的开发中对存储的访问非常频繁,但是Android各个版本对存储的支持比较混乱,部分版本甚至没有公开的API支持对扩展存储访问.对Android的内置、外置存储设备进行研究后,提出将存储分为内部存储、外部存储及扩展存储三个类型.分析了各个存储的类型的特点及访问方式,重点讨论了扩展存储的访问方式,提出利用JAVA的反射机制来获取Android平台的扩展存储目录,解决了Android不同版本对存储进行访问的兼容性问题.通过分析工具分析了反射机制在此应用中的效率问题,并在不同Android版本的设备上进行了测试.  相似文献   

18.
CCL (checkpointing and communication library) is a software layer in support of optimistic parallel discrete event simulation (PDES) on myrinet-based COTS clusters. Beyond classical low latency message delivery functionalities, this library implements CPU offloaded, non-blocking (asynchronous) checkpointing functionalities based on data transfer capabilities provided by a programmable DMA engine on board of myrinet network cards. These functionalities are unique since optimistic simulation systems conventionally rely on checkpointing implemented as a synchronous, CPU-based data copy. Releases of CCL up to v2.4 only support monoprogrammed non-blocking checkpoints. This forces re-synchronization between CPU and DMA activities, which is a potential source of overhead, each time a new checkpoint request must be issued at the simulation application level while the last issued one is still being carried out by the DMA engine. In this paper we present a redesigned release of CCL (v3.0) that, exploiting hardware capabilities of more advanced myrinet clusters, supports multiprogrammed non-blocking checkpoints. The multiprogrammed approach allows higher degree of concurrency between checkpointing and other simulation specific operations carried out by the CPU, with benefits on performance. We also report the results of the experimental evaluation of those benefits for the case of a Personal Communication System (PCS) simulation application, selected as a real world test-bed.  相似文献   

19.
由于分层结构的约束,基于日志结构合并(LSM)树的RocksDB键值存储系统面临着读取性能低下的问题。一种有效的解决方法是对热点数据进行主动缓存,但其面临两个挑战:一是如何在数据分布持续动态变化时对热点数据进行预测,二是如何将主动缓存机制与RocksDB存储结构衔接起来。针对这些挑战,基于预测分析技术,构建了由数据采集、系统交互、系统测试等部分组成的面向RocksDB键值系统的主动缓存框架,能够将热点数据缓存在LSM树的较低层级中;并对数据访问模式进行建模,设计并实现了基于增量学习的热点数据预测分析方法,能够有效减少存储介质的I/O访问次数。实验结果表明该机制能有效提升RocksDB在不同动态工作负载下的数据读取性能。  相似文献   

20.
With the advent of the era of cloud computing and big data, in order to cope with vast amounts of data, a number of key-value databases have emerged. These systems provide the ability of large scale data storage and effective data operations based on primary keys, but they do not efficiently support the range and k-Nearest Neighbor (kNN) queries on multi-dimensional datasets. In this paper, we introduce, SPIKE, a sliced Pyramid-based index system for key-value data stores. SPIKE bridges the gap between the data scale and querying functionality for highly available, scalable distributed key-value data stores. We first present SP-Index, the kernel indexing scheme. The SP-Index is designed as a two-level index mechanism consisting of a sliced pyramid space partition index and a distributed B-Tree index. On the basis of SP-Index, we have designed and implemented SPIKE on Cassandra, which provides efficient multi-dimensional complex query processing. We have conducted a set of comprehensive experiments with three types of datasets including synthetic datasets, TPC-H benchmark datasets and a real-world dataset. The experiment results show that SPIKE can efficiently handle multi-dimensional complex queries on large-scale key-value datasets. Evaluation results in comparison with existing systems demonstrates that SPIKE outperforms the comparing work including the original Pyramid, MySQL Cluster and CCIndex by dozens of times in complex query processing.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号