首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 187 毫秒
1.
随着新型非易失存储介质的出现,软件I/O栈的开销已经成为存储系统的性能瓶颈.首先详述了基于磁盘的传统I/O栈的各个软件层次和请求经过I/O栈的一般流程.在分析了传统I/O栈在闪存(flash)、相变存储器(phase change memory,PCM)等新型非易失存储介质构成的存储系统中存在的问题后,对专门为PCIe固态硬盘(solid state drive,SSD)设计的高性能主机控制器接口——NVMe接口及基于该接口的I/O栈、请求流程进行了详细介绍.最后,针对相变存储器、阻变存储器(resistive randomaccess memory,RRAM)和自旋转移矩磁阻随机存储器(spin-transfer torque magnetic random access memory,STT-MRAM)等下一代存储介质,对I/O栈在中断使用、文件系统权限检查等方面带来的性能问题进行了详细分析,指出未来I/O栈设计要考虑的问题.  相似文献   

2.
基于相变存储器的存储系统与技术综述   总被引:2,自引:0,他引:2  
随着处理器和存储器之间性能差距的不断增大,"存储墙"问题日益突出,但传统DRAM器件的集成度已接近极限,能耗问题也已成为瓶颈,如何设计扎实有效的存储架构解决存储墙问题已成为必须面对的挑战.近年来,以相变存储器(phase change memory,PCM)为代表的新型存储器件因其高集成度、低功耗的特点而受到了国内外研究者的广泛关注.特别地,相变存储器因其非易失性及字节寻址的特性而同时具备主存和外存的特点,在其影响下,主存和外存之间的界限正在变得模糊,将对未来的存储体系结构带来重大变化.重点讨论了基于PCM构建主存的结构,分析了其构建主存中的写优化技术、磨损均衡技术、硬件纠错技术、坏块重用技术、软件优化等关键问题,然后讨论了PCM在外存储系统的应用研究以及其对外存储体系结构和系统设计带来的影响.最后给出了PCM在存储系统中的应用研究展望.  相似文献   

3.
熊安萍  刘进进  邹洋 《计算机工程与设计》2012,33(7):2678-2682,2689
对象存储文件系统中将大数据文件分片存储到多个存储节点上,以获取更好的并行I/O性能,提高系统吞吐率.现有对象存储文件系统的存储策略并未充分考虑存储对象本身负载的动态变化,不利于提高系统资源利用率.针对此问题,考虑存储对象的空间及I/O等负载实时变化,提出了一种简单、灵活、高效的负载均衡存储策略,并对该策略进行了实现.实验结果表明,该策略能有效提高对象存储系统资源的利用率和吞吐率,保证对象存储文件系统高效的读写性能.  相似文献   

4.
计算流体动力学(computational fluid dynamics ,CFD)是高性能计算重要应用领域之一,其计算涉及大量数据访问.在大规模并行计算情况下,串行I/O的性能与计算能力不匹配,I/O成为性能瓶颈.并行I/O 是解决这一问题的主要途径之一.针对一个真实多区结构网格CFD 并行程序 HOSTA (high‐order simulator for aerodynamics),基于HDF5(hierarchical data format v5)数据存储格式及其并行I/O编程接口,实现了其主要数据的并行I/O.在一套有6个I/O服务器结点的高性能计算机系统上,采用实际C FD算例进行了性能测试.对一个三角翼算例,并行I/O相对于串行I/O的性能加速比达到21.27,最高获得5.81 GBps的I/O吞吐率,并使程序整体性能提高10%以上;对一个网格规模更大的简单翼型算例,并行I/O最高获得了6.72 GBps的I/O吞吐率.  相似文献   

5.
首先研究了分布式集群存储系统中如何为各个客户端提供公平有效的I/O服务的问题,然后针对大规模集群文件系统提出了一种分布式I/O拥塞控制的策略.在拥塞控制下,当服务器轻载时,可以让单个客户端并行地发出更多的I/O请求给服务器,以达到最大化网络和服务器资源的利用率以及I/O吞吐率的目的;当服务器重载时,通过一种节流控制,限...  相似文献   

6.
基于相变存储器的存储技术研究综述   总被引:1,自引:0,他引:1  
以数据为中心的大数据技术给计算机存储系统带来了机遇和挑战.传统的基于动态随机存储器(DRAM)器件的内存面临工艺尺寸缩小至2X nm及以下所带来的系统稳定性、数据可靠性等问题;相变存储器(PCM)具有非易失性、存储密度高、功耗低、抗辐射干扰等优点,且读写性能接近DRAM,是未来最有可能取代DRAM的非易失存储器,它为存储系统的研究和设计提供了新的解决方案.文中在归纳相变存储器器件发展和研究现状的基础上,对相变存储器在系统级的应用方式和面临的问题进行了比较和分析,研究了基于相变存储器的内存技术和外存技术,分析了当前在PCM的寿命、写性能、延迟、功耗等方面所提出的解决方案,指出了现有方案的优势和面临的缺陷,并探讨了未来的研究方向,为该领域在今后的发展提供了一定的参考.  相似文献   

7.
随着半导体工艺的发展,处理器集成的片上缓存越来越大,传统存储器件的漏电功耗问题日益严峻,如何设计高能效的片上存储架构已成为重要挑战.为解决这些问题,国内外研究者讨论了大量的新型非易失性存储技术,它们具有非易失性、低功耗和高存储密度等优良特性.为探索spin-transfer torque RAM (STT-RAM),phase change memory (PCM),resistive RAM (RRAM)和domain-wall memory(DWM)四种新型非易失性存储器(non-volatile memory,NVM)架构缓存的方法,对比了其与传统存储器件的物理特性,讨论了其架构缓存的优缺点和适用性,重点分类并总结了其架构缓存的优化方法和策略,分析了其中针对新型非易失性存储器写功耗高、写寿命有限和写延迟长等缺点所作出的关键优化技术,最后探讨了新型非易失性存储器件在未来缓存优化中可能的研究方向.  相似文献   

8.
《计算机工程》2017,(1):1-7
I/O密集型虚拟机需要频繁地进行域间通信,为解决现有虚拟机域间通信效率低、延迟大的问题,提出一种基于双环形缓冲区的用户域与驱动域域间通信优化方法。在用户域中建立与驱动域共享的双环形缓冲区,由虚拟机监控器依据I/O任务表对驱动域的访问权限进行控制,减少处理器模式切换和内存映射开销。实验结果表明,与原虚拟机域间通信机制相比,使用该优化方法后的域间通信机制具有更高的吞吐率和更低的延迟,大幅提高了用户域与驱动域的域间通信性能。  相似文献   

9.
近年来研究人员对高性能计算中的并行I/O问题进行了深入研究,然而这些研究主要针对MPP问题,而对集群计算机系统中并行I/O问题的研究不多。因此,对于集群计算中的并行I/O系统进行研究是一个重要的研究课题。对集群计算中的并行I/O传输调度效率进行研究,设计了一个文件传输调度器,可以实现文件传输最快捷,节点资源最大利用,显著提高I/O节点吞吐率和反应时间。经过大量数据的测试和实验证明该调度器的有效性和适用性。  相似文献   

10.
地理栅格数据的并行访问方法研究   总被引:1,自引:0,他引:1  
在海量地理栅格数据处理中,数据I/O性能是影响处理算法程序整体性能的关键。目前针对地理栅格数据 I/O优化问题的研究成果还很有限,通过对并行程序中的数据I/O模式进行深入分析,结合栅格数据逻辑模型和物理 模型的特点,提出了面向地理栅格数据的并行I/O框架;基于消息传递模型,实现了4种并行访问方法。实验证明,并 行访问方法优于传统的串行访问方法和分时多进程访问方法。该研究成果可以提高并行栅格处理程序的I/()访问效 率,进而提高其整体并行性能。  相似文献   

11.
The emerging Phase Change Memory (PCM) is considered as one of the most promising candidates to replace DRAM as main memory due to its better scalability and non-volatility. With multi-bit storage capability, Multiple-Level-Cell (MLC) PCM outperforms Single-Level-Cell (SLC) in density. However, the high write latency has been a performance bottleneck for MLC PCM for two reasons: First, MLC PCM has a much longer programming time; Second, the write latencies of different cell state transitions range significantly. When cells are concurrently written in the burst mode, the write latency of a burst is delayed by the worst state transitions. To improve the write throughput of MLC PCM based main memory, this paper proposes a Write Reconstruction (WR) scheme. WR reconstructs multiple burst writes targeting the same memory row, where the worst case cells are grouped together at some writes. With this approach, the write latency of other writes will be reduced. WR incurs low implementation overhead and shows significant efficiency. Experimental results show that WR achieves 18.1% of write latency reduction on average, with negligible power overhead.  相似文献   

12.
传统的网络文件系统难以满足高性能计算系统的I/O 需求,并行网络文件系统——PNFS可以有效地解决传统网络文件系统在可扩展性、可用性和性能上存在的问题。首先对PNFS的体系结构进行了设计,实现了元数据服务器与存储服务器的分离,消除了由于集中服务器结构引发的I/O瓶颈问题。然后,对PNFS的原型系统进行了性能测试,并与相同环境下NFS的测试结果进行比较与分析,结果表明PNFS能够为客户端提供并行访问文件数据的能力,有着较高的I/O读写带宽和较低的访问延迟,同时实现了客户端I/O带宽与存储服务器规模之间的线性可扩展关系,能较好地满足高性能计算中的I/O需求。  相似文献   

13.
This paper presents further results on the design and implementation of various optimizations based on our earlier work of developing a parallel pipelined model for the computational intensive applications that have multiple processing tasks. Performance evaluation of this model was done by using a real-time airborne radar application that employs a Space-Time Adaptive Processing (STAP) algorithm. This paper focuses on the following four issues: (1) The tradeoffs between increasing the throughput and reducing the latency are examined in more detail when allocating processors among different processing tasks. (2) A multi-threaded design is incorporated into the pipeline model and implemented on a massively parallel computer with symmetric multi-processor nodes, which shows enhanced performance. (3) The disk I/O is incorporated into the parallel pipeline to study its effect on performance in which two I/O task designs have been implemented: embedding I/O in the pipeline or having a separate I/O task. By using a double buffering approach together with the asynchronous I/O, the overall pipeline performance scales well as the number of processors increases. (4) From the comparison of the two I/O implementations, it is discovered that the latency may be improved when merging multiple tasks into a single task. The effect of reorganizing the task structure of the pipeline is discussed in detail. All the performance results shown in this work demonstrate the linear scalability the parallel pipeline model can achieve using a production radar application. Although this paper focuses on the implementation of the parallel pipeline model and uses the results from a STAP application to support the claims of the discovered properties for this pipeline, this model is also applicable to many other types of applications with similar computational characteristics.  相似文献   

14.
叶孝斌  杨树强 《计算机工程》2000,26(3):57-58,76
并行I/O是基于无共享结构的并行数据库系统提高性能的有效途径之一。它通过并行磁盘服务和网络传输并行化提供了高带宽I/O。文章设计实现了基于无共享结构的并行数据库系统的并行I/O,探讨了设计并行I/O时的几个关键问题及实现技术。  相似文献   

15.
Previous studies indicate that I/O could become a performance bottleneck in commodity PC-based cluster Web servers. Current local native file systems do not work well for expensive file I/Os while specialized file systems have a limitation on portability. In this paper, we present a lightweight, collaborative temporary file system (CTFS) to improve disk I/O performance for clustered Web servers. CTFS employs several techniques to achieve high-performance, good scalability and portability: (1) a lightweight local temporal file system at each node, (2) using Remote Direct Memory Access (RDMA) to improve intra-cluster communication performance, and (3) a location-aware summary cache for scalable file-to-server lookup. Comprehensive trace-driven simulation experiments conclude that CTFS achieves up to a 37% better system throughput and reduces up to 47% total disk I/O latency than a local asynchronous FFS solution.  相似文献   

16.
Network contention hotspots can limit network throughput for parallel disk I/O, even when the interconnection network appears to be sufficiently provisioned. We studied I/O hotspots in mesh networks as a function of the spatial layout of an application's compute nodes relative to the I/O nodes.Our analytical modeling and dynamic simulations show that when I/O nodes are configured on one side of a two-dimensional mesh, realizable I/O throughput is at best bounded by four times the network bandwidth per link. Maximal performance depends on the spatial layout of jobs, and cannot be further improved by adding I/O nodes.Applying these results, we devised a new parallel layout allocation strategy (PLAS) which minimizes I/O hotspots, and approaches the theoretical best case for parallel I/O throughput. Our I/O performance analysis and processor allocation strategy are applicable to a wide range of contemporary and emerging high-performance computing systems.  相似文献   

17.
提出了一种半虚拟化网络模型来优化虚拟机域间通信的性能,通过共享内存建立通信通道来打破虚拟机之前的隔离屏障,减少在数据传输过程中的拷贝次数.基于内核虚拟机(kernel-based virtual machine,KVM)半虚拟化框架编程接口的实现方法可以简化设备I/O的模拟,减少特权指令模拟所需的根-非根模式的切换,提...  相似文献   

18.
A scalable backplane topology which allows a practically unlimited number of modules with identical interfaces is presented. Short, buffered, point-to-point connections overcome clock skew problems. Synchronized, pipelined data transfer operations ensure high throughput and reasonably low latency times for fine-grain parallel algorithms. A simple bus interface logic without any special hardware configuration guarantees a cheap implementation with standard FPGAs. The measured performance in our FPGA based prototype with 32 bit wide data bus shows a throughput of 160 Mbytes/s for each module with 75 ns latency time between modules.  相似文献   

19.
并行I/O系统有多种存取模式,它们有各自的存取特点和适用范围。为了获得不同模式下的系统性能,并行I/O测试中往往要综合使用多种微测试程序。这不仅要求用户深入了解并行I/O的特点,而且要求他们熟悉各种并行I/O微测试程序的输入与输出。提出并实现了一个并行I/O测试Jetter,它从接口类型、存取模式和进程-文件关系的角度划分了并行I/O接口,不仅能够测试I/O系统在上述模式下的性能,而且简化了测试工作。实际应用Jetter表明,并行I/O系统对不同模式的支持效果不同,最高差异可以达到两个数量级以上,这些测试结论有助于用户开发高质量的并行程序。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号