首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
相变存储器(PCM)由于其非易失性、高读取速度以及低静态功耗等优点,已成为主存研究领域的热点.然而,目前缺乏可用的PCM设备,这使得基于PCM的算法研究得不到有效验证.因此,本文提出了利用主存模拟器仿真并验证PCM算法的思路.本文首先介绍了现有主存模拟器的特点,并指出其并不能完全满足当前主存研究的实际需求,在此基础上提出并构建了一个基于DRAM和PCM的混合主存模拟器.与现有模拟器的实验比较结果表明,本文设计的混合主存模拟器能够有效地模拟DRAM和PCM混合存储架构,并能够支持不同形式的混合主存系统模拟,具有高可配置性.最后,论文通过一个使用示例说明了混合主存模拟器编程接口的易用性.  相似文献   

2.
In this paper, we describe and evaluate three possible architectures for using 3D-DRAMs and PCMs in the processor memory hierarchy. We explore: (i) using 3D-DRAM as main memory with PCM as backing store; (ii) using 3D-DRAM as the Last Level Cache and PCM as the main memory; and (iii) using both 3D-DRAM and PCM as main memory. In each of these configurations, since the proposed memories are significantly faster than today’s off-chip 2D DRAMs for main memories and magnetic hard drives for secondary storage, we introduce hardware assistance to speedup virtual to physical address translation.We use Simics, a full system simulator, and benchmarks from both SPEC and OLTP suites to evaluate our designs. We use CACTI for obtaining energy and latency values for our configurations. We measure energy consumed and execution performance for the selected benchmarks.Our studies lead to the following conclusions. The best performance is obtained when 3D-DRAMs are used as last level caches (LLC) and PCM as the main memory. However, this organization performs poorly in terms of energy consumed. Our 3D-DRAM together with PCM as main memory is the best choice in terms of energy consumed. In terms of write-backs, 3D-DRAM as LLC causes fewer writes to PCM than the other organization.These experiments can be extended to explore specific memory organizations, capacities of 3D-DRAM needed as LLC or main memory and how the hybrid PCM/DRAM memory should be used for specific application contexts.  相似文献   

3.
随着大数据应用的发展,需要处理的数据量急剧增长,企业为了保证数据的及时处理并快速响应客户,正在广泛部署以Apache Spark为代表的内存计算系统.然而TB级别的内存不但造成了服务器成本的上升,也促进了功耗的增长.由于DRAM的功耗、容量密度受限于工艺瓶颈,无法满足内存计算快速增长的内存需求,因此研发人员将目光逐渐移向了新型的非易失性内存(non-volatile memory, NVM).由DRAM和NVM共同构成的异质内存,具有低成本、低功耗、高容量密度等特点,但由于NVM读写性能较差,如何合理布局数据到异质内存是一个关键的研究问题.系统分析了Spark应用的访存特征,并结合OpenJDK的内存使用特点,提出了一套管理数据在DRAM和NVM之间布局的编程框架.应用开发者通过对本文提供接口的简单调用,便可将数据合理布局在异质内存之中.仅需20%~25%的DRAM和大量的NVM,便可以达到使用等量的DRAM时90%左右的性能.该框架可以通过有效利用异质内存来满足内存计算不断增长的计算规模.同时,“性能/价格”比仅用DRAM时提高了数倍.  相似文献   

4.
The problem of lifetime maximization of PCM has been well studied. The arrival of non-volatile memory devices has replaced the traditional DRAM. Still the DRAM has many limitations on endurance and high power write operations. Similarly, number of designs has been discussed earlier to maximize the lifetime of PCM by catching the main memory at available DRAM. Still they could not achieve the performance on power consumption reduction and increasing memory utilization. To improve the performance in power consumption reduction and lifetime maximization, and categorical model is presented in this paper. The proposed method categorizes the processes according to their memory access activity. The categorized process has been allocated to respective part of hybrid memory which encourages maximum read and minimum write in PCM. The proposed method increases the lifetime of PCM than other methods.  相似文献   

5.
随着大数据应用的涌现,计算机系统需要更大容量的内存以满足大数据处理的高时效性需求.新型非易失性存储器(non-volatile memory,NVM)结合传统动态随机存储器(dynamic random access memory, DRAM)组成的混合内存系统具有内存容量大、功耗低的优势,因而得到了广泛关注.大数据应用同时也面临着旁路转换缓冲器(translation lookaside buffer, TLB)缺失率过高的性能瓶颈.大页可以有效降低TLB缺失率,然而,在混合内存中支持大页面临着大页迁移开销过大的问题.因此,设计了一种支持大页和大容量缓存的层次化混合内存系统:DRAM和NVM分别使用4KB和2MB粒度的页面分别进行管理,同时在DRAM和NVM之间实现直接映射.设计了基于访存频率的DRAM缓存数据过滤机制,减轻了带宽压力.提出了基于内存实时信息的动态热度阈值调整策略,灵活适应应用访存特征的变化.实验显示:与使用大页的全NVM内存系统和缓存热页(caching hot page, CHOP)系统相比平均有69.9%和15.2%的性能提升,而与使用大页的全DRAM内存系统相比平均只有8.8%的性能差距.  相似文献   

6.
当海量数据请求访问异构内存系统时,异构内存页在动态随机存储器(dynamic random access memory, DRAM)和非易失性存储器(non-volatile memory, NVM)之间进行频繁的往返迁移.然而,应用于传统内存页的迁移策略难以适应内存页“冷”“热”度的快速动态变化,这使得从DRAM迁移至NVM的“冷”页面可能在短时间内变“热”从而产生大量冗余的迁移操作.当前的相关研究都仅着眼于正在执行迁移的页面而忽视了等待迁移和完成迁移的页面,且判断“冷”“热”程度的标准不一,使得冗余的迁移大量产生.因此,提出了一个基于DRAM牺牲Cache的异构内存页迁移机制(VC-HMM),使用非易失性存储器中工艺较为成熟的相变存储器(phase change memory, PCM),通过在DRAM和PCM之间增加一个由DRAM构成的小容量牺牲Cache将系统主存DRAM中变“冷”的页面迁移到牺牲Cache中,以避免主存页面在短时间内再次变“热”而造成的冗余迁移.同时,还使得迁回PCM的部分页面不需要写回,减少PCM存储单元的写入操作次数,延长PCM的使用寿命.另外,对于不同的工作负载,VC-HMM可以自适应设置迁移操作的参数,增加迁移的合理性.实验结果表明:与其他迁移策略(CoinMigrator,MQRA,THMigrator)相比,VC-HMM平均减少了至少62.97%的PCM写操作次数、22.72%的平均访问时延、38.37%的重复迁移操作以及3.40%的系统能耗.  相似文献   

7.

As embedded memory technology evolves, the traditional Static Random Access Memory (SRAM) technology has reached the end of development. For deepening the manufacturing process technology, the next generation memory technology is highly required because of the exponentially increasing leakage current of SRAM. Non-volatile memories such as STT-MRAM (Spin Torque Transfer Magnetic Random Access Memory), PCM (Phase Change Memory) are good candidates for replacing SRAM technology in embedded memory systems. They have many advanced characteristics in the perspective of power consumption, leakage power, size (density) and latency. Nonetheless, nonvolatile memories have two major problems that hinder their use it the next-generation memory. First, the lifetime of the nonvolatile memory cell is limited by the number of write operations. Next, the write operation consumes more latency and power than the same size of the read operation. This study describes a compiler optimization technique to overcome such disadvantages of a nonvolatile memory component in hybrid cache memories. A hybrid cache is proposed to overcome the disadvantages using a compiler. Specifically, to minimize the number of write operations for nonvolatile memory, we present a data replacement technique that considers the locations of the register spill data. Many portions of the memory accesses are yielded by the spill data of a register allocator in an optimizing compiler. Such spill data can be partially removed using a recalculation method. Thus, we implemented an optimization technique that rearranges the data placement with recalculation to minimize the write instructions on the nonvolatile memory. Our experimental results show that the proposed technique can reduce the average number of spill codes by 20%, and improves the energy consumption by 20.2% on average.

  相似文献   

8.
Emerging byte-addressable non-volatile memory (NVM) technologies offer higher density and lower cost than DRAM, at the expense of lower performance and limited write endurance. There have been many studies on hybrid NVM/DRAMmemory management in a single physical server. However, it is still an open problem on how to manage hybrid memories efficiently in a distributed environment. This paper proposes Alloy, a memory resource abstraction and data placement strategy for an RDMA-enabled distributed hybrid memory pool (DHMP). Alloy provides simple APIs for applications to utilize DRAM or NVM resource in the DHMP, without being aware of the hardware details of the DHMP. We propose a hotness-aware data placement scheme, which combines hot data migration, data replication and write merging together to improve application performance and reduce the cost of DRAM. We evaluate Alloy with several micro-benchmark workloads and public benchmark workloads. Experimental results show that Alloy can significantly reduce the DRAM usage in the DHMP by up to 95%, while reducing the total memory access time by up to 57% compared with the state-of-the-art approaches.  相似文献   

9.
相变内存是一种新兴的存储技术.相对于动态随机访问内存,相变内存具有高可扩展性和低功耗等特点,因此被认为是最有潜力的下一代存储技术.相变内存面临的挑战之一是其存储单元只能经受有限次写操作.因此,如何提高相变内存的耐久性成为亟待解决的问题.提出了一种基于代数映射的相变内存矩阵磨损均衡方法.该方法在每一列和每一行分别进行磨损均衡.通过从行和列两个维度进行两级地址映射,任意逻辑块都可以既在某个列地址空间中进行地址重映射,而被映射到任意一个行中;同时又可以在某个行地址空间中进行地址重映射,而被映射到任意一个列中.设计并实现了一个仿真系统来验证该方法,并进行了详细的功能正确性和抗攻击性能测试.矩阵磨损均衡有效地实现了相变内存抗不均衡写访问、抗恶意写攻击和降低磨损均衡引起的额外写访问开销等目标.  相似文献   

10.
Phase change memory (PCM) is a promising candidate to replace DRAM as main memory, thanks to its better scalability and lower static power than DRAM. However, PCM also presents a few drawbacks, such as long write latency and high write power. Moreover, the write commands parallelism of PCM is restricted by instantaneous power constraints, which degrades write bandwidth and overall performance. The write power of PCM is asymmetric: writing a zero consumes more power than writing a one. In this paper, we propose a new scheduling policy, write power asymmetry scheduling (WPAS), that exploits the asymmetry of write power. WPAS improveswrite commands parallelism of PCM memory without violating power constraint. The evaluation results show that WPAS can improve performance by up to 35.5%, and 18.5% on average. The effective read latency can be reduced by up to 33.0%, and 17.1% on average.  相似文献   

11.
张震  付印金  胡谷雨 《计算机应用》2018,38(8):2230-2235
相变存储器(PCM)凭借低功耗的优势有望成为新一代主存储器,但是耐受性的缺陷成为其广泛应用的重要障碍。现有的随机存取存储器(DRAM)缓存技术和磨损均衡分别从减少PCM写数量以及均匀化写操作分布两个角度延长PCM使用寿命,但前者在写回数据时未考虑数据的读写倾向性,后者在空间局部性较强的应用场景下存在数据交换粒度、空间开销、随机性等诸多问题。因此,设计一种全新的混合存储架构,结合最近最少使用(LRU)算法和带有时间变化的最不经常使用(LFU-Aging)算法提出区分数据读写倾向性的缓存策略,并且基于布隆过滤器(BF)设计针对强空间局部性工作集的动态磨损均衡算法,在有效减少冗余写操作的同时实现低空间开销的组间磨损均衡操作。实验结果表明,该策略能够减少PCM上13.4%~38.6%的写操作,同时有效均匀90%以上分组的写操作分布。  相似文献   

12.
基于相变存储器的存储系统与技术综述   总被引:2,自引:0,他引:2  
随着处理器和存储器之间性能差距的不断增大,“存储墙”问题日益突出,但传统DRAM器件的集成度已接近极限,能耗问题也已成为瓶颈,如何设计扎实有效的存储架构解决存储墙问题已成为必须面对的挑战.近年来,以相变存储器(phase change memory, PCM)为代表的新型存储器件因其高集成度、低功耗的特点而受到了国内外研究者的广泛关注.特别地,相变存储器因其非易失性及字节寻址的特性而同时具备主存和外存的特点,在其影响下,主存和外存之间的界限正在变得模糊,将对未来的存储体系结构带来重大变化.重点讨论了基于PCM构建主存的结构,分析了其构建主存中的写优化技术、磨损均衡技术、硬件纠错技术、坏块重用技术、软件优化等关键问题,然后讨论了PCM在外存储系统的应用研究以及其对外存储体系结构和系统设计带来的影响.最后给出了PCM在存储系统中的应用研究展望.  相似文献   

13.
Network continuous-media applications are emerging with a great pace. Cache memories have long been recognized as a key resource (along with network bandwidth) whose intelligent exploitation can ensure high performance for such applications. Cache memories exist at the continuous-media servers and their proxy servers in the network. Within a server, cache memories exist in a hierarchy (at the host, the storage-devices, and at intermediate multi-device controllers). Our research is concerned with how to best exploit these resources in the context of continuous media servers and in particular, how to best exploit the available cache memories at the drive, the disk array controller, and the host levels. Our results determine under which circumstances and system configurations it is preferable to devote the available memory to traditional caching (a.k.a. data sharing) techniques as opposed to prefetching techniques. In addition, we show how to configure the available memory for optimal performance and optimal cost. Our results show that prefetching techniques are preferable for small-size caches (such as those expected at the drive level). For very large caches (such as those employed at the host level) caching techniques are preferable. For intermediate cache sizes (such as those at multi-device controllers) a combination of both strategies should be employed.  相似文献   

14.
Refresh power of dynamic random-access memory (DRAM) has become a critical problem due to the large memory capacity of servers and mobile devices. It is imperative to reduce refresh rate in order to reduce refresh power consumption. However, current methods of refresh rate improvement have limitations such as large area/performance overhead, low system availability, and lack of support at the memory controller. We propose a novel scheme which comprises three essential functions: (1) an adaptive refresh method that adjusts refresh period on each DRAM chip, (2) a runtime method of retention-time profiling that operates inside DRAM chips during idle time thereby improving availability, and (3) a dual-row activation method which improves weak cell retention time at a very small area cost. The proposed scheme allows each DRAM chip to refresh with its own refresh period without requiring the external support. Experiments based on real DRAM chip measurements show that the proposed methods can increase refresh period by 4.5 times at 58 °C by adjusting refresh period in a temperature-aware manner while incurring only a small overhead of 1.05% and 0.02% in DRAM chip area and power consumption, respectively. Below 58 °C, our method improves the refresh period by 12.5% compared with two state-of-the-art methods, AVATAR and in-DRAM ECC. In various memory configurations with SPEC benchmarks, our method outperforms the existing ones in terms of energy-delay product by 19.7% compared with the baseline, and by 15.4% and 12.4%, with respect to AVATAR and in-DRAM ECC, respectively.  相似文献   

15.
Big Data requires a shift in traditional computing architecture. The in-memory computing is a new paradigm for Big Data analytics. However, DRAM-based main memory is neither cost-effective nor energy-effective. This work combines flash-based solid state drive (SSD) and DRAM together to build a hybrid memory, which meets both of the two requirements. As the latency of SSD is much higher than that of DRAM, the hybrid architecture should guarantee that most requests are served by DRAM rather than by SSD. Accordingly, we take two measures to enhance the hit ratio of DRAM. First, the hybrid memory employs an adaptive prefetching mechanism to guarantee that data have already been prepared in DRAM before they are demanded. Second, the DRAM employs a novel replacement policy to give higher priority to replace data that are easy to be prefetched because these data can be served by prefetching once they are demanded once again. On the contrary, the data that are hard to be prefetched are protected by DRAM. The prefetching mechanism and replacement policy employed by the hybrid memory rely on access patterns of files. So, we propose a novel pattern recognition method by improving the LZ data compression algorithm to detect access patterns. We evaluate our proposals via prototype and trace-driven simulations. Experimental results demonstrate that the hybrid memory is able to extend the DRAM by more than twice.  相似文献   

16.
The ever increasing memory demands of many scientific applications and the complexity of today’s shared computational resources still require the occasional use of virtual memory, network memory, or even out-of-core implementations, with well known drawbacks in performance and usability. In Mills et al. (Adapting to memory pressure from within scientific applications on multiprogrammed COWS. In: International Parallel and Distributed Processing Symposium, IPDPS, Santa Fe, NM, 2004), we introduced a basic framework for a runtime, user-level library, MMlib, in which DRAM is treated as a dynamic size cache for large memory objects residing on local disk. Application developers can specify and access these objects through MMlib, enabling their application to execute optimally under variable memory availability, using as much DRAM as fluctuating memory levels will allow. In this paper, we first extend our earlier MMlib prototype from a proof of concept to a usable, robust, and flexible library. We present a general framework that enables fully customizable memory malleability in a wide variety of scientific applications. We provide several necessary enhancements to the environment sensing capabilities of MMlib, and introduce a remote memory capability, based on MPI communication of cached memory blocks between ‘compute nodes’ and designated memory servers. The increasing speed of interconnection networks makes a remote memory approach attractive, especially at the large granularity present in large scientific applications. We show experimental results from three important scientific applications that require the general MMlib framework. The memory-adaptive versions perform nearly optimally under constant memory pressure and execute harmoniously with other applications competing for memory, without thrashing the memory system. Under constant memory pressure, we observe execution time improvements of factors between three and five over relying solely on the virtual memory system. With remote memory employed, these factors are even larger and significantly better than other, system-level remote memory implementations.  相似文献   

17.
The Journal of Supercomputing - This research is to design an effective prefetching method required for hybrid main memory systems consisting of dynamic random-access memory (DRAM) and phase-change...  相似文献   

18.
在大规模流媒体服务中,缓存管理是非常关键的问题.特别是随着IA-64架构的出现,物理内存的大小可大大得到增加,缓存管理策略正变得越来越重要.目前已经有很多缓存管理算法,其中间隔缓存策略通常被认为是比较有效的一个.但是以往的各种基于间隔的算法大多没有考虑媒体对象的流行程度,致使缓存的利用率受到了影响.通过对媒体对象的流行程度的特点进行研究,并考虑到利用IA-64系统中的大内存的思想,提出了一种基于流行程度的间隔缓存策略.同时,为了分析该算法的性能,引入了一个算法的性能分析模型.分析结果显示该算法比传统的间隔缓存策略具有更好的性能.  相似文献   

19.
Hybrid memory systems composed of dynamic random access memory (DRAM) and Non-volatile memory (NVM) often exploit page migration technologies to fully take the advantages of different memory media. Most previous proposals usually migrate data at a granularity of 4 KB pages, and thus waste memory bandwidth and DRAM resource. In this paper, we propose Mocha, a non-hierarchical architecture that organizes DRAM and NVM in a flat address space physically, but manages them in a cache/memory hierarchy. Since the commercial NVM device–Intel Optane DC Persistent Memory Modules (DCPMM) actually access the physical media at a granularity of 256 bytes (an Optane block), we manage the DRAM cache at the 256-byte size to adapt to this feature of Optane. This design not only enables fine-grained data migration and management for the DRAM cache, but also avoids write amplification for Intel Optane DCPMM. We also create an Indirect Address Cache (IAC) in Hybrid Memory Controller (HMC) and propose a reverse address mapping table in the DRAM to speed up address translation and cache replacement. Moreover, we exploit a utility-based caching mechanism to filter cold blocks in the NVM, and further improve the efficiency of the DRAM cache. We implement Mocha in an architectural simulator. Experimental results show that Mocha can improve application performance by 8.2% on average (up to 24.6%), reduce 6.9% energy consumption and 25.9% data migration traffic on average, compared with a typical hybrid memory architecture–HSCC.  相似文献   

20.
在科学计算领域,数据规模随着数值模拟精度要求的提高而快速增长,以DRAM为主存的传统方案由于成本高而难以扩展容量,近年来越来越被关注的持久内存技术有望解决这一问题。持久内存是在DRAM和SSD之间的补充,相比DRAM,持久内存具有容量大、性价比高的优点,但是性能也相对较低。为测试持久内存的应用性能,面向科学计算的一个重要领域——计算流体力学(CFD),对Intel持久内存进行性能评估。实验中,持久内存采用了最易于使用的内存模式,源码不需要任何修改,测试程序涵盖内存基准测试和3种常见的CFD算法,实验结果表明,在内存模式下,对不同CFD算法,相比纯DRAM的配置,持久内存的引入会带来一定的性能损失,且该损失随数据规模的增加而增大;另一方面,持久内存的部署使单服务器能支撑超大数据规模的数值模拟。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号