期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

陈仁海史文燕李雅帅冯志勇《集成技术》2022,11(3):3-22

大数据应用对内存容量的需求越来越大，而在大数据应用中，以动态随机存储器为内存介质的传统存储器所凸显出来的问题也越来越严重。计算机设计者们开始考虑用非易失性内存去替代传统的动态随机存储器内存。非易失性内存作为非易失的存储介质，不需要动态刷新，因此不会引起大量的能量消耗；此外，非易失性内存的读性能与动态随机存储器相近，且非易失性内存单个存储单元的容量具有较强的可扩展性。但将非易失性内存作为内存集成到现有的计算机系统中，需要解决其安全性问题。传统的动态随机存储器作为内存介质掉电后数据会自动丢失，即数据不会在存储介质中驻留较长时间，而当非易失性内存作为非易失性存储介质时，数据可以保留相对较久的时间。若攻击者获得了非易失性内存存储器的访问权，扫描存储内容，便可以获取内存中的数据，这一安全性问题被定义为数据的“恢复漏洞”。因此，在基于非易失性内存模组的数据中心环境中，如何充分有效地利用非易失性内存，并保证其安全性，成为迫切需要解决的问题。该文从非易失性内存的安全层面出发，对近年来的研究热点及进展进行介绍。首先，该文总结了非易失性内存所面临的主要安全问题，如数据窃取、完整性破坏、数据一致性与崩溃恢复，以及由加解密和完整性保护技术引入而导致的系统性能下降等问题。然后，针对上述各问题，对组合计数器模式加密技术、完整性保护技术扩展的默克尔树、数据一致性与崩溃恢复技术，以及相关优化方案作了详细介绍。最后，对全文进行了总结，并对非易失性内存未来需要进一步关注的问题进行了展望。相似文献

2.

A hybrid memory architecture supporting fine-grained data migration

Ye CHI Jianhui YUE Xiaofei LIAO Haikun LIU Hai JIN 《Frontiers of Computer Science》2024,18(2):182103

Hybrid memory systems composed of dynamic random access memory (DRAM) and Non-volatile memory (NVM) often exploit page migration technologies to fully take the advantages of different memory media. Most previous proposals usually migrate data at a granularity of 4 KB pages, and thus waste memory bandwidth and DRAM resource. In this paper, we propose Mocha, a non-hierarchical architecture that organizes DRAM and NVM in a flat address space physically, but manages them in a cache/memory hierarchy. Since the commercial NVM device–Intel Optane DC Persistent Memory Modules (DCPMM) actually access the physical media at a granularity of 256 bytes (an Optane block), we manage the DRAM cache at the 256-byte size to adapt to this feature of Optane. This design not only enables fine-grained data migration and management for the DRAM cache, but also avoids write amplification for Intel Optane DCPMM. We also create an Indirect Address Cache (IAC) in Hybrid Memory Controller (HMC) and propose a reverse address mapping table in the DRAM to speed up address translation and cache replacement. Moreover, we exploit a utility-based caching mechanism to filter cold blocks in the NVM, and further improve the efficiency of the DRAM cache. We implement Mocha in an architectural simulator. Experimental results show that Mocha can improve application performance by 8.2% on average (up to 24.6%), reduce 6.9% energy consumption and 25.9% data migration traffic on average, compared with a typical hybrid memory architecture–HSCC. 相似文献

3.

Extending SSD Lifespan with Comprehensive Non-Volatile Memory-Based Write Buffers

下载免费PDF全文

Fan Ziqi Park Dongchul 《计算机科学技术学报》2019,34(1):113-132

Journal of Computer Science and Technology - New non-volatile memory (NVM) technologies are expected to replace main memory DRAM (dynamic random access memory) in the near future. NAND flash... 相似文献

4.

Resource abstraction and data placement for distributed hybrid memory pool

Tingting CHEN Haikun LIU Xiaofei LIAO Hai JIN 《Frontiers of Computer Science》2021,15(3):153103

Emerging byte-addressable non-volatile memory (NVM) technologies offer higher density and lower cost than DRAM, at the expense of lower performance and limited write endurance. There have been many studies on hybrid NVM/DRAMmemory management in a single physical server. However, it is still an open problem on how to manage hybrid memories efficiently in a distributed environment. This paper proposes Alloy, a memory resource abstraction and data placement strategy for an RDMA-enabled distributed hybrid memory pool (DHMP). Alloy provides simple APIs for applications to utilize DRAM or NVM resource in the DHMP, without being aware of the hardware details of the DHMP. We propose a hotness-aware data placement scheme, which combines hot data migration, data replication and write merging together to improve application performance and reduce the cost of DRAM. We evaluate Alloy with several micro-benchmark workloads and public benchmark workloads. Experimental results show that Alloy can significantly reduce the DRAM usage in the DHMP by up to 95%, while reducing the total memory access time by up to 57% compared with the state-of-the-art approaches. 相似文献

5.

面向DRAM和NVM异构混合内存架构的排序连接算法优化

杨柳金培权《计算机工程与科学》2021,43(2):191-198

随着计算机技术的高速发展,数据的应用规模也在不断扩大,各行各业对于数据存取速度的要求也越来越高.为了满足这种需求,内存数据库的思想被提出,然而传统的内存存储器DRAM由于密度和能耗的限制无法大规模集成和扩展.与此同时,非易失内存(NVM)以其性能高、密度高、能耗低的优势弥补了DRAM的不足.DRAM和NVM结合在一起组... 相似文献

6.

基于持久化内存的索引设计重新思考与优化

韩书楷熊子威蒋德钧熊劲《计算机研究与发展》2021,58(2):356-370

非易失性内存(non-volatile memory,NVM)是近几年来出现的一种新型存储介质.一方面,同传统的易失性内存一样,它有着低访问延迟、可字节寻址的特性;另一方面,与易失性内存不同的是,掉电后它存储的数据不会丢失,此外它还有着更高的密度以及更低的能耗开销这些特性使得非易失性内存有望被大规模应用在未来的计算机系... 相似文献

7.

基于DRAM牺牲Cache的异构内存页迁移机制

裴颂文钱艺幻叶笑春刘海坤孔令和《计算机研究与发展》2022,59(3):568-581

当海量数据请求访问异构内存系统时,异构内存页在动态随机存储器(dynamic random access memory,DRAM)和非易失性存储器(non-volatile memory,NVM)之间进行频繁的往返迁移.然而,应用于传统内存页的迁移策略难以适应内存页"冷""热"度的快速动态变化,这使得从DRAM迁移至N... 相似文献

8.

An energy-efficient encryption mechanism for NVM-based main memory in mobile systems

《Journal of Systems Architecture》2017

Emerging non-volatile memory (NVM) has been considered as the most promising candidate of DRAM for future main memory design in mobile devices. NVM-based main memory exhibits attractive features, such as byte-addressability, low standby power, high density and near DRAM performance. However, the nature of non-volatility makes NVM vulnerable to be attacked by malicious programs. Though several data encryption techniques have been proposed to solve this problem, they do not consider the limited resources in mobile systems. To address this issue, in this paper, we propose an energy-efficient encryption mechanism, named MobiLock, to effectively enhance the security of NVM-based main memory in mobile systems. The basic idea is to enhance the encryption and decryption performance by utilizing cache and concurrency mechanisms, respectively. To achieve this, we first develop a cache mechanism to cache the encrypted intermediate data (i.e., PAD) whose plaintexts are updated frequently, for accelerating decryption and reducing recomputation of PAD. We then propose a concurrency mechanism to read the ciphertext in NVM and calculate the PAD simultaneously, to reduce the decryption latency. The evaluation results show that our technique can effectively reduce encryption energy consumption and decryption latency, respectively. 相似文献

9.

支持高并发访问的新型NVM存储系统

蔡涛陈志鹏牛德姣王杰詹毕晟《计算机应用》2019,39(1):51-56

I/O系统软件栈是影响NVM存储系统性能的重要因素。针对NVM存储系统的读写速度不均衡、写寿命有限等问题，设计了同异步融合的访问请求管理策略；在使用异步策略管理数据量较大的写操作的同时，仍然使用同步策略管理读请求和少量数据的写请求。针对多核处理器环境下不同计算核心访问存储系统时地址转换开销大的问题，设计了面向多核处理器地址转换缓存策略，减少地址转换的时间开销。最后实现了支持高并发访问NVM存储系统（CNVMS）的原型，并使用通用测试工具进行了随机读写、顺序读写、混合读写和实际应用负载的测试。实验结果表明，与PMBD相比，所提策略能提高1%~22%的读写速度和9%~15%的IOPS，验证了CNVMS策略能有效提高NVM存储系统的I/O性能和访问请求处理速度。相似文献

10.

Energy efficient task allocation for hybrid main memory architecture

《Journal of Systems Architecture》2016

Compared with the conventional dynamic random access memory (DRAM), emerging non-volatile memory technologies provide better density and energy efficiency. However, current NVM devices typically suffer from high write power, long write latency and low write endurance. In this paper, we study the task allocation problem for the hybrid main memory architecture with both DRAM and PRAM, in order to leverage system performance and the energy consumption of the memory subsystem via assigning different memory devices for each individual task. For an embedded system with a static set of periodical tasks, we design an integer linear programming (ILP) based offline adaptive space allocation (offline-ASA) algorithm to obtain the optimal task allocation. Furthermore, we propose an online adaptive space allocation (online-ASA) algorithm for dynamic task set where arrivals of tasks are not known in advance. Experimental results show that our proposed schemes achieve 27.01% energy saving on average, with additional performance cost of 13.6%. 相似文献

11.

A hybrid memory built by SSD and DRAM to support in-memory Big Data analytics

Zhiguang Chen Yutong Lu Nong Xiao Fang Liu 《Knowledge and Information Systems》2014,41(2):335-354

Big Data requires a shift in traditional computing architecture. The in-memory computing is a new paradigm for Big Data analytics. However, DRAM-based main memory is neither cost-effective nor energy-effective. This work combines flash-based solid state drive (SSD) and DRAM together to build a hybrid memory, which meets both of the two requirements. As the latency of SSD is much higher than that of DRAM, the hybrid architecture should guarantee that most requests are served by DRAM rather than by SSD. Accordingly, we take two measures to enhance the hit ratio of DRAM. First, the hybrid memory employs an adaptive prefetching mechanism to guarantee that data have already been prepared in DRAM before they are demanded. Second, the DRAM employs a novel replacement policy to give higher priority to replace data that are easy to be prefetched because these data can be served by prefetching once they are demanded once again. On the contrary, the data that are hard to be prefetched are protected by DRAM. The prefetching mechanism and replacement policy employed by the hybrid memory rely on access patterns of files. So, we propose a novel pattern recognition method by improving the LZ data compression algorithm to detect access patterns. We evaluate our proposals via prototype and trace-driven simulations. Experimental results demonstrate that the hybrid memory is able to extend the DRAM by more than twice. 相似文献

12.

NVRC:一种面向NVM的写限制日志方案

范鹏浩黄国锐金培权《计算机科学》2021,48(3):130-135

非易失性内存(Non-Volatile Memory,NVM)具有支持按字节寻址、持久性、存储密度高、读写延迟低等特点,因此成为解决DRAM(Dynamic Random Access Memory)容量有限问题的首选技术。随着数据库系统中NVM的引入,传统的日志技术需要考虑如何适应NVM特性。首先总结了已有的面向NVM的日志技术研究,进而提出了一种尽可能限制NVM写操作的数据库日志方案NVRC(Non-Volatile Record-updating with Cacheline)。文中提出了结合异地更新和原地更新的日志管理方案。具体而言,NVRC在异地更新的“影子记录”的基础上,引入了“缓存行原地更新”策略,并通过代价分析选择合理的日志更新策略,从而减少对NVM的写操作。采用DRAM模拟NVM的方式在YCSB测试负载上进行了实验,并对比了NVRC与传统的WAL(Write Ahead Log)以及NVM感知的PCMLx(PCMLoggingx)方法。结果表明,NVRC的NVM写次数在修改均匀的情况下比WAL和PCMLx分别减少了54%和17%,同时更新性能分别提升了59%和10%。相似文献

13.

Runtime vs. Manual Data Distribution for Architecture-Agnostic Shared-Memory Programming Models

Nikolopoulos Dimitrios S. Ayguadé Eduard Polychronopoulos Constantine D. 《International journal of parallel programming》2002,30(4):225-255

This paper compares data distribution methodologies for scaling the performance of OpenMP on NUMA architectures. We investigate the performance of automatic page placement algorithms implemented in the operating system, runtime algorithms based on dynamic page migration, runtime algorithms based on loop scheduling transformations and manual data distribution. These techniques present the programmer with trade-offs between performance and programming effort. Automatic page placement algorithms are transparent to the programmer, but may compromise memory access locality. Dynamic page migration algorithms are also transparent, but require careful engineering and tuned implementations to be effective. Manual data distribution requires substantial programming effort and architecture-specific extensions to the API, but may localize memory accesses in a nearly optimal manner. Loop scheduling transformations may or may not require intervention from the programmer, but conform better to an architecture-agnostic programming paradigm like OpenMP. We identify the conditions under which runtime data distribution algorithms can optimize memory access locality in OpenMP. We also present two novel runtime data distribution techniques, one based on memory access traces and another based on affinity scheduling of parallel loops. These techniques can be used to effectively replace manual data distribution in regular applications. The results provide a proof of concept that it is possible to scale a portable shared-memory programming model up to more than 100 processors, without modifying the API and without exposing architectural details to the programmer. 相似文献

14.

内存体系划分技术的研究与发展

邱杰凡华宗汉范菁刘磊《软件学报》2022,33(2):751-769

在多核计算机时代,多道程序在整个共享内存体系上的“访存干扰”是制约系统总体性能和服务质量的重要因素.即使当前内存资源已相对丰富,但如何优化内存体系的性能、降低访存干扰并高效地管理内存资源,仍是计算机体系结构领域的研究热点.为深入研究该问题,详述将“页着色(pagecoloring)”内存划分技术应用于整个内存体系(包括Cache、内存通道以及内存DRAM Bank),进而消除了并行多道程序在共享内存体系上的访存干扰的一系列先进方法.从DRAM Bank、Channel与Cache以及非易失性内存(non-volatile memory, NVM)等内存体系中介质为切入点,层次分明地展开论述:首先,详述将页着色应用于多道程序在DRAM Bank与通道的划分,消除多道程序间的访存冲突;随后是将页着色应用于在内存体系中Cache和DRAM的“垂直”协同划分,可同时消除多级内存介质上的访存干扰;最后是将页着色应用于包含NVM的混合内存体系,以提高程序运行效率和系统整体效能.实验结果表明,所提内存划分方法提高了系统整体性能(平均5%-15%)、服务质量(QoS),并有效地降低了系统能耗.通过梳理... 相似文献

15.

面向非易失内存的异构索引

刘睿诚张俊晨罗永平金培权《软件学报》2022,33(3):832-848

非易失内存(non-volatile memory,NVM)为数据存储与管理带来新的机遇,但同时也要求已有的索引结构针对NVM的特性进行重新设计.围绕NVM的存取特性,重点研究了树形索引在NVM上的访问、持久化、范围查询等操作的性能优化,并提出了一种上下两层结构的异构索引HART.该索引结合了B+树与Radix树的特点... 相似文献

16.

Automatic Skeleton-Driven Memory Affinity for Transactional Worklist Applications

Luís Fabrício Wanderley Góes Christiane Pousa Ribeiro Márcio Castro Jean-François Méhaut Murray Cole Marcelo Cintra 《International journal of parallel programming》2014,42(2):365-382

Memory affinity has become a key element to achieve scalable performance on multi-core platforms. Mechanisms such as thread scheduling, page allocation and cache prefetching are commonly employed to enhance memory affinity which keeps data close to the cores that access it. In particular, software transactional memory (STM) applications exhibit irregular memory access behavior that makes harder to determine which and when data will be needed by each core. Additionally, existing STM runtime systems are decoupled from issues such as thread and memory management. In this paper, we thus propose a skeleton-driven mechanism to improve memory affinity on STM applications that fit the worklist pattern employing a two-level approach. First, it addresses memory affinity in the DRAM level by automatic selecting page allocation policies. Then it employs data prefetching helper threads to improve affinity in the cache level. It relies on a skeleton framework to exploit the application pattern in order to provide automatic memory page allocation and cache prefetching. Our experimental results on the STAMP benchmark suite show that our proposed mechanism can achieve performance improvements of up to 46 %, with an average of 11 %, over a baseline version on two NUMA multi-core machines. 相似文献

17.

Toward multi-programmed workloads with different memory footprints: a self-adaptive last level cache scheduling scheme

Jingyu Zhang Minyi Guo Chentao Wu Yuanyi Chen 《中国科学:信息科学(英文版)》2018,61(1):012105

With the emerging of 3D-stacking technology, the dynamic random-access memory (DRAM) can be stacked on chips to architect the DRAM last level cache (LLC). Compared with static randomaccess memory (SRAM), DRAM is larger but slower. In the existing research papers, a lot of work has been devoted to improving the workload performance using SRAM and stacked DRAM together, ranging from SRAM structure improvement, to optimizing cache tag and data access. Instead, little attention has been paid to designing an LLC scheduling scheme for multi-programmed workloads with different memory footprints. Motivated by this, we propose a self-adaptive LLC scheduling scheme, which allows us to utilize SRAM and 3D-stacked DRAM efficiently, achieving better workload performance. This scheduling scheme employs (1) an evaluation unit, which is used to probe and evaluate the cache information during the process of programs being executed; and (2) an implementation unit, which is used to self-adaptively choose SRAM or DRAM. To make the scheduling scheme work correctly, we develop a data migration policy. We conduct extensive experiments to evaluate the performance of our proposed scheme. Experimental results show that our method can improve the multi-programmed workload performance by up to 30% compared with the state-of-the-art methods. 相似文献

18.

Robust performance in hybrid-memory cooperative caches

Luiz Ramos Ricardo Bianchini 《Parallel Computing》2014

Modern servers require large main memories, which so far have been enabled by increasing DRAM’s density. With DRAM’s scalability nearing its limit, Phase-Change Memory (PCM) is being considered as an alternative technology. PCM is denser, more scalable, and consumes lower idle power than DRAM, while exhibiting byte-addressability and access times in the nanosecond range. Still, PCM is slower than DRAM and has limited endurance. These characteristics prompted the study of hybrid memory systems, combining a small amount of DRAM and a large amount of PCM. In this paper, we leverage hybrid memories to improve the performance of cooperative memory caches in server clusters. Our approach entails a novel policy that exploits popularity information in placing objects across servers and memory technologies. Our results show that (1) DRAM-only and PCM-only memory systems do not perform well in all cases; and (2) when managed properly, hybrid memories always exhibit the best or close-to-best performance, with significant gains in many cases, without increasing energy consumption. 相似文献

19.

Optimizing memory access traffic via runtime thread migration for on-chip distributed memory systems

Weiwei Fu Tianzhou Chen Chao Wang Li Liu 《The Journal of supercomputing》2014,69(3):1491-1516

On-chip distributed memory system has become an attractive solution for massive parallel memory accesses found in future many-core processors. However, increasing number of on-chip cores and memory controllers inevitably introduce many remote memory accesses, which generate a large amount of on-chip traffic and put great pressure on the interconnection. This paper tries to optimize on-chip memory access traffic via runtime thread migration. We first analyze memory access behaviors in multi-threaded applications and find that the memory access targets and volumes are similar during short periods, which makes runtime prediction feasible. But the memory access targets exhibit great mobility during long periods, motivating us to dynamically move threads towards the data. Based on these observations, we propose a novel low-cost and distributed thread migration algorithm which adjusts thread placement in chains based on benefit estimation. We present details of the workflow, including the trigger and arbitration of migration requests and the procedures to determine the migration chains. Simulation results show that our algorithm achieves system performance speedup of 11.5 % and reduces average memory access latency by 11.0 %. It can find a few but effective thread migrations to optimize on-chip memory access traffic with acceptable hardware and runtime overheads. 相似文献

20.

一种支持多种访存技术的CBEA片上多核MPI并行编程模型 总被引：1，自引：0，他引：1

冯国富董小社胡冰王旭昊王恩东《计算机学报》2008,31(11)

现有的CBEA(Cell Broadband Engine Architecture)编程模型多侧重于支持类似于流处理的"批量访存"(Bulk Data Transfer)应用,传统非规则访存应用性能较低.文中基于Cell架构提出了一种同时支持"批量访存"与非规则访存应用的MPI并行编程模型,将通信分解在PPE(PowerPC Processing Element)上,拓宽模型的适用范围;在统一访存接口下,通过运行时访存剖分信息指导选择和优化访存以提高计算效率.实验结果表明,文中提出的编程模型支持多种访存模式并具有很好的并行加速比,可获得较同类相关技术30%~50%左右的性能提升. 相似文献