期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Cached DRAM for ILP processor memory access latency reduction

Zhang Z. Zhu Z. Zhang X. 《Micro, IEEE》2001,21(4):22-32

Cached DRAM adds a small cache onto a DRAM chip to reduce average DRAM access latency. The authors compare cached DRAM with other advanced DRAM techniques for reducing memory access latency in instruction-level-parallelism processors 相似文献

2.

A hybrid memory built by SSD and DRAM to support in-memory Big Data analytics

Zhiguang Chen Yutong Lu Nong Xiao Fang Liu 《Knowledge and Information Systems》2014,41(2):335-354

Big Data requires a shift in traditional computing architecture. The in-memory computing is a new paradigm for Big Data analytics. However, DRAM-based main memory is neither cost-effective nor energy-effective. This work combines flash-based solid state drive (SSD) and DRAM together to build a hybrid memory, which meets both of the two requirements. As the latency of SSD is much higher than that of DRAM, the hybrid architecture should guarantee that most requests are served by DRAM rather than by SSD. Accordingly, we take two measures to enhance the hit ratio of DRAM. First, the hybrid memory employs an adaptive prefetching mechanism to guarantee that data have already been prepared in DRAM before they are demanded. Second, the DRAM employs a novel replacement policy to give higher priority to replace data that are easy to be prefetched because these data can be served by prefetching once they are demanded once again. On the contrary, the data that are hard to be prefetched are protected by DRAM. The prefetching mechanism and replacement policy employed by the hybrid memory rely on access patterns of files. So, we propose a novel pattern recognition method by improving the LZ data compression algorithm to detect access patterns. We evaluate our proposals via prototype and trace-driven simulations. Experimental results demonstrate that the hybrid memory is able to extend the DRAM by more than twice. 相似文献

3.

Unimem: Runtime Data Management on Non-Volatile Memory-Based Heterogeneous Main Memory for High Performance Computing

下载免费PDF全文

Kai Wu Dong Li 《计算机科学技术学报》2021,36(1):90-109

Non-volatile memory(NVM)provides a scalable and power-efficient solution to replace dynamic random access memory(DRAM)as main memory.However,because of the relatively high latency and low bandwidth of NVM,NVM is often paired with DRAM to build a heterogeneous memory system(HMS).As a result,data objects of the application must be carefully placed to NVM and DRAM for the best performance.In this paper,we introduce a lightweight runtime solution that automatically and transparently manages data placement on HMS without the requirement of hardware modifications and disruptive change to applications.Leveraging online profiling and performance models,the runtime solution characterizes memory access patterns associated with data objects,and minimizes unnecessary data movement.Our runtime solution effectively bridges the performance gap between NVM and DRAM.We demonstrate that using NVM to replace the majority of DRAM can be a feasible solution for future HPC systems with the assistance of a software-based data management. 相似文献

4.

Advances in DRAM interfaces

Kumanoya M. Ogawa T. Inoue K. 《Micro, IEEE》1995,15(6):30-36

New advanced architectures in DRAM interfaces seek to close the ever-widening performance gap between DRAM and microprocessor and to break the bandwidth bottleneck in graphics systems. We present an overview of five of these interfaces: EDO, SDRAM, RDRAM, CDRAM, and 3D-RAM. EDO will soon replace conventional DRAM, and SDRAM will partly take over in 66-MHz and higher frequency systems. Other interfaces will initially find target markets that exploit their unique features, and then seek wider market acceptance. Eventually, advances in DRAM will contribute to the trend toward a system on a chip 相似文献

5.

The cache DRAM architecture: a DRAM with an on-chip cache memory

Hidaka H. Matsuda Y. Asakura M. Fujishima K. 《Micro, IEEE》1990,10(2):14-25

A DRAM (dynamic RAM) with an on-chip cache, called the cache DRAM, has been proposed and fabricated. It is a hierarchical RAM containing a 1-Mb DRAM for the main memory and an 8-kb SRAM (static RAM) for cache memory. It uses a 1.2-μm CMOS technology. Suitable for no-wait-state memory access in low-end workstations and personal computers, the chip also serves high-end systems as a secondary cache scheme. It is shown how the cache DRAM bridges the gap in speed between high-performance microprocessor units and existing DRAMs. The cache DRAM concept is explained, and its architecture is presented. The error checking and correction scheme used to improve the cache DRAM's reliability is described. Performance results for an experimental device are reported 相似文献

6.

A hybrid memory architecture supporting fine-grained data migration

Ye CHI Jianhui YUE Xiaofei LIAO Haikun LIU Hai JIN 《Frontiers of Computer Science》2024,18(2):182103

Hybrid memory systems composed of dynamic random access memory (DRAM) and Non-volatile memory (NVM) often exploit page migration technologies to fully take the advantages of different memory media. Most previous proposals usually migrate data at a granularity of 4 KB pages, and thus waste memory bandwidth and DRAM resource. In this paper, we propose Mocha, a non-hierarchical architecture that organizes DRAM and NVM in a flat address space physically, but manages them in a cache/memory hierarchy. Since the commercial NVM device–Intel Optane DC Persistent Memory Modules (DCPMM) actually access the physical media at a granularity of 256 bytes (an Optane block), we manage the DRAM cache at the 256-byte size to adapt to this feature of Optane. This design not only enables fine-grained data migration and management for the DRAM cache, but also avoids write amplification for Intel Optane DCPMM. We also create an Indirect Address Cache (IAC) in Hybrid Memory Controller (HMC) and propose a reverse address mapping table in the DRAM to speed up address translation and cache replacement. Moreover, we exploit a utility-based caching mechanism to filter cold blocks in the NVM, and further improve the efficiency of the DRAM cache. We implement Mocha in an architectural simulator. Experimental results show that Mocha can improve application performance by 8.2% on average (up to 24.6%), reduce 6.9% energy consumption and 25.9% data migration traffic on average, compared with a typical hybrid memory architecture–HSCC. 相似文献

7.

Toward multi-programmed workloads with different memory footprints: a self-adaptive last level cache scheduling scheme

Jingyu Zhang Minyi Guo Chentao Wu Yuanyi Chen 《中国科学:信息科学(英文版)》2018,61(1):012105

With the emerging of 3D-stacking technology, the dynamic random-access memory (DRAM) can be stacked on chips to architect the DRAM last level cache (LLC). Compared with static randomaccess memory (SRAM), DRAM is larger but slower. In the existing research papers, a lot of work has been devoted to improving the workload performance using SRAM and stacked DRAM together, ranging from SRAM structure improvement, to optimizing cache tag and data access. Instead, little attention has been paid to designing an LLC scheduling scheme for multi-programmed workloads with different memory footprints. Motivated by this, we propose a self-adaptive LLC scheduling scheme, which allows us to utilize SRAM and 3D-stacked DRAM efficiently, achieving better workload performance. This scheduling scheme employs (1) an evaluation unit, which is used to probe and evaluate the cache information during the process of programs being executed; and (2) an implementation unit, which is used to self-adaptively choose SRAM or DRAM. To make the scheduling scheme work correctly, we develop a data migration policy. We conduct extensive experiments to evaluate the performance of our proposed scheme. Experimental results show that our method can improve the multi-programmed workload performance by up to 30% compared with the state-of-the-art methods. 相似文献

8.

Universal Test Interface for embedded-DRAM testing

Miyano S. Sato K. Numata K. 《Design & Test of Computers, IEEE》1999,16(1):53-58

Because the configurations of embedded DRAM macros vary for each product, designers normally must customize the test circuitry for each product. The authors have developed circuitry (Universal Test Interface) that unifies testing regardless of the DRAM configuration and the number of macros on a chip. The Universal Test Interface alleviates the contradiction inherent in embedded DRAM testing 相似文献

9.

HSCS: a hybrid shared cache scheduling scheme for multiprogrammed workloads

Jingyu ZHANG Chentao WU Dingyu YANG Yuanyi CHEN Xiaodong MENG Liting XU Minyi GUO 《Frontiers of Computer Science》2018,12(6):1090-1104

The traditional dynamic random-access memory (DRAM) storage medium can be integrated on chips via modern emerging 3D-stacking technology to architect a DRAM shared cache in multicore systems. Compared with static random-access memory (SRAM), DRAM is larger but slower. In the existing research, a lot of work has been devoted to improving the workload performance using SRAM and stacked DRAM together in shared cache systems, ranging from SRAM structure improvement to optimizing cache tags and data access. However, little attention has been paid to designing a shared cache scheduling scheme for multiprogrammed workloads with different memory footprints in multicore systems. Motivated by this, we propose a hybrid shared cache scheduling scheme that allows a multicore system to utilize SRAM and 3D-stacked DRAM efficiently, thus achieving better workload performance. This scheduling scheme employs (1) a cache monitor, which is used to collect cache statistics; (2) a cache evaluator, which is used to evaluate the cache information during the process of programs being executed; and (3) a cache switcher, which is used to self-adaptively choose SRAM or DRAM shared cache modules. A cache data migration policy is naturally developed to guarantee that the scheduling scheme works correctly. Extensive experiments are conducted to evaluate the workload performance of our proposed scheme. The experimental results showed that our method can improve the multiprogrammed workload performance by up to 25% compared with state-of-the-art methods (including conventional and DRAM cache systems). 相似文献

10.

Reducing DRAM refresh power consumption by runtime profiling of retention time and dual-row activation

《Microprocessors and Microsystems》2020

Refresh power of dynamic random-access memory (DRAM) has become a critical problem due to the large memory capacity of servers and mobile devices. It is imperative to reduce refresh rate in order to reduce refresh power consumption. However, current methods of refresh rate improvement have limitations such as large area/performance overhead, low system availability, and lack of support at the memory controller. We propose a novel scheme which comprises three essential functions: (1) an adaptive refresh method that adjusts refresh period on each DRAM chip, (2) a runtime method of retention-time profiling that operates inside DRAM chips during idle time thereby improving availability, and (3) a dual-row activation method which improves weak cell retention time at a very small area cost. The proposed scheme allows each DRAM chip to refresh with its own refresh period without requiring the external support. Experiments based on real DRAM chip measurements show that the proposed methods can increase refresh period by 4.5 times at 58 °C by adjusting refresh period in a temperature-aware manner while incurring only a small overhead of 1.05% and 0.02% in DRAM chip area and power consumption, respectively. Below 58 °C, our method improves the refresh period by 12.5% compared with two state-of-the-art methods, AVATAR and in-DRAM ECC. In various memory configurations with SPEC benchmarks, our method outperforms the existing ones in terms of energy-delay product by 19.7% compared with the baseline, and by 15.4% and 12.4%, with respect to AVATAR and in-DRAM ECC, respectively. 相似文献

11.

基于DRAM和PCM的混合主存模拟器

张德志万寿红岳丽华《计算机系统应用》2017,26(9):16-23

相变存储器（PCM）由于其非易失性、高读取速度以及低静态功耗等优点,已成为主存研究领域的热点.然而,目前缺乏可用的PCM设备,这使得基于PCM的算法研究得不到有效验证.因此,本文提出了利用主存模拟器仿真并验证PCM算法的思路.本文首先介绍了现有主存模拟器的特点,并指出其并不能完全满足当前主存研究的实际需求,在此基础上提出并构建了一个基于DRAM和PCM的混合主存模拟器.与现有模拟器的实验比较结果表明,本文设计的混合主存模拟器能够有效地模拟DRAM和PCM混合存储架构,并能够支持不同形式的混合主存系统模拟,具有高可配置性.最后,论文通过一个使用示例说明了混合主存模拟器编程接口的易用性. 相似文献

12.

A case for studying DRAM issues at the system level

Jacob B. 《Micro, IEEE》2003,23(4):44-56

The widening gap between today's processor and memory speeds makes DRAM subsystem design an increasingly important part of computer system design. If the DRAM research community would follow the microprocessor community's lead by leaning more heavily on architecture- and system-level solutions in addition to technology-level solutions to achieve higher performance, the gap might begin to close. 相似文献

13.

Embedded DRAM development: Technology, physical design, andapplication issues

《Design & Test of Computers, IEEE》2001,18(3):7-15

Embedded DRAM provides advantages from a system point of view, along with many technical challenges. This article presents an overview of circuit, technology, and physical design methodology issues in embedded DRAM development and application 相似文献

14.

A hybrid fuzzy and neural approach for DRAM price forecasting

T. Chen Author Vitae 《Computers in Industry》2011,62(2):196-204

The trend in the price of dynamic random access memory (DRAM) is a very important prosperity index in the semiconductor industry. To further enhance the performance of DRAM price forecasting, a hybrid fuzzy and neural approach is proposed in this study. In the proposed approach, multiple experts construct their own fuzzy multiple linear regression models from various viewpoints to forecast the price of a DRAM product. Each fuzzy multiple linear regression model can be converted into two equivalent nonlinear programming problems to be solved. To aggregate these fuzzy price forecasts, a two-step aggregation mechanism is applied. At the first step, fuzzy intersection is applied to aggregate the fuzzy price forecasts into a polygon-shaped fuzzy number, in order to improve the precision. After that, a back propagation network is constructed to defuzzify the polygon-shaped fuzzy number and to generate a representative/crisp value, so as to enhance the accuracy. A real example is used to evaluate the effectiveness of the proposed methodology. According to experimental results, the proposed methodology improved both the precision and accuracy of DRAM price forecasting by 66% and 43%, respectively. 相似文献

15.

MCM8L4000A—4M byte低功耗DRAM存储器在80C198系统中的应用

马文江林家瑞《微处理机》1995,(1):31-34

本文介绍了MOTOROLA公司新近推出的DRAM－MCM8L4000A大容量存储器的特点及其在80C198单片机系统中的应用研究，并介绍了一种提高单片机寻址能力以及降低单片机系统功耗的有效方法。相似文献

16.

3D DRAM Design and Application to 3D Multicore Systems

Hongbin Sun Jibang Liu Anigundi R.S. Nanning Zheng Jian-Qiang Lu Rose K. Tong Zhang 《Design & Test of Computers, IEEE》2009,26(5):36-47

From a system architecture perspective, 3D technology can satisfy the high memory bandwidth demands that future multicore/manycore architectures require. This article presents a 3D DRAM architecture design and the potential for using 3D DRAM stacking for both L2 cache and main memory in 3D multicore architecture. 相似文献

17.

Discussing DRAM and CMOS Scaling with Inventor Bob Dennard

《Design & Test of Computers, IEEE》2008,25(2):188-191

This is one of a series of ongoing interviews in IEEE Design & Test with well-known engineers in the electronics industry. In this interview, Ken Wagner talks with IBM Fellow Bob Dennard, the inventor of DRAM. In addition to his foundational work in DRAM, Dennard is also well-known for his work in CMOS process scaling. 相似文献

18.

Hyper switching memory utilization on hybrid main memory for improved task execution and reduced power consumption

《Microprocessors and Microsystems》2020

The problem of lifetime maximization of PCM has been well studied. The arrival of non-volatile memory devices has replaced the traditional DRAM. Still the DRAM has many limitations on endurance and high power write operations. Similarly, number of designs has been discussed earlier to maximize the lifetime of PCM by catching the main memory at available DRAM. Still they could not achieve the performance on power consumption reduction and increasing memory utilization. To improve the performance in power consumption reduction and lifetime maximization, and categorical model is presented in this paper. The proposed method categorizes the processes according to their memory access activity. The categorized process has been allocated to respective part of hybrid memory which encourages maximum read and minimum write in PCM. The proposed method increases the lifetime of PCM than other methods. 相似文献

19.

Memory Subsystems in High-End Routers

Wang Feng Hamdi Mounir 《Micro, IEEE》2009,29(3):52-63

As Internet routers scale to support next-generation networks, their memory subsystems must also scale. Several solutions combine static RAM and dynamic RAM buffering but still have major scaling limitations. Using a parallel architecture and distributed memory-management algorithms with hybrid SRAM/DRAM improves buffering performance. The parallel hybrid SRAM/DRAM memory system is also work conserving, which is particularly important under light traffic conditions. 相似文献

20.

基于DRAM牺牲Cache的异构内存页迁移机制

裴颂文钱艺幻叶笑春刘海坤孔令和《计算机研究与发展》2022,59(3):568-581

当海量数据请求访问异构内存系统时,异构内存页在动态随机存储器(dynamic random access memory,DRAM)和非易失性存储器(non-volatile memory,NVM)之间进行频繁的往返迁移.然而,应用于传统内存页的迁移策略难以适应内存页"冷""热"度的快速动态变化,这使得从DRAM迁移至N... 相似文献