期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Multimedia LSI accelerator with embedded DRAM

Sase I. Shimizu N. Yoshikawa T. 《Micro, IEEE》1997,17(6):49-54

To succeed in the graphics controllers market, it is important to take advantage of embedded DRAMs, which provide low power consumption, low electromagnetic interference (EMI), smaller board space, and frame memory flexibility (capacity, access speed, and bandwidth). These capabilities benefit portable PC applications in which board space and power consumption are serious considerations. The MSM7680 accelerator is ideally suited for a compact multimedia system because of its smaller, embedded DRAM capacity. The MSM7680 provides high performance and/or a one-chip solution for many graphics display systems. It also integrates the frame buffer with graphics controller functions such as a 2D drawing engine, MPEG-1 decoder, digital/analog converter for RGB analog output, and a clock generator phase-locked loop. The MSM7680 multimedia accelerator uses a 256-bit data bus with embedded 1.25-Mbyte DRAM to increase the rate of data transfers and decrease power consumption 相似文献

2.

H.264编码器存储带宽分析及DRAM控制器设计

下载免费PDF全文

胡红旗许家栋孙景楠《计算机工程与应用》2009,45(14):141-144

在分析H.264/AVC编码过程中存储器带宽需求的基础上,提出一种DRAM控制器结构,并实现了几种不同调度策略的DRAM控制器结构设计。实现了令牌环、固定优先级和抢占式等三种结构,结合已有的存储空间映射方法,通过减少换行及Bank切换过程中的冗余周期,进一步提高存储器的带宽利用率。实验结果表明,提出的三种存储器结构中抢占式调度具有最高的宽利用率,可满足150 MHz时钟频率条件下HDTV1080P实时编码的应用。相似文献

3.

Memory access schedule minimization for embedded systems

Jingtong Hu Chun Jason Xue Wei-Che Tseng Qingfeng Zhuge Yingchao Zhao Edwin H.-M. Sha 《Journal of Systems Architecture》2012,58(1):48-59

The growing gap between microprocessor speed and DRAM speed is a major problem that computer designers are facing. In order to narrow the gap, it is necessary to improve DRAM’s speed and throughput. To achieve this goal, this paper proposes techniques to take advantage of the characteristics of the 3-stage access of contemporary DRAM chips by grouping the accesses of the same row together and interleaving the execution of memory accesses from different banks. A family of Bubble Filling Scheduling (BFS) algorithms are proposed in this paper to minimize memory access schedule length and improve memory access time for embedded systems.When the memory access trace is known in some application-specific embedded systems, this information can be fully utilized to generate efficient memory access schedules. The offline BFS algorithm can generate schedules which are 47.49% shorter than in-order scheduling and 8.51% shorter than existing burst scheduling on average. When memory accesses are received by the single memory controller in real time, the memory accesses have to be scheduled as they come. The online BFS algorithm in this paper serves this purpose and generates schedules which are 58.47% shorter than in-order scheduling and 4.73% shorter than burst scheduling on average. To improve the memory throughput and further reduce the memory access schedule, an architecture with dual memory controllers is proposed. According to the experimental results, the dual controller algorithm can generate schedules which are 62.89% shorter than in-order scheduling, 14.23% shorter than burst scheduling, and 10.07% shorter than single controller BFS algorithms on average. 相似文献

4.

Reducing DRAM refresh power consumption by runtime profiling of retention time and dual-row activation

《Microprocessors and Microsystems》2020

Refresh power of dynamic random-access memory (DRAM) has become a critical problem due to the large memory capacity of servers and mobile devices. It is imperative to reduce refresh rate in order to reduce refresh power consumption. However, current methods of refresh rate improvement have limitations such as large area/performance overhead, low system availability, and lack of support at the memory controller. We propose a novel scheme which comprises three essential functions: (1) an adaptive refresh method that adjusts refresh period on each DRAM chip, (2) a runtime method of retention-time profiling that operates inside DRAM chips during idle time thereby improving availability, and (3) a dual-row activation method which improves weak cell retention time at a very small area cost. The proposed scheme allows each DRAM chip to refresh with its own refresh period without requiring the external support. Experiments based on real DRAM chip measurements show that the proposed methods can increase refresh period by 4.5 times at 58 °C by adjusting refresh period in a temperature-aware manner while incurring only a small overhead of 1.05% and 0.02% in DRAM chip area and power consumption, respectively. Below 58 °C, our method improves the refresh period by 12.5% compared with two state-of-the-art methods, AVATAR and in-DRAM ECC. In various memory configurations with SPEC benchmarks, our method outperforms the existing ones in terms of energy-delay product by 19.7% compared with the baseline, and by 15.4% and 12.4%, with respect to AVATAR and in-DRAM ECC, respectively. 相似文献

5.

面向实时流处理的多核多线程处理器访存队列

田杭沛高德远樊晓桠朱怡安《计算机研究与发展》2009,46(10)

针对多核多线程处理器中乱序访存影响计算实时性的问题,在对典型访存队列进行研究的基础上提出了一种新的访存队列构建模型及其硬件结构.该模型采用窗口优化算法控制最差情况下的访存延迟,保证访存的实时性,同时又利用优化的乱序调度策略减少访存延迟.实验证明,该访存队列可控制最大访存延迟,与顺序访存相比,存储器具备更高的带宽,与传统的乱序访存相比较,可以充分满足计算的实时性需求,而存储器有效带宽基本不受影响,解决了多核多线程处理器承担实时流计算的基础难题. 相似文献

6.

A 510‐kb SOG‐DRAM for mobile displays with embedded frame memories

H. Haga Y. Nonaka Y. Kamon T. Otose D. Sasaki Y. Kitagishi T. Matsuzaki Y. Sato H. Asada 《Journal of the Society for Information Display》2006,14(4):339-344

Abstract— A system‐on‐glass (SOG) dynamic random access memory (DRAM), which enables the implementation of frame‐memory‐integrated displays, has been developed. A dynamic one‐transistor‐one‐capacitor memory cell, which has a data retention time of over 16.6 msec, and a compression/decompression (CODEC) circuit were developed to reduce the layout area and power. The CODEC enables an 18‐bit/pixel color display, while reducing the memory capacity from 18 to 12 bits/pixel. A frame‐memory macro was created by combining the SOG‐DRAM with an embedded controller that enables independent access for writing and reading. Its operation was verified by chip measurement and demonstration as a frame‐memory operation of 262k‐color QCIF+ displays. The work reported in this paper was the first step to creating a Zero‐Chip Display with an integrated frame memory, and it proved the concept was feasible. 相似文献

7.

新型存储控制器的研究设计 总被引：1，自引：0，他引：1

黄可望《计算机工程与设计》2006,27(6):1065-1068

随着存储器的不断发展及系统功能的不断增强，系统对存储控制器提出了更高的要求。为此，需要开发成本低、效率高、应用广泛的新型存储器控制器。本新型存储控制器是基于FPGA的设计方案采用自顶向下（TOP-DOWN）的设计思想，遵循FPGA的设计流程实现的。新型存储控制器设计了3类接口：存储器接口、MPU／MCU接口、USB接口。存储控制器通过存储器接口来控制存储器；MPU／MCU接口可以控制存储器接口和存储控制器的状态；USB接口可以连接PC和带有USB接口的设备。此新型存储控制器可用于工业、PC、数字设备、信息家电等多个领域，有广泛的技术性和实用性。相似文献

8.

基于VHDL和CPLD的协调控制器系统总线桥

袁朝辉朱伟徐鹏《微处理机》2005,26(4):79-81,85

本文介绍了一种使用VHDL语言设计、在一片EMP7128S160 CPLD芯片上实现的总线桥。该总线桥是某型协调控制器系统的通讯核心，实现了系统下层的背板总线、背板总线管理器与上位机EPP并口之间的协议转换以及通讯仲裁功能。该桥为系统下层提供了高速的实时数据传输通道，也为上位机访问下层系统提供了完全的访问通道，使整套控制器兼具高性能、易调试两大特点。相似文献

9.

基于FPGA内存数据保护技术的设计与实现

下载免费PDF全文

李仁刚任智新王江为阚宏伟张闯公维锋《计算机工程与应用》2020,56(13):72-76

为了提高内存数据的可靠性,内存保护技术正广泛应用在高端容错计算机中。为此,提出了以现场可编程门阵列（FPGA）为控制器实现一拖二的内存热备份技术,对内存数据进行高效的保护。分析FPGA内部接口IP的延时后,提出了采用FPGA原语实现双倍数据速率（DDR）数据的采集与处理方法。搭建了以镁光的同步动态随机存取存储器（SDRAM）颗粒为控制对象的仿真模型,验证了该方法的有效性。阐述了以赛灵思公司的FPGA芯片做主控器,实现内存热备份功能的应用实例。该方法不仅可有效保护内存数据,由于FPGA的可编程性,使计算机系统具备了在线扩展（容量）、在线升级内存的功能,可以满足特殊行业不宕机、实时容错的要求。相似文献

10.

基于主存访问相关解决等技术的高带宽主存控制器设计

白锋程旭《计算机工程与应用》2003,39(26):125-128

随着现代处理器和缓存技术的发展,当代计算机系统的性能日益受到主存系统的制约,对主存带宽的需求将越来越大。论文提出主存访问相关解决、主存访问动态调度和地址重映射三项技术,利用主存访问自身的特性(局部性)、同步DRAM自身的物理特性(操作的并行性)和二者之间的关系(地址映射),设计了新型、高带宽主存控制器,有效地提高了主存系统的带宽。相似文献

11.

基于DRAM和PCM的混合主存模拟器

张德志万寿红岳丽华《计算机系统应用》2017,26(9):16-23

相变存储器（PCM）由于其非易失性、高读取速度以及低静态功耗等优点,已成为主存研究领域的热点.然而,目前缺乏可用的PCM设备,这使得基于PCM的算法研究得不到有效验证.因此,本文提出了利用主存模拟器仿真并验证PCM算法的思路.本文首先介绍了现有主存模拟器的特点,并指出其并不能完全满足当前主存研究的实际需求,在此基础上提出并构建了一个基于DRAM和PCM的混合主存模拟器.与现有模拟器的实验比较结果表明,本文设计的混合主存模拟器能够有效地模拟DRAM和PCM混合存储架构,并能够支持不同形式的混合主存系统模拟,具有高可配置性.最后,论文通过一个使用示例说明了混合主存模拟器编程接口的易用性. 相似文献

12.

PC机主存储器组成结构分析与设计

米根锁王瑞峰《微机发展》2006,16(10):205-206

从DRAM的发展及应用特点出发,针对使用DRAM构成计算机主存时应解决的主存空间及寻址、多体交叉访问构成并行主存结构、动态刷新等问题,以采用DRAM控制器W4006AF构成80386微机主存的设计为例,对主存的构成及工作原理进行了详细分析,对于分析和设计计算机主存具有很好的参考价值。相似文献

13.

The message-driven processor: a multicomputer processing node withefficient mechanisms

Dally W.J. Fiske J.A.S. Keen J.S. Lethin R.A. Noakes M.D. Nuth P.R. Davison R.E. Fyler G.A. 《Micro, IEEE》1992,12(2):23-39

The message-driven processor (MDP), a 36-b, 1.1-million transistor, VLSI microcomputer, specialized to operate efficiently in a multicomputer, is described. The MDP chip includes a processor, a 4096-word by 36-b memory, and a network port. An on-chip memory controller with error checking and correction (ECC) permits local memory to be expanded to one million words by adding external DRAM chips. The MDP incorporates primitive mechanisms for communication, synchronization, and naming which support most proposed parallel programming models. The MDP system architecture, instruction set architecture, network architecture, implementation, and software are discussed 相似文献

14.

Cached DRAM for ILP processor memory access latency reduction

Zhang Z. Zhu Z. Zhang X. 《Micro, IEEE》2001,21(4):22-32

Cached DRAM adds a small cache onto a DRAM chip to reduce average DRAM access latency. The authors compare cached DRAM with other advanced DRAM techniques for reducing memory access latency in instruction-level-parallelism processors 相似文献

15.

Unimem: Runtime Data Management on Non-Volatile Memory-Based Heterogeneous Main Memory for High Performance Computing

下载免费PDF全文

Kai Wu Dong Li 《计算机科学技术学报》2021,36(1):90-109

Non-volatile memory(NVM)provides a scalable and power-efficient solution to replace dynamic random access memory(DRAM)as main memory.However,because of the relatively high latency and low bandwidth of NVM,NVM is often paired with DRAM to build a heterogeneous memory system(HMS).As a result,data objects of the application must be carefully placed to NVM and DRAM for the best performance.In this paper,we introduce a lightweight runtime solution that automatically and transparently manages data placement on HMS without the requirement of hardware modifications and disruptive change to applications.Leveraging online profiling and performance models,the runtime solution characterizes memory access patterns associated with data objects,and minimizes unnecessary data movement.Our runtime solution effectively bridges the performance gap between NVM and DRAM.We demonstrate that using NVM to replace the majority of DRAM can be a feasible solution for future HPC systems with the assistance of a software-based data management. 相似文献

16.

A hybrid memory built by SSD and DRAM to support in-memory Big Data analytics

Zhiguang Chen Yutong Lu Nong Xiao Fang Liu 《Knowledge and Information Systems》2014,41(2):335-354

Big Data requires a shift in traditional computing architecture. The in-memory computing is a new paradigm for Big Data analytics. However, DRAM-based main memory is neither cost-effective nor energy-effective. This work combines flash-based solid state drive (SSD) and DRAM together to build a hybrid memory, which meets both of the two requirements. As the latency of SSD is much higher than that of DRAM, the hybrid architecture should guarantee that most requests are served by DRAM rather than by SSD. Accordingly, we take two measures to enhance the hit ratio of DRAM. First, the hybrid memory employs an adaptive prefetching mechanism to guarantee that data have already been prepared in DRAM before they are demanded. Second, the DRAM employs a novel replacement policy to give higher priority to replace data that are easy to be prefetched because these data can be served by prefetching once they are demanded once again. On the contrary, the data that are hard to be prefetched are protected by DRAM. The prefetching mechanism and replacement policy employed by the hybrid memory rely on access patterns of files. So, we propose a novel pattern recognition method by improving the LZ data compression algorithm to detect access patterns. We evaluate our proposals via prototype and trace-driven simulations. Experimental results demonstrate that the hybrid memory is able to extend the DRAM by more than twice. 相似文献

17.

A hybrid memory architecture supporting fine-grained data migration

Ye CHI Jianhui YUE Xiaofei LIAO Haikun LIU Hai JIN 《Frontiers of Computer Science》2024,18(2):182103

Hybrid memory systems composed of dynamic random access memory (DRAM) and Non-volatile memory (NVM) often exploit page migration technologies to fully take the advantages of different memory media. Most previous proposals usually migrate data at a granularity of 4 KB pages, and thus waste memory bandwidth and DRAM resource. In this paper, we propose Mocha, a non-hierarchical architecture that organizes DRAM and NVM in a flat address space physically, but manages them in a cache/memory hierarchy. Since the commercial NVM device–Intel Optane DC Persistent Memory Modules (DCPMM) actually access the physical media at a granularity of 256 bytes (an Optane block), we manage the DRAM cache at the 256-byte size to adapt to this feature of Optane. This design not only enables fine-grained data migration and management for the DRAM cache, but also avoids write amplification for Intel Optane DCPMM. We also create an Indirect Address Cache (IAC) in Hybrid Memory Controller (HMC) and propose a reverse address mapping table in the DRAM to speed up address translation and cache replacement. Moreover, we exploit a utility-based caching mechanism to filter cold blocks in the NVM, and further improve the efficiency of the DRAM cache. We implement Mocha in an architectural simulator. Experimental results show that Mocha can improve application performance by 8.2% on average (up to 24.6%), reduce 6.9% energy consumption and 25.9% data migration traffic on average, compared with a typical hybrid memory architecture–HSCC. 相似文献

18.

PARBLO: Page-Allocation-Based DRAM Row Buffer Locality Optimization 总被引：1，自引：1，他引：0

下载免费PDF全文

Wei Mi 《计算机科学技术学报》2009,24(6):1086-1097

DRAM row buffer conflicts can increase memory access latency significantly. This paper presents a new page-allocation-based optimization that works seamlessly together with some existing hardware and software optimizations to eliminate significantly more row buffer conflicts. Validation in simulation using a set of selected scientific and engineering benchmarks against a few representative memory controller optimizations shows that our method can reduce row buffer miss rates by up to 76% (with an average of 37.4%). This reduction in row buffer miss rates will be translated into performance speedups by up to 15% (with an average of 5%). 相似文献

19.

Toward multi-programmed workloads with different memory footprints: a self-adaptive last level cache scheduling scheme

Jingyu Zhang Minyi Guo Chentao Wu Yuanyi Chen 《中国科学:信息科学(英文版)》2018,61(1):012105

With the emerging of 3D-stacking technology, the dynamic random-access memory (DRAM) can be stacked on chips to architect the DRAM last level cache (LLC). Compared with static randomaccess memory (SRAM), DRAM is larger but slower. In the existing research papers, a lot of work has been devoted to improving the workload performance using SRAM and stacked DRAM together, ranging from SRAM structure improvement, to optimizing cache tag and data access. Instead, little attention has been paid to designing an LLC scheduling scheme for multi-programmed workloads with different memory footprints. Motivated by this, we propose a self-adaptive LLC scheduling scheme, which allows us to utilize SRAM and 3D-stacked DRAM efficiently, achieving better workload performance. This scheduling scheme employs (1) an evaluation unit, which is used to probe and evaluate the cache information during the process of programs being executed; and (2) an implementation unit, which is used to self-adaptively choose SRAM or DRAM. To make the scheduling scheme work correctly, we develop a data migration policy. We conduct extensive experiments to evaluate the performance of our proposed scheme. Experimental results show that our method can improve the multi-programmed workload performance by up to 30% compared with the state-of-the-art methods. 相似文献

20.

CIMS中智能设备通讯控制器的研究

杨忠王又钧《计算机应用》1995,15(6):52-54

ＰＣ系列机面向智能测控设备的通讯控制器（通讯卡）是很多应用系统中的关键环节，它起着上、下两级ＣＰＵ之间的桥梁作用。在工业集散系统和计算机集成制造系统（ＣＩＭＳ）中，它负责管理工作站和下位机（现场智能测控设备）之间的信息传输。本文介绍了用８０９８单片机作ＣＰＵ实现的智能通讯控制器，重点分析了双端口存贮器ＩＤＴ７１３０作为ＫＣＰＵ之间信息交换的实现方式，设计新颖紧凑实用。相似文献