期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

刘彬彭蔓蔓《计算机应用研究》2007,24(10):267-268,285

提出了一种基于分离比较cache的设计方法,其技术关键在于设计一个用来存储原标志低四位的全相联cache和分离标志比较器,以确保同时获得高性能和低能量损耗.SPEC95仿真结果表明,分离比较cache能够节省传统四路组相联cache13%的存取时间和45%～60%的能量损耗. 相似文献

2.

一种低功耗动态可重构cache方案 总被引：1，自引：0，他引：1

赵欢苏小昆李仁发《计算机应用》2009,29(5):1446-1451

嵌入式系统中,处理器功耗是十分受关注的,研究表明嵌入式系统中cache存储器的功耗占处理器总功耗的30%～60%。为此提出一种低功耗动态可重构的cache方案Tournament cache,该cache方案通过在传统cache结构的基础上增加三个计数器和一个寄存器,在程序运行的过程中,根据计数器统计的结果动态调整cache的相联度,使得相联度在1、2或4路之间变化,以适应不同程序段的需要,从而降低系统的功耗。实验结果表明,此cache方案对比传统的四路组相联的cache能耗节省超过40%,而且性能的降低几乎可以忽略。相似文献

3.

存储器替换机制及其实现

A·Chaib 胡铭曾《计算机工程与设计》2000,21(6):24-27

提出一种解决PACT01一种结合动态可编程逻辑阵列(DPGA)的处理器的新型体系结制中cache的一致性与同步性问题的算法,并且解决多线程支持的快速上下文切换及快速用户级操作问题。存储器替换机制是解决cache的一致性问题及当cache未命中时从局部或远程存储器到cacbe存储器的数据替换问题的一种硬件实现方法,产生冲突的原因是由于多线程并行的写入／读取的位置相同和读或写的位置相同。文中选择的是相联映射策略,同时也选择了最少最近使用LRU算法,即在cache未命中时替换最少最近使用的参考块,为实现LRU算法设置了与每块相对应的计数器。相似文献

4.

一种低功耗动态可重构cache算法的研究 总被引：1，自引：0，他引：1

任小西刘清《计算机应用研究》2013,30(2):414-416

动态可重构cache算法根据指令时间数监测程序段的变化,确定容量调整.在程序段内,状态机根据平均访问时间对cache的访问进行预判,然后根据预判的结果确定当前程序段的cache结构.实验结果表明,此算法比传统四路组相联cache功耗降低61％,而性能损失只有2％左右.与已有算法相比,功耗和性能都得到进一步的提高. 相似文献

5.

一种基于预比较的低功耗高速缓存设计

彭瑞华付宇卓《微计算机信息》2007,23(29):244-246

介绍了一种采用预比较方法的高速缓存结构。通过标志段的预比较来避免对无关标志段和数据段的访问以降低访问功耗。并引入反相时钟来优化其访问时序，使平均访问延时少于一个周期。实验显示，在保持命中率的基础上，对测试程序的访存优化表现出很好一致性，且功耗优势随相联度增加而增大。相比预测型结构，在8路相联度下平均有28．5％的功耗降低。相似文献

6.

TM-CAM：一种高效的容软错误相联存储器

孙岩黎铁军王发源张民选《计算机工程与科学》2014,36(4):584-588

相联存储器是集成电路中对软错误最敏感的部件之一,但是其结构特点决定了不能使用错误保护码等传统容错方法进行保护。提出了一种容软错误的相联存储器结构TM CAM,通过采用三值匹配线机制和仔细设计的三值灵敏放大器,能够检测相联存储器中的任意一位错误,其结构简单高效。基于该结构,还提出了TM CAM的访问算法。实验表明,TM CAM能够以很小的开销有效地缓解相联存储器中的软错误问题。相似文献

7.

一种SPM周期准确功耗模型分析与实现

下载免费PDF全文

胡志刚赵庆福蒋湘涛《计算机工程与应用》2010,46(2):63-65

功耗问题是限制嵌入式设备发展的瓶颈之一。嵌入式系统中,为了降低嵌入式处理器的整体功耗,使用SPM（Scratch-Pad Memory）部件来替换cache部件。提出了一个SPM周期准确功耗模型。模型通过扩展SimpleScalar模拟器模拟程序执行时对SPM的访问,获得电路输入状态,并利用集成到模拟器中周期准确的SPM功耗模型计算SPM功耗,模型克服了电路级模型可扩展性较差的缺陷,通过在SimpleScalar中配置相关参数,模拟不同大小和结构SPM的功耗。实验表明模型能够准确模拟SPM功耗（误差不超过10%）。对SPM低功耗设计和优化具有一定的指导意义。相似文献

8.

低功耗高性能的分离比较cache方案*

刘彬彭蔓蔓《计算机应用研究》2007,24(10):267-268

提出了一种基于分离比较cache的设计方法,其技术关键在于设计一个用来存储原标志低四位的全相联cache和分离标志比较器,以确保同时获得高性能和低能量损耗。SPEC95仿真结果表明,分离比较cache能够节省传统四路组相联cache13%的存取时间和45%~60%的能量损耗。相似文献

9.

组相联Cache中漏流功耗优化技术研究

张承义张民选邢座程《小型微型计算机系统》2007,28(2):372-375

随着集成电路制造工艺进入超深亚微米阶段,漏电流功耗在微处理器总功耗中所占的比例越来越大,在开发新的低漏流工艺和电路技术之外,如何在体系结构级控制和优化漏流功耗成为业界研究的热点.Cache在微处理器中面积最大,是进行漏流控制和优化的首要部件.本文提出了一种LRU-assist算法,利用既有的LRU信息,在保证处理器性能不受影响的前提下,cache的平均关闭率可达53%,大大降低了漏电流功耗. 相似文献

10.

存储器替换机制及其实现用于PACT01：一个面向多线程的结合DPGA的新型处理器

Chaib A 胡铭曾《计算机工程与设计》2000,21(6):24-27

提出一种解决PACT01：一种结合动态可编程逻辑阵列（DPGA）的处理器的新型体系结制中cache的一致性与同步性问题的算法,并且解决多线程支持的快速上下文切换及快速用户级操作问题。存储器替换机制是解决cache的一致性问题及当cache未命中时从局部或远程存储器到cache存储器的数据替换问题的一种硬件实现方法,产生冲突的原因是由于多线程并行的写入／读取的位置相同和读或写的位置相同,文中选择的是相联映射策略,同时也选择了最少最近使用LRU算法,即在cache未命中时替换最少最近使用的参考块,为实现LRU算法设置了与每块相对应的计数器。相似文献

11.

Linux系统下实现网络内存共享关键技术探析

黄丽娟《电脑学习》2011,(2):42-43

网络内存共享的出现主要借鉴了传统的网格计算技术和集群内存共享技术。论文从动态函数截获,缓存数据组织与管理和异步缓存数据写入三方面探讨了网络内存共享的关键技术。相似文献

12.

A new cache architecture based on temporal and spatial locality 总被引：5，自引：0，他引：5

Jung-Hoon Jang-Soo Shin-Dug 《Journal of Systems Architecture》2000,46(15):1451-1467

A data cache system is designed as low power/high performance cache structure for embedded processors. Direct-mapped cache is a favorite choice for short cycle time, but suffers from high miss rate. Hence the proposed dual data cache is an approach to improve the miss ratio of direct-mapped cache without affecting this access time. The proposed cache system can exploit temporal and spatial locality effectively by maximizing the effective cache memory space for any given cache size. The proposed cache system consists of two caches, i.e., a direct-mapped cache with small block size and a fully associative spatial buffer with large block size. Temporal locality is utilized by caching candidate small blocks selectively into the direct-mapped cache. Also spatial locality can be utilized aggressively by fetching multiple neighboring small blocks whenever a cache miss occurs. According to the results of comparison and analysis, similar performance can be achieved by using four times smaller cache size comparing with the conventional direct-mapped cache.And it is shown that power consumption of the proposed cache can be reduced by around 4% comparing with the victim cache configuration. 相似文献

13.

基于PCIe的多路传输系统的DMA控制器设计

李胜蓝姜宏旭符炜剑陈姣《计算机应用》2017,37(3):691-694

为了避免PCIe传输过程中PIO写延时、主机与嵌入式处理系统交互次数过多等问题对于传输带宽的影响,设计了一种基于命令缓冲机制的直接存储访问（DMA）控制器以提高传输带宽利用率。采用FPGA端内部设置命令缓冲区的方式,使得DMA控制器可以缓存PC端的数据传输请求,FPGA根据自身需求动态地访问PC端存储空间,增强了传输灵活性;同时,提出一种动态拼接的DMA调度方法,通过合并相邻存储区访问请求的方式,进一步减少主机与硬件的交互次数和中断产生次数。系统传输速率测试实验中,DMA写最高速率可达1631 MB/s,DMA读最高速率可达1582 MB/s,带宽最大值可达PCIe总线理论带宽值的85.4%;与传统PIO方式的DMA传输方法相比,DMA读带宽提升58%,DMA写带宽提升36%。实验结果表明,本设计能够有效提升DMA传输效率,明显优于PIO方式。相似文献

14.

Filtering directory lookups in CMPs

A. Bosque V. Viñals P. Ibáñez J.M. Llaber?´aAuthor vitae 《Microprocessors and Microsystems》2011,35(8):695-707

Coherence protocols consume an important fraction of power to determine which coherence action to perform. Specifically, on CMPs with shared cache and directory-based coherence protocol implemented as a duplicate of local caches tags, we have observed that a big fraction of directory lookups cause a miss, because the block looked up is not allocated in any local cache. To reduce the number of directory lookups and therefore the power consumption, we propose to add a filter before the directory access.We introduce two filter implementations. In the first one, filtering information is explicitly kept in the shared cache for every block. In the second one, filtering information is decoupled from the shared cache organization, so the filter size does not depend on the shared cache size.We evaluate our filters in a CMP with 8 in-order processors with 4 threads each and a memory hierarchy with write-through local caches and a shared cache. We show that, for SPLASH2 benchmarks, the proposed filters reduce the number of directory lookups performed by 60% while power consumption is reduced by ∼28%. For Specweb2005, the number of directory lookups performed is reduced by 68% (44%), while directory power consumption is reduced by 19% (9%) using the first (second) filter implementation. 相似文献

15.

软件定义车联网中缓存辅助的NOMA功率分配方案研究

顾金媛章国安张鸿来《计算机应用研究》2022,39(8)

由于对丰富多媒体服务的需求日益增长,车联网需要提供海量的设备连接以满足高频谱效率和低延迟的需求。软件定义网络（SDN）、缓存和非正交多址接入（NOMA）被认为是有效解决这些关键挑战的潜在技术。针对软件定义车联网,提出了一种缓存辅助的NOMA功率分配方案。首先,针对车联网中车辆总是处于高速运动状态的特点,提出了一种新的簇头选择算法,到达的道路交通将借助SDN进行预测,实现自适应车辆分簇。其次,引入了缓存辅助的NOMA方案,每个车辆在文件缓存阶段使用NOMA原理缓存和请求文件。再次,针对双Nakagami-m衰落条件下的两个簇头车辆通信场景,提出了一种最优功率分配策略,将优化问题公式化为找到每辆车的最佳功率曲线,从而最大化地在每辆车上成功解码目标文件的概率。最后,数值仿真和理论分析表明,所提缓存辅助NOMA功率分配方案,性能明显优于传统的NOMA和缓存辅助的OMA。相似文献

16.

面向低功耗的多核处理器Cache设计方法

方娟郭媚杜文娟雷鼎《计算机应用》2013,33(9):2404-2409

针对多核处理器下的共享二级缓存(L2 Cache)提出了一种面向低功耗的Cache设计方案(LPD)。在LPD方案中,分别通过低功耗的共享Cache混合划分算法(LPHP)、可重构Cache算法(CRA)和基于Cache划分的路预测算法(WPP-L2)来达到降低Cache功耗的目的,同时保证系统的性能良好。在LPHP和CRA中,程序运行时动态地关闭Cache中空闲的Cache列,节省了对空闲列的访问功耗。在WPP-L2中,利用路预测技术在Cache访问前给出预测路信息,预测命中时则可用最短的访问延时和最少的访问功耗完成Cache访问;预测失效时,则结合Cache划分策略,降低由路预测失效导致的额外功耗开销。通过SPEC2000测试程序验证,与传统使用最近最少使用(LRU)替换策略的共享L2 Cache相比,本方案提出的三种算法虽然对程序执行时间稍有影响,但分别节省了20.5%、17%和64.6%的平均L2 Cache访问功耗,甚至还提高了系统吞吐率。实验表明,所提方法在保持系统性能的同时可以显著降低多核处理器的功耗。相似文献

17.

Reducing cache and TLB power by exploiting memory region and privilege level semantics

《Journal of Systems Architecture》2013,59(6):279-295

The L1 cache in today’s high-performance processors accesses all ways of a selected set in parallel. This constitutes a major source of energy inefficiency: at most one of the N fetched blocks can be useful in an N-way set-associative cache. The other N-1 cachelines will all be tag mismatches and subsequently discarded.We propose to eliminate unnecessary associative fetches by exploiting certain software semantics in cache design, thus reducing dynamic power consumption. Specifically, we use memory region information to eliminate unnecessary fetches in the data cache, and ring level information to optimize fetches in the instruction cache. We present a design that is performance-neutral, transparent to applications, and incurs a space overhead of mere 0.41% of the L1 cache.We show significantly reduced cache lookups with benchmarks including SPEC CPU, SPECjbb, SPECjAppServer, PARSEC, and Apache. For example, for SPEC CPU 2006, the proposed mechanism helps to reduce cache block fetches from the data and instruction caches by an average of 29% and 53% respectively, resulting in power savings of 17% and 35% in the caches, compared to the aggressively clock-gated baselines. 相似文献

18.

指令Cache的替换策略

邢二保周兴铭《计算机学报》1993,16(6):424-430

本文用理论分析和程序模拟的方法分析了指令Cache的替换策略和组织,用程序的循环模式研究了Cache的替换策略和组织,得出随机替换策略优于LRU和FIFO策略,在一定条件下,直接相联和组相联优于全相联映象算法,分析指令踪迹模拟结果表明,循环模式是Cache行为的较好的解释。相似文献

19.

A Lock-Based Cache Coherence Protocol for Scope Consistency 总被引：5，自引：2，他引：5

下载免费PDF全文

Hu Weiwu Shi Weisong Tang Zhimin Li Ming 《计算机科学技术学报》1998,13(2):97-109

Directory protocols are widely adopted to maintain cache coherence of distributed shared memory multiprocessors.Although scalable to a certain extent,directory protocols are complex enough to prevent it from being used in very large scale multiprocessors with tens of thousands of nodes.his paper proposes a lock-based cache coherence protocol for scope consistency.In does not rely on directory information to maintain cache coherence.Instead,cache coherence is maintained through requiring the releasing processor of a lock to stroe all write-notices generated in the associated critical section to the lock and the acquiring processor invalidates or updates its locally cached data copies according to the write notices of the lock.To evaluate the performance of the lock-based cache coherence protocol,a software SDM system named JIAJIA is built on network of workstations.Besides the lock-based cache coherence protocol,JIAJIA also characterizes itself with its shared memory organization scheme which combines the physical memories of multiple workstations to form a large shared space.Performance measurements with SPLASH2 program suite and NAS benchmarks indicate that,compared to recent SVM systems such as CVM,higher speedup is achieved by JIAJIA.Besides,JIAJIA can solve large scale problems that cannot be solved by other SVM systems due to memory size limitation. 相似文献