期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

高速缓存单元原理及设计实现

孙佳佳刘财坤《微处理机》2014,(3):8-10

Cache是一种容量小、速度快的存储器阵列,位于主存和CPU内核之间,保存着最近一段时间处理器涉及到的主存块内容。为了改善系统性能,CPU尽可能从Cache中读取数据,减小慢速存储器给CPU内核造成的存储器访问瓶颈问题的影响。相似文献

2.

Cache模拟系统的设计和实现

陈强超《现代计算机》2004,(4):74-77

Cache即高速缓冲存储器,位于CPU与主存之间,是现代计算机不可缺少的组成部分,其性能的高低可直接影响计算机整体的工作效率.本文用软件模拟硬件的方法,依据Cache的基本工作原理,分析了Cache在不同工作策略下的命中率问题. 相似文献

3.

您知道Cache吗?

沈小青《数字社区&智能家居》1997,(5)

一、引言随着计算机应用领域的不断扩大,应用程序对计算机处理速度的要求也越来越高,CPU主频已由原来的几十兆赫发展到二百多兆赫。但是,由于目前主存RAM相对低速,致使CPU并未达到其应有的运行速度,从而降低了整机的性能,而高速缓冲存储(Cache)技术的面世较大地提高了整机的速度和性能。二、Cache存储器功能简介 Cache介于CPU与主存之间,采用与CPU相同类型的半导体集成电路,其传输速度明显快于RAM,并且能被CPU直接访问。Cache不仅与CPU 相似文献

4.

嵌入式设备中片上存储器的有效使用方法 总被引：1，自引：0，他引：1

林邦怀《单片机与嵌入式系统应用》2007,(2):67-69

引言随着CPU速度的迅速提高,CPU与片外存储器的速度差异越来越大,匹配CPU与外部存储器的方法通常是采用Cache或者片上存储器.微处理器中的片上存储器结构通常包含指令Cache、数据Cache或者片上存储器. 相似文献

5.

密码嵌入式处理器中高速缓存的研究与设计

王晓燕杨先文陈海民《计算机工程与设计》2012,33(8):3000-3005

为了提高密码嵌入式处理器的运行效率,给出了一种哈佛结构的高速缓存(Cache)设计,包括指令Cache(iCache)和数据Cache(dCache)。采用双端口RAM和较低的硬件开销设计了标签存储器和指令/数据存储器,并描述了iCache和dCache控制流程。实现时配置iCache容量为4KB、dCache容量为8KB,并完成了向密码嵌入式处理器的集成。FPGA验证结果表明其满足处理器的应用要求;性能分析结果表明,采用Cache比处理器直接访问主存在速度上至少提高5.26倍。相似文献

6.

点击IT术语

《电脑技术——Hello-IT》2002,(2)

本期继续介绍有关CPU的几个术语。缓存（Cache）：由于CPU的速度越来越快，与速度较低的动态存储器DRAM配合工作时往往需要插入等待状态，这样就难以发挥出CPU的优越性能。为了解决这一问题，在此传输过程中放置一高速的静态存储器（SRAM），存储CPU经常使用的数据和指令，这样可以大大提高数据传输速度，这就是缓存，又分一级缓存和二级缓存。一级缓存（L1 Cache）：又称内部缓存或片内缓存，集成在CPU内部，用于CPU在处理数据过程中数据的暂时保存。L1 Cache的速度一般与CPU内核速度相同，是所有Cache中最快的。L1 Cache的… 相似文献

7.

一种基于路由器Cache的一致性协议

潘国腾汪波谢伦国张民选《计算机工程》2002,28(7):72-74

在大规模并分布式共享主存多处理机系统中，尽可能减少系统中远程访问时延，是提高系统整体性能的关键。该文提出了一种路由Cache结构，并详细介绍了基于路由器Cache的一致性协议，该协议在减少系统远程访问延时，提高系统有效带宽方面有较好的效果。相似文献

8.

pT-树:高速缓存优化的主存数据库索引结构

杨朝辉王立松《计算机科学》2011,38(10):161-165

随着主存速度和现代处理器速度之间的差距逐渐扩大,系统对主存的存取访问成为新的瓶颈,Cache行为对主存数据库系统更加重要.索引技术是主存数据库系统设计的关键部分.在CST-树的基础上应用预取技术提高查找操作的性能,提出了一种Cache优化的索引结构预取T-树(pT-tree).pT-树使用预取技术有效地创建比正常数据传... 相似文献

9.

多处理器共享缓存设计与实现 总被引：1，自引：0，他引：1

张剑飞《计算机与数字工程》2008,36(9)

高速缓存作为中央处理器(CPU)与主存之间的小规模快速存储器,解决了两者数据处理速度的平衡和匹配问题,有助于提高系统整体性能.多处理器(SMP)支持共享和私有数据的缓存,Cache一致性协议用于维护由于多个处理器共享数据引发的多处理器数据一致性问题.论述了一个适用于64位多核处理器的共享缓存设计,包括如何实现多处理器缓存一致性及其全定制后端实现. 相似文献

10.

SMP Linux中进程与CPU绑定的实现 总被引：1，自引：0，他引：1

安智平张德运高鹏《小型微型计算机系统》2002,23(3):377-380

在目前的 SMP系统中 ,一个进程有可能运行在任何一个 CPU上 .本文在分析 L inux调度机制的基础上 ,提出一种把一个进程绑定到某一个 CPU上以及把一个 CPU绑定到某个进程上的方法 ,用来提高 CPU内部 Cache的命中率 ,从而提高系统的运行效率相似文献

11.

DSP中指令Cache的低功耗设计

下载免费PDF全文

杨晓刚屈凌翔张树丹《计算机工程与应用》2011,47(32):82-86

设计了一种低功耗指令Cache：通过在CPU与一级指令Cache之间加入Line Buffer,来减少CPU对指令Cache的访问次数,从而降低指令Cache的功耗。此外在Line Buffer控制器中添加了重装控制单元,当指令Cache发生缺失时,能将片外存储单元中的指令直接送给CPU,从而最大限度地减少由于Cache缺失所引起CPU取指的延迟。经验证,该设计在降低功耗的同时,还提升了指令Cache的性能。相似文献

12.

Threaded Prefetching: A New Instruction Memory Hierarchy for Real-Time Systems

Lee Minsuk Min Sang Lyul Shin Heonshik Kim Chong Sang Park Chang Yun 《Real-Time Systems》1997,13(1):47-65

Cache memories have been extensively used to bridge the speed gap between high speed processors and relatively slow main memory. However, they are not widely used in real-time systems due to their unpredictable performance. This paper proposes an instruction prefetching scheme called threaded prefetching as an alternative to instruction caching in real-time systems. In the proposed threaded prefetching, an instruction block pointer called a thread is assigned to each instruction memory block and is made to point to the next block on the worst case execution path that is determined by a compile-time analysis. Also, the thread is not updated throughout the entire program execution to guarantee predictability. This paper also compares the worst case performances of various previous instruction prefetching schemes with that of the proposed threaded prefetching. By analyzing several benchmark programs, we show that the worst case performance of the proposed scheme is significantly better than those of previous instruction prefetching schemes. The results also show that when the block size is large enough the worst case performance of the proposed threaded prefetching scheme is almost as good as that of an instruction cache with 100 % hit ratio. 相似文献

13.

Speculative Versioning Cache

Vijaykumar T.N. Gopal S. Smith J.E. Sohi G. 《Parallel and Distributed Systems, IEEE Transactions on》2001,12(12):1305-1317

Dependences among loads and stores whose addresses are unknown hinder the extraction of instruction level parallelism during the execution of a sequential program. Such ambiguous memory dependences can be overcome by memory dependence speculation which enables a load or store to be speculatively executed before the addresses of all preceding loads and stores are known. Furthermore, multiple speculative stores to a memory location create multiple speculative versions of the location. Program order among the speculative versions must be tracked to maintain sequential semantics. A previously proposed approach, the Address Resolution Buffer (ARB) uses a centralized buffer to support speculative versions. Our proposal, called the Speculative Versioning Cache (SVC), uses distributed caches to eliminate the latency and bandwidth problems of the ARB. The SVC conceptually unifies cache coherence and speculative versioning by using an organization similar to snooping bus-based coherent caches. Our evaluation for the Multiscalar architecture shows that hit latency is an important factor affecting performance and private cache solutions trade-off hit rate for hit latency 相似文献

14.

新型体系结构概念—虚拟寄存器与并行的指令处理部件 总被引：4，自引：1，他引：3

李三立廖恒《小型微型计算机系统》1995,16(6):6-11

随着程序对地址空间的需求日益提高，研究者提出了虚拟存储器概念，使程序访问的地址空间免受物理存储器的限制。随着面向寄存器的ＲＩＳＣ技术发展以及多发射结构中指令调度的日益重要，我们提出了虚拟寄存器的新概念，使寄存器空间不受物理寄存器堆大小的束缚，有利于指令调度和寄存器重新命名技术，提高指令级并行性ＩＬＰ。此外，现代新型ＲＩＳＣ处理机都着重于加强数据处理部件中的执行并行度，忽略了放在存储器中指令的处理。相似文献

15.

ARM指令执行速度影响因素的实验研究

下载免费PDF全文

尹旭峰苑士华胡纪滨《计算机工程》2011,37(12):262-264,267

介绍ARM微处理器S3C2440A的内存管理单元(MMU)和高速缓存,设计一种实验方法来测定在不同CPU时钟频率下禁用或启用高速缓存时,程序指令在SDRAM和SRAM中的平均执行速度,并对数据进行分析和处理。实验结果表明,启用高速缓存对提高指令的平均执行速度具有较大影响。相似文献

16.

基于GTK+的嵌入式图形系统性能优化研究

胡练达张激《计算机工程》2014,(1):287-290,294

现有GTK+on DirectFB图形系统对硬件加速的优化不够,在国产嵌入式平台上开发的图形系统性能偏低。为此,提出图形系统的性能优化方法。优化图形构件的存储分配策略,用于减少CPU访问显存和内存速度的差异。采用绘图指令的底层扩展方法,提高椭圆填充、多边形填充等扩展绘图指令的执行效率。测试数据表明,在开启硬件加速情况下,构件存储分配优化策略可使CPU绘图指令的执行速度提高数倍至数十倍,与传统应用层扩展方法相比,扩展绘图指令的底层扩展方法硬件加速比高出5倍左右。相似文献

17.

Linked instruction caches for enhancing power efficiency of embedded systems

Chang-Jung Ku Ching-Wen Chen An Hsia Chun-Lin Chen 《Microprocessors and Microsystems》2014

The power consumed by memory systems accounts for 45% of the total power consumed by an embedded system, and the power consumed during a memory access is 10 times higher than during a cache access. Thus, increasing the cache hit rate can effectively reduce the power consumption of the memory system and improve system performance. In this study, we increased the cache hit rate and reduced the cache-access power consumption by developing a new cache architecture known as a single linked cache (SLC) that stores frequently executed instructions. SLC has the features of low power consumption and low access delay, similar to a direct mapping cache, and a high cache hit rate similar to a two way-set associative cache by adding a new link field. In addition, we developed another design known as a multiple linked caches (MLC) to further reduce the power consumption during each cache access and avoid unnecessary cache accesses when the requested data is absent from the cache. In MLC, the linked cache is split into several small linked caches that store frequently executed instructions to reduce the power consumption during each access. To avoid unnecessary cache accesses when a requested instruction is not in the linked caches, the addresses of the frequently executed blocks are recorded in the branch target buffer (BTB). By consulting the BTB, a processor can access the memory to obtain the requested instruction directly if the instruction is not in the cache. In the simulation results, our method performed better than selective compression, traditional cache, and filter cache in terms of the cache hit rate, power consumption, and execution time. 相似文献

18.

DSPs实时视频处理中的Cache优化算法研究

唐文佳朱光喜王曜刘瑜《小型微型计算机系统》2005,26(4):680-683

在采用并行超长指令字结构的DSP芯片中，CPU处理速度与片外数据存取速度不匹配的问题，导致了CPU处理延时，限制了DSP系统性能的提升，针对这一问题，根据Cache的结构提出一种适宜于在DSPCPU上进行视频数据处理的数据排列新算法，并且将其成功地应用到基于Trimedia PNXl301的MPEG-4程序优化工作中，系统编码结果表明，该方法有效地减少了Cachemiss及片外数据存取的时间开销，在同等条件下，采用本算法后系统编码性能提高2帧／秒(CIF格式)左右。相似文献

19.

支持多机环境的片上Cache的设计与实现

邹代红高德远张盛兵《计算机工程》2007,33(23):249-251

Cache是高性能微处理器解决CPU和存储器速度差异问题的有效措施之一。在共享存储器的多机环境下，共享数据在多个处理器的片上Cache中分布，Cache间维持数据一致性成为关键。该文讨论了32位嵌入式微处理器“龙腾R2”的Cache的设计和实现和支持多机环境的Cache一致性实现方法，并给出了实现的结果。相似文献

20.

Compressed page walk cache

Dunbo ZHANG Chaoyang JIA Li SHEN 《Frontiers of Computer Science》2022,16(3):163104

GPUs are widely used in modern high-performance computing systems. To reduce the burden of GPU programmers, operating system and GPU hardware provide great supports for shared virtual memory, which enables GPU and CPU to share the same virtual address space. Unfortunately, the current SIMT execution model of GPU brings great challenges for the virtual-physical address translation on the GPU side, mainly due to the huge number of virtual addresses which are generated simultaneously and the bad locality of these virtual addresses. Thus, the excessive TLB accesses increase the miss ratio of TLB. As an attractive solution, Page Walk Cache (PWC) has received wide attention for its capability of reducing the memory accesses caused by TLB misses. However, the current PWC mechanism suffers from heavy redundancies, which significantly limits its efficiency. In this paper, we first investigate the facts leading to this issue by evaluating the performance of PWC with typical GPU benchmarks. We find that the repeated L4 and L3 indices of virtual addresses increase the redundancies in PWC, and the low locality of L2 indices causes the low hit ratio in PWC. Based on these observations, we propose a new PWC structure, namely Compressed Page Walk Cache (CPWC), to resolve the redundancy burden in current PWC. Our CPWC can be organized in either direct-mapped mode or set-associated mode. Experimental results show that CPWC increases by 3 times over TPC in the number of page table entries, increases by 38.3% over PWC in L2 index hit ratio and reduces by 26.9% in the memory accesses of page tables. The average memory accesses caused by each TLB miss is reduced to 1.13. Overall, the average IPC can improve by 25.3%. 相似文献