期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

分布式计算环境中基于消息传递机制的分布式共享缓冲区中，Ｃａｃｈｅ效率是算法性能的“瓶颈”。本文在分布式共享缓冲区上实现了一个并行体绘制算法。在数据空间，八叉树快速分类改善了Ｃａｃｈｅ的空间相关性；在图象空间。Ｈｉｂｅｒｔ象素遍历方式改善了Ｃａｃｈｅ的时间相关性，在曙光１０００和ＳＧＩ工作站网络上的实验结果都表明，算法的网络数据传送量大大减少，Ｃａｃｈｅ效率明显提高，绘制时间大大缩短。相似文献

7.

简单访问模式下假共享Cache行抖动的消除

金国华陈福接《计算机学报》1994,17(6):435-445

在采用ｌｏｃａｌｃａｃｈｅ，ｗｒｉｔｅ－ｉｎｖａｌ：ｄａｔｅｃａｃｈｅ一致性协议的多级存储并行处理系统中，一个经常出现的现象就是真假共享所引起的Ｃａｃｈｅ行抖动，由于这种数据在不同处理机的Ｃａｃｈｅ间来回移动的现象严重地影响了并行机性能的发挥，它已受到计算机界广泛的关注，如何使这一问题得到简单而有效的解决已成为多级存储并行处理系统研究的一个关键，为了消除真共享引起的抖动现象，我们已经提出了一套相似文献

8.

Cache一致性协议的研究与评价 总被引：3，自引：0，他引：3

孙昱东孙强南《计算机工程与应用》1995,31(5):53-56

Ｃａｃｈｅ一致性是紧耦合多处理机系统设计中的一项重要课题．为提高访存效率，每台处理机通常带有高速缓冲存储器Ｃａｃｈｅ。这便产生了Ｃａｃｈｅ一致性问题，要求共享数据在各Ｃａｃｈｅ间以及Ｃａｃｈｅ与主存间保持一致。为此出现了多种Ｃａｃｈｅ一致性协议。本文分析了几种类型的一致性协议，并对其进行了软件模拟和性能评价．相似文献

9.

廉价冗余磁盘阵列（RAID）Cache浅析 总被引：3，自引：0，他引：3

刘桂兰祝天龙周学仁分慧珍《新电脑》1995,(6)

廉介冗余磁盘陈列技术已掀起研究开发热潮，磁盘Ｃａｃｈｅ技术的研究早在七十年代就已广泛展开，但是关于磁盘阵列Ｃａｃｈｅ技术的专门性研究文献在国内外并不多见。本文论述了磁盘阵列中引入高速缓存Ｃａｃｈｅ的必要性，综述了磁盘阵列Ｃａｃｈｅ技术的国内外技术动态，提出了磁盘阵列Ｃａｃｈｅ研究中的几个关键问题，并阐述了作者的观点。相似文献

10.

VHDL语言及其应用 总被引：1，自引：0，他引：1

周彩宝刘应学《计算机工程》1998,24(10):51-53,79

介绍目前数字电子系统硬件设计的一种标准输入输出工具ＶＨＤＬ语言，并结合小巨型机上Ｄｃａｃｈｅ的设计，用ＶＨＤＬ语言实现ＭＳＣ－Ｄｃａｃｈｅ的控制。相似文献

11.

高性能RISC微处理器硬件仿真器设计 总被引：2，自引：0，他引：2

刘振宇齐家月《计算机研究与发展》2004,41(8):1436-1441

在微处理器设计中，为了系统级软硬件协同仿真，在后端设计前必须采用硬件仿真器对设计进行系统验证．为此，采用FPGA设计32位RISC流水线结构微处理器的硬件仿真器．此设计主要包括以下特点：采用内存管理单元(MMU)可以实现虚拟地址管理；包括片上Cache，其中包括指令Cache(I-Cache)和数据Cache(D-Cache)；采用标准SYSAD接口设计；包括片上乘除处理单元(MDU)；实现精确异常处理．设计采用XILINX公司的xc2v2000实现，其工作频率为30MHz．相似文献

12.

采用动态译码缓存的高速指令集模拟器

下载免费PDF全文

桑胜田王进祥赵新曙《计算机工程》2006,32(18):248-250

指令集模拟器是计算机体系结构研究和SoC软硬件协同设计的重要工具，模拟器的性能和灵活性是影响设计和验证效率的重要因素。解释型指令集模拟器具有很好的灵活性，在操作系统等涉及到自修改代码的模拟中具有不可替代的作用。该文给出了一个高性能解释型指令集模拟器的设计，它具有很高的模拟精度和很好的灵活性；同时指令集模拟器采用了动态译码缓存等优化技术，使其具有很高的模拟性能。以ARM7指令集模拟器为实例，所提出的优化技术同样适用于其它现心RISC体系结构。相似文献

13.

Performance of One''s Complement Caches

Qing Yang Sridar Adina T. Sun 《Journal of Parallel and Distributed Computing》1998,48(2):143

On-chip caches to reduce average memory access latency are commonplace in today's commercial microprocessors. These on-chip caches generally have low associativity and small cache sizes. Cache line conflicts are the main source of cache misses, which are critical for overall system performance. This paper introduces an innovative design for on-chip data caches of microprocessors, called one's complement cache. While binary complement numbers have been successfully used in designing arithmetic units, to the best of our knowledge, no one has ever considered using such complement numbers in cache memory designs. This paper will show that such complement numbers help greatly in reducing cache misses in a data cache, thereby improving data cache performance. By parallel computation of cache addresses and memory addresses, the new design does not increase the critical hit time of cache accesses. Cache misses caused by line interference are reduced by evenly distributing data items referenced by program loops across all sets in a cache. Even distribution of data in the cache is achieved by making the number of sets in the cache a prime or an odd number, so that the chance of related data being mapped to a same set is small. Trace-driven simulations are used to evaluate the performance of the new design. Performance results on benchmarks show that the new design improves cache performance significantly with negligible additional hardware cost. 相似文献

14.

CMFSim:高可配可扩展的缓存微架构功能模拟器

宋双洋赵姗杨秋松《计算机系统应用》2017,26(10):36-43

作为提高CPU读取和存储数据的效率,弥补与主存之间存取速度差距的有效策略,CPU的缓存（Cache）充分利用其对数据使用的局部性原理,对最近或最常使用的数据进行暂存,对CPU的性能起着决定性作用.缓存的微架构正是决定缓存性能的关键性因素.然而,现代先进的CPU缓存都具备极为复杂的结构,存在多种策略、多种硬件算法和多个层级等不同维度的设计,从硬件上直接设计和论证不仅耗时而且成本很高,Cache微架构模拟器正是用软件方法对硬件微架构进行模拟和仿真.设计一款结构优良的缓存,对不同微架构进行评估,是一件具有深远意义的工作.本文从硬件结构出发,设计实现了一款多级、高可配、高可扩展的缓存微架构功能模拟器CMFSim（Cache microarchitecture functional simulator）,实现了常见的缓存策略和硬件算法,可以进行给定配置下的缓存功能的模拟,从而分析配置参数与缓存性能间的关系. 相似文献

15.

The design of the 88000 RISC family

Melear C. 《Micro, IEEE》1989,9(2):26-38

The design and implementation of the RISC (reduced-instruction-set computer) 88000 system in high-speed, complementary metal-oxide semiconductor (HCMOS) technology is described. The total system consists of the 88100 processor and two 88200 cache memory management units (CMMUs). The various features and components of the 88000 are discussed 相似文献

16.

Path and cache conscious prefetching (PCCP)

Zhen He Alonso Marquez 《The VLDB Journal The International Journal on Very Large Data Bases》2007,16(2):235-249

Main memory cache performance continues to play an important role in determining the overall performance of object-oriented, object-relational and XML databases. An effective method of improving main memory cache performance is to prefetch or pre-load pages in advance to their usage, in anticipation of main memory cache misses. In this paper we describe a framework for creating prefetching algorithms with the novel features of path and cache consciousness. Path consciousness refers to the use of short sequences of object references at key points in the reference trace to identify paths of navigation. Cache consciousness refers to the use of historical page access knowledge to guess which pages are likely to be main memory cache resident most of the time and then assumes these pages do not exist in the context of prefetching. We have conducted a number of experiments comparing our approach against four highly competitive prefetching algorithms. The results shows our approach outperforms existing prefetching techniques in some situations while performing worse in others. We provide guidelines as to when our algorithm should be used and when others maybe more desirable. 相似文献

17.

Improving Data Prefetching Efficacy in Multimedia Applications

Cucchiara Rita Prati Andrea Piccardi Massimo 《Multimedia Tools and Applications》2003,20(2):159-178

The workload of multimedia applications has a strong impact on cache memory performance, since the locality of memory references embedded in multimedia programs differs from that of traditional programs. In many cases, standard cache memory organization achieves poorer performance when used for multimedia. A widely-explored approach to improve cache performance is hardware prefetching, which allows the pre-loading of data in the cache before they are referenced. However, existing hardware prefetching approaches are unable to exploit the potential improvement in performance, since they are not tailored to multimedia locality. In this paper we propose novel effective approaches to hardware prefetching to be used in image processing programs for multimedia. Experimental results are reported for a suite of multimedia image processing programs including MPEG-2 decoding and encoding, convolution, thresholding, and edge chain coding. 相似文献

18.

32位嵌入式RISC微处理器的设计 总被引：8，自引：0，他引：8

张盛兵樊晓桠高德远《计算机研究与发展》2000,37(6):758-763

ＮＲＳ４０００微处理器是西北工业大学航空微电子中心设计的３２位嵌入式ＲＩＳＣ微处理器,在指令系统级与Ｉｎｔｅｌ的８０９６０ＫＡ完全兼容,具有自主版权,规模约３０万等效门。在微体系结构上采用了ＲＩＳＣ核心结构,提出了一种基于核心ＲＩＳＣ微操作的设计方案,具有简单,通用,灵活的特征,而且处理器开发更细粒度的并行性提供了可能,结合多执行部件,流水执行和乱序执行等先进技术,使得ＮＲＳ４０００既实现了与８０相似文献

19.

An Application-Driven Study of Multicast Communication for Write Invalidation

Hsiao Hung-Chang King Chung-Ta 《The Journal of supercomputing》2001,18(3):279-304

In distributed shared-memory (DSM) multiprocessors, a write operation requires multiple messages to invalidate the nodes which share and cache the memory block to being written. The consequent write stall time impedes the performance of such systems. An effective means of achieving efficient invalidation is to employ multicast messages to reach the sharing nodes. This study evaluates two multicast-based invalidation schemes, dual-path and pruning, by performing application-driven simulation. The experimental settings used herein find that multicasts improve invalidation traffic for four of the six evaluated real applications. The remaining two applications are computationally intensive, and multicast-based invalidation is less effective. However, since multicasts encourage bursty communication, our results indicate that they help relieve network congestion during these periods. Dual-path performs slightly better than pruning, because it is less sensitive to routing delay in the routers. Our results further demonstrate that cache size is an important design parameter for multicast-based invalidation, and is highly effective for DSM multiprocessors with larger caches. 相似文献