共查询到18条相似文献,搜索用时 140 毫秒
1.
2.
片上多核处理器(CMP)已经成为处理器发展的方向,处理器设计的重点也转到了互连网络和存储层次结构方面,其中的一个关键问题是如何维护各处理器各级缓存(Cache)的一致性,该问题在传统的共享存储多处理器中使用Cache一致性协议来解决,而CMP相对于传统的多处理器结构具有更高的片上互连带宽和速度,给Cache一致协议提出了新的要求,也提供了新的改进机会.传统的总线侦听协议存在可扩展性不足和不必要的广播、侦听过多的缺点,而目录协议则存在失效间接延时大和复杂度高、验证困难等问题.环形连接的可扩展性好于总线结构,而其实现复杂度也远小于通常目录协议所使用的包交换点到点网络.将基于环的侦听协议应用于CMP;并考虑利用环的顺序性取消原有协议中冲突引起的重发操作,消除可能的饥饿、死锁和活锁等情况,增加协议的稳定性,同时减少消息流量和功耗;利用片上互连延时短的特点,将侦听结果和侦听请求同时传播,使得处理器可以根据侦听结果来对侦听请求进行选择性的侦听操作,可减少不必要的侦听操作,降低功耗. 相似文献
3.
4.
5.
在基于MIPS R10000处理器构建采用簇总线的多处理器系统时,发现R10000用户手册给出的外部冲突解决方案只适用于采用专用EA的单或多处理器系统.鉴于此,介绍了R10000处理器的系统配置和系统接口的一致性,分析了R10000用户手册所给出的外部冲突解决方案的局限性,并基于该外部冲突解决方案,对采用簇总线的多处理器系统中的外部冲突进行了研究,给出了簇协调器可以采用的一个外部冲突解决方案. 相似文献
6.
本文介绍了基于P6总线的多处理器系统的总线事务和存储区的Cache属性.讨论了P6忌线的硬件监听机制,Pentium Ⅲ处理器所采用的MESI状态转换.最后研究了多处理器和P6总线如何相互配合以保证整个系统的Cache一致性。 相似文献
7.
合理地组织一个多级的高速缓冲存储器(Cache)是一种有效的减少存储器访问延迟的方法。论文提出了一种设计32位超标量微处理器Cache单元的结构,讨论了一级Cache、二级Cache设计中的关键技术,介绍了Cache一致性协议的实现,满足了“龙腾”R2微处理器芯片的设计要求。整个芯片采用0.18umCMOS工艺实现,芯片面积在4.1mm×4.1mm之内,微处理器核心频率超过233MHz,功耗小于1.5W。 相似文献
8.
9.
10.
11.
A multiple-bus architecture called a multi-multi is presented. The architecture is designed to handle several dimensions with a moderate number of processors per bus. It provides scaling to a large number of processors in a system. A key characteristic of the architecture is the large amount of bandwidth it provides. Each node in the architecture contains a microprocessor, memory, and a cache. The cache-coherence protocol for the multi-multi architecture combines features of snooping cache schemes, to provide consistency on individual buses, with features of directory schemes, to provide consistency between buses. The snooping cache component can take advantage of the low-latency communication possible on shared buses for efficiency, yet the complete protocol will support many more processors than a single bus can. The resulting protocol naturally extends cache coherence from a multi to a multi-multi. Cache and directory states are described. Concepts that allow efficient performance, namely, local sharing, root node, and bus addresses in the directory, are discussed 相似文献
12.
In multiprocessor system-on-a-chips (MPSoCs) that use snoop-based cache coherency protocols, a miss in the data cache triggers the broadcast of coherency request to all the remote caches, to keep all data coherent. However, the majority of these requests are unnecessary because remote caches do not have the matching blocks and so their tag lookups fail. Both the coherency requests and the tag lookups corresponding to a remote miss consume unnecessary energy.We propose an architecture-level technique for snoop energy reduction, called broadcast filtering, which prevents unnecessary coherency requests from being broadcast to remote caches, and thus reduces snoop energy consumption by both the cache and bus. Broadcast filtering is implemented using a snooping cache and a split bus. The snooping cache checks if a block that cannot be obtained locally exists in remote caches before broadcasting a coherency request. If no remote cache has the matching block, there is no broadcast; and if broadcasting is necessary, the split bus allows coherency requests to be broadcast selectively to the remote caches which have matching blocks.Experimental results show a reduction by 90% of cache lookups, by 60% of bus usage, and by 40% of snoop energy consumption, at a small cost in reduced performance. An analysis result based on the energy model shows the broadcast filtering technique can reduce by up to 55% of energy consumption per cache coherency operation. 相似文献
13.
In symmetric multiprocessors (SMPs), the cache coherence overhead and the speed of the shared buses limit the address/snoop bandwidth needed to broadcast transactions to all processors. As a solution, a scalable address subnetwork called symmetric multiprocessor network (SYMNET) is proposed in which address requests and snoop responses of SMPs are implemented optically. SYMNET not only uses passive optical interconnects that increases the speed of the proposed network, but also pipelines address requests at a much faster rate than electronics. This increases the address bandwidth for snooping, but the preservation of cache coherence can no longer be maintained with the usual snooping protocols. A modified coherence protocol, coherence in SYMNET (COSYM), is introduced to solve the coherence problem. COSYM was evaluated with a subset of Splash-2 benchmarks and compared with the electrical bus-based MOESI protocol. The simulation studies have shown a 5-66 percent improvement in execution time for COSYM as compared to MOESI for various applications. Simulations have also shown that the average latency for a transaction to complete using COSYM protocol was 5-78 percent better than the MOESI protocol. It is also seen that SYMNET can scale up to hundreds of processors while still using fast snooping-based cache coherence protocols, and additional performance gains may be attained with further improvement in optical device technology. 相似文献
14.
15.
Edenfield R.W. Gallup M.G. Ledbetter W.B. Jr. McGarity R.C. Quintana E.E. Reininger R.A. 《Micro, IEEE》1990,10(3):22-35
For pt.1 see ibid., February (1990). The memory subsystem, the external bus, chip and board testing, and design-verification methods for the 68040, a third-generation, full-32-bit microprocessor in the Motorola 68000 family, are discussed. The internal caches and memory management are examined at length. The external bus protocol, arbitration, snooping, and timing specifications are addressed. The MOVE16 instruction, which moves a cache line from one address (which may reside in the data cache) to another address outside the cache is described. User testing, based on dedicated test logic that is fully compliant with the IEEE 1149.1 standard, and factory testing, for which the processor employs structured design techniques for random logic and special test modes for embedded arrays, are examined. The use of top-down design and a hierarchical method of design verification is discussed 相似文献
16.
"龙腾R2"是西北工业大学自主设计的与PowerPC指令集兼容的32位嵌入式微处理器,为了提高"龙腾R2"SOC系统中人机交互能力,提出了利用"龙腾R2"中的键盘控制器在PS2键盘通信协议下与键盘进行通信的方法;首先分析了PS2键盘通信协议,然后给出了键盘驱动程序的结构以及在VxWorks的BSP(板级支持包)中如何对键盘驱动程序加载的原理,然后将VxWorks移植到龙腾R2微处理器原型验证平台中;经过长时间的应用和测试表明,在移植后的FPGA验证平台上,VxWorks操作系统可以稳定运行。 相似文献
17.
一种多处理器总线接口部件的验证环境的搭建 总被引:1,自引:1,他引:0
设计和验证周期的不断紧缩,给芯片验证工作者带来了很大的挑战;为了提高验证效率,对芯片的验证方法和验证环境的搭建进行了深入地研究;以"龙腾R2"微处理器总线接口部件为例,详细阐述一种面向对象的功能覆盖率反馈以及自检查验证环境的搭建流程;实验表明,改进后的验证环境在验证效率以及功能点覆盖面方面都明显优于改进前的验证环境。 相似文献
18.
Alti Vec技术是Motorola为了在其PowerPC架构的通用处理器上实现多媒体处理功能而采用的短向量技术,Longtium R微处理器是西北工业大学航空微电子中心自主研发的高性能32位PowerPC架构微处理器;提出了一种利用Tomasulo算法实现支持Alti Vec技术的短向量双发射调度机制,研究了该短向量的发射策略,重命名寄存器和保留站的设计等,并进行了仿真;结果显示,该双发射短向量单元的IPC平均可达1.2,提高了指令的并行执行效率。 相似文献