首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 140 毫秒
1.
龙腾R2微处理器是西北工业大学航空微电子中心设计的采用PowerPC体系结构,具有自主知识产权的RISC微处理器。为了扩展其多处理器的功能,采用总线侦听的方法来维护多处理器环境下的cache一致性。首先介绍了共享总线侦听技术以及侦听协议,然后详细介绍了龙腾R2微处理器的总线侦听部件的实现方案,对几类cache一致性的实现方案以及性能进行了评析。FPGA实验结果表明,总线侦听部件能高效而准确地保证多处理器系统的cache一致性。  相似文献   

2.
片上多核处理器(CMP)已经成为处理器发展的方向,处理器设计的重点也转到了互连网络和存储层次结构方面,其中的一个关键问题是如何维护各处理器各级缓存(Cache)的一致性,该问题在传统的共享存储多处理器中使用Cache一致性协议来解决,而CMP相对于传统的多处理器结构具有更高的片上互连带宽和速度,给Cache一致协议提出了新的要求,也提供了新的改进机会.传统的总线侦听协议存在可扩展性不足和不必要的广播、侦听过多的缺点,而目录协议则存在失效间接延时大和复杂度高、验证困难等问题.环形连接的可扩展性好于总线结构,而其实现复杂度也远小于通常目录协议所使用的包交换点到点网络.将基于环的侦听协议应用于CMP;并考虑利用环的顺序性取消原有协议中冲突引起的重发操作,消除可能的饥饿、死锁和活锁等情况,增加协议的稳定性,同时减少消息流量和功耗;利用片上互连延时短的特点,将侦听结果和侦听请求同时传播,使得处理器可以根据侦听结果来对侦听请求进行选择性的侦听操作,可减少不必要的侦听操作,降低功耗.  相似文献   

3.
实时嵌入式多处理器操作系统实现框架   总被引:1,自引:1,他引:1  
描述了针对实时嵌入式应用的多处理器操作系统的实现框架,简单介绍了常用的多处理器系统所采用的架构,提出了另外一种多处理器系统的实现架构,并讨论了与之相关的通信、全局资源管理、调度、cache一致性和异构性的问题。  相似文献   

4.
以共享总线的多处理机系统为例,本文介绍了在共享总线系统中用于解决Cache问题的侦听总线一致性协议,并基于总线侦听Cache一致性协议的优点和协议区分状态的原因,给出了一个评价协议好坏的角度:总线的流量和存储器访问的有效时间,最后给出了基于总线侦听Cache一致性协议算法与实现.  相似文献   

5.
在基于MIPS R10000处理器构建采用簇总线的多处理器系统时,发现R10000用户手册给出的外部冲突解决方案只适用于采用专用EA的单或多处理器系统.鉴于此,介绍了R10000处理器的系统配置和系统接口的一致性,分析了R10000用户手册所给出的外部冲突解决方案的局限性,并基于该外部冲突解决方案,对采用簇总线的多处理器系统中的外部冲突进行了研究,给出了簇协调器可以采用的一个外部冲突解决方案.  相似文献   

6.
本文介绍了基于P6总线的多处理器系统的总线事务和存储区的Cache属性.讨论了P6忌线的硬件监听机制,Pentium Ⅲ处理器所采用的MESI状态转换.最后研究了多处理器和P6总线如何相互配合以保证整个系统的Cache一致性。  相似文献   

7.
合理地组织一个多级的高速缓冲存储器(Cache)是一种有效的减少存储器访问延迟的方法。论文提出了一种设计32位超标量微处理器Cache单元的结构,讨论了一级Cache、二级Cache设计中的关键技术,介绍了Cache一致性协议的实现,满足了“龙腾”R2微处理器芯片的设计要求。整个芯片采用0.18umCMOS工艺实现,芯片面积在4.1mm×4.1mm之内,微处理器核心频率超过233MHz,功耗小于1.5W。  相似文献   

8.
张瀛  黄巍  马群生  李三立 《计算机学报》1998,21(Z1):230-236
并行处理技术是现代计算机领域提供超级计算的关键技术,在设计紧耦合并行处理系统时围绕着处理器和存储器为中心产生了多种不同的并行处理体系结构.本文介绍了以i80860XPCPU为基础的MP860并行系统:一种异构型层次式的多处理器系统.该系统设计中具有32个i80860XP处理器,其峰值速度可达32亿次/秒.本文具体讨论了多处理器总线问题、存储器Cache一致性问题,以及MP860并行系统提供的并行编程环境.  相似文献   

9.
在分析现有体系结构级低功耗cache设计方案的基础上,提出了一种混合cache低功耗设计策略,通过在常规混合cache结构上增加一标志域来区分cache某组中的指令和数据,限制了处理器每次访问的路数,从而达到低功耗的效果。详细阐明了该方法的原理和硬件实现,并将其应用到自主研发的龙腾C2微处理器上。实验结果表明,该方法不损耗cache性能,面积牺牲仅1.45%,总功耗降低了23.1%。  相似文献   

10.
在分析现有体系结构级低功耗cache设计方案的基础上,提出了一种混合cache低功耗设计策略,通过在常规混合cache结构上增加一标志域来区分cache某组中的指令和数据,限制了处理器每次访问的路数,从而达到低功耗的效果。详细阐明了该方法的原理和硬件实现,并将其应用到自主研发的龙腾C2微处理器上。实验结果表明,该方法不损耗cache性能,面积牺牲仅1.45%,总功耗降低了23.1%。  相似文献   

11.
Carlton  M. Despain  A. 《Computer》1990,23(6):80-83
A multiple-bus architecture called a multi-multi is presented. The architecture is designed to handle several dimensions with a moderate number of processors per bus. It provides scaling to a large number of processors in a system. A key characteristic of the architecture is the large amount of bandwidth it provides. Each node in the architecture contains a microprocessor, memory, and a cache. The cache-coherence protocol for the multi-multi architecture combines features of snooping cache schemes, to provide consistency on individual buses, with features of directory schemes, to provide consistency between buses. The snooping cache component can take advantage of the low-latency communication possible on shared buses for efficiency, yet the complete protocol will support many more processors than a single bus can. The resulting protocol naturally extends cache coherence from a multi to a multi-multi. Cache and directory states are described. Concepts that allow efficient performance, namely, local sharing, root node, and bus addresses in the directory, are discussed  相似文献   

12.
In multiprocessor system-on-a-chips (MPSoCs) that use snoop-based cache coherency protocols, a miss in the data cache triggers the broadcast of coherency request to all the remote caches, to keep all data coherent. However, the majority of these requests are unnecessary because remote caches do not have the matching blocks and so their tag lookups fail. Both the coherency requests and the tag lookups corresponding to a remote miss consume unnecessary energy.We propose an architecture-level technique for snoop energy reduction, called broadcast filtering, which prevents unnecessary coherency requests from being broadcast to remote caches, and thus reduces snoop energy consumption by both the cache and bus. Broadcast filtering is implemented using a snooping cache and a split bus. The snooping cache checks if a block that cannot be obtained locally exists in remote caches before broadcasting a coherency request. If no remote cache has the matching block, there is no broadcast; and if broadcasting is necessary, the split bus allows coherency requests to be broadcast selectively to the remote caches which have matching blocks.Experimental results show a reduction by 90% of cache lookups, by 60% of bus usage, and by 40% of snoop energy consumption, at a small cost in reduced performance. An analysis result based on the energy model shows the broadcast filtering technique can reduce by up to 55% of energy consumption per cache coherency operation.  相似文献   

13.
In symmetric multiprocessors (SMPs), the cache coherence overhead and the speed of the shared buses limit the address/snoop bandwidth needed to broadcast transactions to all processors. As a solution, a scalable address subnetwork called symmetric multiprocessor network (SYMNET) is proposed in which address requests and snoop responses of SMPs are implemented optically. SYMNET not only uses passive optical interconnects that increases the speed of the proposed network, but also pipelines address requests at a much faster rate than electronics. This increases the address bandwidth for snooping, but the preservation of cache coherence can no longer be maintained with the usual snooping protocols. A modified coherence protocol, coherence in SYMNET (COSYM), is introduced to solve the coherence problem. COSYM was evaluated with a subset of Splash-2 benchmarks and compared with the electrical bus-based MOESI protocol. The simulation studies have shown a 5-66 percent improvement in execution time for COSYM as compared to MOESI for various applications. Simulations have also shown that the average latency for a transaction to complete using COSYM protocol was 5-78 percent better than the MOESI protocol. It is also seen that SYMNET can scale up to hundreds of processors while still using fast snooping-based cache coherence protocols, and additional performance gains may be attained with further improvement in optical device technology.  相似文献   

14.
"龙腾R2"微处理器精确中断优化实现   总被引:1,自引:0,他引:1  
介绍了"龙腾R2"微处理器精确中断的实现方法,详细讨论了备份缓冲区精确中断优化方法和中断指令缓冲区中断响应机制.在"龙腾R2"微处理器上的实验结果表明,采用备份缓冲区和中断指令缓冲区的精确中断方法在不影响微处理器速度的情况下,中断响应速度是原来的3.5倍,中断返回速度是原来的2.6倍.  相似文献   

15.
For pt.1 see ibid., February (1990). The memory subsystem, the external bus, chip and board testing, and design-verification methods for the 68040, a third-generation, full-32-bit microprocessor in the Motorola 68000 family, are discussed. The internal caches and memory management are examined at length. The external bus protocol, arbitration, snooping, and timing specifications are addressed. The MOVE16 instruction, which moves a cache line from one address (which may reside in the data cache) to another address outside the cache is described. User testing, based on dedicated test logic that is fully compliant with the IEEE 1149.1 standard, and factory testing, for which the processor employs structured design techniques for random logic and special test modes for embedded arrays, are examined. The use of top-down design and a hierarchical method of design verification is discussed  相似文献   

16.
"龙腾R2"是西北工业大学自主设计的与PowerPC指令集兼容的32位嵌入式微处理器,为了提高"龙腾R2"SOC系统中人机交互能力,提出了利用"龙腾R2"中的键盘控制器在PS2键盘通信协议下与键盘进行通信的方法;首先分析了PS2键盘通信协议,然后给出了键盘驱动程序的结构以及在VxWorks的BSP(板级支持包)中如何对键盘驱动程序加载的原理,然后将VxWorks移植到龙腾R2微处理器原型验证平台中;经过长时间的应用和测试表明,在移植后的FPGA验证平台上,VxWorks操作系统可以稳定运行。  相似文献   

17.
一种多处理器总线接口部件的验证环境的搭建   总被引:1,自引:1,他引:0  
设计和验证周期的不断紧缩,给芯片验证工作者带来了很大的挑战;为了提高验证效率,对芯片的验证方法和验证环境的搭建进行了深入地研究;以"龙腾R2"微处理器总线接口部件为例,详细阐述一种面向对象的功能覆盖率反馈以及自检查验证环境的搭建流程;实验表明,改进后的验证环境在验证效率以及功能点覆盖面方面都明显优于改进前的验证环境。  相似文献   

18.
Alti Vec技术是Motorola为了在其PowerPC架构的通用处理器上实现多媒体处理功能而采用的短向量技术,Longtium R微处理器是西北工业大学航空微电子中心自主研发的高性能32位PowerPC架构微处理器;提出了一种利用Tomasulo算法实现支持Alti Vec技术的短向量双发射调度机制,研究了该短向量的发射策略,重命名寄存器和保留站的设计等,并进行了仿真;结果显示,该双发射短向量单元的IPC平均可达1.2,提高了指令的并行执行效率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号