首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 93 毫秒
1.
龙腾R2微处理器是西北工业大学航空微电子中心设计的采用PowerPC体系结构,具有自主知识产权的RISC微处理器。为了扩展其多处理器的功能,采用总线侦听的方法来维护多处理器环境下的cache一致性。首先介绍了共享总线侦听技术以及侦听协议,然后详细介绍了龙腾R2微处理器的总线侦听部件的实现方案,对几类cache一致性的实现方案以及性能进行了评析。FPGA实验结果表明,总线侦听部件能高效而准确地保证多处理器系统的cache一致性。  相似文献   

2.
一种基于共享总线的冗余容错多处理器系统   总被引:1,自引:0,他引:1  
定义了一种完全基于局部处理器的多处理器系统,讨论了系统的实现条件,提出了一种共享总线结构,建立了处理器域之间基于固定地址窗的信息交换机制,实现了无主多处理器系统.从而解决了高端应用中的设备的冗余备份问题,同时提供了灵活而且任意层次的系统级备份模式,具有最高的处理器安装密度以及信息处理能力.  相似文献   

3.
4.
定义了一种完全基于局部处理器的多处理器系统,讨论了系统的实现条件,提出了一种共享总线结构,建立了处理器域之间基于固定地址窗的信息交换机制,实现了无主多处理器系统。从而解决了高端应用中的设备的冗余备份问题,同时提供了灵活而且任意层次的系统级备份模式,具有最高的处理器安装密度以及信息处理能力。  相似文献   

5.
探讨通过优化总线接口部件的设计来提高处理器整体性能,优化的措施着重于降低处理器访存的次数和减小总线负载。仿真和验证结果证明这些方法是可行有效的。  相似文献   

6.
本文介绍了基于P6总线的多处理器系统的总线事务和存储区的Cache属性.讨论了P6忌线的硬件监听机制,Pentium Ⅲ处理器所采用的MESI状态转换.最后研究了多处理器和P6总线如何相互配合以保证整个系统的Cache一致性。  相似文献   

7.
以共享总线的多处理机系统为例,本文介绍了在共享总线系统中用于解决Cache问题的侦听总线一致性协议,并基于总线侦听Cache一致性协议的优点和协议区分状态的原因,给出了一个评价协议好坏的角度:总线的流量和存储器访问的有效时间,最后给出了基于总线侦听Cache一致性协议算法与实现.  相似文献   

8.
SoC技术的发展使多个异构的处理器集成到一个芯片成为可能,这种结构已成为提高微处理器性能的重要途径.与传统的多处理器系统一样,Cache一致性问题也是片内异构多处理器系统必须首先解决的问题.本文在分析Cache一致性问题的基础上,对采用不同监听协议的多处理器的集成,以牺牲简单的硬件为代价来完成一致性协议的转化.将此方法并入多处理器芯片封装内来管理,可保证在异构多处理器系统中数据的一致性.  相似文献   

9.
多处理器共享缓存设计与实现   总被引:1,自引:0,他引:1  
高速缓存作为中央处理器(CPU)与主存之间的小规模快速存储器,解决了两者数据处理速度的平衡和匹配问题,有助于提高系统整体性能.多处理器(SMP)支持共享和私有数据的缓存,Cache一致性协议用于维护由于多个处理器共享数据引发的多处理器数据一致性问题.论述了一个适用于64位多核处理器的共享缓存设计,包括如何实现多处理器缓存一致性及其全定制后端实现.  相似文献   

10.
一种基于总线的多处理器共享内存机制   总被引:3,自引:1,他引:3  
基于总线的分布式多处理器体系结构是目前常见的高性能路由器硬件体系结构,清华大学计算机系统在研制“863”重大项目“高性能安全路由器”的过程中,在基于CompactPCI总线的PowerPC多处理器平台上实现了一种多处理器共享内存机制,该共享内存机制(SM机制)实现了一系列核心对象,包括SM内存,SM信号量,SM消息队列和SM任务控制块等,本文详细介绍了SM机制的设计与实现并给出了性能测试结果。  相似文献   

11.
This paper presents a new cache consistency scheme for hierarchically structured shared-memory multiprocessors. The scheme is simple, fast and efficient, and it does not require a large amount of state information to be maintained. The scheme exploits the broadcast capability of these systems, but limits the extent of the broadcasts by means of a novel filtering mechanism. As a specific example, it is shown how the proposed cache consistency scheme can be implemented on the Hector multiprocessor architecture. Using trace-driven simulations, we demonstrate that the scheme is scalable and performs well for common applications.  相似文献   

12.
13.
公平性是一个关键的优化问题,当系统缺乏公平时,会出现线程饿死和优先级反转等问题.以公平性优化作为研究目标,分析当前共享Cache划分公平性的评价标准,找出了其评价参数和划分策略的不足,提出了一种新的共享Cache划分方案.通过提出一个新的多线程公平性评价指标并改进了已有的公平划分策略,从而提高多线程运行的公平性.实验结果表明,该共享Cache划分方案显著提高了系统公平性,并且系统吞吐量也有提高.  相似文献   

14.
When using a shared memory multiprocessor, the programmer faces the issue of selecting the portable programming model which will provide the best performance. Even if they restricts their choice to the standard programming environments (MPI and OpenMP), they have to select a programming approach among MPI and the variety of OpenMP programming styles. To help the programmer in their decision, we compare MPI with three OpenMP programming styles (loop level, loop level with large parallel sections, SPMD) using a subset of the NAS benchmark (CG, MG, FT, LU), two dataset sizes (A and B), and two shared memory multiprocessors (IBM SP3 NightHawk II, SGI Origin 3800). We have developed the first SPMD OpenMP version of the NAS benchmark and gathered other OpenMP versions from independent sources (PBN, SDSC and RWCP). Experimental results demonstrate that OpenMP provides competitive performance compared with MPI for a large set of experimental conditions. Not surprisingly, the two best OpenMP versions are those requiring the strongest programming effort. MPI still provides the best performance under some conditions. We present breakdowns of the execution times and measurements of hardware performance counters to explain the performance differences. Copyright © 2005 John Wiley & Sons, Ltd.  相似文献   

15.
A survey of cache coherence schemes for multiprocessors   总被引:1,自引:0,他引:1  
Stenstrom  P. 《Computer》1990,23(6):12-24
Schemes for cache coherence that exhibit various degrees of hardware complexity, ranging from protocols that maintain coherence in hardware, to software policies that prevent the existence of copies of shared, writable data, are surveyed. Some examples of the use of shared data are examined. These examples help point out a number of performance issues. Hardware protocols are considered. It is seen that consistency can be maintained efficiently, although in some cases with considerable hardware complexity, especially for multiprocessors with many processors. Software schemes are investigated as an alternative capable of reducing the hardware cost  相似文献   

16.
This paper proposes a novel leakage management technique for applications with producer-consumer sharing patterns. Although previous research has proposed leakage management techniques by turning off inactive cache blocks, these techniques can be further improved by exploiting the various run-time characteristics of target applications in CMPs. By exploiting particular access sequences observed in producer-consumer sharing patterns and the spatial locality of shared buffers, our technique enables a more aggressive turn-off of L2 cache blocks of these buffers. Experimental results using a CMP simulator show that our proposed technique reduces the energy consumption of on-chip L2 caches, a shared bus, and off-chip memory by up to 31.3% over the existing cache leakage power management techniques with no significant performance loss.  相似文献   

17.
We analyze two important problems that arise in shared-memory multiprocessor systems. Thestale data problem involves ensuring that data items in local memory of individual processors are current, independent of writes done by other processors.False sharing occurs when two processors have copies of the same shared data block but update different portions of the block. The false sharing problem involves guaranteeing that subsequent writes are properly combined. In modern architectures these problems are usually solved in hardware, by exploiting mechanisms for hardware controlled cache consistency. This leads to more expensive and nonscalable designs. Therefore, we are concentrating on software methods for ensuring cache consistency that would allow for affordable and scalable multiprocessing systems. Unfortunately, providing software control is nontrivial, both for the compiler writer and for the application programmer. For this reason we are developing a debugging environment that will facilitate the development of compiler-based techniques and will help the programmer to tune his or her application using explicit cache management mechanisms. We extend the notion of a race condition for IBM Shared Memory System POWER/4, taking into consideration its noncoherent caches, and propose techniques for detection of false sharing problems. Identification of the stale data problem is discussed as well, and solutions are suggested.  相似文献   

18.
Advancement in semiconductor technology is allowing to pack more and more processing cores on a single die and scalable directory based protocols are needed for maintaining cache coherence. Most of the currently available directory based protocols are designed for mesh based topology and have the problem of delay and scalability. Cluster based coherence protocol is a better option than flat directory based protocol but the problem of mesh based topology is still exits. On the other hand, tree based topology takes fewer hop counts compared to mesh based topology. In this paper we give a hierarchical cache coherence protocol based on tree based topology. We divide the processing cores into clusters and each cluster shares a higher-level cache. At the next level we form clusters of caches connected to yet another higher-level cache. This is continued up to the top level cache/memory. We give various architectural placements that can benefit from the protocol; hop-count comparison; and memory overhead requirements. Finally, we formally verify the protocol using the Mur? tool.  相似文献   

19.
为了解决总线网络中的分布式数据一致性问题,提出了基于总线网络的分布式一致性算法.该算法通过Mod运算将多节点仲裁问题转化为惟一节点仲裁,减少了所需的消息数,降低了系统的负荷;通过消息复用的方法减少了算法所需的消息种类,并缩短了响应延迟.理论性能分析和仿真试验表明,该算法较之传统算法具有较低的消息复杂度和时间复杂度.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号