首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
以瓦片结构众核处理器一致性协议的设计为主线,综述了国内外近年来关于众核处理器cache一致性的相关研究;介绍了不同NUCA结构对一致性协议的影响;分析和对比了几种传统目录一致性协议的特性及其存在的问题;归纳了最新几个面向众核结构一致性协议的设计思想和特性。最后为设计具备应用程序适应性和可扩展性的cache一致性协议指出了几个关键的设计方向。  相似文献   

2.
片上多核处理器(CMP)已经成为处理器发展的方向,处理器设计的重点也转到了互连网络和存储层次结构方面,其中的一个关键问题是如何维护各处理器各级缓存(Cache)的一致性,该问题在传统的共享存储多处理器中使用Cache一致性协议来解决,而CMP相对于传统的多处理器结构具有更高的片上互连带宽和速度,给Cache一致协议提出了新的要求,也提供了新的改进机会.传统的总线侦听协议存在可扩展性不足和不必要的广播、侦听过多的缺点,而目录协议则存在失效间接延时大和复杂度高、验证困难等问题.环形连接的可扩展性好于总线结构,而其实现复杂度也远小于通常目录协议所使用的包交换点到点网络.将基于环的侦听协议应用于CMP;并考虑利用环的顺序性取消原有协议中冲突引起的重发操作,消除可能的饥饿、死锁和活锁等情况,增加协议的稳定性,同时减少消息流量和功耗;利用片上互连延时短的特点,将侦听结果和侦听请求同时传播,使得处理器可以根据侦听结果来对侦听请求进行选择性的侦听操作,可减少不必要的侦听操作,降低功耗.  相似文献   

3.
在全互联的网络结构下,提出了一种基于广播的cache一致性协议的详细设计,使请求传输不再像目录协议中的那样,经过第三方中转,而是直接发给所有节点,由最新拥有副本者给出响应。对协议进行了分析证明,并建立了模型,通过模型检测工具NuSMV验证了协议的正确性。  相似文献   

4.
用于多种计算机系统和指令系统仿真的Virtutech Simics只提供一个简单的顺序扁平侦听式高速缓存一致性(Snoo-ping Cache Coherence Protocol)模型支持MESI协议,从而制约了可仿真的并行处理器个数。以下将基于目录的分布式高速缓存一致性协议(Distributed Directory-based Cache Coherence Protocol)模型应用于Simics中并给出基于Simics的分布式一致性协议的仿真结果。这一结果证实分布式协议能降低事件总数,减少网络中的事件。本文提出一个简单的基于目录的分布式高速缓存一致性协议,从而解决制约Simics的可扩放性问题。  相似文献   

5.
In glueless shared-memory multiprocessors where cache coherence is usually maintained using a directory-based protocol, the fast access to the on-chip components (caches and network router, among others) contrasts with the much slower main memory. Unfortunately, directory-based protocols need to obtain the sharing status of every memory block before coherence actions can be performed. This information has traditionally been stored in main memory, and therefore these cache coherence protocols are far from being optimal. In this work, we propose two alternative designs for the last-level private cache of glueless shared-memory multiprocessors: the lightweight directory and the SGluM cache. Our proposals completely remove directory information from main memory and store it in the home node’s L2 cache, thus reducing both the number of accesses to main memory and the directory memory overhead. The main characteristics of the lightweight directory are its simplicity and the significant improvement in the execution time for most applications. Its drawback, however, is that the performance of some particular applications could be degraded. On the other hand, the SGluM cache offers more modest improvements in execution time for all the applications by adding some extra structures that cope with the cases in which the lightweight directory fails.  相似文献   

6.
龙腾R2微处理器是西北工业大学航空微电子中心设计的采用PowerPC体系结构,具有自主知识产权的R ISC微处理器。为了扩展其多处理器的功能,采用总线侦听的方法来维护多处理器环境下的cache一致性。首先介绍了共享总线侦听技术以及侦听协议,然后详细介绍了龙腾R2微处理器的总线侦听部件的实现方案,对几类cache一致性的实现方案以及性能进行了评析。FPGA实验结果表明,总线侦听部件能高效而准确地保证多处理器系统的cache一致性。  相似文献   

7.
We develop a specification methodology that documents and specifies a cache coherence protocol in eight tables: the states, events, actions, and transitions of the cache and memory controllers. We then use this methodology to specify a detailed, modern three-state broadcast snooping protocol with an unordered data network and an ordered address network that allows arbitrary skew. We also present a detailed specification of a new protocol called multicast snooping (Bilir et al., 1999) and, in doing so, we better illustrate the utility of the table-based specification methodology. Finally, we demonstrate a technique for verification of the multicast snooping protocol, through the sketch of a manual proof that the specification satisfies a sequentially consistent memory model  相似文献   

8.
We propose implementing cache coherence protocols within the network, demonstrating how an in-network implementation of the MSI directory-based protocol allows for in-transit optimizations of read and write delay. Our results show 15% and 24% savings on average in memory access latency for SPLASH-2 parallel benchmarks running on a 4/spl times/4 and a 16/spl times/16 multiprocessor respectively.  相似文献   

9.
Carlton  M. Despain  A. 《Computer》1990,23(6):80-83
A multiple-bus architecture called a multi-multi is presented. The architecture is designed to handle several dimensions with a moderate number of processors per bus. It provides scaling to a large number of processors in a system. A key characteristic of the architecture is the large amount of bandwidth it provides. Each node in the architecture contains a microprocessor, memory, and a cache. The cache-coherence protocol for the multi-multi architecture combines features of snooping cache schemes, to provide consistency on individual buses, with features of directory schemes, to provide consistency between buses. The snooping cache component can take advantage of the low-latency communication possible on shared buses for efficiency, yet the complete protocol will support many more processors than a single bus can. The resulting protocol naturally extends cache coherence from a multi to a multi-multi. Cache and directory states are described. Concepts that allow efficient performance, namely, local sharing, root node, and bus addresses in the directory, are discussed  相似文献   

10.
多核处理器规模的不断扩大和核间通信机制的日益复杂,使得Cache一致性维护变得更加困难。本文从多核处理器Cache一致性问题的产生背景出发,分析监听协议、目录协议、Token协议和Hammer协议的实现机制以及在多核环境中的优缺点,分别从一致性协议与片上互连结构协同设计、面向低功耗应用的协议优化策略、Cache一致性协议验证及容错机制等角度考虑,对未来多核处理器Cache一致性协议设计的发展趋势和技术挑战进行详细分析与讨论。  相似文献   

11.
In symmetric multiprocessors (SMPs), the cache coherence overhead and the speed of the shared buses limit the address/snoop bandwidth needed to broadcast transactions to all processors. As a solution, a scalable address subnetwork called symmetric multiprocessor network (SYMNET) is proposed in which address requests and snoop responses of SMPs are implemented optically. SYMNET not only uses passive optical interconnects that increases the speed of the proposed network, but also pipelines address requests at a much faster rate than electronics. This increases the address bandwidth for snooping, but the preservation of cache coherence can no longer be maintained with the usual snooping protocols. A modified coherence protocol, coherence in SYMNET (COSYM), is introduced to solve the coherence problem. COSYM was evaluated with a subset of Splash-2 benchmarks and compared with the electrical bus-based MOESI protocol. The simulation studies have shown a 5-66 percent improvement in execution time for COSYM as compared to MOESI for various applications. Simulations have also shown that the average latency for a transaction to complete using COSYM protocol was 5-78 percent better than the MOESI protocol. It is also seen that SYMNET can scale up to hundreds of processors while still using fast snooping-based cache coherence protocols, and additional performance gains may be attained with further improvement in optical device technology.  相似文献   

12.
The Multiprocessor Priority Ceiling Protocol (MPCP) is a classic suspension-based real-time locking protocol for partitioned fixed-priority (P-FP) scheduling. However, existing blocking time analysis is pessimistic under the P-FP + MPCP scheduling, which negatively impacts the schedulability for real-time tasks. In this paper, we model each task as an alternating sequence of normal and critical sections, and use both the best-case execution time (BCET) and the worst-case execution time (WCET) to describe the execution requirement for each section. Based on this model, a novel analysis is proposed to bound shared resource requests. This analysis uses BCET to derive the lower bound on the inter-arrival time for shared resource requests, and uses WCET to obtain the upper bound on the execution time of a task on critical sections during an arbitrary time interval of △t. Based on this analysis, improved blocking analysis and its associated worst-case response time (WCRT) analysis are proposed for P-FP + MPCP scheduling. Schedulability experiments indicate that the proposed method outperforms the existing methods and improves the schedulability significantly.  相似文献   

13.
Mobile agents are able to migrate among machines to achieve their tasks. This feature is attractive to design, implement, and maintain distributed systems because we can implement both client-side and server-side programming in one mobile agent. However, it involves the increase of data traffic for mobile agent migrations. In this paper, we propose program code caching to reduce the data traffic caused by mobile agent migrations. A mobile agent consists of many program codes that define a task executed in each machine they migrate; thus, the mobile agent migration involves the transfer of their program codes. Therefore, our method reduces the number of the transfer of program codes by using program code cache. We have implemented our method on a mobile agent framework called Maglog and conducted experiments on a meeting scheduling system.  相似文献   

14.
This work compares commercial fast data transport approaches through 10 Gbit/s WAN (wide area network). Common solutions, such as FTP (file transport protocol) based on TCP/IP stack, are being increasingly replaced by modern protocols based on more efficient stacks. To assess the capabilities of current applications for fast data transport, the following commercial solutions were investigated: Velocity--a data transport application of Bit Speed LLC; TIXstream--a data transport application of Tixel GmbH; FileCatalyst Direct--a data transport application of Unlimi-Tech Software Inc; Catapult Server--a data transport application of XDT PTY LTD; ExpeDat--a commercial data transport solution of Data Expedition, Inc. The goal of this work is to test solutions under equal network conditions and thus compare transmission performance of recent proprietary alternatives for FTP/TCP within 10 Gigabit/s networks where there are high latencies and packet loss in WANs. This research focuses on a comparison of approaches using intuitive parameters such as data rate and duration of transmission. The comparison has revealed that of all investigated solutions TIXstream achieves maximum link utilization in presence of lightweight impairments. The most stable results were achieved using FC Direct. ExpeDat shows the most accurate output.  相似文献   

15.
This paper proposes a novel routing protocol enriched with an assigning mechanism that enables for efficient data flow coordination, among communication nodes with heterogeneous spectrum availability in distributed cognitive radio networks. Efficient routing protocol operation, as a matter of maximum-possible routing paths establishments and minimum delays is obtained, by utilizing a signaling mechanism that was developed based on a simulation scenario. This simulation scenario includes a number of secondary communication nodes, operating over TVWS (television white spaces) under the "spectrum of commons" regulation regime. The validity of the proposed routing protocol for enhanced efficiency in cognitive radio networks is validated, by conducting experimental simulations and obtaining performance evaluation results. Simulation results verified the efficiency of the proposed routing protocol for minimizing routing delays among secondary communication nodes and indentified fields for further research.  相似文献   

16.
In this paper we present a cache coherence protocol formultistage interconnection network (MIN)-based multiprocessors with two distinct private caches:privateblocks caches (PCache) containing blocks private to a process andshared-blocks caches (SCache) containing data accessible by all processes. The architecture is extended by a coherence control bus connecting all shared-block cache controllers. Timing problems due to variable transit delays through the MIN are dealt with by introducingTransient states in the proposed cache coherence protocol. The impact of the coherence protocol on system performance is evaluated through a performance study of three phases. Assuming homogeneity of all nodes, a single-node queuing model (phase 3) is developed to analyze system performance. This model is solved for processor and coherence bus utilizations using the mean value analysis (MVA) technique with shared-blocks steady state probabilities (phase 1) and communication delays (phase 2) as input parameters. The performance of our system is compared to that of a system with an equivalent-sized unified cache and with a multiprocessor implementing a directory-based coherence protocol. System performance measures are verified through simulation.  相似文献   

17.
An innovative dynamically reconfgurable radio-over-fber(RoF)network equipped with an intelligent medium access control(MAC)protocol is proposed to provide broadband access to train passengers in railway high-speed mobile applications.The proposed RoF network architecture is based on a reconfgurable control station and remote access unit(RAU)that is equipped with a fxed flter and tunable flter.The proposed hybrid frequency-division multiplexing/time division multiple access(FDM/TDMA)based MAC protocol realizes failure detection/recovery and dynamic wavelength allocation to remote access units.Simulation result shows that with the proposed MAC protocol,the control station can detect failures and recover and dynamic wavelength allocation can increase the wavelength resource utilization to maintain network performance.  相似文献   

18.
Bus-based multiprocessors constitute a cost-effective class of shared-memory multiprocessors. Private caches are the key to an efficient utilization of the shared bus, and most such systems use a write-invalidate cache-coherence protocol to keep the caches coherent. Two important factors that limit the performance of the system are cache misses that lead to long-latency reads and bus congestion because of read misses and coherence traffic. While hybrid write-invalidate/write-update snooping protocols lead to fewer read misses than write-invalidate protocols, previous studies have shown them to be incapable of providing consistent performance improvements because of heavily increased coherence traffic. In this paper, we analyze how the deficiencies of hybrid snooping protocols can be dramatically reduced by using write caches and read snarfing (also called read-broadcast) under release consistency. Our performance evaluation is based on program-driven simulation and a set of five scientific applications with different sharing behaviors including migratory sharing as well as producer–consumer sharing. We show that one of the evaluated hybrid protocols, extended with write caches as well as read snarfing, manages to reduce the number of coherence misses by between 83 and 93% as compared to a write-invalidate protocol for all five applications in this study. In addition, the number of bus transactions is reduced substantially. However, we also show that read snarfing and hybrid snooping protocols might lead to higher cache occupancy because of increased sharing. Because of the small implementation cost of the hybrid protocol and the two extensions, we believe the combination to be an effective approach to boosting the performance of bus-based multiprocessors.  相似文献   

19.
A Lock-Based Cache Coherence Protocol for Scope Consistency   总被引:5,自引:2,他引:5       下载免费PDF全文
Directory protocols are widely adopted to maintain cache coherence of distributed shared memory multiprocessors.Although scalable to a certain extent,directory protocols are complex enough to prevent it from being used in very large scale multiprocessors with tens of thousands of nodes.his paper proposes a lock-based cache coherence protocol for scope consistency.In does not rely on directory information to maintain cache coherence.Instead,cache coherence is maintained through requiring the releasing processor of a lock to stroe all write-notices generated in the associated critical section to the lock and the acquiring processor invalidates or updates its locally cached data copies according to the write notices of the lock.To evaluate the performance of the lock-based cache coherence protocol,a software SDM system named JIAJIA is built on network of workstations.Besides the lock-based cache coherence protocol,JIAJIA also characterizes itself with its shared memory organization scheme which combines the physical memories of multiple workstations to form a large shared space.Performance measurements with SPLASH2 program suite and NAS benchmarks indicate that,compared to recent SVM systems such as CVM,higher speedup is achieved by JIAJIA.Besides,JIAJIA can solve large scale problems that cannot be solved by other SVM systems due to memory size limitation.  相似文献   

20.
The core of current-generation high-performance multiprocessor systems is out-of-order execution processors with aggressive branch prediction. Despite their relatively high branch prediction accuracy, these processors still execute many memory instructions down mispredicted paths. Previous work that focused on uniprocessors showed that these wrong-path (WP) memory references may pollute the caches and increase the amount of cache and memory traffic. On the positive side, however, they may prefetch data into the caches for memory references on the correct-path. While computer architects have thoroughly studied the impact of WP effects in uniprocessor systems, there is no comparable work for multiprocessor systems. In this paper, we explore the effects of WP memory references on the memory system behavior of shared-memory multiprocessor (SMP) systems for both broadcast and directory-based cache coherence. Our results show that these WP memory references can increase the amount of cache-to-cache transfers by 32%, invalidations by 8% and 20% for broadcast and directory-based SMPs, respectively, and the number of writebacks by up to 67% for both systems. In addition to the extra coherence traffic, WP memory references also increase the number of cache line state transitions by 21% and 32% for broadcast and directory-based SMPs, respectively. In order to reduce the performance impact of these WP memory references, we introduce two simple mechanisms—filtering WP blocks that are not likely-to-be-used and WP aware cache replacement—that yield speedups of up to 37%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号