期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

李均晓张盛兵沈绪榜《计算机应用研究》2008,25(6):1890-1893

龙腾R2微处理器是西北工业大学航空微电子中心设计的采用PowerPC体系结构,具有自主知识产权的R ISC微处理器。为了扩展其多处理器的功能,采用总线侦听的方法来维护多处理器环境下的cache一致性。首先介绍了共享总线侦听技术以及侦听协议,然后详细介绍了龙腾R2微处理器的总线侦听部件的实现方案,对几类cache一致性的实现方案以及性能进行了评析。FPGA实验结果表明,总线侦听部件能高效而准确地保证多处理器系统的cache一致性。相似文献

2.

Efficient Integration of Compiler-Directed Cache Coherence and Data Prefetching

Hock-Beng Lim Pen-Chung Yew 《Journal of Parallel and Distributed Computing》2001,61(12):1775

Cache coherence enforcement and memory latency reduction and hiding are very important and challenging problems in the design of large-scale distributed shared-memory (DSM) multiprocessors. We propose an integrated approach to solve these problems through a compiler-directed cache coherence scheme called the Cache Coherence with Data Prefetching (CCDP) scheme. The CCDP scheme enforces cache coherence by prefetching the potentially stale references in a parallel program. It also prefetches the non-stale references to hide their memory latencies. To optimize the performance of the CCDP scheme, some prefetch hardware support is provided to efficiently handle these two forms of data prefetching operations. We also developed the compiler techniques utilized by the CCDP scheme for stale reference detection, prefetch target analysis, and prefetch scheduling. We evaluated the performance of the CCDP scheme via execution-driven simulations of several numerical applications from the SPEC CFP95 and the Perfect benchmark suites. The simulation results show that the CCDP scheme provides significant performance improvements for the applications studied, comparable to that obtained with a full-map hardware cache coherence scheme. 相似文献

3.

Maintaining Cache Coherence through Compiler-Directed Data Prefetching

Hock-Beng Lim Pen-Chung Yew 《Journal of Parallel and Distributed Computing》1998,53(2):170

In this paper, we propose a compiler-directed cache coherence scheme which makes use of data prefetching to enforce cache coherence in large-scale distributed shared-memory (DSM) systems. TheCache Coherence With Data Prefetching(CCDP) scheme uses compiler analyses to identify potentially stale and nonstale data references in a parallel program and enforces cache coherence by prefetching the potentially stale references. In this manner, the CCDP scheme brings up-to-date data into the caches to avoid stale references and also hides the latency of these memory accesses. Furthermore, the scheme also prefetches the nonstale references to hide their memory latencies. To evaluate the performance impact of the CCDP scheme on a real system, we applied the scheme on five applications from the SPEC CFP95 and CFP92 benchmark suites, and executed the resulting codes on the Cray T3D. The experimental results indicate that for all of the applications studied, our scheme provides significant performance improvements by caching shared data and using data prefetching to enforce cache coherence and to hide memory latency. 相似文献

4.

基于语义缓存的粒度自适应一致性维护策略

周红静杨金民《计算机系统应用》2012,21(7):164-167

对象/关系映射常使用缓存来提升处理性能,缓存中数据与服务器数据的一致性维护是影响系统可靠性的关键问题。提出由中间层发起一致性维护的策略。该策略结合数据存储粒度、数据更新频率及更新数据量大小等因素,分别采用TTL及按需请求方式来进行一致性维护,保持缓存数据与服务器数据一致。实验结果表明,该策略能有效减少网络数据传输的开销,降低网络负载,并保证数据的有效性。相似文献

5.

Solutions and debugging for data consistency in multiprocessors with noncoherent caches

David Bernstein Mauricio Breternitz Jr. Ahmed M. Gheith Bilha Mendelson 《International journal of parallel programming》1995,23(1):83-103

We analyze two important problems that arise in shared-memory multiprocessor systems. Thestale data problem involves ensuring that data items in local memory of individual processors are current, independent of writes done by other processors.False sharing occurs when two processors have copies of the same shared data block but update different portions of the block. The false sharing problem involves guaranteeing that subsequent writes are properly combined. In modern architectures these problems are usually solved in hardware, by exploiting mechanisms for hardware controlled cache consistency. This leads to more expensive and nonscalable designs. Therefore, we are concentrating on software methods for ensuring cache consistency that would allow for affordable and scalable multiprocessing systems. Unfortunately, providing software control is nontrivial, both for the compiler writer and for the application programmer. For this reason we are developing a debugging environment that will facilitate the development of compiler-based techniques and will help the programmer to tune his or her application using explicit cache management mechanisms. We extend the notion of a race condition for IBM Shared Memory System POWER/4, taking into consideration its noncoherent caches, and propose techniques for detection of false sharing problems. Identification of the stale data problem is discussed as well, and solutions are suggested. 相似文献

6.

模块化本体缓存方案

蒋宗华《数字社区&智能家居》2010,(11)

该文提出了一种模块化本体缓存方案,使用该方案能够缩短本体推理的平均时间。给出了缓存的对象和缓存一致性检查方法,通过中断方式和增量模式传递知识更新信息,能够减少通信量和提高更新的效率。相似文献

7.

Techniques for Improving Performance of Hybrid Snooping Cache Protocols

Fredrik Dahlgren 《Journal of Parallel and Distributed Computing》1999,59(3):329

Bus-based multiprocessors constitute a cost-effective class of shared-memory multiprocessors. Private caches are the key to an efficient utilization of the shared bus, and most such systems use a write-invalidate cache-coherence protocol to keep the caches coherent. Two important factors that limit the performance of the system are cache misses that lead to long-latency reads and bus congestion because of read misses and coherence traffic. While hybrid write-invalidate/write-update snooping protocols lead to fewer read misses than write-invalidate protocols, previous studies have shown them to be incapable of providing consistent performance improvements because of heavily increased coherence traffic. In this paper, we analyze how the deficiencies of hybrid snooping protocols can be dramatically reduced by using write caches and read snarfing (also called read-broadcast) under release consistency. Our performance evaluation is based on program-driven simulation and a set of five scientific applications with different sharing behaviors including migratory sharing as well as producer–consumer sharing. We show that one of the evaluated hybrid protocols, extended with write caches as well as read snarfing, manages to reduce the number of coherence misses by between 83 and 93% as compared to a write-invalidate protocol for all five applications in this study. In addition, the number of bus transactions is reduced substantially. However, we also show that read snarfing and hybrid snooping protocols might lead to higher cache occupancy because of increased sharing. Because of the small implementation cost of the hybrid protocol and the two extensions, we believe the combination to be an effective approach to boosting the performance of bus-based multiprocessors. 相似文献

8.

Adaptive per-user per-object cache consistency management for mobile data access in wireless mesh networks 总被引：1，自引：0，他引：1

Yinan Li^{Author Vitae} Ing-Ray Chen Author Vitae 《Journal of Parallel and Distributed Computing》2011,71(7):1034-1046

We propose and analyze an adaptive per-user per-object cache consistency management (APPCCM) scheme for mobile data access in wireless mesh networks. APPCCM supports strong data consistency semantics through integrated cache consistency and mobility management. The objective of APPCCM is to minimize the overall network cost incurred due to data query/update processing, cache consistency management, and mobility management. In APPCCM, data objects can be adaptively cached at the mesh clients directly or at mesh routers dynamically selected by APPCCM. APPCCM is adaptive, per-user and per-object as the decision regarding where to cache a data object accessed by a mesh client is made dynamically, depending on the mesh client’s mobility and data query/update characteristics, and the network’s conditions. We develop analytical models for evaluating the performance of APPCCM and devise a computational procedure for dynamically calculating the overall network cost incurred. We demonstrate via both model-based analysis and simulation validation that APPCCM outperforms non-adaptive cache consistency management schemes that always cache data objects at the mesh client, or at the mesh client’s current serving mesh router for mobile data access in wireless mesh networks. 相似文献

9.

A proxy-based integrated cache consistency and mobility management scheme for client–server applications in Mobile IP systems

Weiping He Ing-Ray Chen 《Journal of Parallel and Distributed Computing》2009

In this paper, we investigate a proxy-based integrated cache consistency and mobility management scheme for supporting client–server applications in Mobile IP systems with the objective to minimize the overall network traffic generated. Our cache consistency management scheme is based on a stateful strategy by which cache invalidation messages are asynchronously sent by the server to a mobile host (MH) whenever data objects cached at the MH have been updated. We use a per-user proxy to buffer invalidation messages to allow the MH to disconnect arbitrarily and to reduce the number of uplink requests when the MH is reconnected. Moreover, the user proxy takes the responsibility of mobility management to further reduce the network traffic. We investigate a design by which the MH’s proxy serves as a gateway foreign agent (GFA) as in the MIP Regional Registration protocol to keep track of the address of the MH in a region, with the proxy migrating with the MH when the MH crosses a regional area. We identify the optimal regional area size under which the overall network traffic cost, due to cache consistency management, mobility management, and query requests/replies, is minimized. The integrated cache consistency and mobility management scheme is demonstrated to outperform MIPv6, no-proxy and/or no-cache schemes, as well as a decoupled scheme that optimally but separately manages mobility and service activities in Mobile IPv6 environments. 相似文献

10.

存储模型仿真器的设计与实现 总被引：2，自引：1，他引：1

吴俊敏杨超陈国良张淼辉门珂《计算机研究与发展》2005,42(3):394-403

存储一致性问题和高速缓存一致性问题是共享存储并行计算机中两个最关键的问题,通过仿真器对它们进行了量化研究,设计并实现了一个存储模型仿真器MMS．基于MMS仿真了不同并行机结构模型下多种存储一致性模型的行为;针对不同类型的计算问题比较了不同的存储一致性模型,并对实验结果进行了分析;实现了几个不同的高速缓存一致性协议,并比较了它们的性能．相似文献

11.

基于Oracle DCN的缓存一致性技术

下载免费PDF全文

张璞《计算机工程》2008,34(22):46-48

针对J2EE多层架构应用中的缓存一致性问题,提出一个利用Oracle数据库更改通知(DCN)机制维护缓存一致性的方案,以分销资源计划系统为例,在中间层通过应用Java缓存系统来存储系统中频繁存取的数据库结果集,在数据层利用DCN机制维护缓存一致性,论述了相关实现技术。应用结果表明,该方案是有效和可行的。相似文献

12.

基于GPRS网络的强一致性两级缓存系统框架

谭劲朱光喜《计算机科学》2005,32(8):40-43

无线移动环境中缓存的主要目的是减少对无线带宽资源的占用和节省电池能量,然而移动无线终端的漫游与经常断开连接又给缓存内容的一致性带来了一系列新的问题。本文针对目前运行的GPRS网络,提出了在用户端（移动终端）和GPRS骨干网中添加验证服务器VS（Validation Server）对数据进行两级缓存的系统框架和缓存强一致性策略。该框架简化了无线移动环境下维护缓存一致性的复杂性,有效地降低了对无线带宽的占用和数据库服务器的负载,支持移动终端断开连接的时间任意和在一个公众陆地移动通信网PLMN网内的漫游,具有很强的实用性。相似文献

13.

在移动计算环境中基于移动代理的缓存失效方案 总被引：2，自引：2，他引：2

吴劲卢显良任立勇《计算机科学》2003,30(4):82-84

1 引言缓存技术是分布式计算环境中的重要技术,它可以改善系统的整体性能(如查询响应时间、吞吐量等),而移动计算的网络环境是一种特殊的分布式环境,与传统的分布式系统相比,它具有鲜明的特点:移动性、断接性、带宽多样性、可伸缩性、弱可靠性、网络通信的非对称性、电源能力局限性等等。这些特点使得缓存技术在移动计算环境中尤为重要。因为缓存能有效减少带宽需求,并能节省移动计算机的能耗。相似文献

14.

硬件结构支持的基于同步的高速缓存一致性协议

黄河刘磊宋风龙马啸宇《计算机学报》2009,32(8)

共享存储系统中如何高效地实现高速缓存一致性是体系结构设计面临的一个关键问题和难点问题.已有的基于目录的协议存在难于实现、验证复杂和存储空间开销大等问题.面向片上众核处理器,文中提出一种由硬件结构支持、基于同步的高速缓存一致性协议.该方案不使用目录,而是通过使用bloom-filter表示一致性信息,并在并行程序中的同步点维护高速缓存一致性.与现有的基于目录的高速缓存一致性协议相比,该方案可以降低目录协议的实现、验证复杂度.用SPLASH一2测试程序集评估表明,基于同步的协议可以获得与基于目录的协议相当的性能. 相似文献

15.

基于事实所有权的RPKI缓存更新冲突检测机制

肖文龙马迪毛伟邵晴《计算机系统应用》2022,31(2):366-375

随着RPKI覆盖的域间网络的范围不断扩大,RPKI在实际部署中的数据同步一致性的问题,运维失误和权威机构权力滥用的风险已成为影响RPKI全面部署的主要障碍.本文提出了一种基于事实所有权的RPKI缓存更新冲突检测机制.该机制利用反向RTR协议与RPKI数据层级分发架构进行事实路由起源信息的采集与同步,并通过比较事实路由起源信息与RPKI缓存更新数据检测出冲突的RPKI缓存更新数据,保护了RPKI缓存的真实有效.最后,本文就该机制的数据同步时间效率和检测性能同其他方案进行了对比,实验结果表明本方案有一定的检出优势. 相似文献

16.

Translation-lookaside buffer consistency

Teller P.J. 《Computer》1990,23(6):26-36

Nine solutions to the cache consistency problem for shared-memory multiprocessors with multiple translation-lookaside buffers (TLBs) are described. A TLB's function is defined, and it is shown how TLB inconsistency arises in uniprocessor and multiprocessor architectures. The problem of TLB consistency is solved in a uniprocessor and in multiprocessors with a shared bus, virtual-address caches, and hardware cache consistency. Solutions that can be implemented in multiprocessors with more general interconnection networks and without hardware cache consistency are presented 相似文献

17.

Two proposals for the inclusion of directory information in the last-level private caches of glueless shared-memory multiprocessors

Alberto Ros Ricardo Fernández-Pascual Manuel E. Acacio José M. García 《Journal of Parallel and Distributed Computing》2008

In glueless shared-memory multiprocessors where cache coherence is usually maintained using a directory-based protocol, the fast access to the on-chip components (caches and network router, among others) contrasts with the much slower main memory. Unfortunately, directory-based protocols need to obtain the sharing status of every memory block before coherence actions can be performed. This information has traditionally been stored in main memory, and therefore these cache coherence protocols are far from being optimal. In this work, we propose two alternative designs for the last-level private cache of glueless shared-memory multiprocessors: the lightweight directory and the SGluM cache. Our proposals completely remove directory information from main memory and store it in the home node’s L2 cache, thus reducing both the number of accesses to main memory and the directory memory overhead. The main characteristics of the lightweight directory are its simplicity and the significant improvement in the execution time for most applications. Its drawback, however, is that the performance of some particular applications could be degraded. On the other hand, the SGluM cache offers more modest improvements in execution time for all the applications by adding some extra structures that cope with the cases in which the lightweight directory fails. 相似文献

18.

核分组的多核处理器优化方法

李国红汪东升刘振宇李崇民刘根贤郭三川《计算机科学与探索》2014,(4):385-396

随着多核处理器规模的扩大,请求数据的处理器核到数据的宿主节点之间的平均距离相应增大,并且数据访问在分布式共享高速缓存块中的分布并不均衡引起了网络热点。这些情况导致一级高速缓存缺失延迟的增大。为了解决该问题,将每四个处理器核分为一组,在组内设计邻近数据探测器。邻近数据探测器通过确定一次缺失能否在邻近核的一级高速缓存中得到数据,从而利用了并行程序在多核处理器上执行时数据访问的核间局部性。另外,根据新的结构相应优化了高速缓存一致性协议。实验表明,该片上存储优化方法提高了系统性能,减少了片上网络流量,节省了能耗。相似文献

19.

Runtime home mapping for effective memory resource usage

Mario Lodde José FlichAuthor Vitae 《Microprocessors and Microsystems》2014

In tiled Chip Multiprocessors (CMPs) last-level cache (LLC) banks are usually shared but distributed among the tiles. A static mapping of cache blocks to the LLC banks leads to poor efficiency since a block may be mapped away from the tiles actually accessing it. Dynamic policies either rely on the static mapping of blocks to a set of banks (D-NUCA) or rely on the OS to dynamically load pages to statically mapped addresses (first-touch). 相似文献

20.

Web强缓存一致性的研究

项武《数字社区&智能家居》2010,(8)

Web对象缓存技术是一种减少web服务器访问通信量和访问延迟的重要手段。Web缓存的引入虽然大大减轻了服务器负载,降低了网络拥塞,减少了客户端访问的延迟等优点,但同时也带来缓存的一致性问题,这样使客户端获得web的数据可能不是最新的版本。该文通过分析现有的缓存一致性方针,提出了一个应适于web的强缓存一致性算法。相似文献