期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

曹燊李勇坚《计算机系统应用》2015,24(11):173-178

提出了一种通过查找缓存一致性协议不变量来验证带参协议正确性的新方法.缓存一致性协议验证的难点在于必须证明协议对于任意大小的带参系统都成立.我们通过寻找不变量和协议规则之间的对应关系来计算辅助不变量,从而帮助推导验证缓存一致性协议.我们设计实现了一个不变量查找工具并将该工具应用到German协议上计算它们的辅助不变量并成功地验证了协议的安全性质. 相似文献

2.

基于边际增益的二级缓存动态分配策略

姚念民刁莹韩永《计算机工程》2013,(12):27-30

现有的ULC机制可有效减少多级缓存的数据冗余,并解决存储服务器端缓存访问的局部性较弱问题,但在存储服务器连接多个应用服务器的情况下,现有ULC在分配缓存容量时不能使存储服务器端缓存资源的边际收益最大化。为此,提出一种多应用共享缓存的二级缓存动态分配策略MG—ULC。该策略以ULC机制为基础,给出以边际增益为考虑因素的缓存分配的理论依据,并根据各应用的访问模式在二级缓存的边际增益动态分配缓存容量。实验结果表明,随着各应用服务器访问模式的变化,MG—ULC能比ULC更合理地分配二级缓存,从而达到更高的缓存利用率。相似文献

3.

基于Ceph存储系统的小文件存储优化方案

陈法河柴小丽《计算机系统应用》2022,31(2):108-113

针对Ceph存储系统面对小文件存储时存在元数据服务器性能瓶颈、文件读取效率低等问题.本文从小文件之间固有的数据关联性出发,通过轻量级模式匹配算法,提取出关联特征并以此为依据对小文件进行合并,提高了合并文件之间的合理性,并在文件读取时将同一合并文件内的小文件存入客户端缓存来提高缓存读取命中率,经过实验验证本文的方案有效的提高了小文件的访问效率. 相似文献

4.

A Cache coherence protocol for MIN-based multiprocessors

Mazin S. Yousif Chita R. Das Matthew J. Thazhuthaveetil 《The Journal of supercomputing》1994,8(2):163-185

In this paper we present a cache coherence protocol formultistage interconnection network (MIN)-based multiprocessors with two distinct private caches:privateblocks caches (PCache) containing blocks private to a process andshared-blocks caches (SCache) containing data accessible by all processes. The architecture is extended by a coherence control bus connecting all shared-block cache controllers. Timing problems due to variable transit delays through the MIN are dealt with by introducingTransient states in the proposed cache coherence protocol. The impact of the coherence protocol on system performance is evaluated through a performance study of three phases. Assuming homogeneity of all nodes, a single-node queuing model (phase 3) is developed to analyze system performance. This model is solved for processor and coherence bus utilizations using the mean value analysis (MVA) technique with shared-blocks steady state probabilities (phase 1) and communication delays (phase 2) as input parameters. The performance of our system is compared to that of a system with an equivalent-sized unified cache and with a multiprocessor implementing a directory-based coherence protocol. System performance measures are verified through simulation. 相似文献

5.

Godson-T缓存一致性协议的Murphi建模和验证

周琰《计算机系统应用》2013,22(10):124-128

Godson-T缓存一致性协议是用于Godson-T众核处理器的缓存一致性协议．在Godson-T协议中,缓存一致性协议和存储一致性模型存在紧密的紧耦合关系,分析协议的一致性时发现该协议满足的缓存一致性不是强一致性,不满足传统意义上缓存透明的一致性要求．我们选取了Murphi模型检测工具作为我们建模的语言和验证工具．在对Godson-T缓存一致性协议建模的时候,由于协议的上述特点,我们需要对处理器核结点,高速缓存和内存作为一个整体建模,并成功地验证了协议的相关性质．相似文献

6.

一种基于对象存储系统的元数据缓存实现方法 总被引：1，自引：0，他引：1

周功业吴伟杰陈进才《计算机科学》2007,34(10):146-148

对象存储系统中元数据访问速度是影响文件系统性能的关键因素之一。提出了一种在客户端实现元数据缓存的方法,并用元数据操作协议保证缓存一致性,基于Hash的LFU-DA算法提高缓存查找效率。实验表明该方法减少了系统平均服务响应时间,提高了系统的I／O性能。相似文献

7.

Improving Metadata Caching Efficiency for Data Deduplication via In-RAM Metadata Utilization

下载免费PDF全文

Bing Zhou Jiang-Tao Wen 《计算机科学技术学报》2016,31(4):805-819

We describe a data deduplication system for backup storage of PC disk images, named in-RAM metadata utilizing deduplication (IR-MUD). In-RAM hash granularity adaptation and miniLZO based data compression are firstly proposed to reduce the in-RAM metadata size and thereby reduce the space overheads required by the in-RAM metadata caches. Secondly, an in-RAM metadata write cache, as opposed to the traditional metadata read cache, is proposed for further reducing metadata-related disk I/O operations and improving deduplication throughput. During deduplication, the metadata write cache is managed following the LRU caching policy. For each manifest that is hit in the metadata write cache, an expensive manifest reloading operation from the disk is avoided. After deduplication, all the manifests in the metadata write cache are cleared and stored on the disk. Our experimental results using 1.5 TB real-world disk image dataset show that 1) IR-MUD achieved about 95% size reduction for the deduplication metadata, with a small time overhead introduced, 2) when the metadata write cache was not utilized, with the same RAM space size for the metadata read cache, IR-MUD achieved a 400% higher RAM hit ratio and a 50% higher deduplication throughput, as compared with the classic Sparse Indexing deduplication system where no metadata utilization approaches are utilized, and 3) when the metadata write cache was utilized and enough RAM space was available, IR-MUD achieved a 500% higher RAM hit ratio compared with Sparse Indexing and a 70% higher deduplication throughput compared with IR-MUD with only a single metadata read cache. The in-RAM metadata harnessing and metadata write caching approaches of IR-MUD can be applied in most parallel deduplication systems for improving metadata caching efficiency. 相似文献

8.

An SSD-based accelerator for directory parsing in storage systems containing massive files

Zhiguang Chen Nong Xiao Fang Liu 《Peer-to-Peer Networking and Applications》2013,6(4):397-408

Data explosion introduces new challenges to storage systems. In a file system for big data, a large number of directories and files exist, which are usually organized in a large tree. Parsing directories in a large tree is difficult. In this paper, we propose an accelerator, which helps file systems to fetch the metadata of files rapidly. Contributions of this work include two aspects. First, we propose an accelerator for directory parsing. The accelerator is actually an SSD-based (Solid State Drive-based) cache, which keeps the metadata of frequently or recently accessed files and directories. When a file is demanded, the accelerator attempts to obtain its metadata directly from SSD. If the metadata is kept in SSD, the file system can rapidly obtain the metadata. However, if the metadata is not in SSD, the accelerator consumes a long time to access SSD, but to no avail. In order to avoid non-beneficial SSD accesses, the accelerator predicts whether the metadata is kept by SSD before issuing a read request. Only if the metadata has a high probability of being kept in SSD, the accelerator issues a request to the SSD. The second contribution of this paper is a new bloom filter used to predict whether a piece of data is kept in SSD. Bloom filter is a space-efficient data structure supporting membership query. But, the standard bloom filter cannot support element deletion. Whereas, our accelerator is a cache, which evicts items periodically. The standard bloom filter is not suitable for our accelerator. In this work, we designed a new bloom filter with low overhead, which supports element deletion. The new bloom filter perfectly suits the proposed accelerator. With the prediction of our bloom filter, the accelerator can accelerate the process of directory parsing with nearly no negative impact. We evaluated the accelerator by using a prototype. Experimental results demonstrate that, the accelerator can speed up the directory parsing process by nearly four times compared with a file system without an accelerator. 相似文献

9.

面向多核NUCA共享数据竞争问题的Bank一致性技术

下载免费PDF全文

吴俊杰潘晓辉《计算机工程与科学》2009,31(11)

非一致Cache体系结构(NUCA)几乎已经成为未来片上大容量cache的发展方向。多核处理器的NUCA结构中,多个处理器核对共享数据的竞争访问,可能导致数据经常处于中部的cache Bank,增加NUCA的访问延迟。本文提出支持数据副本的Bank一致性技术,通过有选择地在NUCA中为访问的处理器核创建不同的数据副本,Bank一致性技术能够缓解多核处理器对共享数据的竞争问题。本文详细地介绍了Bank一致性协议的设计方法。最后,使用全系统模拟器对8个NPB基准测试程序进行了详细评测。实验结果表明,Bank一致性技术能够有效缓解多核处理器中共享数据的竞争访问问题。相比不支持Bank一致性技术的CMP-DNUCA结构,本文的方法能将系统IPC性能平均提升5.95%。相似文献

10.

Terrain simplification simplified: a general framework for view-dependent out-of-core visualization 总被引：11，自引：0，他引：11

Lindstrom P. Pascucci V. 《IEEE transactions on visualization and computer graphics》2002,8(3):239-254

We describe a general framework for out-of-core rendering and management of massive terrain surfaces. The two key components of this framework are: view-dependent refinement of the terrain mesh and a simple scheme for organizing the terrain data to improve coherence and reduce the number of paging events from external storage to main memory. Similar to several previously proposed methods for view-dependent refinement, we recursively subdivide a triangle mesh defined over regularly gridded data using longest-edge bisection. As part of this single, per-frame refinement pass, we perform triangle stripping, view frustum culling, and smooth blending of geometry using geomorphing. Meanwhile, our refinement framework supports a large class of error metrics, is highly competitive in terms of rendering performance, and is surprisingly simple to implement. Independent of our refinement algorithm, we also describe several data layout techniques for providing coherent access to the terrain data. By reordering the data in a manner that is more consistent with our recursive access pattern, we show that visualization of gigabyte-size data sets can be realized even on low-end, commodity PCs without the need for complicated and explicit data paging techniques. Rather, by virtue of dramatic improvements in multilevel cache coherence, we rely on the built-in paging mechanisms of the operating system to perform this task. The end result is a straightforward, simple-to-implement, pointerless indexing scheme that dramatically improves the data locality and paging performance over conventional matrix-based layouts. 相似文献

11.

Low-level implementation of the SISC protocol for thread-level speculation on a multi-core architecture

《Parallel Computing》2017

Chip Multiprocessors (CMP) have emerged during last decades as a very attractive solution in using the ever-increasing on-chip transistor count. However, classical parallelization techniques failed to fully exploit parallelization from existing sequential applications due to false data dependencies. This paper focuses on the Thread-level Speculation (TLS) technique, an alternative way to exploit the transistor budget in a CMP. With TLS, even possibly data dependent threads can run in parallel as long as the semantics of the sequential execution is preserved. A special hardware support monitors the actual data dependencies between threads at run time and, if they are violated, misspeculation effects are undone usually through replay. This kind of system is known as speculative CMP. However, the TLS mechanism requires complex protocols that integrate cache coherence and speculation to maintain program order among multiple versions of data. Current TLS protocol evaluations are usually inadequate because they are not done low-level enough. A realistic evaluation of speculative CMPs requires either to be performed on a real hardware or very detailed cycle-accurate simulator models.In this paper we are particularly focused on a low-level evaluation of the write-invalidate TLS protocol Speculation Integrated with Snoopy Coherence (SISC) protocol proposed in [1]. This evaluation relies on cycle-level simulation environment with detailed cycle-level cache memories, cache controller and system bus. On top of this, a speculative four core architecture is simulated and three new modules (Scheduler, Squash Arbiter and Supplier Arbiter) are provided to support low-level implementation of the SISC protocol. The overall cost of the SISC protocol is evaluated by means of CACTI tool for the three different domains: the access latency cost, the area cost, and the power cost. The evaluation goal was to keep the cache access time to remain below cycle latency as well as the area and power overheads below an acceptable budget overhead. The SISC protocol has been compared against regular MESI-based architecture in both 32-bit and 64-bit versions. We kept the cache access time below the cycle latency, and we managed to keep both data cache area and static power overheads respectively below 32% and 35%. 相似文献

12.

片上多核处理器Cache一致性协议优化研究综述

胡森森计卫星王一拙陈旭付文飞石峰《软件学报》2017,28(4):1027-1047

现代晶体管技术在单芯片上集成多个处理器已经成为现实.近年来,随着多核处理器集成核数的不断增加,高速缓存的一致性问题凸显出来,已成为多核处理器的性能瓶颈之一,亟待解决.本文介绍了片上多核处理器一致性问题的由来.总结了多核时代高速缓存一致性协议设计的关键问题,综述了近年来学术界对一致性的研究.从程序访存行为模式、目录组织结构、一致性粒度、一致性协议流量、目录协议的可扩展性等方面,阐述了近年来缓存一致性协议性能优化的方向.对目前片上多核处理器缓存一致性协议设计中存在的问题进行了讨论,并指出了未来进一步研究的方向. 相似文献

13.

基于层次结构的元数据动态管理方法的研究

刘群冯丹《计算机研究与发展》2009,46(Z2)

在大规模分布式存储系统中,元数据高性能服务和扩展性已成为一个重要的研究热点.在元数据服务器(metadata server,MDS)中,将元数据分解为目录对象和文件对象.目录对象为定位性元数据,提供文件所在位置和访问控制;文件对象为描述性元数据,描述文件的数据特性.每个MDS负责所有目录对象和自身的文件对象,同时,以目录对象ID和文件名为关键字的Hash值作为局部元数据查找表的索引,通过Bloom Filter算法将每个MDS的局部元数据查找表压缩成一个摘要,这样既可利用MDS中Cache,提高Cache的命中率,减少磁盘I/O次数,动态扩展MDS,又能够实现快速的元数据查找. 相似文献

14.

Cache-Based Synchronization in Shared Memory Multiprocessors

《Journal of Parallel and Distributed Computing》1996,32(1):11-27

In shared memory multiprocessors, efficient synchronization is imperative to assure good performance. There are two aspects to the “cost” of a synchronization operation: the first is the waiting time at synchronization points, and the second is the intrinsic overhead in performing the operation. The overhead has two components. The first component deals with contention resolution for synchronization operation among competing processors. The second component deals with the shared data accesses that the processor has to perform once it enters a synchronization region. We present a mechanism to reduce the overhead of performing synchronization operations in a cache-based shared memory multiprocessor. The mechanism is based on the intuitive notion that parallel programs invariably use synchronization operations to govern the access to shared data. Traditional multiprocessor cache protocols treat synchronization accesses the same way as normal read/write memory accesses, leading to inefficiencies in performing synchronization operations which ultimately limit the scalability of such systems. The key idea in our approach is to combine synchronization with the coherence maintenance for the cached data. Each cache line maintains states for synchronization as well as for cache coherence, and the cache protocol ensures the correctness of the synchronization operations and the coherence of the data at these synchronization points. To assess the performance gain due to the proposed mechanism, simulation studies are performed using a workload model that represents a dynamic scheduling paradigm which forms the core of several parallel programs. Results from simulation studies show that the proposed cache-based synchronization performs significantly better than traditional cache coherence approaches. 相似文献

15.

A fully persistent and consistent read/write cache using flash-based general SSDs for desktop workloads

《Information Systems》2016

The flash-based SSD is used as a tiered cache between RAM and HDD. Conventional schemes do not utilize the nonvolatile feature of SSD and cannot cache write requests. Writes are a significant, or often dominant, fraction of storage workloads. To cache write requests, the SSD cache should persistently and consistently manage its data and metadata, and guarantee no data loss even after a crash. Persistent cache management may require frequent metadata changes and causes high overhead. Some researchers insist that a nonvolatile persistent cache requires new additional primitives that are not supported by general SSDs in the market. We proposed a fully persistent read/write cache, which improves both read and write performance, does not require any special primitive, has a low overhead, guarantees the integrity of the cache metadata and the consistency of the cached data, even during a crash or power failure, and is able to recover the flash cache quickly without any data loss. We implemented the persistent read/write cache as a block device driver in Linux. Our scheme aims at virtual desktop infra servers. So the evaluation was performed with massive, real desktop traces of five users for ten days. The evaluation shows that our scheme outperforms an LRU version of SSD cache by 50% and the read-only version of our scheme by 37%, on average, for all experiments. This paper describes most of the parts of our scheme in detail. Detailed pseudo-codes are included in the Appendix. 相似文献

16.

Formal automatic verification of cache coherence in multiprocessors with relaxed memory models

Fong Pong Dubois M. 《Parallel and Distributed Systems, IEEE Transactions on》2000,11(9):989-1006

State-based, formal methods have been successfully applied to the automatic verification of cache coherence in sequentially consistent systems. However, coherence in shared memory multiprocessors under a relaxed memory model is much more complex to verify automatically. With relaxed memory models, incoming invalidations and outgoing updates can be delayed in each cache while processors are allowed to race ahead. This buffering of memory accesses considerably increases the amount of state in each cache and the complexity of protocol interactions. Moreover, because caches can hold inconsistent copies of the same data for long periods of time, coherence cannot be verified by simply checking that cached copies are identical at all times. This paper makes two major contributions. First, we demonstrate how to model and verify cache coherence under a relaxed memory model in the context of state-based verification methods. Frameworks for modeling the hardware and for generating correct memory access sequences driving the hardware model are developed. We also show correctness properties which must be verified on the hardware model. Second, we demonstrate a successful application of a state-based verification tool called SSM for the verification of the delayed protocol, an aggressive protocol for relaxed memory models. SSM is based on an abstraction technique preserving the properties to verify. We show that with classical, explicit approaches the verification of cache coherence is realistically unfeasible because of the state space explosion problem, whereas SSM is able to verify protocols both at both behavioral and message-passing levels. 相似文献

17.

DP&TB: a coherence filtering protocol for many-core chip multiprocessors

Fengkai Yuan Zhenzhou Ji 《The Journal of supercomputing》2013,66(1):249-261

Future many-core chip multiprocessors (CMPs) will integrate hundreds of processor cores on chip. Two cache coherence protocols are the mainstream applied to current CMPs. The token-based protocol (Token) provides high performance, but it generates a prohibitive amount of network traffic, which translates into excessive power consumption. The directory-based protocol (Directory) reduces network traffic, yet trades off with the storage overhead of the directory as well as entails comparatively low performance caused by indirection limiting its applicability for many-core CMPs. In this work, we present DP&TB, a novel cache coherence protocol particularly suited to future many-core CMPs. In DP&TB, cache coherence is maintained at the granularity of a page, facilitating to filter out either unnecessary coherence inspections for blocks inside private pages or network traffic for blocks inside shared pages. We employ Directory to detect private and shared pages and Token to maintain the coherence of the blocks inside shared pages. DP&TB inherits the merit of Directory and Token and overcome their problems. Experimental results show that DP&TB comprehensively beyond Directory and Token with improvement by 9.1 % in performance over Token and by 13.8 % in network traffic over Directory. In addition, the storage overhead of DP&TB is less than half of that of Directory. Our proposal can fulfill the requirement of many-core CMPs to achieve high performance, power and area efficiency. 相似文献

18.

一种性能优化的小文件存储访问策略的研究 总被引：1，自引：0，他引：1

赵跃龙谢晓玲蔡咏才王国华刘霖《计算机研究与发展》2012,49(7):1579-1586

在分布式文件系统中,小文件的管理一般存在访问性能较差和存储空间浪费较大等缺点.为了解决这些问题,提出了一种性能优化的小文件存储访问(SFSA)策略.SFSA将逻辑上连续的数据尽可能存储在物理磁盘的连续空间,使用Cache充当元数据服务器的角色并通过简化的文件信息节点提高Cache利用率,提高了小文件访问性能;写数据时聚合更新数据及其文件夹域中的相关数据为一次I/O请求写入,减少了文件碎片数量,提高了存储空间利用率;文件传输时利用局部性原理,提前发送批量的高访问率的小文件,降低了建立网络连接开销,提升了文件传输性能.理论分析和实验证明,SFSA的设计思想和方法能有效地优化小文件的存储访问性能. 相似文献

19.

Model Checking Data Consistency for Cache Coherence Protocols 总被引：1，自引：0，他引：1

下载免费PDF全文

Hong Pan Hui-Min Lin and Yi Lv 《计算机科学技术学报》2006,21(5):765-775

A method for automatic verification of cache coherence protocols is presented, in which cache coherence protocols are modeled as concurrent value-passing processes, and control and data consistency requirement are described as formulas in first-orderμ-calculus. A model checker is employed to check if the protocol under investigation satisfies the required properties. Using this method a data consistency error has been revealed in a well-known cache coherence protocol. The error has been corrected, and the revised protocol has been shown free from data consistency error for any data domain size, by appealing to data independence technique. 相似文献

20.

Adaptive and scalable load balancing for metadata server cluster in cloud-scale file systems

Quanqing XU Rajesh Vellore ARUMUGAM Khai Leong YONG Yonggang WEN Yew-Soon ONG Weiya XI 《Frontiers of Computer Science》2015,9(6):904

Big data is an emerging term in the storage industry, and it is data analytics on big storage, i.e., Cloud-scale storage. In Cloud-scale (or EB-scale) file systems, load balancing in request workloads across a metadata server cluster is critical for avoiding performance bottlenecks and improving quality of services.Many good approaches have been proposed for load balancing in distributed file systems. Some of them pay attention to global namespace balancing, making metadata distribution across metadata servers as uniform as possible. However, they do not work well in skew request distributions, which impair load balancing but simultaneously increase the effectiveness of caching and replication. In this paper, we propose Cloud Cache (C²), an adaptive and scalable load balancing scheme for metadata server cluster in EB-scale file systems. It combines adaptive cache diffusion and replication scheme to cope with the request load balancing problem, and it can be integrated into existing distributed metadata management approaches to efficiently improve their load balancing performance. C² runs as follows: 1) to run adaptive cache diffusion first, if a node is overloaded, loadshedding will be used; otherwise, load-stealing will be used; and 2) to run adaptive replication scheme second, if there is a very popular metadata item (or at least two items) causing a node be overloaded, adaptive replication scheme will be used, in which the very popular item is not split into several nodes using adaptive cache diffusion because of its knapsack property. By conducting performance evaluation in trace-driven simulations, experimental results demonstrate the efficiency and scalability of C². 相似文献