期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

杨朝辉王立松《计算机科学与探索》2011,38(10)

随着主存速度和现代处理器速度之间的差距逐渐扩大,系统对主存的存取访问成为新的瓶颈,Cache行为对主存数据库系统更加重要.索引技术是主存数据库系统设计的关键部分.在CST-树的基础上应用预取技术提高查找操作的性能,提出了一种Cache优化的索引结构预取T-树(pT-tree).pT-树使用预取技术有效地创建比正常数据传输单元更大的索引结点,从而降低了CST-树的高度,减少了从父亲结点遍历至孩子结点时的Cache缺失.实验结果表明,pT树与B+-树、T-树、CST-树、CSB+-树相比查找性能有所提高. 相似文献

2.

CSA-Tree:一种改进的高维主存索引树

梁俊杰冯玉才《计算机学报》2007,30(3):415-423

主存技术的不断进步,使得主存多媒体数据库的实现成为可能.研究表明,主存多媒体数据库系统性能深受处理器缓存未命中的影响,缓存感知型主存索引是提高数据检索效率的有效手段.针对SA-Tree不适用于主存存取的缺点,提出它的变体CSA-Tree.CSA-Tree利用PCA降维技术,将树的各层节点采用不同的维度表示,这样不仅提高了缓存空间的利用率,还降低了CPU负载,从而提高了索引查询效率.大量实验证明,CSA-Tree在主存环境中具有良好的高维数据检索性能. 相似文献

3.

基于主存的优化高维索引树

冯玉才梁俊杰曹忠升《计算机研究与发展》2006,43(Z3)

主存多媒体数据库系统性能深受处理器缓存未命中的影响,缓存感知型主存索引是提高数据检索效率的有效手段.针对SA-Tree不适用于主存存取的缺点,提出它的变体CSA-Tree.CSA-Tree利用PCA降维技术,将树的各层节点采用不同的维度来表示,这样不仅提高缓存空间的利用率,还降低了CPU负载,从而提高了索引查询效率.大量实验证明,CSA-Tree在主存环境中具有良好的高维数据检索性能. 相似文献

4.

DCST:主存空间高效的缓存敏感型T-树索引研究

《计算机科学与探索》2017,(2):221-230

已有主存索引通过指针消除和预取机制提升索引结构的缓存感知能力,减少缓存失效次数,但是并没有有效地利用现代计算机的CPU性能和内存空间。为了进一步提升索引结构对内存空间以及CPU性能的利用率,提出了DCST-树索引结构。该索引结构采用数据压缩的方式,对结点中的关键字进行压缩,提高索引结构对内存空间和缓存空间的利用率,减少内存访问次数,提高缓存命中率。同时,对结点进行分区,增加结点容量,提高结点扇出度,降低树的高度。实验结果表明,所提方案比现有主存索引机制具有更加高效的空间利用率和缓存感知能力,同时具有更加优秀的查询处理能力。相似文献

5.

基于Realms的主存R树索引的实现 总被引：1，自引：0，他引：1

李萍《计算机应用》2003,23(5):94-97

为了充分发挥主存数据库技术的优越性，提高系统性能，需要使用空间索引，并将索引也放在主存中。R树类是目前空间数据索引的研究热点，具有动态性及构造和维护的简单性，在基本R树索引的基础上便于作各种算法改进，文中开发的基于Realms的空间分析数据库管理系统SADBS中实现了主存R树索引的创建及插入、删除、更新、查询等操作。相似文献

6.

主存空间对象的索引方法 总被引：11，自引：0，他引：11

刘东李琦《环境遥感》1996,11(4):302-308

空间索引关系空间数据库和地理信息系统的整体性能。目前，随着计算机主存价格的迅速下降，发展主存空间数据库已经成为可能，主存空间数据库需要相适应的空间索引。本文设计两种主存的空间索引－主存网格索引和主存Ｆ－树索引，并对两者的性能进行比较。在多数应用环境下，Ｆ－树空间索引性能更优。相似文献

7.

基于简单常见模式编码(S-FPC)的压缩Cache层次设计

下载免费PDF全文

田新华张民选《计算机工程与科学》2008,30(1):113-118

本文基于简单常见模式压缩编码设计了一种新颖的片内压缩Cache层次结构。在该结构中,L1数据Cache和L2Cache都以压缩格式保存数据,但具有不同的布局。其中,L1数据Cache的布局能触发部分Cache行预取,同时又能避免普通预取技术可能导致的Cache污染增加以及带宽浪费的现象,而且没有预取缓冲开销。实验结果表明,与传统Cache结构相比,本文的设计方案可以显著增加L1数据Cache和L2Cache的有效容量,并且不会增加L1数据Cache的访存延迟,对L1数据Cache平均能增加33％的有效容量,减少L1数据Cachhe失效率达21％,程序执行速度提高了13％。相似文献

8.

您知道Cache吗?

沈小青《数字社区&智能家居》1997,(5)

一、引言随着计算机应用领域的不断扩大,应用程序对计算机处理速度的要求也越来越高,CPU主频已由原来的几十兆赫发展到二百多兆赫。但是,由于目前主存RAM相对低速,致使CPU并未达到其应有的运行速度,从而降低了整机的性能,而高速缓冲存储(Cache)技术的面世较大地提高了整机的速度和性能。二、Cache存储器功能简介 Cache介于CPU与主存之间,采用与CPU相同类型的半导体集成电路,其传输速度明显快于RAM,并且能被CPU直接访问。Cache不仅与CPU 相似文献

9.

微机工作站中的超高速缓存Cache

尤念祖《微型机与应用》1986,(3)

超高速缓存Cache已被大型机和超级小型机普遍使用。它的主要作用是弥补主存速度的不足,使高速的处理机和大容量主存间存在的瓶颈现象得到缓和。随着微处理机速度的提高,功能的增强,微机工作站的性能已接近超级小型机,在微机工作站中设置Cache则成为一个自然的发展趋势。本文介绍一个MC68010微机工作站中的Cache设计,并对使用Cache后带来的性能改进作出分析。为帮助理解,文章首先对Cache设计涉及的一些基本原则作一简单回顾。相似文献

10.

如何选配高速缓存?

蒋勇《电脑爱好者》1996,(1)

由于高速的CPU与相对低速的主存之间速度不匹配,使CPU访问主存时需插入1～n个等待时钟周期,为解决这一矛盾,现在流行的主板采用多级存储体系,留有Cache插座,以供选配。究竟哪种情况下应加或不加Cache呢? 相似文献

11.

IP交换机中嵌入式内存数据库HSQL的索引优化

黄明飞徐学洲《计算机工程与应用》2002,38(23):194-196,203

该文分析总结了当前在内存数据库中提高树索引的缓存敏感性的主要技术,设计并实现了一个具有缓存敏感性的AVL树,即CC-AVL树。CC-AVL树合理利用缓存行的大小,并构造父子节点在内存中连续存储的结构,从而使一个缓存行中包含正要访问的节点和其左孩子节点,同时使用CPU支持的数据预取技术获得右孩子节点。从而使CC-AVL树比AVL树具有更高的缓存敏感性。CC-AVL树被用于一个运行在IP交换机中的嵌入式内存数据库HSQL中。相似文献

12.

基于深度优先序列模式挖掘的预取模型

下载免费PDF全文

卫琳石磊《计算机工程与应用》2007,43(20):169-172

序列模式挖掘能够发现隐含在Web日志中的用户的访问规律,可以被用来在Web预取模型中预测即将访问的Web对象。目前大多数序列模式挖掘是基于Apriori的宽度优先算法。提出了基于位图深度优先挖掘算法,采用基于字典树数据结构的深度优先策略,同时采用位图保存和计算各序列的支持度,能够较迅速地挖掘出频繁序列。将该序列模式挖掘算法应用于Web预取模型中,在预取缓存一体化的条件下实验表明具有较好的性能。相似文献

13.

The Performance Optimization of Threaded Prefetching for Linked Data Structures

Yan Huang Jie Tang Zhi-min Gu Min Cai Jianxun Zhang Ninghan Zheng 《International journal of parallel programming》2012,40(2):141-163

Helper threaded prefetching based on Chip Multiprocessor is a well known approach to reducing memory latency and has been explored in linked data structures accesses. However, conventional helper threaded prefetching often suffers from useless prefetches and cache thrashing, which affect its effectiveness. In this paper, we first analyzed the shortcomings of conventional helper threaded prefetching for linked data structures. Then we proposed an improved helper threaded prefetching, Skip Helper Threaded Prefetching, for hotspots with two level data traversals. Our solution is to profile the applications and balance delinquent loads between main thread and prefetching thread based on the characteristic of operations in their hotspots. Evaluations show that the proposed solution improves average performance by 8.9% (-O2) and 8.5% (-O3) over the conventional helper threaded prefetching that greedily prefetches all delinquent loads. We also compare our proposal with the active threaded prefetching which synchronizes with main thread by semaphore, and find that our proposal provides better performance for the targeted applications. 相似文献

14.

基于申威GCC编译器的间接预取算法

余龙龙韩林《计算机系统应用》2022,31(8):203-211

对间接存储器的访问延迟往往会影响应用程序的执行性能, 一种有效的解决方案是使用预取技术. 国产申威平台中支持常规访问模式的软件预取和硬件预取机制, 但是其GCC编译器中缺少为间接存储器访问模式自动插入预取的方法. 为了解决这个问题, 基于申威GCC开发了一个完整间接预取优化遍, 它利用深度优先搜索算法查找引用循环归纳变量的间接内存引用并为之生成合适的软件预取. 在一组内存受限的基准测试中, 自动预取遍对SW1621处理器的平均加速比达到1.16倍. 相似文献

15.

Pre-execution data prefetching with I/O scheduling

Yue Zhao Kenji Yoshigoe Mengjun Xie 《The Journal of supercomputing》2014,68(2):733-752

Parallel applications suffer from I/O latency. Pre-execution I/O prefetching is effective in hiding I/O latency, in which a pre-execution prefetching thread is created and dedicated to fetch the data for the main thread in advance. However, existing pre-execution prefetching works do not pay attention to the relationship between the main thread and the pre-execution prefetching thread. They just simply pre-execute the I/O accesses using the prefetching thread as soon as possible failing to carefully coordinate them with the operations of the main thread. This drawback induces a series of adverse effects on pre-execution prefetching such as diminishing the degree of the parallelism between computation and I/O, delaying the I/O access of main threads, and aggravating the I/O resource competition in the whole system. In this paper, we propose a new method to overcome this drawback by scheduling the I/O operations among the main threads and the pre-execution prefetching threads. The results of extensive experiments on four popular benchmarks in parallel I/O performance area demonstrate the benefits of the proposed approach. 相似文献

16.

Correlation prefetching with a user-level memory thread

Solihin Y. Lee J. Torrellas J. 《Parallel and Distributed Systems, IEEE Transactions on》2003,14(6):563-580

This paper proposes using a user-level memory thread (ULMT) for correlation prefetching. In this approach, a user thread runs on a general-purpose processor in main memory, either in the memory controller chip or in a DRAM chip. The thread performs correlation prefetching in software, sending the prefetched data into the L2 cache of the main processor. This approach requires minimal hardware beyond the memory processor: The correlation table is a software data structure that resides in main memory, while the main processor only needs a few modifications to its L2 cache so that it can accept incoming prefetches. In addition, the approach has wide applicability, as it can effectively prefetch even for irregular applications. Finally, it is very flexible, as the prefetching algorithm can be customized by the user on an application basis. Our simulation results show that, through a new design of the correlation table and prefetching algorithm, our scheme delivers good results. Specifically, nine mostly-irregular applications show an average speedup of 1.32. Furthermore, our scheme works well in combination with a conventional processor-side sequential prefetcher, in which case the average speedup increases to 1.46. Finally, by exploiting the customization of the prefetching algorithm, we increase the average speedup to 1.53. 相似文献

17.

Write-Optimized B+ Tree Index Technology for Persistent Memory

下载免费PDF全文

Rui-Xiang Ma Fei Wu Bu-Rong Dong Meng Zhang Wei-Jun Li Chang-Sheng Xie 《计算机科学技术学报》2021,36(5):1037-1050

Due to its low latency,byte-addressable,non-volatile,and high density,persistent memory (PM) is expected to be used to design a high-performance storage system.However,PM also has disadvantages such as limited endurance,thereby proposing challenges to traditional index technologies such as B+ tree.B+ tree is originally designed for dynamic random access memory (DRAM)-based or disk-based systems and has a large write amplification problem.The high write amplification is detrimental to a PM-based system.This paper proposes WO-tree,a write-optimized B+ tree for PM.WO-tree adopts an unordered write mechanism for the leaf nodes,and the unordered write mechanism can reduce a large number of write operations caused by maintaining the entry order in the leaf nodes.When the leaf node is split,WO-tree performs the cache line flushing operation after all write operations are completed,which can reduce frequent data flushing operations.WO-tree adopts a partial logging mechanism and it only writes the log for the leaf node.The inner node recognizes the data inconsistency by the read operation and the data can be recovered using the leaf node information,thereby significantly reducing the logging overhead.Furthermore,WO-tree adopts a lock-free search for inner nodes,which reduces the locking overhead for concurrency operation.We evaluate WO-tree using the Yahoo!Cloud Serving Benchmark(YCSB) workloads.Compared with traditional B+ tree,wB-tree,and Fast-Fair,the number of cache line flushes caused by WO-tree insertion operations is reduced by 84.7％,22.2％,and 30.8％,respectively,and the execution time is reduced by 84.3％,27.3％,and 44.7％,respectively. 相似文献

18.

Scheduling algorithm based on prefetching in MapReduce clusters

《Applied Soft Computing》2016

Due to cluster resource competition and task scheduling policy, some map tasks are assigned to nodes without input data, which causes significant data access delay. Data locality is becoming one of the most critical factors to affect performance of MapReduce clusters. As machines in MapReduce clusters have large memory capacities, which are often underutilized, in-memory prefetching input data is an effective way to improve data locality. However, it is still posing serious challenges to cluster designers on what and when to prefetch. To effectively use prefetching, we have built HPSO (High Performance Scheduling Optimizer), a prefetching service based task scheduler to improve data locality for MapReduce jobs. The basic idea is to predict the most appropriate nodes for future map tasks based on current pending tasks and then preload the needed data to memory without any delaying on launching new tasks. To this end, we have implemented HPSO in Hadoop-1.1.2. The experiment results have shown that the method can reduce the map tasks causing remote data delay, and improves the performance of Hadoop clusters. 相似文献

19.

Prefetching J<Superscript>+</Superscript>-Tree: A Cache-Optimized Main Memory Database Index Structure

下载免费PDF全文

Hua Luan 《计算机科学技术学报》2009,24(4):687-707

As the speed gap between main memory and modern processors continues to widen, the cache behavior becomes more important for main memory database systems (MMDBs). Indexing technique is a key component of MMDBs. Unfortunately, the predominant indexes — B⁺-trees and T-trees — have been shown to utilize cache poorly, which triggers the development of many cache-conscious indexes, such as CSB⁺-trees and pB⁺-trees. Most of these cache-conscious indexes are variants of conventional B⁺-trees, and have better cache performance than B⁺-trees. In this paper, we develop a novel J ⁺ -tree index, inspired by the Judy structure which is an associative array data structure, and propose a more cache-optimized index — Prefetching J ⁺ -tree (pJ⁺-tree), which applies prefetching to J⁺-tree to accelerate range scan operations. The J⁺-tree stores all the keys in its leaf nodes and keeps the reference values of leaf nodes in a Judy structure, which makes J⁺-tree not only hold the advantages of Judy (such as fast single value search) but also outperform it in other aspects. For example, J⁺-trees can achieve better performance on range queries than Judy. The pJ⁺-tree index exploits prefetching techniques to further improve the cache behavior of J⁺-trees and yields a speedup of 2.0 on range scans. Compared with B⁺-trees, CSB⁺-trees, pB⁺-trees and T-trees, our extensive experimental study shows that pJ⁺-trees can provide better performance on both time (search, scan, update) and space aspects. 相似文献

20.

A Tree Projection Algorithm for Generation of Frequent Item Sets

《Journal of Parallel and Distributed Computing》2001,61(3):350-371

In this paper we propose algorithms for generation of frequent item sets by successive construction of the nodes of a lexicographic tree of item sets. We discuss different strategies in generation and traversal of the lexicographic tree such as breadth-first search, depth-first search, or a combination of the two. These techniques provide different trade-offs in terms of the I/O, memory, and computational time requirements. We use the hierarchical structure of the lexicographic tree to successively project transactions at each node of the lexicographic tree and use matrix counting on this reduced set of transactions for finding frequent item sets. We tested our algorithm on both real and synthetic data. We provide an implementation of the tree projection method which is up to one order of magnitude faster than other recent techniques in the literature. The algorithm has a well-structured data access pattern which provides data locality and reuse of data for multiple levels of the cache. We also discuss methods for parallelization of the TreeProjection algorithm. 相似文献