共查询到20条相似文献,搜索用时 15 毫秒
1.
Bin LIN Shanshan LI Xiangke LIAO Jing ZHANG Xiaodong LIU 《Frontiers of Computer Science in China》2014,(2):175-183
Deduplication technology has been increasingly used to reduce storage costs. Though it has been successfully applied to backup and archival systems, existing techniques can hardly be deployed in primary storage systems due to the associated latency cost of detecting duplicated data, where every unit has to be checked against a substantially large fin- gerprint index before it is written. In this paper we introduce Leach, for inline primary storage, a self-learning in-memory fingerprints cache to reduce the writing cost in deduplica- tion system. Leach is motivated by the characteristics of real- world I/O workloads: highly data skew exist in the access patterns of duplicated data. Leach adopts a splay tree to or- ganize the on-disk fingerprint index, automatically learns the access patterns and maintains hot working sets in cache mem- ory, with a goal to service a majority of duplicated data de- tection. Leveraging the working set property, Leach provides optimization to reduce the cost of splay operations on the fin- gerprint index and cache updates. In comprehensive experi- ments on several real-world datasets, Leach outperforms con- ventional LRU (least recently used) cache policy by reducing the number of cache misses, and significantly improves write performance without great impact to cache hits. 相似文献
2.
Bin LIN Shanshan LI Xiangke LIAO Jing ZHANG Xiaodong LIU 《Frontiers of Computer Science》2014,8(2):175-183
Deduplication technology has been increasingly used to reduce storage costs. Though it has been successfully applied to backup and archival systems, existing techniques can hardly be deployed in primary storage systems due to the associated latency cost of detecting duplicated data, where every unit has to be checked against a substantially large fingerprint index before it is written. In this paper we introduce Leach, for inline primary storage, a self-learning in-memory fingerprints cache to reduce the writing cost in deduplication system. Leach is motivated by the characteristics of real-world I/O workloads: highly data skew exist in the access patterns of duplicated data. Leach adopts a splay tree to organize the on-disk fingerprint index, automatically learns the access patterns and maintains hot working sets in cachememory, with a goal to service a majority of duplicated data detection. Leveraging the working set property, Leach provides optimization to reduce the cost of splay operations on the fingerprint index and cache updates. In comprehensive experiments on several real-world datasets, Leach outperforms conventional LRU (least recently used) cache policy by reducing the number of cache misses, and significantly improves write performance without great impact to cache hits. 相似文献
3.
旅行商问题中巡回路径的数据结构对局部启发式算法的效率起着非常关键的作用。巡回路径的数据结构必须能够查询一条回路中每个城市的相对顺序,并且能够将一条回路中的部分城市逆序。分析了数组表示法、伸展树表示法和两级树表示法表示巡回路径时各种基本操作的实现过程及时间复杂度。数组表示法能够在常数时间内确定一条回路中每个城市的相对顺序,但是最坏情况下完成逆序操作需要Ω(n)时间,不适用于大规模的旅行商问题。伸展树表示法执行查询和更新操作的平摊时间复杂度是O(logn),适用于极大规模的旅行商问题。而两级树表示法在最坏情况下每一个更新操作的时间复杂度是O(n^0.5),适用于大规模的旅行商问题。 相似文献
4.
Splay trees are self-organizing binary search trees that were introduced by Sleator and Tarjan [J. ACM 32 (1985) 652-686]. In this paper we present a randomized variant of these trees. The new algorithm for reorganizing the tree is both simple and easy to implement. We prove that our randomized splaying scheme has the same asymptotic performance as the original deterministic scheme but improves constants in the expected running time. This is interesting in practice because the search time in splay trees is typically higher than the search time in skip lists and AVL-trees. We present a detailed experimental study of our algorithm. On request sequences generated by fixed probability distributions, we can achieve improvements of up to 25% over deterministic splaying. On request sequences that exhibit high locality of reference, the improvements are minor. 相似文献
5.
Ding -Zhu Du 《Algorithmica》1995,13(4):381-386
We disprove a conjecture of Shor and Smith on a greedy heuristic for the Steiner minimum tree by showing that the length ratio between the Steiner minimum tree and the greedy tree constructed by their method for the same set of points can be arbitrarily close to3/2. We also propose a new conjecture.Supported in part by the National Science Foundation under Grant CCR-9208913. 相似文献
6.
7.
The algorithm proposed by Chang and lyengar to perfectly balance binary search trees has been modified to not only balance but also thread binary search trees. Threads are constructed in the same sequence as normal pointers during the balancing process. No extra workspace is necessary, and the running time is also linear for the modified algorithm. Such produced tree structure has minimal average path length for fast information retrieval, and threads to facilitate more flexible and efficient traversing schemes. Maintenance and manipulation of the data structure are discussed and relevant algorithms given. 相似文献
8.
Representation techniques are important issues when designing successful evolutionary algorithms. Within this field the use of neutrality plays an important role. We examine the use of bit-wise neutrality introduced by Poli and López (2007) from a theoretical point of view and show that this mechanism only enhances mutation-based evolutionary algorithms if not the same number of genotypic bits for each phenotypic bit is used. Using different numbers of genotypic bits for the bits in the phenome we point out by rigorous runtime analyses that it may reduce the optimization time significantly. 相似文献
9.
10.
Splay and randomized search trees (RSTs) are self‐balancing binary tree structures with little or no space overhead compared to a standard binary search tree (BST). Both trees are intended for use in applications where node accesses are skewed, for example in gathering the distinct words in a large text collection for index construction. We investigate the efficiency of these trees for such vocabulary accumulation. Surprisingly, unmodified splaying and RSTs are on average around 25% slower than using a standard binary tree. We investigate heuristics to limit splay tree reorganization costs and show their effectiveness in practice. In particular, a periodic rotation scheme improves the speed of splaying by 27%, while other proposed heuristics are less effective. We also report the performance of efficient bit‐wise hashing and red–blacktrees for comparison. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献
11.
M. Drmota 《Algorithmica》2001,29(1-2):89-119
By using analytic tools it is shown that the expected value of the heightH
n
of binary search trees of sizen is asymptotically given by EH
n
=c logn+
(log logn) and its variance by VH
n
=
((log logn)2), wherec=4.31107 …. The same bounds have been obtained by Devroye and Reed [3] by completely different methods.
Dedicated to Philippe Flajolet on the occasion of his 50th birthday
This research was supported by the Austrian Science Foundation FWF, Grant P10187-MAT.
Online publication September 22, 2000. 相似文献
12.
13.
It is shown that Bern's probabilistic asymptotic results on rectilinear Steiner trees remain valid for the model that there are exactlyN nodes uniformly distributed in a square of side N. 相似文献
14.
基于XQuery语义缓存的异构数据集成系统的查询处理 总被引:1,自引:0,他引:1
提出了一种在Mediator-Wrapper结构中嵌入基于XQuery语义缓存的异构数据集成系统的查询处理方案,主要包括查询包含关系判定、查询分解和重写.同时提出利用树型同态算法解决XQuery查询语义包含关系的判断问题和Web环境下的缓存替换策略,旨在提高信息集成系统的查询性能. 相似文献
15.
近几年来,互联网上出现了一类称为Mashup的新型应用,它使最终用户能够个性化地聚合和操作分布在互联网上的数据源.然而,关于Mashup在动态环境下运行时的性能研究还比较缺乏.为此,利用缓存技术提出了Mashup运行时的性能优化方法——POMO.POMO具有以下3个主要创新点:首先,POMO通过算子序列的缓存点的成本和收益模型实现了动态缓存点选取;其次,POMO通过缓存点的B+树索引实现了缓存点重用;第三,POMO通过两阶段切换数据传输协议实现了缓存点更新.实验分析结果表明:POMO减少了Mashup在动态环境下的运行成本,提高了Mashup运行时的性能 相似文献
16.
Adaptivity in sorting algorithms is sometimes gained at the expense of practicality. We give experimental results showing that Splaysort — sorting by repeated insertion into a Splay tree — is a surprisingly efficient method for in-memory sorting. Splaysort appears to be adaptive with respect to all accepted measures of presortedness, and it outperforms Quicksort for sequences with modest amounts of existing order. Although Splaysort has a linear space overhead, there are many applications for which this is reasonable. In these situations Splaysort is an attractive alternative to traditional comparison-based sorting algorithms such as Heapsort, Mergesort, and Quicksort. 相似文献
17.
H. Sarojadevi S. K. Nandy S. Balakrishnan 《International journal of parallel programming》2004,32(5):415-446
Emerging multiprocessor architectures such as chip multiprocessors, embedded architectures, and massively parallel architectures, demand faster, more efficient, and more scalable cache coherence schemes. In devising more cost-efficient schemes, formal insights into a system model is deemed useful. We, in this paper, build formalisms for execution in cache based Distributed shared-memory multiprocessors (DSM) obeying Release Consistency model, and derive conditions for cache coherence. A cost-efficient cache coherence scheme without directories is designed. Our approach relies on processor directed coherence actions, which are early in nature. The scheme exploits sharing information provided by a programmer-centric framework. Per-processor coherence buffers (CB) are employed to impose coherence on live shared variables between consecutive release points in the execution. Simulation of 8 entry 4-way associative CB based system achieves a speedup of 1.07–4.31 over full-map 3-hop directory scheme for six of the SPLASH-2 benchmarks. 相似文献
18.
19.
Robert F. Sproull 《Algorithmica》1991,6(1):579-589
This note presents a simplification and generalization of an algorithm for searchingk-dimensional trees for nearest neighbors reported by Friedmanet al [3]. If the distance between records is measured usingL
2
, the Euclidean norm, the data structure used by the algorithm to determine the bounds of the search space can be simplified to a single number. Moreover, because distance measurements inL
2
are rotationally invariant, the algorithm can be generalized to allow a partition plane to have an arbitrary orientation, rather than insisting that it be perpendicular to a coordinate axis, as in the original algorithm. When ak-dimensional tree is built, this plane can be found from the principal eigenvector of the covariance matrix of the records to be partitioned. These techniques and others yield variants ofk-dimensional trees customized for specific applications.It is wrong to assume thatk-dimensional trees guarantee that a nearest-neighbor query completes in logarithmic expected time. For smallk, logarithmic behavior is observed on all but tiny trees. However, for largerk, logarithmic behavior is achievable only with extremely large numbers of records. Fork = 16, a search of ak-dimensional tree of 76,000 records examines almost every record. 相似文献