Self‐adjusting trees in practice for large text collections期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Self‐adjusting trees in practice for large text collections

Authors:	Hugh E Williams Justin Zobel Steffen Heinz

Abstract:	Splay and randomized search trees (RSTs) are self‐balancing binary tree structures with little or no space overhead compared to a standard binary search tree (BST). Both trees are intended for use in applications where node accesses are skewed, for example in gathering the distinct words in a large text collection for index construction. We investigate the efficiency of these trees for such vocabulary accumulation. Surprisingly, unmodified splaying and RSTs are on average around 25% slower than using a standard binary tree. We investigate heuristics to limit splay tree reorganization costs and show their effectiveness in practice. In particular, a periodic rotation scheme improves the speed of splaying by 27%, while other proposed heuristics are less effective. We also report the performance of efficient bit‐wise hashing and red–blacktrees for comparison. Copyright © 2001 John Wiley & Sons, Ltd.

Keywords:	self‐adjusting binary trees splay trees randomized search trees treaps vocabulary accumulation