Given a text
T[1..
u] over an alphabet of size
σ, the
full-text search problem consists in finding the
occ occurrences of a given pattern
P[1..
m] in
T. In
indexed text searching we build an index on
T to improve the search time, yet increasing the space requirement. The current trend in indexed text searching is that of
compressed full-text self-indices, which replace the text with a more space-efficient representation of it, at the same time providing indexed access to the text. Thus, we can provide efficient access within compressed space. The Lempel-Ziv index (LZ-index) of Navarro is a compressed full-text self-index able to represent
T using 4
uH k (
T)+
o(
ulog?
σ) bits of space, where
H k (
T) denotes the
k-th order empirical entropy of
T, for any
k=
o(log?
σ u). This space is about four times the compressed text size. The index can locate all the
occ occurrences of a pattern
P in
T in
O(
m 3log?
σ+(
m+
occ)log?
u) worst-case time. Although this index has proven very competitive in practice, the
O(
m 3log?
σ) term can be excessive for long patterns. Also, the factor 4 in its space complexity makes it larger than other state-of-the-art alternatives. In this paper we present stronger Lempel-Ziv based indices (LZ-indices), improving the overall performance of the original LZ-index. We achieve indices requiring (2+
ε)
uH k (
T)+
o(
ulog?
σ) bits of space, for any constant
ε>0, which makes them the smallest existing LZ-indices. We simultaneously improve the search time to
O(
m 2+(
m+
occ)log?
u), which makes our indices very competitive with state-of-the-art alternatives. Our indices support displaying any text substring of length
? in optimal
O(
?/log?
σ u) time. In addition, we show how the space can be squeezed to (1+
ε)
uH k (
T)+
o(
ulog?
σ) to obtain a structure with
O(
m 2) average search time for
m≥2log?
σ u. Alternatively, the search time of LZ-indices can be improved to
O((
m+
occ)log?
u) with (3+
ε)
uH k (
T)+
o(
ulog?
σ) bits of space, which is much less than the space needed by other Lempel-Ziv-based indices achieving the same search time. Overall our indices stand out as a very attractive alternative for space-efficient indexed text searching.
相似文献