String Indexing for Patterns with Wildcards |
| |
Authors: | Philip Bille Inge Li Gørtz Hjalte Wedel Vildhøj Søren Vind |
| |
Affiliation: | 1. DTU Compute, Technical University of Denmark, Lyngby, Denmark
|
| |
Abstract: | We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results. - A linear space index with query time O(m+σ j loglogn+occ). This significantly improves the previously best known linear space index by Lam et al. (in Proc. 18th ISAAC, pp. 846–857, 2007]), which requires query time Θ(jn) in the worst case.
- An index with query time O(m+j+occ) using space \(O(\sigma^{k^{2}} n \log^{k} \log n)\) , where k is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time.
- A time-space trade-off, generalizing the index by Cole et al. (in Proc. 36th STOC, pp. 91–100, 2004]).
We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|