Cache-oblivious index for approximate string matching |
| |
Authors: | Wing-Kai Hon Tak-Wah Lam |
| |
Affiliation: | a Department of Computer Science, National Tsing Hua University, Taiwanb Department of Computer Science, The University of Hong Kong, Hong Kongc Department of Computer Science, Louisiana State University, LA, USAd Department of Electrical Engineering and Computer Science, Kansas University, KS, USA |
| |
Abstract: | This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O((nlogkn)/B) disk pages and finds all k-error matches with O((|P|+occ)/B+logknloglogBn) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require Ω(|P|+occ+poly(logn)) I/Os. The second index reduces the space to O((nlogn)/B) disk pages, and the I/O complexity is O((|P|+occ)/B+logk(k+1)nloglogn). |
| |
Keywords: | String matching Indexing Approximate queries Cache-oblivious I/O model |
本文献已被 ScienceDirect 等数据库收录! |
|