Approximate Matching of Run-Length Compressed Strings |
| |
Authors: | M?kinen Ukkonen and Navarro |
| |
Affiliation: | (1) Department of Computer Science, P.O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland. vmakinen@cs.helsinki.fi, ukkonen@cs.helsinki.fi. Supported by the Academy of Finland under Grant 22584., FI;(2) Center for Web Research, Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile. gnavarro@dcc.uchile.cl. Supported by Millenium Nucleus Center for Web Research, Grant P01-029-F, Mideplan, Chile., CL |
| |
Abstract: | Abstract. We focus on the problem of approximate matching of strings that have been compressed using run-length encoding. Previous
studies have concentrated on the problem of computing the longest common subsequence (LCS) between two strings of length m and n , compressed to m' and n' runs. We extend an existing algorithm for the LCS to the Levenshtein distance achieving O(m'n+n'm) complexity. Furthermore, we extend this algorithm to a weighted edit distance model, where the weights of the three basic
edit operations can be chosen arbitrarily. This approach also gives an algorithm for approximate searching of a pattern of
m letters (m' runs) in a text of n letters (n' runs) in O(mm'n') time. Then we propose improvements for a greedy algorithm for the LCS, and conjecture that the improved algorithm has O(m'n') expected case complexity. Experimental results are provided to support the conjecture. |
| |
Keywords: | , Compressed pattern matching, Run-length encoding, Levenshtein distance, Longest common subsequence, Weighted edit,,,,,distance, |
本文献已被 SpringerLink 等数据库收录! |
|