首页 | 本学科首页   官方微博 | 高级检索  
     


Approximate Matching of Run-Length Compressed Strings
Authors:Mäkinen  Ukkonen  Navarro
Affiliation:Department of Computer Science, P.O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, vmakinen@cs.helsinki.fi, ukkonen@cs.helsinki.fi. Supported by the Academy of Finland under Grant 22584., Finland
Center for Web Research, Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile. gnavarro@dcc.uchile.cl. Supported by Millenium Nucleus Center for Web Research, Grant P01-029-F, Mideplan, Chile
Abstract:We focus on the problem of approximate matching of strings that have been compressed using run-length encoding. Previous studies have concentrated on the problem of computing the longest common subsequence (LCS) between two strings of length m and n , compressed to m' and n' runs. We extend an existing algorithm for the LCS to the Levenshtein distance achieving O(m'n+n'm) complexity. Furthermore, we extend this algorithm to a weighted edit distance model, where the weights of the three basic edit operations can be chosen arbitrarily. This approach also gives an algorithm for approximate searching of a pattern of m letters (m' runs) in a text of n letters (n' runs) in O(mm'n') time. Then we propose improvements for a greedy algorithm for the LCS, and conjecture that the improved algorithm has O(m'n') expected case complexity. Experimental results are provided to support the conjecture.
Keywords:
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号