Approximate Matching of Run-Length Compressed Strings期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

Approximate Matching of Run-Length Compressed Strings

Authors:	Mäkinen Ukkonen Navarro

Affiliation:	Department of Computer Science, P.O. Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, vmakinen@cs.helsinki.fi, ukkonen@cs.helsinki.fi. Supported by the Academy of Finland under Grant 22584., Finland Center for Web Research, Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile. gnavarro@dcc.uchile.cl. Supported by Millenium Nucleus Center for Web Research, Grant P01-029-F, Mideplan, Chile

Abstract:	We focus on the problem of approximate matching of strings that have been compressed using run-length encoding. Previous studies have concentrated on the problem of computing the longest common subsequence (LCS) between two strings of length m and n , compressed to m' and n' runs. We extend an existing algorithm for the LCS to the Levenshtein distance achieving O(m'n+n'm) complexity. Furthermore, we extend this algorithm to a weighted edit distance model, where the weights of the three basic edit operations can be chosen arbitrarily. This approach also gives an algorithm for approximate searching of a pattern of m letters (m' runs) in a text of n letters (n' runs) in O(mm'n') time. Then we propose improvements for a greedy algorithm for the LCS, and conjecture that the improved algorithm has O(m'n') expected case complexity. Experimental results are provided to support the conjecture.

Keywords:
本文献已被 SpringerLink 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏