Approximate regular expression pattern matching with concave gap penalties |
| |
Authors: | J. R. Knight E. W. Myers |
| |
Affiliation: | (1) Department of Computer Science, University of Arizona, Gould-Simpson Building 721, 85721 Tucson, AZ, USA |
| |
Abstract: | Given a sequenceA of lengthM and a regular expressionR of lengthP, an approximate regular expression pattern-matching algorithm computes the score of the optimal alignment betweenA and one of the sequencesB exactly matched byR. An alignment between sequencesA=a1a2 ... aM andB=b1b2... bN is a list of ordered pairs, (i1,j1), (i2j2), ..., (it,jtt) such that ik < ik+1 and jk < jk+1. In this case the alignmentaligns symbols aik and bjk, and leaves blocks of unaligned symbols, orgaps, between them. A scoring schemeS associates costs for each aligned symbol pair and each gap. The alignment's score is the sum of the associated costs, and an optimal alignment is one of minimal score. There are a variety of schemes for scoring alignments. In a concave gap penalty scoring schemeS={, w}, a function (a, b) gives the score of each aligned pair of symbolsa andb, and aconcave function w(k) gives the score of a gap of lengthk. A function w is concave if and only if it has the property that, for allk > 1, w(k + 1) –w(k) w(k) –w(k –1). In this paper we present an O(MP(logM + log2P)) algorithm for approximate regular expression matching for an arbitrary and any concavew.This work was supported in part by the National Institute of Health under Grant RO1 LM04960. |
| |
Keywords: | Pattern matching Regular expressions Concave gap penalties Approximate matching |
本文献已被 SpringerLink 等数据库收录! |
|