Improving the worst-case performance of the Hunt-Szymanski strategy for the longest common subsequence of two strings |
| |
Affiliation: | 1. School of Tianyou, East China Jiaotong University, Nanchang, 330013, China;2. School of Electrical and Automation Engineering, East China Jiaotong University, Nanchang, 330013, China;1. Mathematics Institute, Zeeman Building, University of Warwick, Coventry CV4 7AL, UK;2. Departamento de Ingeniería Matemática, Facultad de Ciencias Físicas y Matemáticas, Universidad de Concepción, Chile;3. Departamento de Ingeniería Matemática y Centro de Modelamiento Matemático (CNRS IRL 2807), Universidad de Chile, Beauchef 851, Santiago, Chile;1. School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei Street, 210094 Nanjing, China;2. School of Transportation Science and Engineering, Beihang University, 37 Xueyuan Road, Haidian District, 100191 Beijing, China;3. School of Social Science, Nanjing Vocational University of Industry Technology, 210046 Nanjing, China |
| |
Abstract: | Among the algorithms set up to date for finding the longest common subsequence of two strings, the one by Hunt and Szymanski exhibits the best known performance in favorable cases, but can be worse than any straightforward algorithm for a large variety of inputs. The new algorithm presented here pursues a schedule of primitive operations quite close to the one inherent to the Hunt-Szymanski strategy, but with substantially enhanced efficiency. In fact, the new algorithm improves on the former in two important respects. First, its worst case is never worse than linear in the product nm of the lengths of the two input strings. Second, its time bound does not always grow with the cardinality r of the set R of all pairs of matching positions of the input strings. Rather, it depends on the cardinality d of a specific subset of R, whose elements are called here dominant matches, and are elsewhere referred to as minimal candidates. This second improvement also appears of significance, since it seems that whenever r gets too close to mn, this forces d to be linear in m. The new algorithm requires standard preprocessing, and makes use of finger-trees. In a forthcoming paper, it will be shown among other things that the same performance can be achieved with simpler and handier auxiliary data structures. |
| |
Keywords: | |
本文献已被 ScienceDirect 等数据库收录! |
|