首页 | 本学科首页   官方微博 | 高级检索  
     


Speeding up two string-matching algorithms
Authors:M. Crochemore  A. Czumaj  L. Gasieniec  S. Jarominek  T. Lecroq  W. Plandowski  W. Rytter
Affiliation:(1) LITP, Institut Blaise Pascal, Université Paris 7, 2 Place Jussieu, 75251 Paris Cedex 05, France;(2) Institute of Informatics, Warsaw University, ul. Banacha 2, 00-913 Warsaw 59, Poland
Abstract:We show how to speed up two string-matching algorithms: the Boyer-Moore algorithm (BM algorithm), and its version called here the reverse factor algorithm (RF algorithm). The RF algorithm is based on factor graphs for the reverse of the pattern. The main feature of both algorithms is that they scan the text right-to-left from the supposed right position of the pattern. The BM algorithm goes as far as the scanned segment (factor) is a suffix of the pattern. The RF algorithm scans while the segment is a factor of the pattern. Both algorithms make a shift of the pattern, forget the history, and start again. The RF algorithm usually makes bigger shifts than BM, but is quadratic in the worst case. We show that it is enough to remember the last matched segment (represented by two pointers to the text) to speed up the RF algorithm considerably (to make a linear number of inspections of text symbols, with small coefficient), and to speed up the BM algorithm (to make at most 2 ·n comparisons). Only a constant additional memory is needed for the search phase. We give alternative versions of an accelerated RF algorithm: the first one is based on combinatorial properties of primitive words, and the other two use the power of suffix trees extensively. The paper demonstrates the techniques to transform algorithms, and also shows interesting new applications of data structures representing all subwords of the pattern in compact form.The work by M. Crochemore and T. Lecroq was partially supported by PRC ldquoMathématiques-Informatique,rdquo M. Crochemore was also partially supported by NATO Grant CRG 900293, and the work by A. Czumaj, L. Gasieniec, S. Jarominek, W. Plandowski, and W. Rytter was supported by KBN of the Polish Ministry of Education.
Keywords:Analysis of algorithms  Pattern matching  String matching  Suffix tree  Suffix automaton  Combinatorial problems  Periods  Text processing  Data retrieval
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号