首页 | 本学科首页   官方微博 | 高级检索  
     


New Techniques for Regular Expression Searching
Authors:Gonzalo Navarro  Mathieu Raffinot
Affiliation:(1) Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile;(2) Equipe Génome et Informatique, Tour Evry 2, 523 place des terrasses de l’Agora, 91034 Evry, France
Abstract:We present two new techniques for regular expression searching and use them to derive faster practical algorithms. Based on the specific properties of Glushkov’s nondeterministic finite automaton construction algorithm, we show how to encode a deterministic finite automaton (DFA) using O(m2m) bits, where m is the number of characters, excluding operator symbols, in the regular expression. This compares favorably against the worst case of O(m2m|Σ|) bits needed by a classical DFA representation (where Σ is the alphabet) and O(m22m) bits needed by the Wu and Manber approach implemented in Agrep. We also present a new way to search for regular expressions, which is able to skip text characters. The idea is to determine the minimum length ℓ of a string matching the regular expression, manipulate the original automaton so that it recognizes all the reverse prefixes of length up to ℓ of the strings originally accepted, and use it to skip text characters as done for exact string matching in previous work. We combine these techniques into two algorithms, one able and one unable to skip text characters. The algorithms are simple to implement, and our experiments show that they permit fast searching for regular expressions, normally faster than any existing algorithm.
Keywords:Glushkov automaton  Compact DFA representation  Bit-parallelism  BDM  Reverse factors
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号