A faster algorithm for matching a set of patterns with variable length don't cares |
| |
Authors: | Meng Zhang Liang Hu |
| |
Affiliation: | a College of Computer Science and Technology, Jilin University, Changchun, China b Department of Computer Science, Jilin Business and Technology College, Changchun, China |
| |
Abstract: | We present a simple and faster solution to the problem of matching a set of patterns with variable length don't cares. Given an alphabet Σ, a pattern p is a word p1@p2?@pm, where pi is a string over Σ called a keyword and @∉Σ is a symbol called a variable length don't care (VLDC) symbol. Pattern p matches a text t if t=u0p1u1…um−1pmum for some u0,…,um∈Σ∗. The problem addressed in this paper is: given a set of patterns P and a text t, determine whether one of the patterns of P matches t.Kucherov and Rusinowitch (1997) [9] presented an algorithm that solves the problem in time O((|t|+|P|)log|P|), where |P| is the total length of keywords in every pattern of P. We give a new algorithm based on Aho-Corasick automaton. It uses the solutions of Dynamic Marked Ancestor Problem of Chan et al. (2007) [5]. The algorithm takes O((|t|+‖P‖)logκ/loglogκ) time, where ‖P‖ is the total number of keywords in every pattern of P, and κ is the number of distinct keywords in P. The algorithm is faster and simpler than the previous approach. |
| |
Keywords: | Algorithms String matching Variable length don't cares |
本文献已被 ScienceDirect 等数据库收录! |
|