首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
《国际计算机数学杂志》2012,89(9):1097-1106
The problem of converting a regular expression to nondeterministic finite automaton (NFA) is a fundamental problem that has been well studied. However, the two basic construction algorithms: (1) Thompson, (2) McNaughton–Yamada and Glushkov, both have disadvantages. In this article: first, a ‘smart’ parsing algorithm is developed which constructs a parse tree with at most (3l???1) nodes form a regular expression with l literals; second, we propose an algorithm that works on the resulting NFA from Thompson's construction, eliminating as many auxiliary states as possible while maintaining Thompson's properties. It is shown that the resulting NFA is minimized. This means that no auxiliary states can be eliminated without violating the defining properties of Thompson NFA. The time and space requirements for the above algorithms are linear with respect to the length of the regular expression.  相似文献   

2.
Howard Chivers 《Software》2017,47(5):669-688
The jsre regular expression library was designed to provide fast matching of complex expressions over large input streams using user‐selectable character encodings. An established design approach was used: a simulated non‐deterministic automaton (NFA) implemented as a virtual machine, avoiding exponential cost functions in either space or time. A deterministic automaton (DFA) was chosen as a general dispatching mechanism for Unicode character classes, and this also provided the opportunity to use compact DFAs in various optimization strategies. The result was the development of a regular expression Preview which provides a summary of all the matches possible from a given point in a regular expression in a form that can be implemented as a compact DFA and can be used to further improve the performance of the standard NFA simulation algorithm. This paper formally defines a preview and describes and evaluates several optimizations using this construct. They provide significant speed improvements accrued from fast scanning of anchor positions, avoiding retesting of repeated strings in unanchored searches and efficient searching of multiple alternate expressions which in the case of keyword searching has a time complexity which is logarithmic in the number of words to be searched. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

3.
Signature-based intrusion detection is required to inspect network traffic at wire-speed. Matching packet payloads against patterns specified with regular expression is a computation intensive task. Hence, the design of hardware accelerator to speed up regular expression matching has been an active research area. A systematic approach to detect regular expression is based on finite automaton. The space-time trade-off between deterministic finite automaton (DFA) and non-deterministic finite automaton (NFA) is well-known. DFA can offer constant throughput but it may suffer from the state explosion problem. Hence, implementation of DFA for large pattern sets on embedded device with limited on-chip memory may not be viable. NFA requires linear space but the throughput can be very low. Implementations of NFA with hardwired circuits can overcome the speed deficiency by exploiting the massive parallelism offered by dedicated hardware circuitries, but this approach does not support efficient dynamic updates. In this paper, we shall present a memory-based architecture for the implementation of NFA to speed up regular expression matching for signature-based intrusion detection. The proposed method supports dynamic updates and offers constant throughput so that it can be used to supplement the existing DFA-based methods in handling large pattern sets.  相似文献   

4.
The aim of this paper is to extend the family of Glushkov automata. This is achieved by designing new operators, the so-called multi-tilde-bar operators, that allow us to compute Glushkov functions for the associated extended expressions. Conversely an extended Glushkov automaton with states can be converted into an extended expression with occurrences of symbols. It leads to a characterization in terms of graphs of the family of extended Glushkov automata. Moreover, extended expressions are shown to be superpolynomially more succinct than standard expressions.  相似文献   

5.
Several methods have been developed to construct λ-free automata that represent a regular expression. Among the most widely known are the position automaton (Glushkov), the partial derivatives automaton (Antimirov) and the follow automaton (Ilie and Yu). All these automata can be obtained with quadratic time complexity, thus, the comparison criterion is usually the size of the resulting automaton. The methods that obtain the smallest automata (although, for general expressions, they are not comparable), are the follow and the partial derivatives methods. In this paper, we propose another method to obtain a λ-free automaton from a regular expression. The number of states of the automata we obtain is bounded above by the size of both the partial derivatives automaton and of the follow automaton. Our algorithm also runs with the same time complexity of these methods.  相似文献   

6.

We present an extension to multiplicities of a classical algorithm for computing a boolean automaton from a regular expression. The Glushkov construction computes an automaton with n +1 states from a regular expression with n occurences of letters. We give an extension of the Glushkov algorithm for the multiplicity case in a non commutative semiring. Next, we give four equivalent extended step by step algorithms.  相似文献   

7.
8.
This article describes an improvement of the brute force determinization algorithm in the case of homogeneous nondeterministic finite automata (NFAs), as well as its application to pattern matching. Brute force determinization with limited memory may provide a partially determinized automaton, but its bounded complexity makes it a safe procedure contrary to the classical subset construction. Actually, our algorithm is inspired by both recent results of Champarnaud concerning the subset automaton of a homogeneous NFA and the algorithm recently designed by Navarro and Raffinot to implement the brute force determinization of the Glushkov NFA of a regular pattern. Our algorithm significantly improves Navarro–Raffinot's one since it has an average exponentially smaller memory requirement for a given level of determinization, which, considering a bounded memory, implies a quadratically smaller parsing time. This algorithm has been implemented in CCP software (http://www.univ-rouen.fr/LIFAR/aia/ccp.html). Tests have been carried out in the field of text processing and biology. Experimental results are reported.  相似文献   

9.
A reactive automaton has extra links whose role is to change the behaviour of the automaton. We show that these links do not increase the expressiveness of finite automata but that they can be used to reduce dramatically their state number both in the deterministic case and the non-deterministic case.Typical examples of regular expressions associated with deterministic automata of exponential size according to the length of the expression show that reactive links provide an alternative representation of total linear size for the language.  相似文献   

10.
One-Unambiguous Regular Languages   总被引:2,自引:0,他引:2  
The ISO standard for the Standard Generalized Markup Language (SGML) provides a syntactic meta-language for the definition of textual markup systems. In the standard, the right-hand sides of productions are based on regular expressions, although only regular expressions that denote words unambiguously, in the sense of the ISO standard, are allowed. In general, a word that is denoted by a regular expression is witnessed by a sequence of occurrences of symbols in the regular expression that match the word. In an unambiguous regular expression as defined by Booket al.(1971,IEEE Trans. Comput.C-20(2), 149–153), each word has at most one witness. But the SGML standard also requires that a witness be computed incrementally from the word with a one-symbol lookahead; we call such regular expressions 1-unambiguous. A regular language is a 1-unambiguouslanguage if it is denoted by some 1-unambiguous regular expression. We give a Kleene theorem for 1-unambiguous languages and characterize 1-unambiguous regular languages in terms of structural properties of the minimal deterministic automata that recognize them. As a result we are able to prove the decidability of whether a given regular expression denotes a 1-unambiguous language; if it does, then we can construct an equivalent 1-unambiguous regular expression in worst-case optimal time.  相似文献   

11.
Regular trees can be defined by two types of rational expressions. For these two types we solve the star-height problem, i.e., we show how to construct a rational expression of minimal star-height from the minimal graph of the given tree (i.e., the analogue of the minimal deterministic automation for regular languages). In one case, the minimal starheight is the rank (in the sense of Eggan) of the minimal graph. There corresponds a characterization of the star-height of a prefix-free regular language w.r.t. rational expressions of a special kind (called deterministic) as the rank of its minimal deterministic automaton considered as a graph.  相似文献   

12.
We revisit the problem of deciding by means of a finite automaton whether a string is uniquely decodable from its bigram counts. An efficient algorithm for constructing a polynomial-size Nondeterministic Finite Automaton (NFA) that decides unique decodability is given. This NFA may be simulated efficiently in time and space. Conversely, we show that the minimum deterministic finite automaton for deciding unique decodability has exponentially many states in alphabet size, and compute the correct order of magnitude of the exponent.  相似文献   

13.
一种基于深度报文检测的FSM状态表压缩技术   总被引:6,自引:0,他引:6  
针对深度报文检测中正则表达式模式匹配的状态表爆炸问题,提出并实现了一种集合交割的预编码方法(SI-precode),在正则表达式转换成DFA前对所有输入符号进行预编码,通过压缩输入,减少FSM中输入符号的种类,从而压缩状态转移表的空间.证明了预编码生成的状态机的正确性及其与原状态机的同态性.采用L7-filter模式进行实验表明SI-precode不仅提高了正则表达式的编译速度,针对单模式状态机,其状态转移表空间比不进行预编码压缩了87%~97%,50个模式的多模式状态机可压缩59%.预编码在软硬件结合体系结构下进行协议识别时不会对性能造成影响;对纯软件结构性能降低2%~4%.  相似文献   

14.
A mutating finite automaton (MFA) is a nondeterministic finite automaton (NFA) that changes its morphology over discrete time by a sequence of mutations. This results in a sequence of NFAs, the initial NFA, and one mutated NFA for each mutation. Some application domains, including model-based diagnosis of discrete-event systems in artificial intelligence and model-based testing in software engineering, require temporal determinization of MFAs. Determinizing an MFA temporally means generating a deterministic finite automaton (DFA) that is equivalent to the mutated NFA as soon as a mutation occurs. Since, in computation time, the classical Subset Construction determinization algorithm may be less than optimal when applied to MFAs, a conservative algorithm is proposed, called Subset Restructuring, which, instead of constructing the new DFA from scratch based on the mutated NFA, generates the new DFA by updating the previous DFA based on the mutation occurred. Subset Restructuring is sound and complete, thereby yielding the same DFA generated by Subset Construction. Results from massive experimentation indicate the viability of Subset Restructuring, especially so when large MFAs change by small mutations.  相似文献   

15.
We present two new techniques for regular expression searching and use them to derive faster practical algorithms. Based on the specific properties of Glushkov’s nondeterministic finite automaton construction algorithm, we show how to encode a deterministic finite automaton (DFA) using O(m2m) bits, where m is the number of characters, excluding operator symbols, in the regular expression. This compares favorably against the worst case of O(m2m|Σ|) bits needed by a classical DFA representation (where Σ is the alphabet) and O(m22m) bits needed by the Wu and Manber approach implemented in Agrep. We also present a new way to search for regular expressions, which is able to skip text characters. The idea is to determine the minimum length ℓ of a string matching the regular expression, manipulate the original automaton so that it recognizes all the reverse prefixes of length up to ℓ of the strings originally accepted, and use it to skip text characters as done for exact string matching in previous work. We combine these techniques into two algorithms, one able and one unable to skip text characters. The algorithms are simple to implement, and our experiments show that they permit fast searching for regular expressions, normally faster than any existing algorithm.  相似文献   

16.
Summary The amount of nondeterminism in a nondeterministic finite automaton (NFA) is measured by counting the minimal number of guessing points a string w has to pass through on its way to an accepting state. NFA's with more nondeterminism can achieve greater savings in the number of states over their deterministic counterparts than NFA's with less nondeterminism. On the other hand, for some nontrivial infinite regular languages a deterministic finite automaton (DFA) can already be quite succinct in the sense that NFA's need as many states (and even context-free grammars need as many nonterminals) as the minimal DFA has states.This research was supported in part by the National Science Foundation under Grant No. MCS 76-10076  相似文献   

17.
确定性有限自动机(Det­erministic Finite Automata, DFA)匹配速度远快于非确定性有限状态自动机(Non-deterministic Finite state Autom­ata, NFA),但大量正则表达式转换为DFA时会引起状态爆炸而占用巨大的存储空间。首先定义膨胀系数(Expansion Coefficient, EC)来描述正则表达式的膨胀特性,然后在膨胀系数这一概念基础上,提出一种高效的分组算法--IGA(Improved Grouping Algorithm)算法对正则表达式进行有效分组,将容易引起状态爆炸的正则表达式相互隔离,从而节省存储空间。实验结果表明,与原有算法相比,在相同分组数目时IGA算法平均能够减少25%的状态数。  相似文献   

18.
The theory of formal string languages and of formal tree languages are both important parts of the theory of formal languages. Regular tree languages are recognized by finite tree automata. Trees in their postfix notation can be seen as strings. This paper presents a simple transformation from any given (bottom-up) finite tree automaton recognizing a regular tree language to a deterministic pushdown automaton accepting the same tree language in postfix notation. The resulting deterministic pushdown automaton can be implemented easily by an existing parser generator because it is constructed for an LR(0) grammar, and its size directly corresponds to the size of the deterministic finite tree automaton. The class of regular tree languages in postfix notation is a proper subclass of deterministic context-free string languages. Moreover, the class of tree languages which are in their postfix notation deterministic context-free string languages is a proper superclass of the class of regular tree languages.  相似文献   

19.
Quite often, trivial problems stated for deterministic finite automata (DFA) are surprisingly difficult for the non-deterministic case (NFA). In any non-minimal DFA for a given regular language, we can find two equivalent states which can be “merged” without changing the accepted language. This is not the case for NFA, where we can have non-minimal automata with no “mergible” states. In this paper, we prove a very basic result for NFA, that for a given regular language, any NFA of size greater than a computable constant must contain mergible states. Even more, we parameterized this constant in order to guarantee groups of an arbitrary number of mergible states.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号