期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

付哲李军《计算机工程与应用》2018,54(20):1-13

深度检测在维护网络安全、保证服务质量等方面扮演着重要的角色。正则表达式匹配算法作为高性能深度检测的核心技术,具有重要的研究价值和实践意义。随着网络流量不断增长、规则数目持续增多以及网络结构日趋灵活和动态,现有的正则表达式匹配算法面临着匹配速度、内存占用和更新能力等多方面的挑战。介绍了正则表达式匹配算法的研究背景,从空间压缩、匹配加速、新型自动机设计以及规则拆分和分组四个角度入手,分类总结了学术界具有影响力的研究成果。通过基于真实网络流量的评测,比较了几种经典匹配算法在不同规则集上的匹配速度、内存占用和预处理时间等性能指标,并给出了不同需求场景下高效正则表达式匹配算法的选择建议,归纳了高性能正则表达式匹配算法的下一步发展方向。相似文献

2.

FPGA实现的正则表达式匹配性能分析

唐球姜磊谭建龙刘金刚《小型微型计算机系统》2012,33(11)

分析出影响FPGA实现的正则表达式匹配性能的关键因素是正则表达式匹配性能优化的前提.首先由L7-Filter各个规则的性能测试结果分析出低主频规则有别于其它高主频规则的三个特征.其后通过设计多个字符组串联而成的特殊正则表达式测试模型去验证这三个特征对基于FPGA的正则表达式自动机性能的影响程度.得出如下结论:基于FPGA的正则表达式自动机的主频随字符组宽度的增长而迅速下降,随字符组串联数目的增长而缓慢下降;星号(*)或问号(?)重复语法对字符组规则主频的影响大于加号(+)重复语法对字符组规则主频的影响.最后将基于字符组的结论推广至更普遍的大量字符“或(Ⅰ)”操作的层面. 相似文献

3.

正则表达式匹配的高效硬件实现 总被引：2，自引：1，他引：1

下载免费PDF全文

孙志刚张子文《计算机工程与科学》2009,31(10)

正则表达式具有编写简单和描述能力强的特点,在报文深度内容检测中得到了广泛应用。但是,由于处理复杂,基于软件的正则表达式匹配的实现难以满足大流量下报文的内容检测。本文首先对实现正则表达式匹配的多模式确定有限自动机(MPDFA)方法进行研究,并基于该方法提出基于硬件实现报文正则表达式匹配的微引擎结构。最后,给出了我们基于AlteraCycloneIIFPGA实现的报文深度内容检查实现方案。其核心是四个实现正则表达式匹配的微引擎。测试表明,通过四个微引擎的并行处理可实现千兆以太网接口报文的线速内容检查。相似文献

4.

基于广义后缀树结合过滤因子的正则表达式匹配算法

林婧何震瀛《计算机应用与软件》2022,39(1):266-270,286

为了提高正则表达式在文本集合上的匹配效率,提出一种基于广义后缀树与过滤因子相结合的正则表达式匹配技术。根据给定的文本集合构建广义后缀树,通过在广义后缀树上定位过滤因子得到有效的候选匹配集合,利用过滤因子的序列信息进一步过滤候选集合,进而对候选集合中的字符串进行验证,得到匹配结果。通过在真实的数据集上进行实验,证明了该算法能够有效地提高正则表达式的匹配性能。相似文献

5.

基于膨胀系数的正则表达式分组算法

王美阳唐学文杨正益《计算机与现代化》2014,(7):129-132

确定性有限自动机(Deterministic Finite Automata, DFA)匹配速度远快于非确定性有限状态自动机(Non-deterministic Finite state Automata, NFA),但大量正则表达式转换为DFA时会引起状态爆炸而占用巨大的存储空间。首先定义膨胀系数(Expansion Coefficient, EC)来描述正则表达式的膨胀特性,然后在膨胀系数这一概念基础上,提出一种高效的分组算法--IGA(Improved Grouping Algorithm)算法对正则表达式进行有效分组,将容易引起状态爆炸的正则表达式相互隔离,从而节省存储空间。实验结果表明,与原有算法相比,在相同分组数目时IGA算法平均能够减少25%的状态数。相似文献

6.

一种正则表达式的高效分组算法

肖武德《计算机安全》2010,(4):57-59

正则表达式（Regular Expression,RE）因其强大的表达能力和简单性正取代精确字符串（explicitstring）成为描述模式（pattern）的首选。在网络应用中,基于DFA（确定有限自动机）的正则表达式匹配技术通常用于网络流量实时处理、病毒检测等系统中。随着正则表达式的数量不断增加,DFA的存储空间急剧膨胀导致Cache的命中率大大降低,最终影响匹配的性能。提出了一种高效的正则表达式分组算法,通过合理地将正则表达式分组来大大降低DFA所需的存储空间。还尝试提出了评价正则表达式分组算法的一些指标。相似文献

7.

面向深度包检测的DFA细粒度并行匹配方法

刘兴奎邵宗有刘新春孙凝晖《计算机研究与发展》2014,(5)

确定性有限自动机(DFA)是实现正则表达式匹配的一种有效手段,但DFA的状态跳转是串行的,导致匹配速度慢、难以满足高速骨干网环境深度包检测(DPI)的性能需求.提出了一种称为LBDFA(Loopback DFA)的细粒度并行化状态跳转方法,通过将在Loopback状态上的连续跳转并行化,提高了匹配速度.此外,利用Bloom filter消除该并行跳转中的临时偏离现象,进一步提高了并行潜力.在L7-filter以及Snort规则集上的测试结果表明,LBDFA能够满足10Gbps以上的正则表达式匹配需求. 相似文献

8.

基于预定义类的紧凑型正则表达式匹配算法

麦涛涛潘晓中王亚奇苏阳《计算机应用》2017,37(2):397-401

针对目前硬件正则表达式匹配算法在存储空间以及吞吐量等方面面临的挑战,结合扩展有限自动机（XFA）正则表达式匹配算法,提出了一种预定义类的压缩自动机匹配算法（Pre-Class CFA）。通过预定义类,算法既可以实现正则表达式中类字符匹配,又能够通过优先级的设定匹配特殊字符集,并在XFA消除确定性有限状态机（DFA）状态爆炸问题的基础上进一步压缩了迁移边数目;同时算法根据现场可编程门阵列（FPGA）和迁移边的特征,设计了一种基于并联只读存储器（ROM）结构的迁移边存取方法,可以实现同一状态多条迁移边的并行读取和匹配。在中低性能FPGA平台ALTERA DE2-70上对算法进行测试,实验中系统吞吐量为1.3 Gb/s,可实现千兆网络下的入侵检测和垃圾过滤。相似文献

9.

基于正则表达式的应用层协议识别加速

范慧萍宣蕾陈曙晖黄高平《计算机研究与发展》2008,45(Z1):438

在当今网络中,传统的采用端口进行协议识别已越来越无法满足需求.采用了正则表达式进行协议识别,并对其匹配正确性和速度进行了优化.通过将NFA匹配引擎转换为DFA匹配引擎,不仅减少了其状态数,还提高了匹配的速度;在匹配方式上提出了3种匹配方式,并加以测试比较,并与One-Pass扫描算法相结合.通过对DARPA数据集进行测试,验证加速后的匹配正确性比L7-filter高,匹配速度则可达到其6.5倍. 相似文献

10.

正则表达式匹配引擎性能分析

邓凯元姜磊《计算机与现代化》2011,(7):105-107,110

正则表达式具有强大的描述能力,在计算机领域,正则表达式匹配技术应用十分广泛。目前,已经有多个正则表达式匹配引擎,在实际应用中,对于不同的匹配规则集和正则语法,不同的匹配引擎会有不同的性能表现。本文通过对PCRE、Greta、Boost、RE2四种常用正则表达式匹配引擎的性能测试,给出在不用的正则语法情况下的匹配速度,并深入分析不同坏境下适用的正则表达式匹配引擎。对实际系统设计中正则表达式库的选择有指导意义。相似文献

11.

一种用于深度包检测的正则表达式分组算法

杜江王希《电脑与微电子技术》2012,(12):3-5,9

当前深度包检测算法通常需要将正则表达式转换为NFA或者DFA．但是随着网络带宽的不断增加．NFA和DFA状态占用的存储空间越来越大,极大地考验着系统的存储能力。为了应对这个问题．提出一种基于正则表达式相性的分组算法来对表达式进行分组,实验证明该算法能减少NFA和DFA状态的数量,提高匹配的效率。相似文献

12.

A memory-based NFA regular expression match engine for signature-based intrusion detection

Derek Pao Nga Lam Or Ray C.C. Cheung 《Computer Communications》2013,36(10-11):1255-1267

Signature-based intrusion detection is required to inspect network traffic at wire-speed. Matching packet payloads against patterns specified with regular expression is a computation intensive task. Hence, the design of hardware accelerator to speed up regular expression matching has been an active research area. A systematic approach to detect regular expression is based on finite automaton. The space-time trade-off between deterministic finite automaton (DFA) and non-deterministic finite automaton (NFA) is well-known. DFA can offer constant throughput but it may suffer from the state explosion problem. Hence, implementation of DFA for large pattern sets on embedded device with limited on-chip memory may not be viable. NFA requires linear space but the throughput can be very low. Implementations of NFA with hardwired circuits can overcome the speed deficiency by exploiting the massive parallelism offered by dedicated hardware circuitries, but this approach does not support efficient dynamic updates. In this paper, we shall present a memory-based architecture for the implementation of NFA to speed up regular expression matching for signature-based intrusion detection. The proposed method supports dynamic updates and offers constant throughput so that it can be used to supplement the existing DFA-based methods in handling large pattern sets. 相似文献

13.

Minimized Thompson NFA

《国际计算机数学杂志》2012,89(9):1097-1106

The problem of converting a regular expression to nondeterministic finite automaton (NFA) is a fundamental problem that has been well studied. However, the two basic construction algorithms: (1) Thompson, (2) McNaughton–Yamada and Glushkov, both have disadvantages. In this article: first, a ‘smart’ parsing algorithm is developed which constructs a parse tree with at most (3l???1) nodes form a regular expression with l literals; second, we propose an algorithm that works on the resulting NFA from Thompson's construction, eliminating as many auxiliary states as possible while maintaining Thompson's properties. It is shown that the resulting NFA is minimized. This means that no auxiliary states can be eliminated without violating the defining properties of Thompson NFA. The time and space requirements for the above algorithms are linear with respect to the length of the regular expression. 相似文献

14.

Optimising unicode regular expression evaluation with previews

下载免费PDF全文

Howard Chivers 《Software》2017,47(5):669-688

The jsre regular expression library was designed to provide fast matching of complex expressions over large input streams using user‐selectable character encodings. An established design approach was used: a simulated non‐deterministic automaton (NFA) implemented as a virtual machine, avoiding exponential cost functions in either space or time. A deterministic automaton (DFA) was chosen as a general dispatching mechanism for Unicode character classes, and this also provided the opportunity to use compact DFAs in various optimization strategies. The result was the development of a regular expression Preview which provides a summary of all the matches possible from a given point in a regular expression in a form that can be implemented as a compact DFA and can be used to further improve the performance of the standard NFA simulation algorithm. This paper formally defines a preview and describes and evaluates several optimizations using this construct. They provide significant speed improvements accrued from fast scanning of anchor positions, avoiding retesting of repeated strings in unanchored searches and efficient searching of multiple alternate expressions which in the case of keyword searching has a time complexity which is logarithmic in the number of words to be searched. Copyright © 2016 John Wiley & Sons, Ltd. 相似文献

15.

Mergible states in large NFA

《Theoretical computer science》2005,330(1):23-34

Quite often, trivial problems stated for deterministic finite automata (DFA) are surprisingly difficult for the non-deterministic case (NFA). In any non-minimal DFA for a given regular language, we can find two equivalent states which can be “merged” without changing the accepted language. This is not the case for NFA, where we can have non-minimal automata with no “mergible” states. In this paper, we prove a very basic result for NFA, that for a given regular language, any NFA of size greater than a computable constant must contain mergible states. Even more, we parameterized this constant in order to guarantee groups of an arbitrary number of mergible states. 相似文献

16.

Transition complexity of language operations

Michael Domaratzki Kai Salomaa 《Theoretical computer science》2007

相似文献

17.

一类NFA 到DFA 的直接转化方法

程元斌《计算机系统应用》2012,21(10):109-113

NFA的确定化具有重要的理论和实际意义.迄今为止,普遍采用子集构造法将一个NFA(非确定性自动机)转化为DFA(确定性自动机),但这种方法需要引入空输入ε及状态子集I的ε-闭包,其计算过程相对繁琐.而且在确定化过程中对于NFA状态集存在ε-closure重复计算和由于对非ε转换的判断而引起的重复计算等问题.本文描述了一种将一类NFA直接转化为DFA的方法.在本方法中,不需要引入空输入ε,可根据原始的NFA状态图或状态转移表直接得出等价的DFA状态图或状态转移表,而且所有状态都是单一的状态而非集合状态,便于软硬件实现与测试. 相似文献

18.

Concise representations of regular languages by degree and probabilistic finite automata

Chandra M. R. Kintala Kong -Yee Pun Detlef Wotschke 《Theory of Computing Systems》1993,26(4):379-395

Meyer and Fischer b][MF] proved that nondeterministic finite automata (NFA) can be exponentially more concise than deterministic finite automata (DFA) in their representations of regular languages. Several variants of that basic finite state machine model are now being used to analyze parallelism and to build real-time software systems [HL+]. Even though these variants can sometimes represent regular languages in a more concise manner than NFA, the underlying models fundamentally differ from NFA in how they operate. Degree automata [W] (DA), however, differ from NFA only in their acceptance criteria and accept only regular languages. We show here that DA are also exponentially more concise than NFA on some sequences of regular languages. We also show that the conciseness of probabilistic automata [R] with isolated cutpoints can be unbounded over DA and, concurrently, i.e., over the same sequence of languages, those DA can be exponentially more concise than NFA.Detlef Wotschke was supported in part by Deutsche Forschungsgemeinschaft under Grant No. Wo 334/2-1 and by Stiftung Volkswagenwerk under Grant No. II/62 325. 相似文献