首页 | 本学科首页   官方微博 | 高级检索  
     

基于多字符DFA的高速正则表达式匹配算法
引用本文:贺炜,郭云飞,莫涵,扈红超.基于多字符DFA的高速正则表达式匹配算法[J].计算机应用,2013,33(8):2370-2374.
作者姓名:贺炜  郭云飞  莫涵  扈红超
作者单位:国家数字交换系统工程技术研究中心,郑州 450002
基金项目:国家科技支撑计划项目;国家863计划项目
摘    要:基于确定性有限自动机(DFA)的传统正则表达式匹配方法存在单周期处理单字符的速度瓶颈。为提升处理速率,提出一种单周期处理多字符的匹配算法MC-DFA,该算法基于DFA实现,支持匹配位置的精确定位。MC-DFA将传统DFA中的单字符跳转合并为多字符跳转,实现了单周期处理多个输入字符。通过状态转移矩阵二阶压缩算法,MC-DFA分别对矩阵行内以及行间冗余进行消除,减少了内存使用。300条规则下,单周期处理8字符时,MC-DFA吞吐率能够达到7.88Gb/s,内存占用小于6MB,预处理时间为19.24s。实验结果表明,MC-DFA能够有效提升系统吞吐率,并且保证内存占用在可接受范围之内,性能优于现有正则表达式匹配算法。

关 键 词:正则表达式    高速    多字符    精确定位    矩阵压缩
收稿时间:2013-02-20
修稿时间:2013-03-12

Multi-character DFA-based high speed regular expression matching algorithm
HE Wei GUO Yunfei MO Han HU Hongchao.Multi-character DFA-based high speed regular expression matching algorithm[J].journal of Computer Applications,2013,33(8):2370-2374.
Authors:HE Wei GUO Yunfei MO Han HU Hongchao
Affiliation:China National Digital Switching System Engineering and Technological R&D Center, Zhengzhou Henan 450002, China
Abstract:Traditional Deterministic Finite Automata (DFA) based regular expression matching can only process one character per cycle, which is the speed bottleneck. A new algorithm named Multi-Character DFA (MC-DFA) was proposed for high throughput matching and precise positioning. It combined the one character transition in traditional DFA together to handle multi-character processing per cycle. A new transition matrix compress algorithm was also proposed to reduce the redundancy introduced by MC-DFA. The result demonstrates that MC-DFA can improve the throughput efficiently while requiring acceptable memory. For a set of 300 regexes, 8C-DFA obtains a throughput of 7.88Gb/s, memory usage less than 6MB and 19.24s preprocessing time, better than traditional methods.
Keywords:regular expression                                                                                                                          high throughput                                                                                                                          multi-character                                                                                                                          precise positioning                                                                                                                          matrix compress
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号