首页 | 本学科首页   官方微博 | 高级检索  
     

HybridFA:一种基于统计的AC自动机空间优化技术
引用本文:熊 刚,何慧敏,于 静,刘燕兵,郭 莉.HybridFA:一种基于统计的AC自动机空间优化技术[J].通信学报,2015,36(7):31-39.
作者姓名:熊 刚  何慧敏  于 静  刘燕兵  郭 莉
作者单位:1. 中国科学院 信息工程研究所,北京 100093;2. 中国移动(深圳)有限公司,深圳 518031
基金项目:中国科学院战略性科技先导专项基金资助项目(XDA06030602);国家高技术研究发展计划(“863”计划)基金资助项目(2011AA010703);国家自然科学基金青年基金资助项目(61202477)
摘    要:针对高级Aho-Corasick (AC)自动机为提高串匹配速度而造成的空间浪费问题,研究发现数据流对自动机节点的访问规律,据此提出基于数据访问特征的混合自动机构建算法HybridFA。分别研究了基于访问频率、访问层次以及结合上述2种特征对AC自动机的部分节点实现完全化的算法。在Snort、ClamAV、URL等真实数据集上的实验结果表明,HybridFA算法的存储空间低于高级AC自动机的5%。此外,结合访问频率和访问层次的改进算法在保证匹配速度的同时具有更强的数据适应性。

关 键 词:多模式串匹配  空间优化  高级AC自动机  统计策略  节点完全化
收稿时间:7/8/2014 12:00:00 AM

HybridFA: a memory reduction technique for the AC automata based on statistics
Gang XIONG,Hui-min HE,Jing YU,Yan-bing LIU,Li GUO.HybridFA: a memory reduction technique for the AC automata based on statistics[J].Journal on Communications,2015,36(7):31-39.
Authors:Gang XIONG  Hui-min HE  Jing YU  Yan-bing LIU  Li GUO
Affiliation:1. Institute of Information Engineering,Chinese Academy of Science,Beijing 100093,China;2. China Mobile Group,Guangdong Co.Ltd.,Shenzhen Branch,Shenzhen 518031,China
Abstract:Despite the fast speed in multiple string matching tasks, the advanced Aho-Corasick(AC) automata wastes storage memory to a great extent. Study indicated that the automata states have specific statistical access characteristics in practice. Accordingly, a series of algorithms based on statistical characteristics for building hybrid finite automata, named HybridFA, are proposed. This work completes partial states of the AC automata according to different features, including access frequency, state hierarchy, and combined characteristics respectively. Experimental results on the real-world datasets like Snort, ClamAV, and URL show that the storage space of HybridAC is reduced to less than 5% of the space cost by the advanced AC automata. Furthermore, HybridFA based on combined characteristics achieves the superior performance on matching speed and robustness comparing to other proposed algorithms.
Keywords:multiple string matching  memory reduction  advanced AC automata  statistical strategy  state completing
点击此处可从《通信学报》浏览原始摘要信息
点击此处可从《通信学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号