期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

杨毅夫刘燕兵刘萍郭莉《计算机工程》2009,35(21):39-41

自动机是串匹配算法中常用的数据结构,对自动机实现紧缩存储可以节省算法空间。总结常用自动机紧缩存储方法,分析其原理、时间效率、空间效率和优缺点,给出各种方法与数据稀疏性之间的关系。运用紧缩存储方法实现基本AC算法,对随机数据和真实数据的实验结果证明该算法有效。相似文献

2.

一种基于有限自动机的快速串匹配算法

陈倩《计算机技术与发展》2009,19(1)

串匹配是字符串的基本操作之一,因此为它设计一个高效算法具有一定意义.文中基于有限自动机理论,在对经典的K.M.P.算法进行分析的基础上,提出了一种快速的串匹配算法.该算法利用自动机的状态转换表实现串匹配,避免了扫描字符串时的失败链回溯,从而加快了算法的运行速度.理论分析与实验结果均表明,在正文串比较长,模式串中局部匹配失败时失败链反馈较多的情况下,该算法在速度上明显优于K.M.P.算法.但在空间复杂度上,该算法需要较多的存储空间. 相似文献

3.

一种应用于现代网络搜索引擎的快速串匹配算法

吴楠朱怀宏夏黎春《计算机与现代化》2003,(11):7-8,11

提出了一种基于确定有穷自动机的快速多模式串匹配算法,在匹配过程中能尽可能多地跳过待查文本串字符。算法的特性为现代网络搜索引擎的复合条件查询提供了有力的软件支撑。实验表明可有效地改善网络搜索引擎的性能。相似文献

4.

自动机算法在文本不精确匹配中的应用

叶建伟季振洲胡铭曾《计算机工程与应用》2004,40(6):135-136,154

论文从实用的角度,着重研究了有限自动机算法在文本的不精确匹配中的应用,提出了一种用于中文精确匹配的自动机的构建思想,两种用于中文同音字匹配的自动机的构建思想,以及利用自动机的原理去除无用字符对文本匹配的干扰的方法。编程实现了上述三种自动机算法并对其作了测试,给出了三种算法各自的性能测试数据。相似文献

5.

面向硬件的多模式串匹配算法及其链式实现

下载免费PDF全文

张小山赵国鸿王勇军《计算机工程》2006,32(15):242-244

描述了一个面向硬件的简单有效的多模式字符串匹配算法，该算法易于用硬件实现。算法的主要思想是利用硬件的并行工作特性，让所有模式的每个字符都同时与输入的待匹配字符进行匹配，再迭代利用上轮匹配中的匹配信息来产生本轮匹配的结果。根据该算法设计了一种链式匹配结构并通过FPGA芯片对结构进行了逻辑实现，同时根据实验结果对设计进行了评价。相似文献

6.

基于存储优化的多模式串匹配算法

刘燕兵刘萍谭建龙郭莉《计算机研究与发展》2009,46(10):1768-1776

多模式串匹配算法是网络内容过滤系统的核心技术之一.自动机的存储空间大小和Cache性能是影响多模式串匹配算法速度的关键因素.随着模式串规模的扩大,自动机的巨大存储开销导致现有的串匹配算法性能大幅度下降.从压缩存储空间以提高Cache命中率的思想出发,提出了一种对经典SBOM算法的优化策略,它用Suffix Tree代替SBOM算法中的Factor Oracle结构,同时用剪枝的方法将Suffix Tree降低为近似线性的空间复杂度,然后用双数组Trie表示之,以压缩存储空间.与SBOM算法相比,改进算法不仅能够有效地节省存储空间,而且显著地提高了串匹配的速度,非常适合于在线高速匹配的应用环境. 相似文献

7.

XML内容筛选中的快速串匹配算法

刘萍谭建龙《中文信息学报》2005,19(2):21-28

本文提出了一种对XML 文本进行快速串匹配的算法- XMatch。在对于XML 文本的含路径信息的模式串匹配中,由于XML 文本的结构化特点,使得传统的串匹配算法不能直接有效的使用;而现有的大部分XML 内容筛选方法都是基于SAX 分析的事件驱动过程,效率普遍较低。XMatch 在对XML 文本的结构-schema 进行分析的同时,结合模式串的路径信息,建立一个扫描自动机的有限状态自动机;此外,算法还支持带循环引用路径信息的模式串匹配。XMatch 容易扩展,可以支持普通的结构化文本的串匹配。实验结果显示,本算法的效率比使用SAX事件驱动的方法有明显的提高。相似文献

8.

模糊串匹配算法及其应用

苗兰芳杨传斌《小型微型计算机系统》1996,17(10):72-76

本文分析了模糊字符串几种类型和它的表示方法，并给出模糊匹配的算法及Ｃ语言的源程序，同时也指出了在文枉编辑器的应用例子。相似文献

9.

PARA-AC：一种基于AC自动机的高性能匹配算法

熊仁都;杨嘉佳;朱广宇;唐球;隋然《电子技术应用》2020,(11):87-90+95

原始AC自动机由于匹配性能低,无法满足当前大数据环境下大规模特征串实时匹配的应用需求。针对这一问题,提出一种基于多线程的多模式串匹配加速算法,称之为PARA-AC(Parallel Aho-Corasick automaton)。该算法将待匹配字符串切割成若干字符子串以及若干切割点边界字符集,并将字符子串、切割点边界字符集输入至线程池中进行匹配,从而实现字符串的并行化加速处理。实验结果表明,与原始AC自动机匹配算法相比,PARA-AC算法显著提高了匹配速度,约为原始AC的13.91倍。相似文献

10.

新的自适应多串匹配算法

下载免费PDF全文

宋云龙际珍李峰刘振海《计算机工程与应用》2009,45(6):98-100

在不同关键词规模、最短关键词长度和字符集大小等情况下,有效的多串匹配算法是不同的。新提出的自适应多串匹配算法（Adapted Multiple Strings Matching Algorithm,AMSM）改善了SBOM算法中Oracle树存在不精确跳跃计算的缺点,同时采用了WuManber算法的块跳跃策略和压缩形式的Oracle树比较策略,提高了算法的性能,可适用于各种情况,是一种通用多串（多模式）匹配算法。相似文献

11.

Giant complete automaton for uncertain multiple string matching and its high speed construction algorithm

HU Yue GAO QingShi GUO Li & WANG PeiFeng School of Information Engineering University of Science Technology Beijing Beijing China 《中国科学:信息科学(英文版)》2011,(8):1562-1571

Multiple string matching is often completed under the presence of U- or V-uncertain-strings,or combinations thereof.Recognizing large numbers of strings with U-, V-,and U-V-uncertain-strings,including the interleaving of two or more uncertain strings,is important to thoroughly gathering useful information and detecting harmful information.This paper proposes a complete automaton and its high-speed construction algorithm for large-scale U-, V-,and U-V-uncertain multiple strings,including two or more uncertai... 相似文献

12.

改进的多模式字符串匹配算法

蔡晓妍戴冠中杨黎斌《计算机应用》2007,27(6):1415-1417

在经典的AC多模式字符串匹配算法的基础上,结合BMH算法的优点,提出了一种快速的多模式字符串匹配算法。一般情况下,该算法不需要匹配目标文本串中的每个字符,而是在实际比较之前跳过尽可能多的字符,以减少字符比较的操作,实现快速匹配。在模式串较长和较短的情况下,算法都有很好的性能。实验表明,在模式串较短时,本算法所需的时间仅为AC算法的50%～30%;在模式串较长时,所需时间为AC算法的26.7%～15.2%。相似文献

13.

An aggressive algorithm for multiple string matching 总被引：1，自引：0，他引：1

Liuling Dai 《Information Processing Letters》2009,109(11):553-559

A new algorithm based on the Wu-Manber algorithm for multiple string matching is presented in this paper. The algorithm eliminates the functional overlap of the table HASH and SHIFT, and computes the shift distances in an aggressive manner. After each test, the algorithm examines the character next to the scan window to maximize the shift distance. This idea is consistent with that of the quick-search (QS) algorithm. Experimental results on four alphabets show that the new algorithm is more efficient than Wu-Manber and other recent algorithms, particularly on short pattern sets and large alphabet. 相似文献

14.

On building minimal automaton for subset matching queries

Kimmo Fredriksson 《Information Processing Letters》2010,110(24):1093-1098

We address the problem of building an index for a set D of n strings, where each string location is a subset of some finite integer alphabet of size σ, so that we can answer efficiently if a given simple query string (where each string location is a single symbol) p occurs in the set. That is, we need to efficiently find a string d∈D such that p[i]∈d[i] for every i. We show how to build such index in O(nlogσ/Δ(σ)log(n)) average time, where Δ is the average size of the subsets. Our methods have applications e.g. in computational biology (haplotype inference) and music information retrieval. 相似文献

15.

Efficient parameterized string matching

Kimmo Fredriksson Maxim Mozgovoy 《Information Processing Letters》2006,100(3):91-96

In parameterized string matching the pattern P matches a substring t of the text T if there exist a bijective mapping from the symbols of P to the symbols of t. We give simple and practical algorithms for finding all such pattern occurrences in sublinear time on average. The algorithms work for a single and multiple patterns. 相似文献

16.

一种适用于大规模特征集的快速匹配算法 总被引：1，自引：0，他引：1

下载免费PDF全文

李雪薛一波王春露《计算机工程与应用》2007,43(34):168-170

提出了一种适用于大规模特征集的快速匹配算法——SRS算法,该算法性能优异,在特征集达到100 000条时,匹配速度比经典算法快10倍以上。该算法适用于内容过滤、防病毒、反垃圾邮件、短信过滤、网络入侵检测和防御等众多领域。相似文献

17.

Approximate string matching with suffix automata

Esko Ukkonen Derick Wood 《Algorithmica》1993,10(5):353-364

Theapproximate string matching problem is, given a text string, a pattern string, and an integerk, to find in the text all approximate occurrences of the pattern. An approximate occurrence means a substring of the text with edit distance at mostk from the pattern. We give a newO(kn) algorithm for this problem, wheren is the length of the text. The algorithm is based on the suffix automaton with failure transitions and on the diagonalwise monotonicity of the edit distance table. Some experiments showing that the algorithm has a small overhead are reported. 相似文献

18.

一种改进的BM字符串匹配算法

李韦男虞慧群《计算机工程与应用》2014,50(16):104-108

经典字符串匹配算法的本质都是从左向右或者从右向左顺序进行字符匹配的,在主串中存在大量子串与模式串前缀或者后缀相同时效率较低,并且模式串最大右移长度为模式串长度。改进算法采用二分匹配字符串的方法,有效地避免了由主串中大量子串与模式串前缀相同或者后缀相同引起的无意义比较次数。模式串的移动距离根据改进的坏字符规则进行计算,增大了模式串的移动距离。实验结果表明,改进的字符串匹配算法可以有效地减少字符串的匹配次数和移动次数,达到了提高算法效率的目的。相似文献

19.

A lower-variance randomized algorithm for approximate string matching

Mikhail J. Atallah Elena Grigorescu Yi Wu 《Information Processing Letters》2013,113(18):690-692

相似文献

20.

Fast string matching for multiple searches

Peter Fenwick 《Software》2001,31(9):815-833

We present a string matching or pattern matching method which is especially useful when a single block of text must be searched repeatedly for different patterns. The method combines linking the text according to digrams, searching on the least‐frequent digram, and probing selected characters as a preliminary filter before full pattern comparison. Tests on real alphabetic data show that the number of character comparisons may be decreased by two orders of magnitude compared with Knuth–Morris–Pratt and similar searching, but with an initialization overhead comparable to five to ten conventional searches. Copyright © 2001 John Wiley & Sons, Ltd. 相似文献