共查询到20条相似文献,搜索用时 0 毫秒
1.
Thierry Lecroq 《Information Processing Letters》2007,102(6):229-235
String matching is the problem of finding all the occurrences of a pattern in a text. We propose a very fast new family of string matching algorithms based on hashing q-grams. The new algorithms are the fastest on many cases, in particular, on small size alphabets. 相似文献
2.
q-gram matching is used for approximate substring matching problems in a wide range of application areas, including intrusion detection. In this paper, we present a tree-based model to perform fast linear time q-gram matching. All q-grams present in the text are stored in a tree structure similar to trie. We use a tree redundancy pruning algorithm to reduce the size of the tree without losing any information. We also use suffix links for fast q-gram search during query matching. We compare our work with the Rabin-Karp-based hash-table technique, commonly used for multiple q-gram search. We present results of experiments on system call sequence data used for intrusion detection. 相似文献
3.
G. De V. Smit 《Software》1982,12(1):57-66
Three string matching algorithms—straightforward, Knuth-Morris-Pratt and Boyer-Moor—re examined and their time complexities discussed. A comparison of their actual average behaviour is made, based on empirical data presented. It is shown that the Boyel-Moore algorithm is extremely efficient in most cases and that, contrary to the impression one might get from the analytical results, the Knuth-Morris-Pratt algorithm is not significantly better on the average than the straightforward algorithm. 相似文献
4.
Dr. K. -L. Chung 《Computing》1995,54(2):119-125
Given a run-length coded text of length 2n and a run-length coded pattern of length 2m,m?n commonly, this paper first presents anO(n+m) time sequential algorithm for string matching, then presents anO(1) time parallel algorithm on a two-dimensionalm×n mesh with a reconfigurable bus system. 相似文献
5.
This paper proposes new algorithms for fixed-length approximate string matching and approximate circular string matching under the Hamming distance. Fixed-length approximate string matching and approximate circular string matching are special cases of approximate string matching and have numerous direct applications in bioinformatics and text searching. Firstly, a counter-vector-mismatches (CVM) algorithm is proposed to solve fixed-length approximate string matching with k-mismatches. The development of CVM algorithm is based on the parallel summation of counters located in the same machine word. Secondly, a parallel counter-vector-mismatches (PCVM) algorithm is proposed to accelerate CVM algorithm in parallel. The PCVM algorithm is integrated into two-level parallelisms that exploit not only word-level parallelism but also data parallelism via parallel environments such as multi-core processors and graphics processing units (GPUs). In the particular case of adopting GPUs, a shared-mem parallel counter-vector-mismatches (PCVMsmem) scheme can be implemented from PCVM algorithm. The PCVMsmem scheme can exploit the memory model of GPUs to optimize performance of PCVM algorithm. Finally, this paper shows several methods to adopt CVM and PCVM algorithms in case the input pattern is in circular structure. In the experiments with real DNA packages, our proposed algorithms and scheme work greatly faster than previous bit-vector-mismatches and parallel bit-vector-mismatches algorithms. 相似文献
6.
《国际计算机数学杂志》2012,89(4):411-434
In this paper we present a short survey and experimental results for well known sequential string matching algorithms. We consider algorithms based on different approaches including classical, suffix automata, bit-parallelism and hashing. We put special emphasis on algorithms recently presented such as Shift-Or and BNDM algorithms. We compare these algorithms in terms of the number of character comparisons and the running time for four different types of text: binary alphabet, alphabet of size 8, English alphabet and DNA alphabet. 相似文献
7.
8.
Yang Yufen Lin Tao Zhao Liping Zhou Kailun Wang Shuhui 《Multimedia Tools and Applications》2022,81(2):2043-2063
Multimedia Tools and Applications - Screen contents have become a popular image type driven by the growing market for transferring display screen between devices, especially mobile devices. Due to... 相似文献
9.
Panagiotis D. Michailidis Konstantinos G. Margaritis 《Journal of Parallel and Distributed Computing》2007
Approximate string matching problem is a common and often repeated task in information retrieval and bioinformatics. This paper proposes a generic design of a programmable array processor architecture for a wide variety of approximate string matching algorithms to gain high performance at low cost. Further, we describe the architecture of the array and the architecture of the cell in detail in order to efficiently implement for both the preprocessing and searching phases of most string matching algorithms. Further, the architecture performs approximate string matching for complex patterns that contain don’t care, complement and classes symbols. We also simulate and evaluate the proposed architecture on a field programmable gate array (FPGA) device using the JHDL tool for synthesis and the Xilinx Foundation tools for mapping, placement, and routing. Finally, our programmable implementation achieves about 8–340 times faster execution than a desktop computer with a Pentium 4 3.5 GHz for all algorithms when the length of the pattern is 1024. 相似文献
10.
11.
Light-based string matching 总被引:1,自引:1,他引:0
Mihai Oltean 《Natural computing》2009,8(1):121-132
String matching is a very important problem in computer science. The problem consists in finding all the occurrences of a pattern P of length m in a text T of length n. We describe a special device which can do string matching by performing n−m + 1 text-to-pattern comparisons. The proposed device uses light and optical filters for performing computations. Two physical implementations are proposed. One of them uses colored glass and the other one uses polarizing filters. The strengths and the weaknesses of each method are deeply discussed. 相似文献
12.
Any string-matching algorithm requires at least linear time and a constant number of local storage locations. We design and analyze an algorithm which realizes both asymptotic bounds simultaneously. This can be viewed as completely eliminating the need for the tabulated “failure function” in the linear-time algorithm of Knuth, Morris, and Pratt. It makes possible a completely general implementation as a Fortran subroutine or even as a six-head finite automaton. 相似文献
13.
Bertrand Meyer 《Information Processing Letters》1985,21(5):219-227
The problem studied in this paper is to search a given text for occurrences of certain strings, in the particular case where the set of strings may change as the search proceeds.A well-known algorithm by Aho and Corasick applies to the simpler case when the set of strings is known beforehand and does not change. This algorithm builds a transition diagram (finite automaton) from the strings, and uses it as a guide to traverse the text. The search can then be done in linear time.We show how this algorithm can be modified to allow incremental diagram construction, so that new keywords may be entered at any time during the search. The incremental algorithm presented essentially retains the time and space complexities of the non-incremental one. 相似文献
14.
Approximate string matching is an important operation in information systems because an input string is often an inexact match to the strings already stored. Commonly known accurate methods are computationally expensive as they compare the input string to every entry in the stored dictionary. This paper describes a two-stage process. The first uses a very compact n-gram table to preselect sets of roughly similar strings. The second stage compares these with the input string using an accurate method to give an accurately matched set of strings. A new similarity measure based on the Levenshtein metric is defined for this comparison. The resulting method is both computationally fast and storage-efficient. 相似文献
15.
In parameterized string matching the pattern P matches a substring t of the text T if there exist a bijective mapping from the symbols of P to the symbols of t. We give simple and practical algorithms for finding all such pattern occurrences in sublinear time on average. The algorithms work for a single and multiple patterns. 相似文献
16.
Sebastian Deorowicz 《Information Processing Letters》2006,100(1):14-20
Finding the longest common subsequence (LCS) of two given sequences A=a0a1…am−1 and B=b0b1…bn−1 is an important and well studied problem. We consider its generalization, transposition-invariant LCS (LCTS), which has recently arisen in the field of music information retrieval. In LCTS, we look for the LCS between the sequences A+t=(a0+t)(a1+t)…(am−1+t) and B where t is any integer. We introduce a family of algorithms (motivated by the Hunt-Szymanski scheme for LCS), improving the currently best known complexity from O(mnloglogσ) to O(Dloglogσ+mn), where σ is the alphabet size and D?mn is the total number of dominant matches for all transpositions. Then, we demonstrate experimentally that some of our algorithms outperform the best ones from literature. 相似文献
17.
Tries for approximate string matching 总被引:1,自引:0,他引:1
Tries offer text searches with costs which are independent of the size of the document being searched, and so are important for large documents requiring spelling checkers, case insensitivity, and limited approximate regular secondary storage. Approximate searches, in which the search pattern differs from the document by k substitutions, transpositions, insertions or deletions, have hitherto been carried out only at costs linear in the size of the document. We present a trie based method whose cost is independent of document size. Our experiments show that this new method significantly outperforms the nearest competitor for k=0 and k=1, which are arguably the most important cases. The linear cost (in k) of the other methods begins to catch up, for our small files, only at k=2. For larger files, complexity arguments indicate that tries will outperform the linear methods for larger values of k. The indexes combine suffixes and so are compact in storage. When the text itself does not need to be stored, as in a spelling checker, we even obtain negative overhead: 50% compression. We discuss a variety of applications and extensions, including best match (for spelling checkers), case insensitivity, and limited approximate regular expression matching 相似文献
18.
The goal of scaled permuted string matching is to find all occurrences of a pattern in a text, in all possible scales and permutations. Given a text of length n and a pattern of length m we present an O(n) algorithm. 相似文献
19.
Kimmo Fredriksson 《Information Processing Letters》2003,87(4):201-204
Given a text T[1…n] and a pattern P[1…m] over some alphabet Σ of size σ, we want to find all the (exact) occurrences of P in T. The well-known shift-or algorithm solves this problem in time O(n⌈m/w⌉), where w is the number of bits in machine word, using bit-parallelism. We show how to extend the bit-parallelism in another direction, using super-alphabets. This gives a speed-up by a factor s, where s is the number of characters processed simultaneously. The algorithm is implemented, and we show that it works well in practice too. The result is the fastest known algorithm for exact string matching for short patterns and small alphabets. 相似文献
20.
Branislav ?urian 《Information Processing Letters》2010,110(4):148-152
We present improved variations of the BNDM algorithm for exact string matching. At each alignment our bit-parallel algorithms process a q-gram before testing the state variable. In addition we apply reading a 2-gram in one instruction. Our point of view is practical efficiency of algorithms. Our experiments show that the new variations are faster than earlier algorithms in many cases. 相似文献