首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
Matching with don't-cares and a small number of mismatches   总被引:1,自引:0,他引:1  
In matching with don't-cares and k mismatches we are given a pattern of length m and a text of length n, both of which may contain don't-cares (a symbol that matches all symbols), and the goal is to find all locations in the text that match the pattern with at most k mismatches, where k is a parameter. We present new algorithms that solve this problem using a combination of convolutions and a dynamic programming procedure. We give randomized and deterministic solutions that run in time O(nk2logm) and O(nk3logm), respectively, and are faster than the most efficient extant methods for small values of k. Our deterministic algorithm is the first to obtain an O(polylog(k)⋅nlogm) running time.  相似文献   

2.
Simple deterministic wildcard matching   总被引:2,自引:0,他引:2  
We present a simple and fast deterministic solution to the string matching with don't cares problem. The task is to determine all positions in a text where a pattern occurs, allowing both pattern and text to contain single character wildcards. Our algorithm takes O(nlogm) time for a text of length n and a pattern of length m and in our view the algorithm is conceptually simpler than previous approaches.  相似文献   

3.
We consider the combination of function and permuted matching, each of which has fast solutions in their own right. Given a pattern p of length m and a text t of length n, a function match at position i of the text is a mapping f from Σp to Σt with the property that f(pj)=ti+j−1 for all j. We show that the problem of determining for each substring of the text, if any permutation of the pattern has a function match is in general NP-Complete. However where the mapping is also injective, so-called parameterised matching, the problem can be solved efficiently in O(nlog|Σp|) time. We then give a 1/2-approximation for a Hamming distance based optimisation variant by reduction to multiple knapsack with colour constraints.  相似文献   

4.
We present a simple and faster solution to the problem of matching a set of patterns with variable length don't cares. Given an alphabet Σ, a pattern p is a word p1@p2?@pm, where pi is a string over Σ called a keyword and @∉Σ is a symbol called a variable length don't care (VLDC) symbol. Pattern p matches a text t if t=u0p1u1um−1pmum for some u0,…,umΣ. The problem addressed in this paper is: given a set of patterns P and a text t, determine whether one of the patterns of P matches t.Kucherov and Rusinowitch (1997) [9] presented an algorithm that solves the problem in time O((|t|+|P|)log|P|), where |P| is the total length of keywords in every pattern of P. We give a new algorithm based on Aho-Corasick automaton. It uses the solutions of Dynamic Marked Ancestor Problem of Chan et al. (2007) [5]. The algorithm takes O((|t|+‖P‖)logκ/loglogκ) time, where ‖P‖ is the total number of keywords in every pattern of P, and κ is the number of distinct keywords in P. The algorithm is faster and simpler than the previous approach.  相似文献   

5.
Given q+1 strings (a text t of length n and q patterns m1,…,mq) and a natural number w, the multiple serial episode matching problem consists in finding the number of size w windows of text t which contain patterns m1,…,mq as subsequences, i.e., for each mi, if mi=p1,…,pk, the letters p1,…,pk occur in the window, in the same order as in mi, but not necessarily consecutively (they may be interleaved with other letters). Our main contribution here is an algorithm solving this problem on-line in time O(nq) with an MP-RAM model (which is a RAM model equipped with extra operations).  相似文献   

6.
Given a binary string of length n, we give a representation of its suffix array that takes O(nt(lgn)1/t) bits of space such that given i,1?i?n, the ith entry in the suffix array of the string can be retrieved in O(t) time, for any parameter 1?t?lglgn. For t=lglgn, this gives a compressed suffix array representation of Grossi and Vitter [Proc. Symp. on Theory Comput., 2000, pp. 397-406]. For t=O(1/ε), this gives the best known (in terms of space) compressed suffix array representation with constant query time. From this representation one can construct a suffix tree structure for a text of length n, that uses o(nlgn) bits of space which can be used to find all the k occurrences of a given pattern of length m in O(m/lgn+k) time. No such structure was known earlier.  相似文献   

7.
We present a new simple algorithm that constructs an Aho Corasick automaton for a set of patterns, P, of total length n, in O(n) time and space for integer alphabets. Processing a text of size m over an alphabet Σ with the automaton costs O(mlog|Σ|+k), where there are k occurrences of patterns in the text.A new, efficient implementation of nodes in the Aho Corasick automaton is introduced, which works for suffix trees as well.  相似文献   

8.
We present a randomized parallel list ranking algorithm for distributed memory multiprocessors, using a BSP type model. We first describe a simple version which requires, with high probability, log(3p)+log ln(n)=Õ(logp+log logn) communication rounds (h-relations withh=Õ(n/p)) andÕ(n/p)) local computation. We then outline an improved version that requires high probability, onlyr?(4k+6) log(2/3p)+8=Õ(k logp) communication rounds wherek=min{i?0 |ln(i+1)n?(2/3p)2i+1}. Notekn) is an extremely small number. Forn andp?4, the value ofk is at most 2. Hence, for a given number of processors,p, the number of communication rounds required is, for all practical purposes, independent ofn. Forn?1, 500,000 and 4?p?2048, the number of communication rounds in our algorithm is bounded, with high probability, by 78, but the actual number of communication rounds observed so far is 25 in the worst case. Forn?10010100 and 4?p?2048, the number of communication rounds in our algorithm is bounded, with high probability, by 118; and we conjecture that the actual number of communication rounds required will not exceed 50. Our algorithm has a considerably smaller member of communication rounds than the list ranking algorithm used in Reid-Miller’s empirical study of parallel list ranking on the Cray C-90.(1) To our knowledge, Reid-Miller’s algorithm(1) was the fastest list ranking implementation so far. Therefore, we expect that our result will have considerable practical relevance.  相似文献   

9.
The input of the Edge Multicut problem consists of an undirected graph G and pairs of terminals {s1,t1},…,{sm,tm}; the task is to remove a minimum set of edges such that si and ti are disconnected for every 1?i?m. The parameterized complexity of the problem, parameterized by the maximum number k of edges that are allowed to be removed, is currently open. The main result of the paper is a parameterized 2-approximation algorithm: in time f(k)⋅nO(1), we can either find a solution of size 2k or correctly conclude that there is no solution of size k.The proposed algorithm is based on a transformation of the Edge Multicut problem into a variant of the parameterized Max-2SAT problem, where the parameter is related to the number of clauses that are not satisfied. It follows from previous results that the latter problem can be 2-approximated in a fixed-parameter time; on the other hand, we show here that it is W[1]-hard. Thus the additional contribution of the present paper is introducing the first natural W[1]-hard problem that is constant-ratio fixed-parameter approximable.  相似文献   

10.
We present an efficient algorithm for finding all approximate occurrences of a given pattern p of length m in a text t of length n allowing for translocations of equal length adjacent factors and inversions of factors. The algorithm is based on an efficient filtering method and has an O(nmmax(α,β))-time complexity in the worst case and O(max(α,β,σ))-space complexity, where α and β are respectively the maximum length of the factors involved in any translocation and inversion, and σ is the alphabet size. Moreover we show that our algorithm has an O(n) average time complexity, whenever , for ε>0. Experiments show that the proposed algorithm achieves very good results in practical cases.  相似文献   

11.
We present an efficient parameterized algorithm for the (k,t)-set packing problem, in which we are looking for a collection of k disjoint sets whose union consists of t elements. The complexity of the algorithm is O(2O(t)nNlogN). For the special case of sets of bounded size, this improves the O(k(ck)n) algorithm of Jia et al. [J. Algorithms 50 (1) (2004) 106].  相似文献   

12.
13.
The goal of scaled permuted string matching is to find all occurrences of a pattern in a text, in all possible scales and permutations. Given a text of length n and a pattern of length m we present an O(n) algorithm.  相似文献   

14.
15.
Dr. K. -L. Chung 《Computing》1995,54(2):119-125
Given a run-length coded text of length 2n and a run-length coded pattern of length 2m,m?n commonly, this paper first presents anO(n+m) time sequential algorithm for string matching, then presents anO(1) time parallel algorithm on a two-dimensionalm×n mesh with a reconfigurable bus system.  相似文献   

16.
Let X and Y be two strings of lengths n and m, respectively, and k and l, respectively, be the numbers of runs in their corresponding run-length encoded forms. We propose a simple algorithm for computing the longest common subsequence of two given strings X and Y in O(kl+min{p1,p2}) time, where p1 and p2 denote the numbers of elements in the bottom and right boundaries of the matched blocks, respectively. It improves the previously known time bound O(min{nl,km}) and outperforms the time bounds O(kllogkl) or O((k+l+q)log(k+l+q)) for some cases, where q denotes the number of matched blocks.  相似文献   

17.
《Information Sciences》1987,43(3):169-184
We give an algorithm for two-dimensional pattern matching in the presence of errors. We find that the complexity of our algorithm is O(kn1n2 log n2 + n12n2 + kn1m1m2), where the pattern is an n1 × n2 array, the text is an m1 × m2 array and k is the number of mismatches allowed.  相似文献   

18.
Consider a text string of length n, a pattern string of length m, and a match vector of length n which declares each location in the text to be either a mismatch (the pattern does not occur beginning at that location in the text) or a potential match (the pattern may occur beginning at that location in the text). Some of the potential matches could be false, i.e., the pattern may not occur beginning at some location in the text declared to be a potential match. We investigate the complexity of two problems in this context, namely, checking if there is any false match, and identifying all the false matches in the match vector. We present an algorithm on the CRCW PRAM that checks if there exists a false match in O(1) time using O(n) processors. This algorithm does not require preprocessing the pattern. Therefore, checking for false matches is provably simpler than string matching since string matching takes time on the CRCW PRAM. We use this simple algorithm to convert the Karp—Rabin Monte Carlo type string-matching algorithm into a Las Vegas type algorithm without asymptotic loss in complexity. We also present an efficient algorithm for identifying all the false matches and, as a consequence, show that string-matching algorithms take time even given the flexibility to output a few false matches. Received January 28, 1995; revised January 17, 1996.  相似文献   

19.
We present an algorithm to find a proper fraction in simplest reduced terms between two reduced proper fractions. A proper fraction is a rational number m/n with m<n and n>1. A fraction m/n is simpler than p/q if m?p and n?q, with at least one inequality strict. The algorithm operates by walking a Farey tree in maximum steps down each branch. Through Monte Carlo simulation, we find that the present algorithm finds a simpler interpolation of two fractions than using the Euclidean-Convergent [D.W. Matula, P. Kornerup, Foundations of finite precision rational arithmetic, Computing 2 (Suppl.) (1980) 85-111] walk of a Farey tree and terminating at the first fraction satisfying the bound. Analysis shows that the new algorithms, with very high probability, will find an interpolation that is simpler than at least one of the bounds, and thus take less storage space than at least one of the bounds.  相似文献   

20.
分布式存储的并行串匹配算法的设计与分析   总被引:7,自引:0,他引:7  
陈国良  林洁  顾乃杰 《软件学报》2000,11(6):771-778
并行串匹配算法的研究大都集中在PRAM(parallel random access machine)模型上,其他更为实际的模型上的并行串匹配算法的研究相对要薄弱得多.该文采用将最优串行算法并行化的技术,利用模式串的周期性质,巧妙地将改进的KMP(Knuth-Morris-Pratt)算法并行化,提出了一个简便、高效且具有良好可扩放性的分布式串匹配算法,其计算复杂度为O(n/p+m),通信复杂度为O(ulogp相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号