首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We present a simple and faster solution to the problem of matching a set of patterns with variable length don't cares. Given an alphabet Σ, a pattern p is a word p1@p2?@pm, where pi is a string over Σ called a keyword and @∉Σ is a symbol called a variable length don't care (VLDC) symbol. Pattern p matches a text t if t=u0p1u1um−1pmum for some u0,…,umΣ. The problem addressed in this paper is: given a set of patterns P and a text t, determine whether one of the patterns of P matches t.Kucherov and Rusinowitch (1997) [9] presented an algorithm that solves the problem in time O((|t|+|P|)log|P|), where |P| is the total length of keywords in every pattern of P. We give a new algorithm based on Aho-Corasick automaton. It uses the solutions of Dynamic Marked Ancestor Problem of Chan et al. (2007) [5]. The algorithm takes O((|t|+‖P‖)logκ/loglogκ) time, where ‖P‖ is the total number of keywords in every pattern of P, and κ is the number of distinct keywords in P. The algorithm is faster and simpler than the previous approach.  相似文献   

2.
3.
We present a filtering based algorithm for the k-mismatch pattern matching problem with don't cares. Given a text t of length n and a pattern p of length m with don't care symbols in either p or t (but not both), and a bound k, our algorithm finds all the places that the pattern matches the text with at most k mismatches. The algorithm is deterministic and runs in Θ(nm1/3k1/3log2/3m) time.  相似文献   

4.
A string S∈ΣmSΣm can be viewed as a set of pairs S={(σi,i):i∈{0,…,m−1}}S={(σi,i):i{0,,m1}}. We consider approximate pattern matching problems arising from the setting where errors are introduced to the location component (ii), rather than the more traditional setting, where errors are introduced into the content itself (σiσi). In this paper, we consider the case where bits of ii may be erroneously flipped, either in a consistent or transient manner. We formally define the corresponding approximate pattern matching problems, and provide efficient algorithms for their resolution, while introducing some novel techniques.  相似文献   

5.
6.
7.
Summary Rational patterns are used to specify recognizable tree languages. It is shown that, given a rational patternp and a treet, one can decide inOp¦·¦t¦) steps whether there is some match ofp int. Problems of this kind generalized to forests or nets are shown to be NP-complete.  相似文献   

8.
Existing methods for getting the locally best matched alignments between a pair of biological sequences require O(N2) computational steps and O(N2) storage, where N is the average sequence length. An improved method is presented with which the storage requirement is greatly reduced, while the computational steps remain O(N2). Only a small number of additional steps are required to display any common sub-sequences with similarity scores greater than a given threshold. The aligments found by the algorithm are optimal in the sense that their scores are locally maximal, where each score is a sum of weights given to individual matches/replacements, insertions and deletions involved in the alignment. The algorithm was implemented in C programming language on a personal computer. Data area of 64 kbytes on random access memory and a few hundred kbytes on a disk is sufficient for comparing two protein or nucleic acid sequences of 2500 residues. The programs are particularly valuable when used in combination with fast sequence search programs.  相似文献   

9.
It has long been known that pattern matching in the Hamming distance metric can be done in time, where n is the length of the text, m is the length of the pattern, and Σ is the alphabet. The classic algorithm for this is due to Abrahamson and Kosaraju. This paper considers the following generalization, motivated by the situation where the entries in the text and pattern are analog, or distorted by additive noise, or imprecisely given for some other reason: in any alignment of the pattern with the text, two aligned symbols a and b contribute +1 to the similarity score if they differ by no more than a given threshold θ, otherwise they contribute zero. We give an time algorithm for this more general version of the problem; the classic Hamming distance matching problem is the special case of θ=0.  相似文献   

10.
The problem of pattern matching with wildcards is to find all the occurrences of a pattern of length m in a text of length n over a finite alphabet Σ (both the text and the pattern are allowed to contain wildcards). Based on the prime number encoding scheme (Chaim Linhart, Ron Shamir, Faster pattern matching with character classes using prime number encoding, J. Comput. Syst. Sci. 75 (3) (2009) 155-162), we present a new integer encoding and an efficient fast Fourier transforms based algorithm for this problem. The algorithm takes time to search the pattern in the text by computing one convolution. For matching with wildcards, our encoding uses fewer prime numbers and has shorter code words comparing with the prime number encoding. We use at most 2lg|Σ| prime numbers to encode the symbols while in the prime number encoding |Σ| prime numbers are required. This number reduces to 1.5lg|Σ| when |Σ|>40. The code word used in the algorithm is at most 2⌊lg|Σ|⌋⌈lg(5m)⌉ bits while in the prime encoding it is at least bits. We also show that the length of words can be further reduced by increasing the number of convolutions computed.  相似文献   

11.
12.
G. M. Landau  U. Vishkin 《Algorithmica》1994,12(4-5):375-408
For motivation purpose, imagine the followingcontinuous pattern-matching problem. Two continuous pictures, each consisting of unicolor regions, are given; one picture is called thescene and the other thepattern. The problem is to find all occurrences of the pattern in the scene.As a step toward efficient algorithmic handling of the continuous pattern-matching problem by computers, where discretized representations are involved, we consider pattern-matching problems where the pattern and the text are specified either in terms of the continuous properties, or through other exemplar digitized images—a variety of alternative specifications is considered.From the perspective of areas such as computer vision or image processing, our problem definitions identify an important gap in the fundamental theory of image formation and image processing—how to determine, even in the absence of noise, if a digitized image of a scene could contain an image of a given pattern. This is done using carefulaxiomatization. Such a digitized-based approach may lead toward building on the theory of string-matching algorithms (in one, or higher, dimensions) for the benefit of algorithmic pattern matching in image processing.This paper is the journal version of [LV2].Partially supported by NSF Grants CCR-8908286 and CCR-9305873 and the New York State Science and Technology Foundation, Center for Advanced Technology in Telecommunications, Polytechnic University, Brooklyn, NY, USA.Partially supported by NSF Grants CCR-8906949 and CCR-9111348.  相似文献   

13.
We propose a non-transform image compression scheme based on approximate 1D pattern matching, called pattern matching image compression (PMIC). The main idea behind it is a lossy extension of the Lempel-Ziv data compression scheme in which one searches for the longest prefix of an uncompressed image that approximately occurs in the already processed image. This main algorithm is enhanced with several new features such as searching for reverse approximate matching, recognizing sub-strings in images that are additively shifted versions of each other, introducing a variable and adaptive maximum distortion level D, and so forth. These enhancements are crucial to the overall quality of our scheme and their efficient implementation leads to algorithmic issues of interest in their own right. Both algorithmic and experimental results are presented. Our scheme turns out to be competitive with JPEG and wavelet compression for good quality graphical images. We also review related theoretical results  相似文献   

14.
现有方法在对数据属性进行分析处理时,容易造成属性中关键信息的丢失,导致了较高的误报率和漏报率.基于这种现象,提出了一种特殊的两阶段聚类方法和基于模糊轮廓树的模式匹配方法.由描述属性建立模糊轮廓树,在此基础上,再对行为属性实施聚类.最后,得到一系列的某一类型下的若干行为模式.同时,这棵模糊轮廓树又具有判定树的功能.在检测阶段,根据模糊轮廓树,确定一个模糊类别,在该模糊类别内再实施精确的匹配操作.该方法有效保护了属性中蕴含的描述信息和行为信息,避免其关键信息丢失,降低了入侵检测的误报率和漏报率.  相似文献   

15.
Pattern matching with wildcards is a challenging topic in many domains, such as bioinformatics and information retrieval. This paper focuses on the problem with gap-length constraints and the one-off condition (The one-off condition means that each character can be used at most once in all occurrences of a pattern in the sequence). It is difficult to achieve the optimal solution. We propose a graph structure WON-Net (WON-Net is a graph structure. It stands for a network with the weighted centralization measure based on each node’s centrality-degree. Its details are given in Definition 4.1) to obtain all candidate matching solutions and then design the WOW (WOW stands for pattern matching with wildcards based on WON-Net) algorithm with the weighted centralization measure based on nodes’ centrality-degrees. We also propose an adjustment mechanism to balance the optimal solutions and the running time. We also define a new variant of WOW as WOW-δ. Theoretical analysis and experiments demonstrate that WOW and WOW-δ are more effective than their peers. Besides, the algorithms demonstrate an advantage on running time by parallel processing.  相似文献   

16.
Pattern recognition systems in the artificial intelligence field have been based on the assumption that components of the system should be invoked not by directly calling them, but by running data across their sensors and having the invocation take place when a defined pattern is found. Most systems programming, however, has taken a completely different route; the emphasis there is on structures of execution and data types. A new approach to systems programming languages where a bridge is made between structure and pattern-directed invocation is described. The syntax of pattern-directed invocation, along with the semantics of a support structure for message receiving and sending necessary for pattern-directed invocation of routines is presented. Finally, some conclusions are made about pattern-directed invocation, and further areas of work are stated.  相似文献   

17.
LFC is a functional language based on recursive functions defined in context-free languages.In this paper,a new pattern matching algorithm for LFC is presented,which can represent a sequence of patterns as an integer by an encoding method.It is a rather simple method and produces efficient case-expressions for pattern matching definitions of LFC.The algorithm can also be used for other functional languages,but for nested patterns it may become complicated and further studies are needed.  相似文献   

18.
针对跳频通信在宽带噪声干扰和梳状干扰下误比特性能较差的问题,基于“信道即消息”的思想,提出了一种新的短波跳频通信技术——图样匹配跳频.以Williard码、Walsh码和Gold码为例,分析了几种图样码的特性及对系统抗干扰能力和信息速率的影响.提出了图样码的选取要求,结合系统的两种帧同步策略分别给出了图样码型,对FH/MFsK和图样匹配跳频系统进行了仿真对比.仿真结果表明,图样匹配跳频系统具有更强的抗干扰能力,在恶意干扰严重影响常规跳频的误码性能时,图样匹配跳频可以显著提升通信的可靠性.  相似文献   

19.
We explain in detail how to estimate mean values and assess statistical errors for arbitrary functions of elementary observables in Monte Carlo simulations. The method is to estimate and sum the relevant autocorrelation functions, which is argued to produce more certain error estimates than binning techniques and hence to help toward a better exploitation of expensive simulations. An effective integrated autocorrelation time is computed which is suitable to benchmark efficiencies of simulation algorithms with regard to specific observables of interest. A Matlab code is offered for download that implements the method. It can also combine independent runs (replica) allowing to judge their consistency.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号