首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We present improved variations of the BNDM algorithm for exact string matching. At each alignment our bit-parallel algorithms process a q-gram before testing the state variable. In addition we apply reading a 2-gram in one instruction. Our point of view is practical efficiency of algorithms. Our experiments show that the new variations are faster than earlier algorithms in many cases.  相似文献   

2.
In parameterized string matching the pattern P matches a substring t of the text T if there exist a bijective mapping from the symbols of P to the symbols of t. We give simple and practical algorithms for finding all such pattern occurrences in sublinear time on average. The algorithms work for a single and multiple patterns.  相似文献   

3.
Light-based string matching   总被引:1,自引:1,他引:0  
String matching is a very important problem in computer science. The problem consists in finding all the occurrences of a pattern P of length m in a text T of length n. We describe a special device which can do string matching by performing nm + 1 text-to-pattern comparisons. The proposed device uses light and optical filters for performing computations. Two physical implementations are proposed. One of them uses colored glass and the other one uses polarizing filters. The strengths and the weaknesses of each method are deeply discussed.  相似文献   

4.
Efficient string matching with wildcards and length constraints   总被引:1,自引:2,他引:1  
This paper defines a challenging problem of pattern matching between a pattern P and a text T, with wildcards and length constraints, and designs an efficient algorithm to return each pattern occurrence in an online manner. In this pattern matching problem, the user can specify the constraints on the number of wildcards between each two consecutive letters of P and the constraints on the length of each matching substring in T. We design a complete algorithm, SAIL that returns each matching substring of P in T as soon as it appears in T in an O(n+klmg) time with an O(lm) space overhead, where n is the length of T, k is the frequency of P's last letter occurring in T, l is the user-specified maximum length for each matching substring, m is the length of P, and g is the maximum difference between the user-specified maximum and minimum numbers of wildcards allowed between two consecutive letters in P.SAIL stands for string matching with wildcards and length constraints. Gong Chen received the B.Eng. degree from the Beijing University of Technology, China, and the M.Sc. degree from the University of Vermont, USA, both in computer science. He is currently a graduate student in the Department of Statistics at the University of California, Los Angeles, USA. His research interests include data mining, statistical learning, machine learning, algorithm analysis and design, and database management. Xindong Wu is a professor and the chair of the Department of Computer Science at the University of Vermont. He holds a Ph.D. in Artificial Intelligence from the University of Edinburgh, Britain. His research interests include data mining, knowledge-based systems, and Web information exploration. He has published extensively in these areas in various journals and conferences, including IEEE TKDE, TPAMI, ACM TOIS, IJCAI, AAAI, ICML, KDD, ICDM and WWW, as well as 12 books and conference proceedings. Dr. Wu is the Editor-in-Chief of the IEEE Transactions on Knowledge and Data Engineering (by the IEEE Computer Society), the founder and current Steering Committee Chair of the IEEE International Conference on Data Mining (ICDM),an Honorary Editor-in-Chief of Knowledge and Information Systems (by Springer), and a Series Editor of the Springer Book Series on Advanced Information and Knowledge Processing (AI&KP). He is the 2004 ACM SIGKDD Service Award winner. Xingquan Zhu received his Ph.D degree in Computer Science from Fudan University, Shanghai, China, in 2001. He spent 4 months with Microsoft Research Asia, Beijing, China, where he was working on content-based image retrieval with relevance feedback. From 2001 to 2002, he was a postdoctoral associate in the Department of Computer Science at Purdue University, West Lafayette, IN. He is currently a research assistant professor in the Department of Computer Science, the University of Vermont, Burlington, VT. His research interests include data mining, machine learning, data quality, multimedia computing, and information retrieval. Since 2000, Dr. Zhu has published extensively, including over 50 refereed papers in various journals and conference proceedings. Abdullah N. Arslan got his Ph.D. degree in Computer Science in 2002 from the University of California at Santa Barbara. Upon his graduation he joined the Department of Computer Science at the University of Vermont as an assistant professor. He has been with the computer science faculty there since then. Dr. Arslan's main research interests are on algorithms on strings, computational biology and bioinformatics. Dr. Arslan earned his Master's degree in Computer Science in 1996 from the University of North Texas, Denton, Texas and his Bachelor's degree in Computer Engineering in 1990 from the Middle East Technical University, Ankara, Turkey. He worked as a programmer for the Central Bank of Turkey between 1991 and 1994. Yu He received her B.E. degree in Information Engineering from Zhejiang University, China, in 2001. She is currently a graduate student in the Department of Computer Science at the University of Vermont. Her research interests include data mining, bioinformatics and pattern recognition.  相似文献   

5.
Simple deterministic wildcard matching   总被引:2,自引:0,他引:2  
We present a simple and fast deterministic solution to the string matching with don't cares problem. The task is to determine all positions in a text where a pattern occurs, allowing both pattern and text to contain single character wildcards. Our algorithm takes O(nlogm) time for a text of length n and a pattern of length m and in our view the algorithm is conceptually simpler than previous approaches.  相似文献   

6.
Approximate pattern matching algorithms have become an important tool in computer assisted music analysis and information retrieval. The number of different problem formulations has greatly expanded in recent years, not least because of the subjective nature of measuring musical similarity. From an algorithmic perspective, the complexity of each problem depends crucially on the exact definition of the difference between two strings. We present an overview of advances in approximate string matching in this field focusing on new measures of approximation.  相似文献   

7.
8.
String matching is the problem of finding all the occurrences of a pattern in a text. We propose a very fast new family of string matching algorithms based on hashing q-grams. The new algorithms are the fastest on many cases, in particular, on small size alphabets.  相似文献   

9.
G. De V. Smit 《Software》1982,12(1):57-66
Three string matching algorithms—straightforward, Knuth-Morris-Pratt and Boyer-Moor—re examined and their time complexities discussed. A comparison of their actual average behaviour is made, based on empirical data presented. It is shown that the Boyel-Moore algorithm is extremely efficient in most cases and that, contrary to the impression one might get from the analytical results, the Knuth-Morris-Pratt algorithm is not significantly better on the average than the straightforward algorithm.  相似文献   

10.
Experimental results are given for the application of a new n-gram algorithm to substring searching in DNA strings. The results confirm theoretical predictions of expected running times based on the assumption that the data are drawn from a stationary ergodic source. They also confirm that the algorithms tested are the most efficient known for searches involving larger patterns.  相似文献   

11.
The goal of scaled permuted string matching is to find all occurrences of a pattern in a text, in all possible scales and permutations. Given a text of length n and a pattern of length m we present an O(n) algorithm.  相似文献   

12.
An aggressive algorithm for multiple string matching   总被引:1,自引:0,他引:1  
A new algorithm based on the Wu-Manber algorithm for multiple string matching is presented in this paper. The algorithm eliminates the functional overlap of the table HASH and SHIFT, and computes the shift distances in an aggressive manner. After each test, the algorithm examines the character next to the scan window to maximize the shift distance. This idea is consistent with that of the quick-search (QS) algorithm. Experimental results on four alphabets show that the new algorithm is more efficient than Wu-Manber and other recent algorithms, particularly on short pattern sets and large alphabet.  相似文献   

13.
The problem studied in this paper is to search a given text for occurrences of certain strings, in the particular case where the set of strings may change as the search proceeds.A well-known algorithm by Aho and Corasick applies to the simpler case when the set of strings is known beforehand and does not change. This algorithm builds a transition diagram (finite automaton) from the strings, and uses it as a guide to traverse the text. The search can then be done in linear time.We show how this algorithm can be modified to allow incremental diagram construction, so that new keywords may be entered at any time during the search. The incremental algorithm presented essentially retains the time and space complexities of the non-incremental one.  相似文献   

14.
We present a simple and faster solution to the problem of matching a set of patterns with variable length don't cares. Given an alphabet Σ, a pattern p is a word p1@p2?@pm, where pi is a string over Σ called a keyword and @∉Σ is a symbol called a variable length don't care (VLDC) symbol. Pattern p matches a text t if t=u0p1u1um−1pmum for some u0,…,umΣ. The problem addressed in this paper is: given a set of patterns P and a text t, determine whether one of the patterns of P matches t.Kucherov and Rusinowitch (1997) [9] presented an algorithm that solves the problem in time O((|t|+|P|)log|P|), where |P| is the total length of keywords in every pattern of P. We give a new algorithm based on Aho-Corasick automaton. It uses the solutions of Dynamic Marked Ancestor Problem of Chan et al. (2007) [5]. The algorithm takes O((|t|+‖P‖)logκ/loglogκ) time, where ‖P‖ is the total number of keywords in every pattern of P, and κ is the number of distinct keywords in P. The algorithm is faster and simpler than the previous approach.  相似文献   

15.
This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O((nlogkn)/B) disk pages and finds all k-error matches with O((|P|+occ)/B+logknloglogBn) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require Ω(|P|+occ+poly(logn)) I/Os. The second index reduces the space to O((nlogn)/B) disk pages, and the I/O complexity is O((|P|+occ)/B+logk(k+1)nloglogn).  相似文献   

16.
The paper shows how to compute exactly expectations, standard deviations, and cumulative probabilities of the searching times of string-matching algorithms based on the use of automata. This is derived from a methodology based on viewing the underlying Markov chains as exponential families and applying recent results on them.  相似文献   

17.
In this paper, we present a new approach to measuring the similarity between 3D-curves. Our approach allows the possibility of using strings, where each element is a vector rather than just a symbol. We present two different approaches for representing 3D-curves. One possibility is to represent a 3D-curve as two 2D-curves, one being the projection of the 3D-curve in the XY-plane, and the other one in the YZ-plane. For the case that we need geometric rotation invariance, we have used a second approach to the symbolic representation of the 3D-curve using the curvature and the tension as their symbolic representation. We validate this approach through experiments using synthetic and digitalized data.  相似文献   

18.
19.
We propose a new algorithm for computing the edit distance of an uncompressed string against a run-length-encoded string. For an uncompressed string of length n and a compressed string with M runs, the algorithm computes their edit distance in time O(Mn). This result directly implies an O(min{mN,Mn}) time algorithm for strings of lengths m and n with M and N runs, respectively. It improves the previous best known time bound O(mN+Mn).  相似文献   

20.
Applications of approximate string matching to 2D shape recognition   总被引:7,自引:0,他引:7  
H Bunke  U Bü  hler 《Pattern recognition》1993,26(12):1797-1812
A new method for the recognition of arbitrary two-dimensional (2D) shapes is described. It is based on string edit distance computation. The recognition method is invariant under translation, rotation, scaling and partial occlusion. A set of experiments are described demonstrating the robustness and reliability of the proposed approach.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号