首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
We revisit various string indexing problems with range reporting features, namely, position-restricted substring searching, indexing substrings with gaps, and indexing substrings with intervals. We obtain the following main results.
  • We give efficient reductions for each of the above problems to a new problem, which we call substring range reporting. Hence, we unify the previous work by showing that we may restrict our attention to a single problem rather than studying each of the above problems individually.
  • We show how to solve substring range reporting with optimal query time and little space. Combined with our reductions this leads to significantly improved time-space trade-offs for the above problems. In particular, for each problem we obtain the first solutions with optimal time query and O(nlog O(1) n) space, where n is the length of the indexed string.
  • We show that our techniques for substring range reporting generalize to substring range counting and substring range emptiness variants. We also obtain non-trivial time-space trade-offs for these problems.
Our bounds for substring range reporting are based on a novel combination of suffix trees and range reporting data structures. The reductions are simple and general and may apply to other combinations of string indexing with range reporting.  相似文献   

2.
3.
Given a string x of length n and an integer constant λ, the λ-Cover Problem is defined to be the identification of all the sets of λ substrings each of equal length that cover x. This problem can be solved by a general algorithm in O(n2) time for constant alphabet size. We also generalize the λ-Cover Problem, whereby a set of λ substrings of different lengths are considered, which can be computed using the general algorithm in O(n2) time.  相似文献   

4.
A scattered context grammar erases nonterminals in a generalized k-limited way in a successful derivation, where k is a positive integer, if in every sentential form of a derivation, each of its substrings consisting of nonterminals from which the grammar derives empty strings is of length k or less. This paper demonstrates that if a scattered context grammar generates its sentences in this way, it can be converted to a scattered context grammar without erasing productions; in general, however, this is not possible.  相似文献   

5.
A string-based negative selection algorithm is an immune-inspired classifier that infers a partitioning of a string space Σ? into “normal” and “anomalous” partitions from a training set S containing only samples from the “normal” partition. The algorithm generates a set of patterns, called “detectors”, to cover regions of the string space containing none of the training samples. Strings that match at least one of these detectors are then classified as “anomalous”. A major problem with existing implementations of this approach is that the detector generating step needs exponential time in the worst case. Here we show that for the two most widely used kinds of detectors, the r-chunk and r-contiguous detectors based on partial matching to substrings of length r, negative selection can be implemented more efficiently by avoiding generating detectors altogether: for each detector type, training set SΣ? and parameter r? one can construct an automaton whose acceptance behaviour is equivalent to the algorithm’s classification outcome. The resulting runtime is O(|S|?r|Σ|) for constructing the automaton in the training phase and O(?) for classifying a string.  相似文献   

6.
The Transtaxor is a computer program which correlates a given problem definition or overall strategy, expressed as a general input syntax, with a given set of circumstances, expressed as a specific input “situation string” to produce an output formula or course of action. The transtaxor is written once for a given computer and thereafter becomes applicable to any problem which can be expressed in terms of a syntax. In order to extend the methodology to a wide variety of problems which can be expressed qualitatively, the usual syntax definitions have been extended to provide for the inclusion of relations, the division of the syntax into levels, and the paralleling of these levels through “developable agendums”. A simple but powerful “transcribing language” incorporates the “action-directing language” into the processing. The concept of “delineation” facilitates the generation of the basic parts, which are developed by the transtaxor, with the aid of a list-processing language, into progressively higher levels of the syntax until the final goal is produced. Examples are given of the application of the methodology to automatic-programming-language translation and information retrieval.  相似文献   

7.
8.
A recent trend in stringology explores the possibility of utilizing text compression to speed up similarity computation between strings. In this line of investigation, run-length encoding is one of the earliest studied compression schemes. Despite its simple coding nature, the only positive result before this work is the computation of the in-del distance (dual of longest common subsequence), which requires O(mnlogmn) time, where m and n denote the number of runs of the input strings. The worst-case time complexity of computing the edit distance between two run-length encoded strings still depends on the uncompressed string lengths. In this paper, we break the foundational gap by providing its first “fully compressed” algorithm whose running time depends solely on the compressed string lengths. Specifically, given two strings, compressed into m and n runs, mn, we present an O(mn 2)-time algorithm for computing the edit distance of the strings. Our approach also yields the first fully compressed solution to approximate matching of a pattern of m runs in a text of n runs in O(mn 2) time.  相似文献   

9.
Given a rewriting system G (its alphabet, the set of productions and the axiom) one can define the language of G by
  1. taking out of all strings generated by G only those which are over a distinguished subalphabet of G, or
  2. translating the set of all strings generated by G by a fixed homomorphism.
The “trade-offs” between these two mechanisms for defining languages are discussed for both “parallel” rewriting systems from the developmental systems hierarchy and “sequential” rewriting systems from the Chomsky hierarchy.  相似文献   

10.
Fast Algorithms for max independent set   总被引:1,自引:0,他引:1  
We first propose a method, called “bottom-up method” that, informally, “propagates” improvement of the worst-case complexity for “sparse” instances to “denser” ones and we show an easy though non-trivial application of it to the min set cover problem. We then tackle max independent set. Here, we propagate improvements of worst-case complexity from graphs of average degree?d to graphs of average degree greater than?d. Indeed, using algorithms for max independent set in graphs of average degree 3, we successively solve max independent set in graphs of average degree 4, 5 and?6. Then, we combine the bottom-up technique with measure and conquer techniques to get improved running times for graphs of maximum degree?5 and?6 but also for general graphs. The computation bounds obtained for max independent set are?O ?(1.1571 n ), O ?(1.1895 n ) and?O ?(1.2050 n ), for graphs of maximum (or more generally average) degree?4, 5 and?6 respectively, and?O ?(1.2114 n ) for general graphs. These results improve upon the best known results for these cases for polynomial space algorithms.  相似文献   

11.
An arc-annotated string is a string of characters, called bases, augmented with a set of pairs, called arcs, each connecting two bases. Given arc-annotated strings P and Q the arc-preserving subsequence problem is to determine if P can be obtained from Q by deleting bases from Q. Whenever a base is deleted any arc with an endpoint in that base is also deleted. Arc-annotated strings where the arcs are “nested” are a natural model of RNA molecules that captures both the primary and secondary structure of these. The arc-preserving subsequence problem for nested arc-annotated strings is basic primitive for investigating the function of RNA molecules. Gramm et al. (ACM Trans. Algorithms 2(1): 44–65, 2006) gave an algorithm for this problem using O(nm) time and space, where m and n are the lengths of P and Q, respectively. In this paper we present a new algorithm using O(nm) time and O(n+m) space, thereby matching the previous time bound while significantly reducing the space from a quadratic term to linear. This is essential to process large RNA molecules where the space is likely to be a bottleneck. To obtain our result we introduce several novel ideas which may be of independent interest for related problems on arc-annotated strings.  相似文献   

12.
We consider on-line construction of the suffix tree for a parameterized string, where we always have the suffix tree of the input string read so far. This situation often arises from source code management systems where, for example, a source code repository is gradually increasing in its size as users commit new codes into the repository day by day. We present an on-line algorithm which constructs a parameterized suffix tree in randomized O(n) time, where n is the length of the input string. Our algorithm is the first randomized linear time algorithm for the on-line construction problem.  相似文献   

13.
A homogeneous production has its left-hand side formed by a non-empty string of identical nonterminals. A phrase-structure grammar is homogeneous if each of its productions is homogeneous. The present paper discusses the reduction of homogeneous grammars with respect to the number of non-context-free productions. More specifically, it demonstrates that for every phrase-structure grammar, there exists an equivalent homogeneous grammar that has only three non-context-free productions of the form 00→ε, 11→ε, and 22→ε.  相似文献   

14.
Sequencing by Hybridization (SBH) is a method for reconstructing an unknown DNA string based on obtaining, through hybridization experiments, whether certain short strings appear in the target string. Following Margaritis and Skiena (1995) [12], we study the SBH in rounds problem: The goal is to reconstruct an unknown string A (over a fixed alphabet) using queries of the form “does the string S appear in A?” for some query string S. The queries are performed in rounds, where the queries in each round depend on the answers to the queries in the previous rounds. We show that almost all strings of length n can be reconstructed in log1n rounds with O(n) queries per round. We also consider a variant of the problem in which for each substring query S, the answer is whether S appears once in the string A, appears at least twice in A, or does not appear in A. For this problem, we show that almost all strings can be reconstructed in 2 rounds of O(n) queries. Our results improve the previous results of Margaritis and Skiena (1995) [12] and Frieze and Halldórsson (2002) [8].  相似文献   

15.
The semantics of PROLOG programs is usually given in terms of the model theory of first-order logic. However, this does not adequately characterizethe computational behavior of PROLOG programs. PROLOG implementations typically use a sequential evaluation strategy based on the textual order of clauses and literals in a program, as well as nonlogical features like cut. In this work we develop a denotational semantics that captures thecomputational behavior of PROLOG. We present a semantics for “cut-free” PROLOG, which is then extended to PROLOG with cut. For each case we develop a congruence proof that relates the semantics to a standard operational interpreter. As an application of our denotational semantics, we show the correctness of some standard “folk” theorems regarding transformations on PROLOG programs.  相似文献   

16.
A suitably weighted Index Tree such as a B-tree or a Suffix Tree can be easily adapted to store, for a given string x and for all substrings w of x, the number of distinct instances of w along x. The storage needed is seen to be linear in the length of x: moreover, the whole statistics can itself be derived in linear time, off-line of a RAM. If the substring w has nontrivial periods, however, the number of distinct instances might differ from that of distinct non-overlapping occurrences along x. It is shown here that O(n log n) storage units—n standing for the length of x—are sufficient to organize this second kind of statistics, in such a way that the maximum number of nonoverlapping instances for arbitrary w along x can be retrieved in a number of character comparisons not exceeding the length of w.  相似文献   

17.
This paper is concerned with letter-rewriting systems in which context-free rewriting productions are equipped with context conditions. Such a production, π = Aα can be used to rewrite an occurrence of A in a string x only if x satisfies the context conditions attached to π. In particular, we are concerned with situations where the context conditions of several productions are satisfied by a given occurrence of A. Deciding which of these productions can be applied to rewrite this occurrence is done by (once a priori) fixed priorities among all context conditions.  相似文献   

18.
This paper deals with a new method of coding unlabeled trees, the resulting code being a string of integers. For a tree onn vertices both the coding and the decoding algorithm work in time 0(n). Givenn the “mean length” of code strings is shorter than with other existing integer string codes. The new coding method is used to derive an algorithm for generating all unlabeled trees of a given size without repetition.  相似文献   

19.
Demand-paging systems are characterized as stochastic control processes, and optimal page replacement decisions are determined by means of dynamic programming. This approach is distinguished from others by its utilization of page structure information, which may be either supplied a priori or else dynamically learned. The main result is an optimal realizable solution for a general class of replacement problems. The resulting algorithm subsumes others (including “A0”) as special cases.  相似文献   

20.
The nested model is an extension of the traditional, “flat” relational model in which relations can also have relation-valued entries. Its “default” query language, the nested algebra, is rather weak, unfortunately, since it is only a conservative extension of the traditional, flat relational algebra, and thus can express only a small fraction of the polynomial-time queries. Therefore, it was proposed to extend the nested algebra with a fixpoint construct, but the resulting language turned out to be too powerful: many inherently exponential queries could also be expressed. Two polynomial-time restrictions of the fixpoint closure of the nested algebra were proposed: the restricted fixpoint closure (by Gyssens and Van Gucht) and the bounded fixpoint closure (by Suciu). Here, we prove two results. First we show that both restrictions are equivalent in expressive power. The proof technique relies on known encodings of nested relations into flat ones, and on a novel technique, called type substitution, by which we reduce the equivalence of the two restrictions to its obvious counterpart in the flat relational model. Second we prove that both the bounded fixpoint queries and the restricted fixpoint queries admit normal forms, in which the fixpoint occurs exactly once. The proof technique relies on a novel encoding method of nested relations into flat ones.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号