首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We consider on-line construction of the suffix tree for a parameterized string, where we always have the suffix tree of the input string read so far. This situation often arises from source code management systems where, for example, a source code repository is gradually increasing in its size as users commit new codes into the repository day by day. We present an on-line algorithm which constructs a parameterized suffix tree in randomized O(n) time, where n is the length of the input string. Our algorithm is the first randomized linear time algorithm for the on-line construction problem.  相似文献   

2.
Given a graph, finding an optimal vertex ranking and constructing a minimum height elimination tree are two related problems. However, an optimal vertex ranking does not by itself provide enough information to construct an elimination tree of minimum height. On the other hand, an optimal vertex ranking can readily be found directly from an elimination tree of minimum height. On n-vertex trees, the optimal vertex ranking problem already has a linear-time algorithm in the literature. However, there is no linear-time algorithm for the problem of finding a minimum height elimination tree. A naive algorithm for this problem requires O(nlogn) time. In this paper, we propose a linear-time algorithm for constructing a minimum height elimination tree of a tree.  相似文献   

3.
Practical methods for constructing suffix trees   总被引:7,自引:0,他引:7  
Sequence datasets are ubiquitous in modern life-science applications, and querying sequences is a common and critical operation in many of these applications. The suffix tree is a versatile data structure that can be used to evaluate a wide variety of queries on sequence datasets, including evaluating exact and approximate string matches, and finding repeat patterns. However, methods for constructing suffix trees are often very time-consuming, especially for suffix trees that are large and do not fit in the available main memory. Even when the suffix tree fits in memory, it turns out that the processor cache behavior of theoretically optimal suffix tree construction methods is poor, resulting in poor performance. Currently, there are a large number of algorithms for constructing suffix trees, but the practical tradeoffs in using these algorithms for different scenarios are not well characterized. In this paper, we explore suffix tree construction algorithms over a wide spectrum of data sources and sizes. First, we show that on modern processors, a cache-efficient algorithm with O(n2) worst-case complexity outperforms popular linear time algorithms like Ukkonen and McCreight, even for in-memory construction. For larger datasets, the disk I/O requirement quickly becomes the bottleneck in each algorithm's performance. To address this problem, we describe two approaches. First, we present a buffer management strategy for the O(n2) algorithm. The resulting new algorithm, which we call “Top Down Disk-based” (TDD), scales to sizes much larger than have been previously described in literature. This approach far outperforms the best known disk-based construction methods. Second, we present a new disk-based suffix tree construction algorithm that is based on a sort-merge paradigm, and show that for constructing very large suffix trees with very little resources, this algorithm is more efficient than TDD.  相似文献   

4.
On-line construction of suffix trees   总被引:47,自引:0,他引:47  
E. Ukkonen 《Algorithmica》1995,14(3):249-260
An on-line algorithm is presented for constructing the suffix tree for a given string in time linear in the length of the string. The new algorithm has the desirable property of processing the string symbol by symbol from left to right. It always has the suffix tree for the scanned part of the string ready. The method is developed as a linear-time version of a very simple algorithm for (quadratic size) suffixtries. Regardless of its quadratic worst case this latter algorithm can be a good practical method when the string is not too long. Another variation of this method is shown to give, in a natural way, the well-known algorithms for constructing suffix automata (DAWGs).This research was supported by the Academy of Finland and by the Alexander von Humboldt Foundation (Germany).  相似文献   

5.
The two-dimensional suffix tree of an n × n square matrix A is a compacted trie that represents all square submatrices of A[11]. For the off-line case, i.e., A is given in advance to the algorithm, it is known how to build it in optimal time, for any type of alphabet size [11],[18]. Motivated by applications in Image Compression[22[, Giancarlo and Guaiana [14] considered the on-line version of the two-dimensional suffix tree and presented an O(n2 log2 n)-time algorithm, which we refer to as GG. That algorithm is a nontrivial generalization of Ukkonen's on-line algorithm for standard suffix trees [23]. The main contribution in this paper is an O(log n) factor improvement in the time complexity of the GG algorithm, making it optimal for unbounded alphabets [9]. Moreover, the ideas presented here also lead to a major simplification of the GG algorithm. Technically, we are able to preserve most of the structure of the original GG algorithm, by reducing a computational bottleneck to a primitive operation, i.e., comparison of Lcharacters, which is here implemented in constant time rather than O(log n) time as in GG. However, preserving that structure comes at a price. Indeed, in order to make everything work, we need a careful reorganization of another fundamental algorithm: Weiner's algorithm for the construction of standard suffix trees [24]. Specifically, here we provide a version of that algorithm which takes linear time and works on-line and concurrently over a set of strings.  相似文献   

6.
The disk dimension of a planar graph G is the least number k for which G embeds in the plane minus k open disks, with every vertex on the boundary of some disk. Useful properties of graphs with a given disk dimension are derived, leading to an algorithm to obtain an outerplanar subgraph of a graph with disk dimension k by removing at most 2k−2 vertices. This reduction is used to obtain linear-time exact and approximation algorithms on graphs with fixed disk dimension. In particular, a linear-time approximation algorithm is presented for the pathwidth problem.  相似文献   

7.
The two-dimensional suffix tree of a matrix A is a compacted tree that represents all square submatrices of A. We present the first complete version of a deterministic linear-time algorithm to construct the two-dimensional suffix tree by applying a divide-and-conquer approach.  相似文献   

8.
Suffix trees and suffix arrays are fundamental full-text index data structures to solve problems occurring in string processing. Since suffix trees and suffix arrays have different capabilities, some problems are solved more efficiently using suffix trees and others are solved more efficiently using suffix arrays. We consider efficient index data structures with the capabilities of both suffix trees and suffix arrays without requiring much space. When the size of an alphabet is small, enhanced suffix arrays are such index data structures. However, when the size of an alphabet is large, enhanced suffix arrays lose the power of suffix trees. Pattern searching in an enhanced suffix array takes O(m|Σ|) time while pattern searching in a suffix tree takes O(mlog |Σ|) time where m is the length of a pattern and Σ is an alphabet. In this paper, we present linearized suffix trees which are efficient index data structures with the capabilities of both suffix trees and suffix arrays even when the size of an alphabet is large. A linearized suffix tree has all the functionalities of the enhanced suffix array and supports the pattern search in O(mlog |Σ|) time. In a different point of view, it can be considered a practical implementation of the suffix tree supporting O(mlog |Σ|)-time pattern search. In addition, we also present two efficient algorithms for computing suffix links on the enhanced suffix array and the linearized suffix tree. These are the first algorithms that run in O(n) time without using the range minima query. Our experimental results show that our algorithms are faster than the previous algorithms.  相似文献   

9.
Phillips  Stein  Torng  Wein 《Algorithmica》2008,32(2):163-200
Abstract. We consider two fundamental problems in dynamic scheduling: scheduling to meet deadlines in a preemptive multiprocessor setting, and scheduling to provide good response time in a number of scheduling environments. When viewed from the perspective of traditional worst-case analysis, no good on-line algorithms exist for these problems, and for some variants no good off-line algorithms exist unless P = NP . We study these problems using a relaxed notion of competitive analysis, introduced by Kalyanasundaram and Pruhs, in which the on-line algorithm is allowed more resources than the optimal off-line algorithm to which it is compared. Using this approach, we establish that several well-known on-line algorithms, that have poor performance from an absolute worst-case perspective, are optimal for the problems in question when allowed moderately more resources. For optimization of average flow time, these are the first results of any sort, for any NP -hard version of the problem, that indicate that it might be possible to design good approximation algorithms.  相似文献   

10.
We review the linear-time suffix tree constructions by Weiner, McCreight, and Ukkonen. We use the terminology of the most recent algorithm, Ukkonen's on-line construction, to explain its historic predecessors. This reveals relationships much closer than one would expect, since the three algorithms are based on rather different intuitive ideas. Moreover, it completely explains the differences between these algorithms in terms of simplicity, efficiency, and implementation complexity. Received February 12, 1995; revised January 28, 1996.  相似文献   

11.
The suffix tree is a key data structure for biological sequence analysis, since it permits efficient solutions to many string-based problems. Constructing large suffix trees is challenging because of high memory overheads and poor memory locality. Even though efficient suffix tree construction algorithms exist, their run-time is still very high for long DNA sequences such as whole human chromosomes. In this paper, we are using a hierarchical grid system as a computational platform in order to reduce this run-time significantly. To achieve an efficient mapping onto this type of architecture we introduce a parallel suffix tree construction algorithm that makes use of a new data structure called the common prefix suffix tree. Using this algorithm together with a dynamic load balancing strategy we show that our distributed grid implementation leads to significant run-time savings.  相似文献   

12.
Ovidiu Daescu 《Algorithmica》2004,38(1):131-143
In this paper we give bounds on the complexity of some algorithms for approximating 2-D and 3-D polygonal paths with the infinite beam measure of error. While the time/space complexities of the algorithms known for other error measures are well understood, path approximation with infinite beam measure seems to be harder due to the complexity of some geometric structures that arise in the known approaches. Our results answer some open problems left in previous work. We also present a more careful analysis of the time complexity of the general iterative algorithm for infinite beam measure and show that it could perform much better in practice than in the worst case. We then propose a new approach for path approximation that consists of a breadth first traversal of the path approximation graph, without constructing the graph. This approach can be applied to any error criterion in any constant dimension. The running time of the new algorithm depends on the size of the output: if the optimal approximating path has m vertices, the algorithm performs F(m) iterations rather than n–1 in the previous approaches, where F(m) \le n–1 is the number of vertices of the path approximation graph that can be reached with at most m–2 edges. This is the first output sensitive path approximation algorithm.  相似文献   

13.
Optimal Time-Critical Scheduling via Resource Augmentation   总被引:1,自引:0,他引:1  
Phillips  Stein  Torng  Wein 《Algorithmica》2002,32(2):163-200
We consider two fundamental problems in dynamic scheduling: scheduling to meet deadlines in a preemptive multiprocessor setting, and scheduling to provide good response time in a number of scheduling environments. When viewed from the perspective of traditional worst-case analysis, no good on-line algorithms exist for these problems, and for some variants no good off-line algorithms exist unless P = NP . We study these problems using a relaxed notion of competitive analysis, introduced by Kalyanasundaram and Pruhs, in which the on-line algorithm is allowed more resources than the optimal off-line algorithm to which it is compared. Using this approach, we establish that several well-known on-line algorithms, that have poor performance from an absolute worst-case perspective, are optimal for the problems in question when allowed moderately more resources. For optimization of average flow time, these are the first results of any sort, for any NP -hard version of the problem, that indicate that it might be possible to design good approximation algorithms.  相似文献   

14.
《国际计算机数学杂志》2012,89(10):2118-2141
A graph is clique-perfect if the maximum size of a clique-independent set (a set of pairwise disjoint maximal cliques) and the minimum size of a clique-transversal set (a set of vertices meeting every maximal clique) coincide for each induced subgraph. A graph is balanced if its clique-matrix contains no square submatrix of odd size with exactly two ones per row and column. In this work, we give linear-time recognition algorithms and minimal forbidden induced subgraph characterizations of clique-perfectness and balancedness of P4-tidy graphs and a linear-time algorithm for computing a maximum clique-independent set and a minimum clique-transversal set for any P4-tidy graph. We also give a minimal forbidden induced subgraph characterization and a linear-time recognition algorithm for balancedness of paw-free graphs. Finally, we show that clique-perfectness of diamond-free graphs can be decided in polynomial time by showing that a diamond-free graph is clique-perfect if and only if it is balanced.  相似文献   

15.
This paper presents a new distributed self-stabilizing algorithm for the weakly connected minimal dominating set problem. It assumes a self-stabilizing algorithm to compute a breadth-first tree. Using an unfair distributed scheduler the algorithm stabilizes in at most O(nmA) moves, where A is the number of moves to construct a breadth-first tree. All previously known algorithms required an exponential number of moves.  相似文献   

16.
In this paper, the on-line motion planning of articulated robots in dynamic environment is investigated. We propose a practical on-line robot motion planning approach that is based upon pre-computing the global configuration space (C-space) connectivity with respect to all possible obstacle positions. The proposed motion planner consists of an off-line stage and an on-line stage. In the off-line stage, the obstacles in the C-space (C-obstacle) with respect to the obstacle positions in the workspace are computed, which are then stored using a hierarchical data structure with non-uniform 2m trees. In the on-line stage, the real obstacle cells in the workspace are identified and the corresponding 2m trees from the pre-computed database are superposed to construct the real-time C-space. The collision-free path is then searched in this C-space by using the A* algorithm under a multi-resolution strategy which has excellent computational efficiency. In this approach, the most time-consuming operation is performed in the off-line stage, while the on-line computing only need to deal with the real-time obstacles occurring in the dynamic environment. The minimized on-line computational cost makes it feasible for real-time on-line motion planning. The validity and efficiency of this approach is demonstrated using manipulator prototypes with 5 and 7 degree-of-freedom.  相似文献   

17.
Culberson and Rudnicki [J.C. Culberson, P. Rudnicki, A fast algorithm for constructing trees from distance matrices, Inform. Process. Lett. 30 (4) (1989) 215-220] gave an algorithm that reconstructs a degree d restricted tree from its distance matrix. According to their analysis, it runs in time O(dnlogdn) for topological trees. However, this turns out to be false; we show that the algorithm takes time in the topological case, giving tight examples.  相似文献   

18.
A new measure for the study of on-line algorithms   总被引:3,自引:0,他引:3  
An accepted measure for the performance of an on-line algorithm is the competitive ratio introduced by Sleator and Tarjan. This measure is well motivated and has led to the development of a mathematical theory for on-line algorithms.We investigate the behavior of this measure with respect to memory needs and benefits of lookahead and find some counterintuitive features. We present lower bounds on the size of memory devoted to recording the past. It is also observed that the competitive ratio reflects no improvement in the performance of an on-line algorithm due to any (finite) amount of lookahead.We offer an alternative measure that exhibits a different and, in some respects, more intuitive behavior. In particular, we demonstrate the use of our new measure by analyzing the tradeoff between the amortized cost of on-line algorithms for the paging problem and the amount of lookahead available to them. We also derive on-line algorithms for theK-server problem on any bounded metric space, which, relative to the new measure, are optimal among all on-line algorithms (up to a factor of 2) and are within a factor of 2K from the optimal off-line performance.  相似文献   

19.
Matching run-length coded strings (RLCSs) is very important in the field of pattern recognition. This paper considers the design of vectorized matching algorithms that operate directly on RLCSs. We first modify the algorithm of Karp and Rabin (1987) to design a linear-time randomized matching algorithm for RLCSs. Following this algorithm, two new and fast vectorized algorithms are presented. The first one is off-line and the second one is on-line. Some experiments are carried out on a CRAY X-MP EA/ 116se vector supercomputer to demonstrate the good performance of our vectorized algorithms.  相似文献   

20.
We introduce a new generalization of the on-line coloring game. We define the concept of bounded family for on-line t-relaxed colorings. This extends the concept of on-line competitive coloring algorithms to t-relaxed colorings. We characterize the trees T for which the family of T-free graphs is bounded and show that the corresponding bounding function is linear.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号