共查询到20条相似文献,搜索用时 15 毫秒
1.
We discuss an intrinsic generalization of the suffix tree, designed to index a string of length n which has a natural partitioning into m multicharacter substrings or words . This word suffix tree represents only the m suffixes that start at word boundaries. These boundaries are determined by delimiters , whose definition depends on the application. Since traditional suffix tree construction algorithms rely heavily on the fact that all suffixes are inserted, construction of a word suffix tree is nontrivial, in particular when only O(m) construction space is allowed. We solve this problem, presenting an algorithm with O(n) expected running time. In general, construction cost is Ω(n) due to the need of scanning the entire input. In applications that require strict node ordering, an additional cost of sorting O(m') characters arises, where m' is the number of distinct words. In either case, this is a significant improvement over previously known solutions. Furthermore, when the alphabet is small, we may assume that the n characters in the input string occupy o(n) machine words. We illustrate that this can allow a word suffix tree to be built in sublinear time. Received September 2, 1997; revised December 10, 1997. 相似文献
2.
3.
4.
5.
6.
Wing-Kai Hon Tak-Wah Lam Kunihiko Sadakane Wing-Kin Sung Siu-Ming Yiu 《Algorithmica》2007,48(1):23-36
With the first human DNA being decoded into a sequence of about 2.8 billion characters, much biological research has been
centered on analyzing this sequence. Theoretically speaking, it is now feasible to accommodate an index for human DNA in the
main memory so that any pattern can be located efficiently. This is due to the recent breakthrough on compressed suffix arrays,
which reduces the space requirement from O(n log n) bits to O(n) bits. However, constructing compressed suffix arrays is still
not an easy task because we still have to compute suffix arrays first and need a working memory of O(n log n) bits (i.e.,
more than 13 gigabytes for human DNA). This paper initiates the study of constructing compressed suffix arrays directly from
the text. The main contribution is a construction algorithm that uses only O(n) bits of working memory, and the time complexity
is O(n log n). Our construction algorithm is also time and space efficient for texts with large alphabets such as Chinese
or Japanese. Precisely, when the alphabet size is |Σ|, the working space is O(n log |Σ|) bits, and the time complexity remains
O(n log n), which is independent of |Σ|. 相似文献
7.
后缀树的重要性可以为多年来学术界对它总是有新的发现而印证.它的结构简单,但可以在线性的时间里解决许多复杂的问题,被大量的使用在字符串及树的模式匹配中,对于XML标准,有很多基于关系库和对象库的索引技术和查询方案被提出来,我们试图给出一种基于后缀树进行路径导航的查询机制:用后缀树构造XML路径字典加速路径查询评价速度,我们提出可以在线地建立一个trie树的后缀树,讨论了XML路径字典中的后缀树建树算法,阐述了整个索引方案和查询机制,并探讨了包括RPE在内的它所支持的各种查询操作,XML路径字典被用于加快路径查询的评价速度. 相似文献
8.
Suffix trees are the fundamental data structure of combinatorial pattern
matching on words. Suffix trees have been used in order to give optimal
solutions to a great variety of problems on static words, but for practical
situations, such as in a text editor, where the incremental changes of
the text make dynamic updating of the corresponding suffix trees necessary, this
data structure alone has not been used with success. We prove that, for dynamic
modifications of order O(1) of words of length n, any suffix tree updating
algorithm, such as the ones proposed by McCreight, requires O(n) worst-case
running time, as for the full reconstruction of the suffix tree. Consequently,
we argue that this data structure alone is not appropriate for the solution
of combinatorial problems on words that change dynamically. 相似文献
9.
We provide a uniform framework for the study of index data structures for a two-dimensional matrixTEXT[1:n, 1:n] whose entries are drawn from an ordered alphabetΣ. An index forTEXTcan be informally seen as the two-dimensional analog of the suffix tree for a string. It allows on-line searches and statistics to be performed onTEXTby representing compactly theΘ(n3) square submatrices ofTEXTin optimalO(n2) space. We identify 4n−1families of indices forTEXT, each containing ∏ni=1 (2i−1)! isomorphic data structures. We also develop techniques leading to a single algorithm that efficiently builds any index in any family inO(n2 log n) time andO(n2) space. Such an algorithm improves in various respects the algorithms for the construction of the PAT tree and the Lsuffix tree. The framework and the algorithm easily generalize tod>2 dimensions. Moreover, as part of our algorithm, we provide new algorithmic tools that yield a space-efficient implementation of the “naming scheme” of R. Karpet al.(in“Proceedings, Fourth Symposium on Theory of Computing,” pp. 125–136) for strings and matrices. 相似文献
10.
带后缀的三字词约占三字词总数的17.5%。笔者通过对120万字语料的统计和内省共获得71种能构成三字词的后缀,并分析了这些三字词的组合规律及前缀等上下文特征。运用这些知识,对65万字语料中带后缀的三字词进行识别,精确率和召回率由原来的85.2%和86.6%分别提高到86.6%和99.7%。 相似文献
11.
12.
We are interested in the expressiveness of constraints represented by general first order formulae, with equality as unique relation symbol and function symbols taken from an infinite set F. The chosen domain is the set of trees whose nodes, in possibly infinite number, are labelled by elements of F. The operation linked to each element f of F is the mapping (a
1,..., a
n
) b, where b is the tree whose initial node is labelled f and whose sequence of daughters is a
1,..., a
n
.We first consider tree constraints involving long alternated sequences of quantifiers .... We show how to express winning positions of two-person games with such constraints and apply our results to two examples.We then construct a family of strongly expressive tree constraints, inspired by a constructive proof of a complexity result by Pawel Mielniczuk. This family involves the huge number (k), obtained by top down evaluating a power tower of 2's, of height k. By a tree constraint of size proportional to k, it is then possible to define a tree having exactly (k) nodes or to express the multiplication table computed by a Prolog machine executing up to (k) instructions.By replacing the Prolog machine with a Turing machine we show the quasi-universality of tree constraints, that is to say, the ability to concisely describe trees which the most powerful machine will never have time to compute. We also rediscover the following result of Sergei Vorobyov: the complexity of an algorithm, deciding whether a tree constraint without free variables is true, cannot be bounded above by a function obtained from finite composition of simple functions including exponentiation.Finally, taking advantage of the fact that we have at our disposal an algorithm for solving such constraints in all their generalities, we produce a set of benchmarks for separating feasible examples from purely speculative ones. Among others we notice that it is possible to solve a constraint of 5000 symbols involving 160 alternating quantifiers. 相似文献
13.
维吾尔语单词的构形词缀按照一定的规则连接到词干。维吾尔语的黏着言特点和构形词缀连接规则使得可以构造维吾尔语构形词缀的有限状态自动机。该文将详细介绍维吾尔语形容词构形词缀有限自动机的构造步骤。 相似文献
14.
The two-dimensional suffix tree of an n × n square matrix A is a compacted trie that represents all square submatrices of
A[11]. For the off-line case, i.e., A is given in advance to the algorithm, it is known how to build it in optimal time, for
any type of alphabet size [11],[18]. Motivated by applications in Image Compression[22[, Giancarlo and Guaiana [14] considered
the on-line version of the two-dimensional suffix tree and presented an O(n2 log2 n)-time algorithm, which we refer to as GG. That algorithm is a nontrivial generalization of Ukkonen's on-line algorithm
for standard suffix trees [23]. The main contribution in this paper is an O(log n) factor improvement in the time complexity
of the GG algorithm, making it optimal for unbounded alphabets [9]. Moreover, the ideas presented here also lead to a major
simplification of the GG algorithm. Technically, we are able to preserve most of the structure of the original GG algorithm,
by reducing a computational bottleneck to a primitive operation, i.e., comparison of Lcharacters, which is here implemented
in constant time rather than O(log n) time as in GG. However, preserving that structure comes at a price. Indeed, in order
to make everything work, we need a careful reorganization of another fundamental algorithm: Weiner's algorithm for the construction
of standard suffix trees [24]. Specifically, here we provide a version of that algorithm which takes linear time and works
on-line and concurrently over a set of strings. 相似文献
15.
16.
Grebennikov V. I. Deputatova E. A. Kalikhman D. M. Kalikhman L. Ya. Skorobogatov V. V. 《Journal of Computer and Systems Sciences International》2021,60(2):248-270
Journal of Computer and Systems Sciences International - In this paper, we consider a compensation-type quartz pendulum accelerometer, a feature of which is the use of a digital feedback amplifier... 相似文献
17.
基于H.264/AVC编码框架,提出一种支持空域随机访问功能的多视点视频编码方法,以满足快速虚拟视点绘制的需求.首先采用灵活的GOP时空划分来构建独立时空体,从而在每个时空体内通过限制帧间预测范围来增强解码的独立性,实现压缩域内某一帧特定区域数据的快速定位;进而通过调整预测空间来改善空域随机访问和整帧编码性能之间的关系.实验结果表明,该方法不仅在整帧压缩效率和快速空域随机访问灵活性方面达到了很好的平衡,而且能够通过传输部分码流的方式节省传输带宽. 相似文献
18.
In the paper, a new method for modelling trees at medium detail is presented. The method is based on a volumetric representation of trees, generated by an iterated function system (IFS). Alleviating the modeling restrictions of fractal techniques, extensions to the standard IFS are introduced. Practical aspects of modeling and rendering of trees, such as data structures and bounding volumes, are discussed. The advantages of the new method are described at the end together with some results. 相似文献
19.
20.
由于当前的基于DHT的P2P系统在语言搜索方面都有很大的限制,因此建立一种恰当的既具有语言能力又有伸缩性的语言覆盖P2P网络是一种挑战.文中提出一种介于DHT和支持关键字序列查找的语言覆盖之间的中间层DST覆盖网,通过DHT获取并返回给DST覆盖网相应的处理和索引数组,由DST实现关键字序列查找.分析表明它的时间复杂度与关键字序列的长度成线性关系,实验证明在P2P网络上使用基于DST的搜索获得一个确切的文本的查找具有快速性、负载平衡和可用性. 相似文献