首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We discuss an intrinsic generalization of the suffix tree, designed to index a string of length n which has a natural partitioning into m multicharacter substrings or words . This word suffix tree represents only the m suffixes that start at word boundaries. These boundaries are determined by delimiters , whose definition depends on the application. Since traditional suffix tree construction algorithms rely heavily on the fact that all suffixes are inserted, construction of a word suffix tree is nontrivial, in particular when only O(m) construction space is allowed. We solve this problem, presenting an algorithm with O(n) expected running time. In general, construction cost is Ω(n) due to the need of scanning the entire input. In applications that require strict node ordering, an additional cost of sorting O(m') characters arises, where m' is the number of distinct words. In either case, this is a significant improvement over previously known solutions. Furthermore, when the alphabet is small, we may assume that the n characters in the input string occupy o(n) machine words. We illustrate that this can allow a word suffix tree to be built in sublinear time. Received September 2, 1997; revised December 10, 1997.  相似文献   

2.
3.
后缀树是处理字符串的一个优秀算法。利用图像化设计可使后缀树更加清晰。按照递推的思路,建立前i个字符对应的后缀树,通过插入第i+1个字符的方式,建立前i+1个字符对应的后缀树。由于字符串的任意子串都可以表示为某个后缀的前缀,因此可以设定当前节点为根节点。父节点取子节点中贡献最大的节点,同时,记录其对应的字符串。  相似文献   

4.
搜索引擎(Search Eng ine)技术是在网络数据成指数级增加的情况下出现的新技术。然而现在的搜索引擎在检索时都采用的是倒排文件,从后缀数据技术出发探讨了压缩后缀数组(Com pressed Su ffix A rray)技术在搜索引擎技术中的应用,从而大大提高了搜索引擎的性能。  相似文献   

5.
6.
With the first human DNA being decoded into a sequence of about 2.8 billion characters, much biological research has been centered on analyzing this sequence. Theoretically speaking, it is now feasible to accommodate an index for human DNA in the main memory so that any pattern can be located efficiently. This is due to the recent breakthrough on compressed suffix arrays, which reduces the space requirement from O(n log n) bits to O(n) bits. However, constructing compressed suffix arrays is still not an easy task because we still have to compute suffix arrays first and need a working memory of O(n log n) bits (i.e., more than 13 gigabytes for human DNA). This paper initiates the study of constructing compressed suffix arrays directly from the text. The main contribution is a construction algorithm that uses only O(n) bits of working memory, and the time complexity is O(n log n). Our construction algorithm is also time and space efficient for texts with large alphabets such as Chinese or Japanese. Precisely, when the alphabet size is |Σ|, the working space is O(n log |Σ|) bits, and the time complexity remains O(n log n), which is independent of |Σ|.  相似文献   

7.
后缀树的重要性可以为多年来学术界对它总是有新的发现而印证.它的结构简单,但可以在线性的时间里解决许多复杂的问题,被大量的使用在字符串及树的模式匹配中,对于XML标准,有很多基于关系库和对象库的索引技术和查询方案被提出来,我们试图给出一种基于后缀树进行路径导航的查询机制:用后缀树构造XML路径字典加速路径查询评价速度,我们提出可以在线地建立一个trie树的后缀树,讨论了XML路径字典中的后缀树建树算法,阐述了整个索引方案和查询机制,并探讨了包括RPE在内的它所支持的各种查询操作,XML路径字典被用于加快路径查询的评价速度.  相似文献   

8.
Suffix trees are the fundamental data structure of combinatorial pattern matching on words. Suffix trees have been used in order to give optimal solutions to a great variety of problems on static words, but for practical situations, such as in a text editor, where the incremental changes of the text make dynamic updating of the corresponding suffix trees necessary, this data structure alone has not been used with success. We prove that, for dynamic modifications of order O(1) of words of length n, any suffix tree updating algorithm, such as the ones proposed by McCreight, requires O(n) worst-case running time, as for the full reconstruction of the suffix tree. Consequently, we argue that this data structure alone is not appropriate for the solution of combinatorial problems on words that change dynamically.  相似文献   

9.
We provide a uniform framework for the study of index data structures for a two-dimensional matrixTEXT[1:n, 1:n] whose entries are drawn from an ordered alphabetΣ. An index forTEXTcan be informally seen as the two-dimensional analog of the suffix tree for a string. It allows on-line searches and statistics to be performed onTEXTby representing compactly theΘ(n3) square submatrices ofTEXTin optimalO(n2) space. We identify 4n−1families of indices forTEXT, each containing ∏ni=1 (2i−1)! isomorphic data structures. We also develop techniques leading to a single algorithm that efficiently builds any index in any family inO(n2 log n) time andO(n2) space. Such an algorithm improves in various respects the algorithms for the construction of the PAT tree and the Lsuffix tree. The framework and the algorithm easily generalize tod>2 dimensions. Moreover, as part of our algorithm, we provide new algorithmic tools that yield a space-efficient implementation of the “naming scheme” of R. Karpet al.(in“Proceedings, Fourth Symposium on Theory of Computing,” pp. 125–136) for strings and matrices.  相似文献   

10.
带后缀的三字词约占三字词总数的17.5%。笔者通过对120万字语料的统计和内省共获得71种能构成三字词的后缀,并分析了这些三字词的组合规律及前缀等上下文特征。运用这些知识,对65万字语料中带后缀的三字词进行识别,精确率和召回率由原来的85.2%和86.6%分别提高到86.6%和99.7%。  相似文献   

11.
12.
We are interested in the expressiveness of constraints represented by general first order formulae, with equality as unique relation symbol and function symbols taken from an infinite set F. The chosen domain is the set of trees whose nodes, in possibly infinite number, are labelled by elements of F. The operation linked to each element f of F is the mapping (a 1,..., a n ) b, where b is the tree whose initial node is labelled f and whose sequence of daughters is a 1,..., a n .We first consider tree constraints involving long alternated sequences of quantifiers .... We show how to express winning positions of two-person games with such constraints and apply our results to two examples.We then construct a family of strongly expressive tree constraints, inspired by a constructive proof of a complexity result by Pawel Mielniczuk. This family involves the huge number (k), obtained by top down evaluating a power tower of 2's, of height k. By a tree constraint of size proportional to k, it is then possible to define a tree having exactly (k) nodes or to express the multiplication table computed by a Prolog machine executing up to (k) instructions.By replacing the Prolog machine with a Turing machine we show the quasi-universality of tree constraints, that is to say, the ability to concisely describe trees which the most powerful machine will never have time to compute. We also rediscover the following result of Sergei Vorobyov: the complexity of an algorithm, deciding whether a tree constraint without free variables is true, cannot be bounded above by a function obtained from finite composition of simple functions including exponentiation.Finally, taking advantage of the fact that we have at our disposal an algorithm for solving such constraints in all their generalities, we produce a set of benchmarks for separating feasible examples from purely speculative ones. Among others we notice that it is possible to solve a constraint of 5000 symbols involving 160 alternating quantifiers.  相似文献   

13.
维吾尔语单词的构形词缀按照一定的规则连接到词干。维吾尔语的黏着言特点和构形词缀连接规则使得可以构造维吾尔语构形词缀的有限状态自动机。该文将详细介绍维吾尔语形容词构形词缀有限自动机的构造步骤。  相似文献   

14.
The two-dimensional suffix tree of an n × n square matrix A is a compacted trie that represents all square submatrices of A[11]. For the off-line case, i.e., A is given in advance to the algorithm, it is known how to build it in optimal time, for any type of alphabet size [11],[18]. Motivated by applications in Image Compression[22[, Giancarlo and Guaiana [14] considered the on-line version of the two-dimensional suffix tree and presented an O(n2 log2 n)-time algorithm, which we refer to as GG. That algorithm is a nontrivial generalization of Ukkonen's on-line algorithm for standard suffix trees [23]. The main contribution in this paper is an O(log n) factor improvement in the time complexity of the GG algorithm, making it optimal for unbounded alphabets [9]. Moreover, the ideas presented here also lead to a major simplification of the GG algorithm. Technically, we are able to preserve most of the structure of the original GG algorithm, by reducing a computational bottleneck to a primitive operation, i.e., comparison of Lcharacters, which is here implemented in constant time rather than O(log n) time as in GG. However, preserving that structure comes at a price. Indeed, in order to make everything work, we need a careful reorganization of another fundamental algorithm: Weiner's algorithm for the construction of standard suffix trees [24]. Specifically, here we provide a version of that algorithm which takes linear time and works on-line and concurrently over a set of strings.  相似文献   

15.
程序员对源代码的拷贝、粘贴及修改活动会导致软件中出现大量克隆代码,增加软件开发和维护的成本。为解决该问题,提出一种新的克隆检测方法。利用基于后缀数组的算法查找重复的 Token 子串,进而检测出克隆代码,开发相应的克隆检测工具SaCD,用其检测29款C语言开源软件。实验结果表明,SaCD能快速有效地检测软件中的Type-1和Type-2语句克隆,其检测速度比传统的克隆检测工具CCFinderx快了近20倍。  相似文献   

16.
Journal of Computer and Systems Sciences International - In this paper, we consider a compensation-type quartz pendulum accelerometer, a feature of which is the use of a digital feedback amplifier...  相似文献   

17.
基于H.264/AVC编码框架,提出一种支持空域随机访问功能的多视点视频编码方法,以满足快速虚拟视点绘制的需求.首先采用灵活的GOP时空划分来构建独立时空体,从而在每个时空体内通过限制帧间预测范围来增强解码的独立性,实现压缩域内某一帧特定区域数据的快速定位;进而通过调整预测空间来改善空域随机访问和整帧编码性能之间的关系.实验结果表明,该方法不仅在整帧压缩效率和快速空域随机访问灵活性方面达到了很好的平衡,而且能够通过传输部分码流的方式节省传输带宽.  相似文献   

18.
In the paper, a new method for modelling trees at medium detail is presented. The method is based on a volumetric representation of trees, generated by an iterated function system (IFS). Alleviating the modeling restrictions of fractal techniques, extensions to the standard IFS are introduced. Practical aspects of modeling and rendering of trees, such as data structures and bounding volumes, are discussed. The advantages of the new method are described at the end together with some results.  相似文献   

19.
20.
由于当前的基于DHT的P2P系统在语言搜索方面都有很大的限制,因此建立一种恰当的既具有语言能力又有伸缩性的语言覆盖P2P网络是一种挑战.文中提出一种介于DHT和支持关键字序列查找的语言覆盖之间的中间层DST覆盖网,通过DHT获取并返回给DST覆盖网相应的处理和索引数组,由DST实现关键字序列查找.分析表明它的时间复杂度与关键字序列的长度成线性关系,实验证明在P2P网络上使用基于DST的搜索获得一个确切的文本的查找具有快速性、负载平衡和可用性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号