首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 62 毫秒
1.
后缀树的结构简单,但可以在线性的时间里解决许多复杂的问题,被大量的使用在字符串及树的模式匹配中.  相似文献   

2.
为用后缀树聚类算法对维吾尔文网页进行聚类,通过分析可扩展后缀树和维吾尔文的特点设计了维吾尔文后缀树构造算法。实验结果证明该方法能够在线性的时间范围内构造维吾尔文后缀树,并用它来对维吾尔文网页进行聚类。  相似文献   

3.
后缀树是处理字符串的一个优秀算法。利用图像化设计可使后缀树更加清晰。按照递推的思路,建立前i个字符对应的后缀树,通过插入第i+1个字符的方式,建立前i+1个字符对应的后缀树。由于字符串的任意子串都可以表示为某个后缀的前缀,因此可以设定当前节点为根节点。父节点取子节点中贡献最大的节点,同时,记录其对应的字符串。  相似文献   

4.
一种适合于GPU计算的并行后缀数组构造算法   总被引:1,自引:0,他引:1  
后缀数组广泛应用于序列分析、字符串匹配和文本压缩,近年来,有关后缀数组构造和应用算法的不断探索构成了计算机科学中一个非常活跃的研究领域.在对现有串行算法进行了分析和对比之后,提出了一种新的、简洁的适合于GPU计算的并行后缀数组倍增构造算法,以排序方法替代传统的分组策略,不但能独立完成后缀数组的并行构造,还可与现存的串行倍增算法结合使用,以达到最高的执行效率.实验结果表明该算法在解决实际应用问题时,具有易于实现、执行速度快和可扩展性强等优点,尤其在处理小字符集的数据时效率更高.  相似文献   

5.
提出一种基于后缀表示的构建系统发生树的蚁群算法(SR-PTC),该算法用蚂蚁访问物种集合以形成一个对应最优系统发生树的后缀表示序列。为构成一个合法的系统发生树的后缀表示,蚂蚁对内部结点的选择要受到限制,分别为叶结点和内部结点设置两个不同的选择概率,并用赌轮盘选择方法来决定两种结点的选择。另外,在信息素更新时,加入当前树的评价值来影响蚂蚁的运动方向。实验结果表明,此方法能得到较为准确的拓扑结构,在物种数目较小时可以较快地得到结果。  相似文献   

6.
DNA序列中基于适应性后缀树的重复体识别算法   总被引:1,自引:0,他引:1  
现有的在DNA序列中识别重复体的算法多数是基于比对的,对识别速度和吞吐量有很大的限制.针对这个问题文中根据一个平衡重复体的长度和频率的定义,提出了一种基于Ukkonen后缀树的快速识别重复体的RepSeeker算法.算法采用最低限制频率,最大程度地扩展了重复体的长度,同时为了进一步地提高RepSeeker算法的效率,对Ukkonen的后缀树构造算法进行了适应性改进,在构造时加入RepSeeker算法所需的结点信息并将叶子结点和分支结点加以区分,从而使得RepSeeker算法能通过直接读取结点信息来求得子串频率和子串位置.这种改进较大地提高了RepSeeker算法的性能,而且空间开销不大.实验中使用了NCBI中的9条典型DNA序列作为测试数据,并对后缀树改进前后的重复体识别算法做了比较分析.结果表明,RepSeeker在没有损失精度的情况下缩短了算法的运行时间.实验结果与理论上的分析一致.  相似文献   

7.
文章首先介绍了PDBMS采用的Hash-Round-Robin(HRR)数据划分方法以及基于该划分方法的并行RDBn树,最后着重、详细地给出了基于该树的并行Join算法,分析了该算法的效率。  相似文献   

8.
入侵检测中基于后缀树的多模式匹配算法   总被引:1,自引:0,他引:1  
针对模式匹配算法已经成为误用入侵检测系统性能瓶颈的现状,提出了一种新的基于后缀树的多模式匹配算法FSM(Fast String Matching Algorithm).该算法构建了一个后缀自动机,匹配中应用了好后缀启发机制进行启发跳跃.将改算法在Snort2.4.3中实现,实验结果表明,在耗费一定空间的基础上,Snort的时间性能有了较大提高.  相似文献   

9.
基于并行B+-树的并行Join算法的设计、分析与实现   总被引:1,自引:0,他引:1  
B^+-树是一种有效的数据库存储结构,被普遍应用于各种关系数据库系统。把B^+-树并行化,使之用于并行数据库系统显然是一项很有意义的重要工作。本文研究了适用于并行数据库的并行B^+-树存储结构,提出两类基于并行B^+-树工并行Join算法。理论和实验结果表明,这些算法效率高基其它并行Join算法。  相似文献   

10.
本文分析了博弈树并行搜索算法中的主变量分裂算法的同步开锁问题。  相似文献   

11.
Suffix trees are the fundamental data structure of combinatorial pattern matching on words. Suffix trees have been used in order to give optimal solutions to a great variety of problems on static words, but for practical situations, such as in a text editor, where the incremental changes of the text make dynamic updating of the corresponding suffix trees necessary, this data structure alone has not been used with success. We prove that, for dynamic modifications of order O(1) of words of length n, any suffix tree updating algorithm, such as the ones proposed by McCreight, requires O(n) worst-case running time, as for the full reconstruction of the suffix tree. Consequently, we argue that this data structure alone is not appropriate for the solution of combinatorial problems on words that change dynamically.  相似文献   

12.
We discuss an intrinsic generalization of the suffix tree, designed to index a string of length n which has a natural partitioning into m multicharacter substrings or words . This word suffix tree represents only the m suffixes that start at word boundaries. These boundaries are determined by delimiters , whose definition depends on the application. Since traditional suffix tree construction algorithms rely heavily on the fact that all suffixes are inserted, construction of a word suffix tree is nontrivial, in particular when only O(m) construction space is allowed. We solve this problem, presenting an algorithm with O(n) expected running time. In general, construction cost is Ω(n) due to the need of scanning the entire input. In applications that require strict node ordering, an additional cost of sorting O(m') characters arises, where m' is the number of distinct words. In either case, this is a significant improvement over previously known solutions. Furthermore, when the alphabet is small, we may assume that the n characters in the input string occupy o(n) machine words. We illustrate that this can allow a word suffix tree to be built in sublinear time. Received September 2, 1997; revised December 10, 1997.  相似文献   

13.
Engineering a Lightweight Suffix Array Construction Algorithm   总被引:6,自引:0,他引:6  
In this paper we describe a new algorithm for building the suffix array of a string. This task is equivalent to the problem of lexicographically sorting all the suffixes of the input string. Our algorithm is based on a new approach called deep–shallow sorting: we use a shallow sorter for the suffixes with a short common prefix, and a deep sorter for the suffixes with a long common prefix. All the known algorithms for building the suffix array either require a large amount of space or are inefficient when the input string contains many repeated substrings. Our algorithm has been designed to overcome this dichotomy. Our algorithm is lightweight in the sense that it uses very small space in addition to the space required by the suffix array itself. At the same time our algorithm is fast even when the input contains many repetitions: this has been shown by extensive experiments with inputs of size up to 110 Mb. The source code of our algorithm, as well as a C library providing a simple API, is available under the GNU GPL.  相似文献   

14.
后缀数组构建算法的时间和空间开销是它在实际应用中的瓶颈。该文介绍了两种较好的构建算法,对它们的性能作了评估和分析,指出了各自的适用范围,给出并比较了两种算法在不同情况下的实验结果。  相似文献   

15.
由于在生物序列分析、文本索引、网络入侵检测等领域的应用需求,带有通配符的模式匹配问题一直是研究 的热点。针对已有的研究工作中通配符和长度约束具有较强的局限性问题,研究带有灵活通配符的模式匹配问题,其 中通配符可以在模式的任意两子串间出现且可以指定灵活的长度约束。采用非线性数据结构—后缀树,设计了求 解模式所有解的完备算法PAS"I'。预处理阶段采用在线增量式算法构建具有文本先验知识的后缀树,搜索阶段结合 动态规划的思想,逐个匹配模式中字符,最终得到完备解。在基因序列上的实验表明,PAST比其他算法具有更好的 时间性能。  相似文献   

16.
Finding motifs in biological sequences is one of the most intriguing problems for string algorithm designers due to, on the one hand, the numerous applications of this problem in molecular biology and, on the other hand, the challenging aspects of the computational problem. Indeed, when dealing with biological sequences it is necessary to work with approximations (that is, to identify fragments that are not necessarily identical, but just similar, according to a given similarity notion), and this complicates the problem. Existing algorithms run in time linear with respect to the input size. Nevertheless, the output size can be very large due to the approximation (namely exponential in the approximation degree). This often makes the output unreadable, as well as slowing down the inference itself. A high degree of redundancy has been detected in the set of motifs that satisfy traditional requirements, even for exact motifs. Moreover, it has been observed many times that only a subset of these motifs, namely the maximal motifs, could be enough to provide the information of all of them. In this paper, we aim at removing such redundancy. We extend some notions of maximality already defined for exact motifs to the case of approximate motifs with Hamming distance, and we give a characterization of maximal motifs on the suffix tree. Given that this data structure is used by a whole class of motif extraction tools, we show how these tools can be modified to include the maximality requirement without changing the asymptotical complexity.  相似文献   

17.
Maa? 《Algorithmica》2008,37(1):43-74
Affix trees are a generalization of suffix trees that are based on the inherent duality of suffix trees induced by the suffix links. An algorithm is presented that constructs affix trees on-line by expanding the underlying string in both directions and that is the first known algorithm to do this with linear time complexity.  相似文献   

18.
后缀树聚类算法在元搜索引擎中的应用   总被引:2,自引:0,他引:2  
元搜索引擎结果覆盖面广,易于维护,实现简单,能够提供比较全面的结果给用户。后缀树聚类算法(STC)充分考虑了文本集合的语言学特征,并引入了短语特性,从而产生了较好的聚类效果。本文将后缀树聚类算法应用到元搜索引擎中,从而增强了结果的可浏览性,提高了搜索的精度。实验结果表明,STC算法在查准率和时间性能方面都高于传统的聚类算法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号