首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
As trees are used in a wide variety of application areas, the comparison of trees arises in many guises. Here we consider two generalizations of classical tree pattern matching, which consists of determining if one tree is isomorphic to a subgraph of another. For the embedding problems of subgraph isomorphism and topological embedding, we present algorithms for determining a largest tree embeddable in two trees T and T' (or a largest subtree) and a smallest tree in which each of T and T' can be embedded (or a smallest supertree). Both subtrees and supertrees can be used in a variety of different applications. For example, when each of the two trees contains partial information about a data set, such as the evolution of a set of species, the subtree or supertree corresponds to a structuring of the data in a manner consistent with both original trees. The size of a subtree or supertree of two trees can also be used to measure the similarity between two arrangements of data, whether images, documents, or RNA secondary structures. In this paper we present a general paradigm for sequential and parallel subtree and supertree algorithms for subgraph isomorphism and topological embedding. Our sequential algorithms run in time O(n 2.5 log n) and our parallel algorithms in time O(log 3 n) on a randomized crew pram using a polynomial number of processors. In addition, we produce better algorithms for these problems when the underlying trees are ordered, that is, when the children of each node have a left-to-right ordering associated with them. In particular, we obtain O(n 2 ) -time sequential algorithms and O(log 3 n) -time deterministic parallel algorithms on crew prams for both embeddings. Received July 17, 1995; revised May 25, 1996, and December 10, 1996.  相似文献   

2.
针对动态数据库随时间发生改变的特性,提出了一种新的在动态数据库中挖掘频繁子树的算法,引入树的转变概率、子树期望支持度和子树动态支持度等概念,提出了动态数据库中的支持度计算方法和子树搜索空间,从而解决了数据动态变化的频繁子树挖掘问题。随着子树搜索的进行,算法定义裁剪公式和混合数据结构,能有效地减少子树搜索空间和提高频繁子树的同构速度。实验结果表明,新算法有效可行,且具有较好的运行效率。  相似文献   

3.
ESPM--频繁子树挖掘算法   总被引:13,自引:2,他引:13  
随着互联网的发展,频繁模式的挖掘由频繁项集扩展到结构化数据:树和图.在这些结构上的挖掘工作被应用于更为复杂的领域,比如生物信息学、网络日志和XML文档.提出了一个新颖的算法:ESPM,以挖掘有序标号树中的频繁子树.不同于以往的工作,把树同构的判断工作放到了算法的晚期,从而减少了整个挖掘过程的时间开销.人工数据集和真实数据集上的实验都证明ESPM相较于其他算法的优越性.还提出了一些可能的改进.  相似文献   

4.
Discovering Frequent Agreement Subtrees from Phylogenetic Data   总被引:1,自引:0,他引:1  
We study a new data mining problem concerning the discovery of frequent agreement subtrees (FASTs) from a set of phylogenetic trees. A phylogenetic tree, or phylogeny, is an unordered tree in which the order among siblings is unimportant. Furthermore, each leaf in the tree has a label representing a taxon (species or organism) name, whereas internal nodes are unlabeled. The tree may have a root, representing the common ancestor of all species in the tree, or may be unrooted. An unrooted phylogeny arises due to the lack of sufficient evidence to infer a common ancestor of the taxa in the tree. The FAST problem addressed here is a natural extension of the maximum agreement subtree (MAST) problem widely studied in the computational phylogenetics community. The paper establishes a framework for tackling the FAST problem for both rooted and unrooted phylogenetic trees using data mining techniques. We first develop a novel canonical form for rooted trees together with a phylogeny-aware tree expansion scheme for generating candidate subtrees level by level. Then, we present an efficient algorithm to find all FASTs in a given set of rooted trees, through an Apriori-like approach. We show the correctness and completeness of the proposed method. Finally, we discuss the extensions of the techniques to unrooted trees. Experimental results demonstrate that the proposed methods work well, and are capable of finding interesting patterns in both synthetic data and real phylogenetic trees.  相似文献   

5.
基于投影编码的频繁子树挖掘算法   总被引:2,自引:0,他引:2  
频繁子树挖掘被广泛地应用于Web挖掘、生物信息学、XML数据挖掘等领域.提出一种新的算法--PETreeMiner.算法利用序列中无候选产生的技术--前缀投影技术来挖掘频繁子树.在树的先序遍历序列中加入结点的范围属性,在投影过程中进行编码,使得挖掘到的频繁子序列直接对应成一棵频繁子树.实验结果表明算法优于其他算法.  相似文献   

6.
朱颖雯  吉根林 《计算机科学》2007,34(12):175-179
提出了一种高效的最大频繁Embedded子树挖掘算法——CMPETreeMiner。该算法采用先序遍历序列存储树,并将节点的范围属性加入该序列,采用伪投影技术对频繁子序列进行投影,并对投影序列中的每个节点编码。在挖掘带编码的频繁子序列过程中,对频繁子序列进行高效剪枝,得到最大频繁Embedded子树,无需生成所有频繁Embedded子树。实验结果表明,CMPETreeMiner算法是高效可行的。  相似文献   

7.
Counting Objects   总被引:1,自引:0,他引:1  
  相似文献   

8.
Counting Beans     
Grier  David Alan 《Computer》2007,40(11):8-10
Our current body politic is looking for a more robust voting mechanism that is secured by technology rather than by the competence and integrity of bean counters.  相似文献   

9.
The counting method is a simple and efficient method for processing linear recursive datalog queries.Its time complexity is bounded by O(n,e)where n and e denote the numbers the numbers of nodes and edges,respectively,in the graph representing the input.relations.In this paper,the concepts of heritage appearance function and heritage selection function are introduced,and an evaluation algorithm based on the computation of such functions in topological order is developed .This new algorithm requires only linear time in the case of non-cyclic data.  相似文献   

10.
国内外学者提出了许多频繁子树挖掘算法.这些算法使用的均是固定最小支持度.一般说来,具有较高支持度的短子树通常是有趣的;而对于长子树,即使支持度相对低一些也可能有趣.这就要求挖掘过程中最小支持度的值随着树中节点数的增加而减小.提出了快速挖掘可变支持度约束的闭合与最大频繁Induced子树的算法--SCCMTreeMiner. 该算法采用最右扩展技术枚举候选子树,以及两种新的剪枝方法提高挖掘效率,挖掘过程中最小支持度的设定随着树中节点数的增加而减小.实验结果表明,SCCMTreeMiner生成的子树数量和执行时间与CMTreeMiner算法相比都有大幅度减少.  相似文献   

11.
This paper discusses how to count and generate strings that are ``distinct' in two senses: p -distinct and b -distinct. Two strings x on alphabet A and x' on alphabet A' are said to be p -distinct iff they represent distinct ``patterns'; that is, iff there exists no one—one mapping from A to A' that transforms x into x' . Thus aab and baa are p -distinct while aab and ddc are p -equivalent. On the other hand, x and x' are said to be b -distinct iff they give rise to distinct border (failure function) arrays: thus aab with border array 010 is b -distinct from aba with border array 001 . The number of p -distinct (resp. b -distinct) strings of length n formed using exactly k different letters is the [k,n] entry in an infinite p' (resp. b' ) array. Column sums p[n] and b[n] in these arrays give the number of distinct strings of length n . We present algorithms to compute, in constant time per string, all p -distinct (resp. b -distinct) strings of length n formed using exactly k letters, and we also show how to compute all elements p'[k,n] and b'[k,n] . These ideas and results have application to the efficient generation of appropriate test data sets for many string algorithms. Received December 21, 1995; revised April 28, 1997.  相似文献   

12.
In this paper, we review the concept of contextual probability, the resulting notion of neighbourhood counting and the various specialisations of this notion which result in new functions for measuring similarity, such as all common subsequences. We also provide new results on the generalisation of the all common subsequences similarity. Contextual probability was originally proposed as an alternative way of reasoning. It was later found to be an alternative way of estimating probability, and it led to the introduction of the neighbourhood counting notion. This notion was then found to be a generic similarity metric that can be applied to different types of data.  相似文献   

13.
14.
15.
Some deterministic and probabilistic methods are presented for counting and estimating the number of points on curves over finite fields, and on their projections. The classical question of estimating the size of the image of a univariate polynomial is a special case. For curves given by sparse polynomials, the counting problem is #P-complete via probabilistic parsimonious Turing reductions.  相似文献   

16.
Counting up risk     
《Infosecurity》2007,4(1):10-13
  相似文献   

17.
Abstract. Counting functions can be defined syntactically or semantically depending on whether they count the number of witnesses in a non-deterministic or in a deterministic computation on the input. In the Turing-machine-based model these two ways of defining counting were proven to be equivalent for many important complexity classes. In the circuit-based model it was done for #P, but for low-level complexity classes such as #AC 0 and #NC 1 only the syntactical definitions were considered. We give appropriate semantical definitions for these two classes and prove them to be equivalent to the syntactical ones. We also consider semantically defined probabilistic complexity classes corresponding to AC 0 and NC ^{1} and prove that in the case of unbounded error, they are identical to their syntactical counterparts.  相似文献   

18.
19.
Hough  Geoff 《ITNOW》2006,48(6):6-7
  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号