首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
2.
A critical issue in classification tree design-obtaining right-sized trees, i.e. trees which neither underfit nor overfit the data-is addressed. Instead of stopping rules to halt partitioning, the approach of growing a large tree with pure terminal nodes and selectively pruning it back is used. A new efficient iterative method is proposed to grow and prune classification trees. This method divides the data sample into two subsets and iteratively grows a tree with one subset and prunes it with the other subset, successively interchanging the roles of the two subsets. The convergence and other properties of the algorithm are established. Theoretical and practical considerations suggest that the iterative free growing and pruning algorithm should perform better and require less computation than other widely used tree growing and pruning algorithms. Numerical results on a waveform recognition problem are presented to support this view  相似文献   

3.
Discovering Frequent Agreement Subtrees from Phylogenetic Data   总被引:1,自引:0,他引:1  
We study a new data mining problem concerning the discovery of frequent agreement subtrees (FASTs) from a set of phylogenetic trees. A phylogenetic tree, or phylogeny, is an unordered tree in which the order among siblings is unimportant. Furthermore, each leaf in the tree has a label representing a taxon (species or organism) name, whereas internal nodes are unlabeled. The tree may have a root, representing the common ancestor of all species in the tree, or may be unrooted. An unrooted phylogeny arises due to the lack of sufficient evidence to infer a common ancestor of the taxa in the tree. The FAST problem addressed here is a natural extension of the maximum agreement subtree (MAST) problem widely studied in the computational phylogenetics community. The paper establishes a framework for tackling the FAST problem for both rooted and unrooted phylogenetic trees using data mining techniques. We first develop a novel canonical form for rooted trees together with a phylogeny-aware tree expansion scheme for generating candidate subtrees level by level. Then, we present an efficient algorithm to find all FASTs in a given set of rooted trees, through an Apriori-like approach. We show the correctness and completeness of the proposed method. Finally, we discuss the extensions of the techniques to unrooted trees. Experimental results demonstrate that the proposed methods work well, and are capable of finding interesting patterns in both synthetic data and real phylogenetic trees.  相似文献   

4.
This paper addresses the task of identification of nonlinear dynamic systems from measured data. The discrete-time variant of this task is commonly reformulated as a regression problem. As tree ensembles have proven to be a successful predictive modeling approach, we investigate the use of tree ensembles for solving the regression problem. While different variants of tree ensembles have been proposed and used, they are mostly limited to using regression trees as base models. We introduce ensembles of fuzzified model trees with split attribute randomization and evaluate them for nonlinear dynamic system identification.Models of dynamic systems which are built for control purposes are usually evaluated by a more stringent evaluation procedure using the output, i.e., simulation error. Taking this into account, we perform ensemble pruning to optimize the output error of the tree ensemble models. The proposed Model-Tree Ensemble method is empirically evaluated by using input–output data disturbed by noise. It is compared to representative state-of-the-art approaches, on one synthetic dataset with artificially introduced noise and one real-world noisy data set. The evaluation shows that the method is suitable for modeling dynamic systems and produces models with comparable output error performance to the other approaches. Also, the method is resilient to noise, as its performance does not deteriorate even when up to 20% of noise is added.  相似文献   

5.

Pruning is an effective technique in improving the generalization performance of decision tree. However, most of the existing methods are time-consuming or unsuitable for small dataset. In this paper, a new pruning algorithm based on structural risk of leaf node is proposed. The structural risk is measured by the product of the accuracy and the volume (PAV) in leaf node. The comparison experiments with Cost-Complexity Pruning using cross-validation (CCP-CV) algorithm on some benchmark datasets show that PAV pruning largely reduces the time cost of CCP-CV, while the test accuracy of PAV pruning is close to that of CCP-CV.

  相似文献   

6.
由于高维数据通常存在冗余和噪声,在其上直接构造覆盖模型不能充分反映数据的分布信息,导致分类器性能下降.为此提出一种基于精简随机子空间多树集成分类方法.该方法首先生成多个随机子空间,并在每个子空间上构造独立的最小生成树覆盖模型.其次对每个子空间上构造的分类模型进行精简处理,通过一个评估准则(AUC值),对生成的一类分类器进行精简.最后均值合并融合这些分类器为一个集成分类器.实验结果表明,与其它直接覆盖分类模型和bagging算法相比,多树集成覆盖分类器具有更高的分类正确率.  相似文献   

7.
针对动态数据库随时间发生改变的特性,提出了一种新的在动态数据库中挖掘频繁子树的算法,引入树的转变概率、子树期望支持度和子树动态支持度等概念,提出了动态数据库中的支持度计算方法和子树搜索空间,从而解决了数据动态变化的频繁子树挖掘问题。随着子树搜索的进行,算法定义裁剪公式和混合数据结构,能有效地减少子树搜索空间和提高频繁子树的同构速度。实验结果表明,新算法有效可行,且具有较好的运行效率。  相似文献   

8.
This paper focuses on space efficient representations of rooted trees that permit basic navigation in constant time. While most of the previous work has focused on binary trees, we turn our attention to trees of higher degree. We consider both cardinal trees (or k-ary tries), where each node has k slots, labelled {1,...,k}, each of which may have a reference to a child, and ordinal trees, where the children of each node are simply ordered. Our representations use a number of bits close to the information theoretic lower bound and support operations in constant time. For ordinal trees we support the operations of finding the degree, parent, ith child, and subtree size. For cardinal trees the structure also supports finding the child labelled i of a given node apart from the ordinal tree operations. These representations also provide a mapping from the n nodes of the tree onto the integers {1, ..., n}, giving unique labels to the nodes of the tree. This labelling can be used to store satellite information with the nodes efficiently.  相似文献   

9.
In a balloon drawing of a tree, all the children under the same parent are placed on the circumference of the circle centered at their parent, and the radius of the circle centered at each node along any path from the root reflects the number of descendants associated with the node. Among various styles of tree drawings reported in the literature, the balloon drawing enjoys a desirable feature of displaying tree structures in a rather balanced fashion. For each internal node in a balloon drawing, the ray from the node to each of its children divides the wedge accommodating the subtree rooted at the child into two sub-wedges. Depending on whether the two sub-wedge angles are required to be identical or not, a balloon drawing can further be divided into two types: even sub-wedge and uneven sub-wedge types. In the most general case, for any internal node in the tree there are two dimensions of freedom that affect the quality of a balloon drawing: (1) altering the order in which the children of the node appear in the drawing, and (2) for the subtree rooted at each child of the node, flipping the two sub-wedges of the subtree. In this paper, we give a comprehensive complexity analysis for optimizing balloon drawings of rooted trees with respect to angular resolution, aspect ratio and standard deviation of angles under various drawing cases depending on whether the tree is of even or uneven sub-wedge type and whether (1) and (2) above are allowed. It turns out that some are NP-complete while others can be solved in polynomial time. We also derive approximation algorithms for those that are intractable in general.  相似文献   

10.
A k-tree core of a tree network is a subtree with exactly k leaves that minimizes the total distance from vertices to the subtree. A k-tree center of a tree network is a subtree with exactly k leaves that minimizes the distance from the farthest vertex to the subtree. In this paper, two efficient parallel algorithms are proposed for finding a k-tree core and a k-tree center of a tree network, respectively. Both the proposed algorithms perform on the EREW PRAM in O(log n log n) time using O(n) work (time-processor product). Besides being efficient on the EREW PRAM, in the sequential case, our algorithm for finding a k-tree core of a tree network improves the two algorithms previously proposed  相似文献   

11.
Mining frequent tree patterns has many applications in different areas such as XML data, bioinformatics and World Wide Web. The crucial step in frequent pattern mining is frequency counting, which involves a matching operator to find occurrences (instances) of a tree pattern in a given collection of trees. A widely used matching operator for tree-structured data is subtree homeomorphism, where an edge in the tree pattern is mapped onto an ancestor-descendant relationship in the given tree. Tree patterns that are frequent under subtree homeomorphism are usually called embedded patterns. In this paper, we present an efficient algorithm for subtree homeomorphism with application to frequent pattern mining. We propose a compact data-structure, called occ, which stores only information about the rightmost paths of occurrences and hence can encode and represent several occurrences of a tree pattern. We then define efficient join operations on the occ data-structure, which help us count occurrences of tree patterns according to occurrences of their proper subtrees. Based on the proposed subtree homeomorphism method, we develop an effective pattern mining algorithm, called TPMiner. We evaluate the efficiency of TPMiner on several real-world and synthetic datasets. Our extensive experiments confirm that TPMiner always outperforms well-known existing algorithms, and in several cases the improvement with respect to existing algorithms is significant.  相似文献   

12.
Most decision‐tree induction algorithms are using a local greedy strategy, where a leaf is always split on the best attribute according to a given attribute‐selection criterion. A more accurate model could possibly be found by looking ahead for alternative subtrees. However, some researchers argue that the look‐ahead should not be used due to a negative effect (called “decision‐tree pathology”) on the decision‐tree accuracy. This paper presents a new look‐ahead heuristics for decision‐tree induction. The proposed method is called look‐ahead J48 ( LA‐J48) as it is based on J48, the Weka implementation of the popular C4.5 algorithm. At each tree node, the LA‐J48 algorithm applies the look‐ahead procedure of bounded depth only to attributes that are not statistically distinguishable from the best attribute chosen by the greedy approach of C4.5. A bootstrap process is used for estimating the standard deviation of splitting criteria with unknown probability distribution. Based on a separate validation set, the attribute producing the most accurate subtree is chosen for the next step of the algorithm. In experiments on 20 benchmark data sets, the proposed look‐ahead method outperforms the greedy J48 algorithm with the gain ratio and the gini index splitting criteria, thus avoiding the look‐ahead pathology of decision‐tree induction.  相似文献   

13.
We consider the following problem: Given ordered labeled trees S and T can S be obtained from T by deleting nodes? Deletion of the root node u of a subtree with children means replacing the subtree by the trees . For the tree inclusion problem, there can generally be exponentially many ways to obtain the included tree. P. Kilpelinen and H. Mannila [5,7] gave an algorithm based on dynamic programming requiring time and space in the worst case and also on the average for solving this problem. We give an algorithm whose idea is similar to that of [5,7] but which improves the previous one and on the average breaks the barrier. Received: 4 November 1996 / 2 March 2001  相似文献   

14.
提出一种新的基于MPLS的组播方法--按需分枝组播方法.该方法采用一种全新的组播树维护方式,即组播树上只有分枝节点处的路由器和本地链路上有组成员的路由器需要保存组播树的有关信息,并参加组播树的维护过程,组播树上的其它路由器只是以普通单播的路由方式组播数据包,无须维护组播树的任何信息.网络仿真实验和与其它算法性能比较分析表明,该方法可有效地提高IP组播的可量测性和减少转发状态.  相似文献   

15.
Scribe是一种经典的基于Topic的发布订阅系统,它通过分布式组播树将Event分发给订阅者.Scribe需要定期维护组播树,因此造成了大量冗余的Event传递和高昂的维护代价.提出一种基于Scribe的增强型Topic发布订阅系统,简称EScribe.EScribe利用布隆过滤器存储Pastry叶子节点的订阅信息,动态地调整下一跳路由.节点维护子树的间隔时间随着节点在组播树中的层次增大而加长.实验结果表明,EScribe大幅减小了冗余Event传递的数量和组播树规模,也明显减小了Event分发和组播树维护的代价.  相似文献   

16.
Wilber’s logarithmic lower bound, concerning off-line binary search tree access costs, is generalized to encompass binary trees that are not constrained to satisfy the search tree property. Rotation operations in this extended model can be preceded by subtree swaps. A separation between the power of processing with search trees versus unordered trees is demonstrated.  相似文献   

17.
理论及实验表明,在训练集上具有较大边界分布的组合分类器泛化能力较强。文中将边界概念引入到组合剪枝中,并用它指导组合剪枝方法的设计。基于此,构造一个度量标准(MBM)用于评估基分类器相对于组合分类器的重要性,进而提出一种贪心组合选择方法(MBMEP)以降低组合分类器规模并提高它的分类准确率。在随机选择的30个UCI数据集上的实验表明,与其它一些高级的贪心组合选择算法相比,MBMEP选择出的子组合分类器具有更好的泛化能力。  相似文献   

18.
Moen  S. 《Software, IEEE》1990,7(4):21-28
A tree-drawing algorithm that addresses the weaknesses of current approaches to constructing graphical user interfaces is presented. Present algorithms either do not let you draw tree nodes of varying shapes and sizes or they draw such trees in a way that does not produce trees as compact as they could be, which is especially important when diagramming a large system. Also, they cannot reuse layout information when the trees changes, so after every change the layout must be recomputed and the tree redrawn. The main difference between these traditional approaches and the author's approach is that his algorithm is more geometric. Unlike other algorithms, it uses an explicit representation of node and subtree contours, and it stores every contour as a polygon. It has three advantages over traditional algorithms. It allows one to draw trees with nodes of any polygonal shape compactly. The data structure supports insert and delete operations on subtrees. It is simple to implement, yet flexible  相似文献   

19.
基于粗糙集理论的决策树分类方法   总被引:1,自引:0,他引:1  
决策树是数据挖掘中常用的分类方法。本文提出了基于粗糙集的决策树方法,利用粗糙集近似精确度来选择决策树的根节点,分支由分类产生。该方法计算简单,易于理解。本文还提出用悲观剪枝法简化决策树,提高决策树的预测与分类能力。实例说明了本文方法均简单有效。  相似文献   

20.
A CSP search algorithm, like FC or MAC, explores a search tree during its run. Every node of the search tree can be associated with a CSP created by the refined domains of unassigned variables. If the algorithm detects that the CSP associated with a node is insoluble, the node becomes a dead-end. A strategy of pruning “by analogy” states that the current node of the search tree can be discarded if the CSP associated with it is “more constrained” than a CSP associated with some dead-end node. In this paper we present a method of pruning based on the above strategy. The information about the CSPs associated with dead-end nodes is kept in the structures called responsibility sets and kernels. We term the method that uses these structures for pruning RKP, which is abbreviation of Responsibility set, Kernel, Propagation. We combine the pruning method with algorithms FC and MAC. We call the resulting solvers FC-RKP and MAC-RKP, respectively. Experimental evaluation shows that MAC-RKP outperforms MAC-CBJ on random CSPs and on random graph coloring problems. The RKP-method also has theoretical interest. We show that under certain restrictions FC-RKP simulates FC-CBJ. It follows from the fact that intelligent backtracking implicitly uses the strategy of pruning “by analogy.”  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号