首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
We introduce a new algorithm building an optimal dyadic decision tree (ODT). The method combines guaranteed performance in the learning theoretical sense and optimal search from the algorithmic point of view. Furthermore it inherits the explanatory power of tree approaches, while improving performance over classical approaches such as CART/C4.5, as shown on experiments on artificial and benchmark data.  相似文献   

2.
Omnivariate decision trees.   总被引:3,自引:0,他引:3  
Univariate decision trees at each decision node consider the value of only one feature leading to axis-aligned splits. In a linear multivariate decision tree, each decision node divides the input space into two with a hyperplane. In a nonlinear multivariate tree, a multilayer perceptron at each node divides the input space arbitrarily, at the expense of increased complexity and higher risk of overfitting. We propose omnivariate trees where the decision node may be univariate, linear, or nonlinear depending on the outcome of comparative statistical tests on accuracy thus matching automatically the complexity of the node with the subproblem defined by the data reaching that node. Such an architecture frees the designer from choosing the appropriate node type, doing model selection automatically at each node. Our simulation results indicate that such a decision tree induction method generalizes better than trees with the same types of nodes everywhere and induces small trees.  相似文献   

3.
Classifiability-based omnivariate decision trees   总被引:1,自引:0,他引:1  
Top-down induction of decision trees is a simple and powerful method of pattern classification. In a decision tree, each node partitions the available patterns into two or more sets. New nodes are created to handle each of the resulting partitions and the process continues. A node is considered terminal if it satisfies some stopping criteria (for example, purity, i.e., all patterns at the node are from a single class). Decision trees may be univariate, linear multivariate, or nonlinear multivariate depending on whether a single attribute, a linear function of all the attributes, or a nonlinear function of all the attributes is used for the partitioning at each node of the decision tree. Though nonlinear multivariate decision trees are the most powerful, they are more susceptible to the risks of overfitting. In this paper, we propose to perform model selection at each decision node to build omnivariate decision trees. The model selection is done using a novel classifiability measure that captures the possible sources of misclassification with relative ease and is able to accurately reflect the complexity of the subproblem at each node. The proposed approach is fast and does not suffer from as high a computational burden as that incurred by typical model selection algorithms. Empirical results over 26 data sets indicate that our approach is faster and achieves better classification accuracy compared to statistical model select algorithms.  相似文献   

4.
Basak J 《Neural computation》2004,16(9):1959-1981
Decision trees and neural networks are widely used tools for pattern classification. Decision trees provide highly localized representation, whereas neural networks provide a distributed but compact representation of the decision space. Decision trees cannot be induced in the online mode, and they are not adaptive to changing environment, whereas neural networks are inherently capable of online learning and adpativity. Here we provide a classification scheme called online adaptive decision trees (OADT), which is a tree-structured network like the decision trees and capable of online learning like neural networks. A new objective measure is derived for supervised learning with OADT. Experimental results validate the effectiveness of the proposed classification scheme. Also, with certain real-life data sets, we find that OADT performs better than two widely used models: the hierarchical mixture of experts and multilayer perceptron.  相似文献   

5.
One of the central problems of machine learning is considered—the regression restoration problem. A qualitatively new regression decision tree (RDT) is proposed that is based on the concept of a full decision tree (FDT). Earlier, a similar construction of a decision tree (DT) was successfully tested on the problem of classification by precedents, whose statement is close to the problem considered. The results of testing the model of a full RDT (FRDT) on real data are presented.  相似文献   

6.
《Knowledge》1999,12(5-6):269-275
An algorithm for decision-tree induction is presented in which attribute selection is based on the evidence-gathering strategies used by doctors in sequential diagnosis. Since the attribute selected by the algorithm at a given node is often the best attribute according to the Quinlan's information gain criterion, the decision tree it induces is often identical to the ID3 tree when the number of attributes is small. In problem-solving applications of the induced decision tree, an advantage of the approach is that the relevance of a selected attribute or test can be explained in strategic terms. An implementation of the algorithm in an environment providing integrated support for incremental learning, problem solving and explanation is presented.  相似文献   

7.
This paper presents a new architecture of a fuzzy decision tree based on fuzzy rules – fuzzy rule based decision tree (FRDT) and provides a learning algorithm. In contrast with “traditional” axis-parallel decision trees in which only a single feature (variable) is taken into account at each node, the node of the proposed decision trees involves a fuzzy rule which involves multiple features. Fuzzy rules are employed to produce leaves of high purity. Using multiple features for a node helps us minimize the size of the trees. The growth of the FRDT is realized by expanding an additional node composed of a mixture of data coming from different classes, which is the only non-leaf node of each layer. This gives rise to a new geometric structure endowed with linguistic terms which are quite different from the “traditional” oblique decision trees endowed with hyperplanes as decision functions. A series of numeric studies are reported using data coming from UCI machine learning data sets. The comparison is carried out with regard to “traditional” decision trees such as C4.5, LADtree, BFTree, SimpleCart, and NBTree. The results of statistical tests have shown that the proposed FRDT exhibits the best performance in terms of both accuracy and the size of the produced trees.  相似文献   

8.
Integrated sensing and processing decision trees   总被引:2,自引:0,他引:2  
We introduce a methodology for adaptive sequential sensing and processing in a classification setting. Our objective for sensor optimization is the back-end performance metric--in this case, misclassification rate. Our methodology, which we dub Integrated Sensing and Processing Decision Trees (ISPDT), optimizes adaptive sequential sensing for scenarios in which sensor and/or throughput constraints dictate that only a small subset of all measurable attributes can be measured at any one time. Our decision trees optimize misclassification rate by invoking a local dimensionality reduction-based partitioning metric in the early stages, focusing on classification only in the leaves of the tree. We present the ISPDT methodology and illustrative theoretical, simulation, and experimental results.  相似文献   

9.
10.
Efficient incremental induction of decision trees   总被引:2,自引:0,他引:2  
This paper proposes a method to improve ID5R, an incremental TDIDT algorithm. The new method evaluates the quality of attributes selected at the nodes of a decision tree and estimates a minimum number of steps for which these attributes are guaranteed such a selection. This results in reducing overheads during incremental learning. The method is supported by theoretical analysis and experimental results.  相似文献   

11.
12.
A dynamic programming algorithm for constructing optimal dyadic decision trees was recently introduced, analyzed, and shown to be very effective for low dimensional data sets. This paper enhances and extends this algorithm by: introducing an adaptive grid search for the regularization parameter that guarantees optimal solutions for all relevant trees sizes, replacing the dynamic programming algorithm with a memoized recursive algorithm whose run time is substantially smaller for most regularization parameter values on the grid, and incorporating new data structures and data pre-processing steps that provide significant run time enhancement in practice.  相似文献   

13.
Genetically optimized fuzzy decision trees.   总被引:1,自引:0,他引:1  
In this study, we are concerned with genetically optimized fuzzy decision trees (G-DTs). Decision trees are fundamental architectures of machine learning, pattern recognition, and system modeling. Starting with the generic decision tree with discrete or interval-valued attributes, we develop its fuzzy set-based generalization. In this generalized structure we admit the values of the attributes that are represented by some membership functions. Such fuzzy decision trees are constructed in the setting of genetic optimization. The underlying genetic algorithm optimizes the parameters of the fuzzy sets associated with the individual nodes where they play a role of fuzzy "switches" by distributing a flow of processing completed within the tree. We discuss various forms of the fitness function that help capture the essence of the problem at hand (that could be either of classification nature when dealing with discrete outputs or regression-like when handling a continuous output variable). We quantify a nature of the generalization of the tree by studying an optimally adjusted spreads of the membership functions located at the nodes of the decision tree. A series of experiments exploiting synthetic and machine learning data is used to illustrate the performance of the G-DTs.  相似文献   

14.
Test strategies for cost-sensitive decision trees   总被引:2,自引:0,他引:2  
In medical diagnosis, doctors must often determine what medical tests (e.g., X-ray and blood tests) should be ordered for a patient to minimize the total cost of medical tests and misdiagnosis. In this paper, we design cost-sensitive machine learning algorithms to model this learning and diagnosis process. Medical tests are like attributes in machine learning whose values may be obtained at a cost (attribute cost), and misdiagnoses are like misclassifications which may also incur a cost (misclassification cost). We first propose a lazy decision tree learning algorithm that minimizes the sum of attribute costs and misclassification costs. Then, we design several novel "test strategies" that can request to obtain values of unknown attributes at a cost (similar to doctors' ordering of medical tests at a cost) in order to minimize the total cost for test examples (new patients). These test strategies correspond to different situations in real-world diagnoses. We empirically evaluate these test strategies, and show that they are effective and outperform previous methods. Our results can be readily applied to real-world diagnosis tasks. A case study on heart disease is given throughout the paper.  相似文献   

15.
Knowledge and Information Systems - Crowdsourcing systems provide an efficient way to collect labeled data by employing non-expert crowd workers. In practice, each instance obtains a multiple noisy...  相似文献   

16.
Constrained cascade generalization of decision trees   总被引:1,自引:0,他引:1  
While decision tree techniques have been widely used in classification applications, a shortcoming of many decision tree inducers is that they do not learn intermediate concepts, i.e., at each node, only one of the original features is involved in the branching decision. Combining other classification methods, which learn intermediate concepts, with decision tree inducers can produce more flexible decision boundaries that separate different classes, potentially improving classification accuracy. We propose a generic algorithm for cascade generalization of decision tree inducers with the maximum cascading depth as a parameter to constrain the degree of cascading. Cascading methods proposed in the past, i.e., loose coupling and tight coupling, are strictly special cases of this new algorithm. We have empirically evaluated the proposed algorithm using logistic regression and C4.5 as base inducers on 32 UCI data sets and found that neither loose coupling nor tight coupling is always the best cascading strategy and that the maximum cascading depth in the proposed algorithm can be tuned for better classification accuracy. We have also empirically compared the proposed algorithm and ensemble methods such as bagging and boosting and found that the proposed algorithm performs marginally better than bagging and boosting on the average.  相似文献   

17.
This article proposes a study of inductive Genetic Programming with Decision Trees (GPDT). The theoretical underpinning is an approach to the development of fitness functions for improving the search guidance. The approach relies on analysis of the global fitness landscape structure with a statistical correlation measure. The basic idea is that the fitness landscape could be made informative enough to enable efficient search navigation. We demonstrate that by a careful design of the fitness function the global landscape becomes smoother, its correlation increases, and facilitates the search. Another claim is that the fitness function has not only to mitigate navigation difficulties, but also to guarantee maintenance of decision trees with low syntactic complexity and high predictive accuracy.  相似文献   

18.
Recent work in feature-based classification has focused on nonparametric techniques that can classify instances even when the underlying feature distributions are unknown. The inference algorithms for training these techniques, however, are designed to maximize the accuracy of the classifier, with all errors weighted equally. In many applications, certain errors are far more costly than others, and the need arises for nonparametric classification techniques that can be trained to optimize task-specific cost functions. This correspondence reviews the linear machine decision tree (LMDT) algorithm for inducing multivariate decision trees, and shows how LMDT can be altered to induce decision trees that minimize arbitrary misclassification cost functions (MCF's). Demonstrations of pixel classification in outdoor scenes show how MCF's can optimize the performance of embedded classifiers within the context of larger image understanding systems  相似文献   

19.
Intelligent data analysis has gained increasing attention in business and industry environments. Many applications are looking not only for solutions that can automate and de-skill the data analysis process, but also methods that can deal with vague information and deliver comprehensible models. Under this consideration, we present an automatic data analysis platform, in particular, we investigate fuzzy decision trees as a method of intelligent data analysis for classification problems. We present the whole process from fuzzy tree learning, missing value handling to fuzzy rules generation and pruning. To select the test attributes of fuzzy trees we use a generalized Shannon entropy. We discuss the problems connected with this generalization arising from fuzzy logic and propose some amendments. We give a theoretical comparison on the fuzzy rules learned by fuzzy decision trees with some other methods, and compare our classifiers to other well-known classification methods based on experimental results. Moreover, we show a real-world application for the quality control of car surfaces using our approach.  相似文献   

20.
《Intelligent Data Analysis》1998,2(1-4):303-310
Decision tree induction is a prominent learning method, typically yielding quick results with competitive predictive performance. However, it is not unusual to find other automated learning methods that exceed the predictive performance of a decision tree on the same application. To achieve near-optimal classification results, resampling techniques can be employed to generate multiple decision-tree solutions. These decision trees are individually applied and their answers voted. The potential for exceptionally strong performance is counterbalanced by the substantial increase in computing time to induce many decision trees. We describe estimators of predictive performance for voted decision trees induced from bootstrap (bagged) or adaptive (boosted) resampling. The estimates are found by examining the performance of a single tree and its pruned subtrees over a single, training set and a large test set. Using publicly available collections of data, we show that these estimates are usually quite accurate, with occasional weaker estimates. The great advantage of these estimates is that they reveal the predictive potential of voted decision trees prior to applying expensive computational procedures.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号