首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 109 毫秒
1.
We propose an extension of an entropy-based heuristic for constructing a decision tree from a large database with many numeric attributes. When it comes to handling numeric attributes, conventional methods are inefficient if any numeric attributes are strongly correlated. Our approach offers one solution to this problem. For each pair of numeric attributes with strong correlation, we compute a two-dimensional association rule with respect to these attributes and the objective attribute of the decision tree. In particular, we consider a family R of grid-regions in the plane associated with the pairof attributes. For R R, the data canbe split into two classes: data inside R and dataoutside R. We compute the region Ropt R that minimizes the entropy of the splitting,and add the splitting associated with Ropt (foreach pair of strongly correlated attributes) to the set of candidatetests in an entropy-based heuristic. We give efficient algorithmsfor cases in which R is (1) x-monotone connected regions, (2) based-monotone regions, (3) rectangles, and (4) rectilinear convex regions. The algorithm has been implemented as a subsystem of SONAR (System for Optimized Numeric Association Rules) developed by the authors. We have confirmed that we can compute the optimal region efficiently. And diverse experiments show that our approach can create compact trees whose accuracy is comparable with or better than that of conventional trees. More importantly, we can grasp non-linear correlation among numeric attributes which could not be found without our region splitting.  相似文献   

2.
FRCT: fuzzy-rough classification trees   总被引:1,自引:1,他引:0  
Using fuzzy-rough hybrids, we have proposed a measure to quantify the functional dependency of decision attribute(s) on condition attribute(s) within fuzzy data. We have shown that the proposed measure of dependency degree is a generalization of the measure proposed by Pawlak for crisp data. In this paper, this new measure of dependency degree has been encapsulated into the decision tree generation mechanism to produce fuzzy-rough classification trees (FRCT); efficient, top-down, multi-class decision tree structures geared to solving classification problems from feature-based learning examples. The developed FRCT generation algorithm has been applied to 16 real-world benchmark datasets. It is experimentally compared with the five fuzzy decision tree generation algorithms reported so far, and the rough decomposition tree algorithm. Comparison has been made in terms of number of rules, average training time, and classification accuracy. Experimental results show that the proposed algorithm to generate FRCT outperforms existing fuzzy decision tree generation techniques and rough decomposition tree induction algorithm.
Rajen B. BhattEmail:

Dr. Rajen Bhatt   has obtained his B.E. and M.E. both in Control and Instrumentation, from S.S. Engineering College, Bhavnagar, and from Delhi College of Engineering, New Delhi in 1999 and 2002, respectively. He has obtained his Ph.D. from the Department of Electrical Engineering, Indian Institute of Technology Delhi, INDIA in 2006. He was actively engaged in the development of multimedia course on Control Engineering under the National Program on Technology Enabled Learning (NPTEL). He is a regular reviewer of International Journals like Pattern Recognition, Information Sciences, Pattern Analysis and Applications, and IEEE Trans. on Systems, Man and Cybernatics. Since June 2005, he is working with Imaging team of Samsung India Software Centre as a Lead Engineer. He also serves as a Member of Patent Review Committee at Samsung. He has published several research papers in reputed journals and conferences. His current research interests are Pattern Classification and Regression, Soft Computing, Data mining, Patents and Trademarks, and Information Technology for Education. He holds an expertise over industry standard software project management. Dr. M. Gopal   has obtained his B.Tech. (Electrical), M.Tech. (Control systems), and Ph.D. (Control Systems) degrees. all from Birla Institute of Technology and Science, Pilani in 1968, 1970, and 1976, respectively. He has been in the teaching and research field for the last three and half decades; associated with NIT Jaipur, BITS Pilani, IIT Bombay, City University London, and University Technology Malaysia, and IIT Delhi. Since January 1986 he is a Professor with the Electrical Engineering Department, Indian Institute of Technology Delhi. He has published six books in the area of Control Engineering, and a video course on Control Engineering including complete presentation and student questionnaires. He has also published interactive web-compatible multimedia course on Control Engineering, under National Program on Technology Enabled Learning (NPTEL). He has published several research papers in referred journals and conferences. His current research interests include Machine learning, Soft computing technologies, Intelligent control, and e-Learning.   相似文献   

3.
Logistic Model Trees   总被引:2,自引:0,他引:2  
Tree induction methods and linear models are popular techniques for supervised learning tasks, both for the prediction of nominal classes and numeric values. For predicting numeric quantities, there has been work on combining these two schemes into model trees, i.e. trees that contain linear regression functions at the leaves. In this paper, we present an algorithm that adapts this idea for classification problems, using logistic regression instead of linear regression. We use a stagewise fitting process to construct the logistic regression models that can select relevant attributes in the data in a natural way, and show how this approach can be used to build the logistic regression models at the leaves by incrementally refining those constructed at higher levels in the tree. We compare the performance of our algorithm to several other state-of-the-art learning schemes on 36 benchmark UCI datasets, and show that it produces accurate and compact classifiers.Editor Johannes FürnkranzThis is an extended version of a paper that appeared in the Proceedings of the 14th European Conference on Machine Learning (Landwehr et al., 2003).  相似文献   

4.
The application of the CD3 decision tree induction algorithm to telecommunications customer call data to obtain classification rules is described. CD3 is robust against drift in the underlying rules over time (concept drift): it both detects drift and protects the induction process from its effects. Specifically, the task is to data mine customer details and call records to determine whether the profile of customers registering for a friends and family service is changing over time and to maintain a rule set profiling such customers. CD3 and the rationale behind it are described and experimental results on customer data are presented.  相似文献   

5.
Predicting Nearly As Well As the Best Pruning of a Decision Tree   总被引:2,自引:2,他引:0  
Many algorithms for inferring a decision tree from data involve a two-phase process: First, a very large decision tree is grown which typically ends up over-fitting the data. To reduce over-fitting, in the second phase, the tree is pruned using one of a number of available methods. The final tree is then output and used for classification on test data.In this paper, we suggest an alternative approach to the pruning phase. Using a given unpruned decision tree, we present a new method of making predictions on test data, and we prove that our algorithm's performance will not be much worse (in a precise technical sense) than the predictions made by the best reasonably small pruning of the given decision tree. Thus, our procedure is guaranteed to be competitive (in terms of the quality of its predictions) with any pruning algorithm. We prove that our procedure is very efficient and highly robust.Our method can be viewed as a synthesis of two previously studied techniques. First, we apply Cesa-Bianchi et al.'s (1993) results on predicting using expert advice (where we view each pruning as an expert) to obtain an algorithm that has provably low prediction loss, but that is computationally infeasible. Next, we generalize and apply a method developed by Buntine (1990, 1992) and Willems, Shtarkov and Tjalkens (1993, 1995) to derive a very efficient implementation of this procedure.  相似文献   

6.
Trading Accuracy for Simplicity in Decision Trees   总被引:4,自引:0,他引:4  
Bohanec  Marko  Bratko  Ivan 《Machine Learning》1994,15(3):223-250
When communicating concepts, it is often convenient or even necessary to define a concept approximately. A simple, although only approximately accurate concept definition may be more useful than a completely accurate definition which involves a lot of detail. This paper addresses the problem: given a completely accurate, but complex, definition of a concept, simplify the definition, possibly at the expense of accuracy, so that the simplified definition still corresponds to the concept sufficiently well. Concepts are represented by decision trees, and the method of simplification is tree pruning. Given a decision tree that accurately specifies a concept, the problem is to find a smallest pruned tree that still represents the concept within some specified accuracy. A pruning algorithm is presented that finds an optimal solution by generating adense sequence of pruned trees, decreasing in size, such that each tree has the highest accuracy among all the possible pruned trees of the same size. An efficient implementation of the algorithm, based on dynamic programming, is presented and empirically compared with three progressive pruning algorithms using both artificial and real-world data. An interesting empirical finding is that the real-world data generally allow significantly greater simplification at equal loss of accuracy.  相似文献   

7.
A Further Comparison of Splitting Rules for Decision-Tree Induction   总被引:10,自引:0,他引:10  
One approach to learning classification rules from examples is to build decision trees. A review and comparison paper by Mingers (Mingers, 1989) looked at the first stage of tree building, which uses a splitting rule to grow trees with a greedy recursive partitioning algorithm. That paper considered a number of different measures and experimentally examined their behavior on four domains. The main conclusion was that a random splitting rule does not significantly decrease classificational accuracy. This note suggests an alternative experimental method and presents additional results on further domains. Our results indicate that random splitting leads to increased error. These results are at variance with those presented by Mingers.  相似文献   

8.
Technical Note: Naive Bayes for Regression   总被引:1,自引:0,他引:1  
Frank  Eibe  Trigg  Leonard  Holmes  Geoffrey  Witten  Ian H. 《Machine Learning》2000,41(1):5-25
Despite its simplicity, the naive Bayes learning scheme performs well on most classification tasks, and is often significantly more accurate than more sophisticated methods. Although the probability estimates that it produces can be inaccurate, it often assigns maximum probability to the correct class. This suggests that its good performance might be restricted to situations where the output is categorical. It is therefore interesting to see how it performs in domains where the predicted value is numeric, because in this case, predictions are more sensitive to inaccurate probability estimates.This paper shows how to apply the naive Bayes methodology to numeric prediction (i.e., regression) tasks by modeling the probability distribution of the target value with kernel density estimators, and compares it to linear regression, locally weighted linear regression, and a method that produces model trees—decision trees with linear regression functions at the leaves. Although we exhibit an artificial dataset for which naive Bayes is the method of choice, on real-world datasets it is almost uniformly worse than locally weighted linear regression and model trees. The comparison with linear regression depends on the error measure: for one measure naive Bayes performs similarly, while for another it is worse. We also show that standard naive Bayes applied to regression problems by discretizing the target value performs similarly badly. We then present empirical evidence that isolates naive Bayes' independence assumption as the culprit for its poor performance in the regression setting. These results indicate that the simplistic statistical assumption that naive Bayes makes is indeed more restrictive for regression than for classification.  相似文献   

9.
An Empirical Comparison of Selection Measures for Decision-Tree Induction   总被引:7,自引:0,他引:7  
Mingers  John 《Machine Learning》1989,3(4):319-342
One approach to induction is to develop a decision tree from a set of examples. When used with noisy rather than deterministic data, the method involve-three main stages—creating a complete tree able to classify all the examples, pruning this tree to give statistical reliability, and processing the pruned tree to improve understandability. This paper is concerned with the first stage — tree creation which relies on a measure for goodness of split, that is, how well the attributes discriminate between classes. Some problems encountered at this stage are missing data and multi-valued attributes. The paper considers a number of different measures and experimentally examines their behavior in four domains. The results show that the choice of measure affects the size of a tree but not its accuracy, which remains the same even when attributes are selected randomly.  相似文献   

10.
Learning decision tree for ranking   总被引:4,自引:3,他引:1  
Decision tree is one of the most effective and widely used methods for classification. However, many real-world applications require instances to be ranked by the probability of class membership. The area under the receiver operating characteristics curve, simply AUC, has been recently used as a measure for ranking performance of learning algorithms. In this paper, we present two novel class probability estimation algorithms to improve the ranking performance of decision tree. Instead of estimating the probability of class membership using simple voting at the leaf where the test instance falls into, our algorithms use similarity-weighted voting and naive Bayes. We design empirical experiments to verify that our new algorithms significantly outperform the recent decision tree ranking algorithm C4.4 in terms of AUC.
Liangxiao JiangEmail:
  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号