首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 140 毫秒
1.
Inducing oblique decision trees with evolutionary algorithms   总被引:2,自引:0,他引:2  
This paper illustrates the application of evolutionary algorithms (EAs) to the problem of oblique decision-tree (DT) induction. The objectives are to demonstrate that EAs can find classifiers whose accuracy is competitive with other oblique tree construction methods, and that, at least in some cases, this can be accomplished in a shorter time. We performed experiments with a (1+1) evolution strategy and a simple genetic algorithm on public domain and artificial data sets, and compared the results with three other oblique and one axis-parallel DT algorithms. The empirical results suggest that the EAs quickly find competitive classifiers, and that EAs scale up better than traditional methods to the dimensionality of the domain and the number of instances used in training. In addition, we show that the classification accuracy improves when the trees obtained with the EAs are combined in ensembles, and that sometimes it is possible to build the ensemble of evolutionary trees in less time than a single traditional oblique tree.  相似文献   

2.
Hybrid Bayesian estimation tree learning with discrete and fuzzy labels   总被引:1,自引:1,他引:0  
Classical decision tree model is one of the classical machine learning models for its simplicity and effectiveness in applications. However, compared to the DT model, probability estimation trees (PETs) give a better estimation on class probability. In order to get a good probability estimation, we usually need large trees which are not desirable with respect to model transparency. Linguistic decision tree (LDT) is a PET model based on label semantics. Fuzzy labels are used for building the tree and each branch is associated with a probability distribution over classes. If there is no overlap between neighboring fuzzy labels, these fuzzy labels then become discrete labels and a LDT with discrete labels becomes a special case of the PET model. In this paper, two hybrid models by combining the naive Bayes classifier and PETs are proposed in order to build a model with good performance without losing too much transparency. The first model uses naive Bayes estimation given a PET, and the second model uses a set of small-sized PETs as estimators by assuming the independence between these trees. Empirical studies on discrete and fuzzy labels show that the first model outperforms the PET model at shallow depth, and the second model is equivalent to the naive Bayes and PET.  相似文献   

3.
A new family of algorithm called Cline that provides a number of methods to construct and use multivariate decision trees is presented. We report experimental results for two types of data: synthetic data to visualize the behavior of the algorithms and publicly available eight data sets. The new methods have been tested against 23 other decision-tree construction algorithms based on benchmark data sets. Empirical results indicate that our approach achieves better classification accuracy compared to other algorithms.  相似文献   

4.
The scour below spillways can endanger the stability of the dams. Hence, determining the scour depth downstream of spillways is of vital importance. Recently, soft computing models and, in particular, artificial neural networks (ANNs) have been used for scour depth prediction. However, ANNs are not as comprehensible and easy to use as empirical formulas for the estimation of scour depth. Therefore, in this study, two decision-tree methods based on model trees and classification and regression trees were employed for the prediction of scour depth downstream of free overfall spillways. The advantage of model trees and classification and regression trees compared to ANNs is that these models are able to provide practical prediction equations. A comparison between the results obtained in the present study and those obtained using empirical formulas is made. The statistical measures indicate that the proposed soft computing approaches outperform empirical formulas. Results of the present study indicated that model trees were more accurate than classification and regression trees for the estimation of scour depth.  相似文献   

5.
Active Sampling for Class Probability Estimation and Ranking   总被引:1,自引:0,他引:1  
In many cost-sensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very costly to obtain training data with class labels. Active learning acquires data incrementally, at each phase identifying especially useful additional data for labeling, and can be used to economize on examples needed for learning. We outline the critical features of an active learner and present a sampling-based active learning method for estimating class probabilities and class-based rankings. BOOTSTRAP-LV identifies particularly informative new data for learning based on the variance in probability estimates, and uses weighted sampling to account for a potential example's informative value for the rest of the input space. We show empirically that the method reduces the number of data items that must be obtained and labeled, across a wide variety of domains. We investigate the contribution of the components of the algorithm and show that each provides valuable information to help identify informative examples. We also compare BOOTSTRAP-LV with UNCERTAINTY SAMPLING, an existing active learning method designed to maximize classification accuracy. The results show that BOOTSTRAP-LV uses fewer examples to exhibit a certain estimation accuracy and provide insights to the behavior of the algorithms. Finally, we experiment with another new active sampling algorithm drawing from both UNCERTAINTY SAMPLING and BOOTSTRAP-LV and show that it is significantly more competitive with BOOTSTRAP-LV compared to UNCERTAINTY SAMPLING. The analysis suggests more general implications for improving existing active sampling algorithms for classification.  相似文献   

6.
Classification techniques for metric-based software development   总被引:1,自引:0,他引:1  
Managing software development and maintenance projects requires predictions about components of the software system that are likely to have a high error rate or that need high development effort. The value of any classification is determined by the accuracy and cost of such predictions. The paper investigates the hypothesis whether fuzzy classification applied to criticality prediction provides better results than other classification techniques that have been introduced in this area. Five techniques for identifying error-prone software components are compared, namely Pareto classification, crisp classification trees, factor-based discriminant analysis, neural networks, and fuzzy classification. The comparison is illustrated with experimental results from the development of industrial real-time projects. A module quality model — with respect to changes — provides both quality of fit (according to past data) and predictive accuracy (according to ongoing projects). Fuzzy classification showed best results in terms of overall predictive accuracy.  相似文献   

7.
Accurate estimation of software project effort is crucial for successful management and control of a software project. Recently, multiple additive regression trees (MART) has been proposed as a novel advance in data mining that extends and improves the classification and regression trees (CART) model using stochastic gradient boosting. This paper empirically evaluates the potential of MART as a novel software effort estimation model when compared with recently published models, in terms of accuracy. The comparison is based on a well-known and respected NASA software project dataset. The results indicate that improved estimation accuracy of software project effort has been achieved using MART when compared with linear regression, radial basis function neural networks, and support vector regression models.  相似文献   

8.
A compact and accurate model for classification   总被引:6,自引:0,他引:6  
We describe and evaluate an information-theoretic algorithm for data-driven induction of classification models based on a minimal subset of available features. The relationship between input (predictive) features and the target (classification) attribute is modeled by a tree-like structure termed an information network (IN). Unlike other decision-tree models, the information network uses the same input attribute across the nodes of a given layer (level). The input attributes are selected incrementally by the algorithm to maximize a global decrease in the conditional entropy of the target attribute. We are using the prepruning approach: when no attribute causes a statistically significant decrease in the entropy, the network construction is stopped. The algorithm is shown empirically to produce much more compact models than other methods of decision-tree learning while preserving nearly the same level of classification accuracy.  相似文献   

9.
基于粗糙集的决策树算法由于粒化冲突与噪声影响容易导致特征选择的失效。提出属性纯度并结合属性依赖度来构建决策树归纳算法。采用统计集成策略来建立属性纯度,表示决策分类关于条件分类的识别性,并用于相应的属性特征选择;分析属性纯度与属性依赖度的同质性和异态性,采用先属性依赖度后属性纯度选择节点的方法,改进基于粗糙集的决策树算法。决策表例分析与数据实验对比均表明所提算法的有效性与改进性。  相似文献   

10.
房立  黄泽宇 《微机发展》2006,16(8):106-109
构建决策树分类器关键是选择分裂属性。通过分析信息增益和增益比率、Gini索引、基于Goodman-Kruskal关联索引这三种选择分裂属性的标准,提出了一种改进经典决策树分类器C4.5算法的方法(竞争选择分裂属性的决策树分类模型),它综合三种选择分裂属性的标准,通过竞争机制选择最佳分裂属性。实验结果表明它在大多数情况下,使得不牺牲分类精确度而获得更小的决策树成为了可能。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号