首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Real-life datasets are often imbalanced, that is, there are significantly more training samples available for some classes than for others, and consequently the conventional aim of reducing overall classification accuracy is not appropriate when dealing with such problems. Various approaches have been introduced in the literature to deal with imbalanced datasets, and are typically based on oversampling, undersampling or cost-sensitive classification. In this paper, we introduce an effective ensemble of cost-sensitive decision trees for imbalanced classification. Base classifiers are constructed according to a given cost matrix, but are trained on random feature subspaces to ensure sufficient diversity of the ensemble members. We employ an evolutionary algorithm for simultaneous classifier selection and assignment of committee member weights for the fusion process. Our proposed algorithm is evaluated on a variety of benchmark datasets, and is confirmed to lead to improved recognition of the minority class, to be capable of outperforming other state-of-the-art algorithms, and hence to represent a useful and effective approach for dealing with imbalanced datasets.  相似文献   

2.
Fully polarimetric synthetic aperture radar (PolSAR) Earth Observations showed great potential for mapping and monitoring agro-environmental systems. Numerous polarimetric features can be extracted from these complex observations which may lead to improve accuracy of land-cover classification and object characterization. This article employed two well-known decision tree ensembles, i.e. bagged tree (BT) and random forest (RF), for land-cover mapping from PolSAR imagery. Moreover, two fast modified decision tree ensembles were proposed in this article, namely balanced filter-based forest (BFF) and cost-sensitive filter-based forest (CFF). These algorithms, designed based on the idea of RF, use a fast filter feature selection algorithms and two extended majority voting. They are also able to embed some solutions of imbalanced data problem into their structures. Three different PolSAR datasets, with imbalanced data, were used for evaluating efficiency of the proposed algorithms. The results indicated that all the tree ensembles have higher efficiency and reliability than the individual DT. Moreover, both proposed tree ensembles obtained higher mean overall accuracy (0.5–14% higher), producer’s accuracy (0.5–10% higher), and user’s accuracy (0.5–9% higher) than the classical tree ensembles, i.e. BT and RF. They were also much faster (e.g. 2–10 times) and more stable than their competitors for classification of these three datasets. In addition, unlike BT and RF, which obtained higher accuracy in large ensembles (i.e. the high number of DT), BFF and CFF can also be more efficient and reliable in smaller ensembles. Furthermore, the extended majority voting techniques could outperform the classical majority voting for decision fusion.  相似文献   

3.
A Direct Sum Theorem holds in a model of computation, when for every problem solving some k input instances together is k times as expensive as solving one. We show that Direct Sum Theorems hold in the models of deterministic and randomized decision trees for all relations. We also note that a near optimal Direct Sum Theorem holds for quantum decision trees for boolean functions.  相似文献   

4.
Obtaining an indication of confidence of predictions is desirable for many data mining applications. Predictions complemented with confidence levels can inform on the certainty or extent of reliability that may be associated with the prediction. This can be useful in varied application contexts where model outputs form the basis for potentially costly decisions, and in general across risk sensitive applications. The conformal prediction framework presents a novel approach for obtaining valid confidence measures associated with predictions from machine learning algorithms. Confidence levels are obtained from the underlying algorithm, using a non-conformity measure which indicates how ‘atypical’ a given example set is. The non-conformity measure is a key to determining the usefulness and efficiency of the approach. This paper considers inductive conformal prediction in the context of random tree ensembles like random forests, which have been noted to perform favorably across problems. Focusing on classification tasks, and considering realistic data contexts including class imbalance, we develop non-conformity measures for assessing the confidence of predicted class labels from random forests. We examine the performance of these measures on multiple data sets. Results demonstrate the usefulness and validity of the measures, their relative differences, and highlight the effectiveness of conformal prediction random forests for obtaining predictions with associated confidence.  相似文献   

5.
A classifier ensemble is a set of classifiers whose individual decisions are combined to classify new examples. Classifiers, which can represent complex decision boundaries are accurate. Kernel functions can also represent complex decision boundaries. In this paper, we study the usefulness of kernel features for decision tree ensembles as they can improve the representational power of individual classifiers. We first propose decision tree ensembles based on kernel features and found that the performance of these ensembles is strongly dependent on the kernel parameters; the selected kernel and the dimension of the kernel feature space. To overcome this problem, we present another approach to create ensembles that combines the existing ensemble methods with the kernel machine philosophy. In this approach, kernel features are created and concatenated with the original features. The classifiers of an ensemble are trained on these extended feature spaces. Experimental results suggest that the approach is quite robust to the selection of parameters. Experiments also show that different ensemble methods (Random Subspace, Bagging, Adaboost.M1 and Random Forests) can be improved by using this approach.  相似文献   

6.
Hybrid decision tree   总被引:6,自引:0,他引:6  
  相似文献   

7.
In this paper, we present a new algorithm for learning oblique decision trees. Most of the current decision tree algorithms rely on impurity measures to assess the goodness of hyperplanes at each node while learning a decision tree in top-down fashion. These impurity measures do not properly capture the geometric structures in the data. Motivated by this, our algorithm uses a strategy for assessing the hyperplanes in such a way that the geometric structure in the data is taken into account. At each node of the decision tree, we find the clustering hyperplanes for both the classes and use their angle bisectors as the split rule at that node. We show through empirical studies that this idea leads to small decision trees and better performance. We also present some analysis to show that the angle bisectors of clustering hyperplanes that we use as the split rules at each node are solutions of an interesting optimization problem and hence argue that this is a principled method of learning a decision tree.  相似文献   

8.
A novel class of ensembles of linear decision rules is introduced which includes majority voting-based ensembles as a particular case. Based on this general framework, new results are given that state the ability of a subclass to discriminate between two infinite subsets A and B in R n , thus generalizing Mazurov’s theorem for two finite sets.  相似文献   

9.
We investigate the cooperative effects of a single finite chain of monomers near an attractive substrate by first constructing a conformational pseudo-phase diagram based on the thermal fluctuations of energetic and structural quantities. Then, the adsorption transition is analyzed in more detail. This is conveniently done by a microcanonical analysis of densities of states obtained by extensive multicanonical Monte Carlo simulations. For short chains and strong surface attraction, the microcanonical entropy turns out to be a convex function of energy in the transition regime. This is a characteristic physical effect and deserves a careful consideration in analyses of cooperative macrostate transitions in finite systems.  相似文献   

10.
Ranking with decision tree   总被引:1,自引:1,他引:0  
Ranking problems have recently become an important research topic in the joint field of machine learning and information retrieval. This paper presented a new splitting rule that introduces a metric, i.e., an impurity measure, to construct decision trees for ranking tasks. We provided a theoretical basis and some intuitive explanations for the splitting rule. Our approach is also meaningful to collaborative filtering in the sense of dealing with categorical data and selecting relative features. Some experiments were made to illustrate our ranking approach, whose results showed that our algorithm outperforms both perceptron-based ranking and the classification tree algorithms in term of accuracy as well as speed.
Fen XiaEmail:
  相似文献   

11.
We study the quantum version of a decision tree classifier to fill the gap between quantum computation and machine learning. The quantum entropy impurity criterion which is used to determine which node should be split is presented in the paper. By using the quantum fidelity measure between two quantum states, we cluster the training data into subclasses so that the quantum decision tree can manipulate quantum states. We also propose algorithms constructing the quantum decision tree and searching for a target class over the tree for a new quantum object.  相似文献   

12.
一种多变量决策树的构造与研究   总被引:3,自引:0,他引:3       下载免费PDF全文
单变量决策树算法造成树的规模庞大、规则复杂、不易理解,而多变量决策树是一种有效用于分类的数据挖掘方法,构造的关键是根据属性之间的相关性选择合适的属性组合构成一个新的属性作为节点。结合粗糙集原理中的知识依赖性度量和信息系统中条件属性集的离散度概念,提出了一种多变量决策树的构造算法(RD)。在UCI上部分数据集的实验结果表明,提出的多变量决策树算法的分类效果与传统的ID3算法以及基于核方法的多变量决策树的分类效果相比,有一定的提高。  相似文献   

13.
决策树算法的系统实现与修剪优化   总被引:6,自引:3,他引:6  
决策树是对分类问题进行深入分析的一种方法,在实际问题中,按算法生成的决策树往往复杂而庞大,令用户难以理解,这就告诉我们在重分类精确性的同时,也要加强对树修剪的研究,以一个决策树算法的程序实现为例,进一步讨论了对树进行修剪优化时可能涉及的问题,目的在于给决策树研究人员提供一个深入和清晰的简化技术视图。  相似文献   

14.
高效性和可扩展性是多关系数据挖掘中最重要的问题,而提高算法效率的主要瓶颈在于假设空间,且用户对分类的指导会在很大程度上帮助系统完成分类任务,减少系统独自摸索的时间。针对以上问题提出了改进的多关系决策树算法,即将虚拟连接元组传播技术和提出的背景属性传递技术应用到多关系决策树算法中。对改进的多关系决策树算法进行了理论证明,并且对多关系决策树算法和改进的多关系决策树算法进行比较实验。通过实验可以得出,当改进的多关系决策树在搜索数据项达到背景属性传递阈值时,改进的多关系决策树算法的效率相对很高且受属性个数增加(或  相似文献   

15.
We constructed a decision analysis model based on data in the medical literature to estimate the possible outcomes of thrombolytic therapy in patients 50 to 80 years old with possible myocardial infarction. We used the model to test the most likely effects of treatment (determined by averaging the values in reports of large studies) and the worst effects reported so far. The program begins by asking the patient's age, the hours from the onset of pain, and the probability of acute myocardial infarction. It then provides an opportunity to perform sensitivity analyses by changing the values for these variables and for the probability of death in the absence of thrombolytic therapy, as well as for the probability of major stroke and hemorrhage. The counterintuitive findings observed with this program are that the benefits of thrombolytic therapy increase with age and that young patients derive surprisingly little benefit from it.  相似文献   

16.
Univariate decision trees are classifiers currently used in many data mining applications. This classifier discovers partitions in the input space via hyperplanes that are orthogonal to the axes of attributes, producing a model that can be understood by human experts. One disadvantage of univariate decision trees is that they produce complex and inaccurate models when decision boundaries are not orthogonal to axes. In this paper we introduce the Fisher’s Tree, it is a classifier that takes advantage of dimensionality reduction of Fisher’s linear discriminant and uses the decomposition strategy of decision trees, to come up with an oblique decision tree. Our proposal generates an artificial attribute that is used to split the data in a recursive way.The Fisher’s decision tree induces oblique trees whose accuracy, size, number of leaves and training time are competitive with respect to other decision trees reported in the literature. We use more than ten public available data sets to demonstrate the effectiveness of our method.  相似文献   

17.
Learning decision tree for ranking   总被引:4,自引:3,他引:1  
Decision tree is one of the most effective and widely used methods for classification. However, many real-world applications require instances to be ranked by the probability of class membership. The area under the receiver operating characteristics curve, simply AUC, has been recently used as a measure for ranking performance of learning algorithms. In this paper, we present two novel class probability estimation algorithms to improve the ranking performance of decision tree. Instead of estimating the probability of class membership using simple voting at the leaf where the test instance falls into, our algorithms use similarity-weighted voting and naive Bayes. We design empirical experiments to verify that our new algorithms significantly outperform the recent decision tree ranking algorithm C4.4 in terms of AUC.
Liangxiao JiangEmail:
  相似文献   

18.
为改善模糊决策树算法凭经验设定参数值的不准确问题,在分析模糊决策树算法的主要参数特征后,提出使用粒子群算法智能设定参数值的自适应模糊决策树算法.实验表明,与经验设定参数值的模糊决策树算法相比,自适应模糊决策树算法生成的模糊决策树的性能明显提高;最后,通过实验数据分析了关键参数之间存在的交互影响关系.  相似文献   

19.
两种决策树的事前修剪算法   总被引:2,自引:0,他引:2  
屈俊峰  朱莉  胡斌 《计算机应用》2006,26(3):670-0672
修剪决策树可以在决策树生成时或生成后,前者称为事前修剪。决策树上的每一个节点对应着一个样例集,通过分析样例集中样例的个数或者样例集的纯度,提出了基于节点支持度的事前修剪算法PDTBS和基于节点纯度的事前修剪算法PDTBP。为了达到修剪的目的,PDTBS阻止小样例集节点的扩展,PDTBP阻止高纯度样例集节点的扩展。分析表明这两个算法的时间复杂度均呈线性,最后使用UCI的数据实验表明:算法PDTBS,PDTBP可以在保证分类精度损失极小的条件下大幅度地修剪决策树。  相似文献   

20.
新型决策树构造方法   总被引:1,自引:0,他引:1       下载免费PDF全文
决策树是一种重要的数据挖掘工具,但构造最优决策树是一个NP-完全问题。提出了一种基于关联规则挖掘的决策树构造方法。首先定义了高可信度的近似精确规则,给出了挖掘这类规则的算法;在近似精确规则的基础上产生新的属性,并讨论了新生成属性的评价方法;然后利用新生成的属性和数据本身的属性共同构造决策树;实验结果表明新的决策树构造方法具有较高的精度。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号