首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
基于不完备信息系统的决策树生成算法   总被引:1,自引:1,他引:0  
决策树是一种有效地进行实例分类的数据挖掘方法。在处理不完备信息系统中的缺省值数据时,现有决策树算法大多使用猜测技术。在不改变缺失值的情况下,利用极大相容块的概念定义了不完备决策表中条件属性对决策属性的决策支持度,将其作为属性选择的启发式信息。同时,提出了一种不完备信息系统中的决策树生成算法IDTBDS,该算法不仅可以快速得到规则集,而且具有较高的准确率。  相似文献   

2.
Decision trees have been widely used in data mining and machine learning as a comprehensible knowledge representation. While ant colony optimization (ACO) algorithms have been successfully applied to extract classification rules, decision tree induction with ACO algorithms remains an almost unexplored research area. In this paper we propose a novel ACO algorithm to induce decision trees, combining commonly used strategies from both traditional decision tree induction algorithms and ACO. The proposed algorithm is compared against three decision tree induction algorithms, namely C4.5, CART and cACDT, in 22 publicly available data sets. The results show that the predictive accuracy of the proposed algorithm is statistically significantly higher than the accuracy of both C4.5 and CART, which are well-known conventional algorithms for decision tree induction, and the accuracy of the ACO-based cACDT decision tree algorithm.  相似文献   

3.
孙娟  王熙照 《计算机工程》2006,32(12):210-211,231
决策树归纳学习算法是机器学习领域中解决分类问题的最有效工具之一。由于决策树算法自身的缺陷了,因此需要进行相应的简化来提高预测精度。模糊决策树算法是对决策树算法的一种改进,它更加接近人的思维方式。文章通过实验分析了模糊决策树、规则简化与模糊规则简化;模糊决策树与模糊预剪枝算法的异同,对决策树的大小、算法的训练准确率与测试准确率进行比较,分析了模糊决策树的性能,为改进该算法提供了一些有益的线索。  相似文献   

4.
Decision trees are popular representations of Boolean functions. We show that, given an alternative representation of a Boolean function f, say as a read-once branching program, one can find a decision tree T which approximates f to any desired amount of accuracy. Moreover, the size of the decision tree is at most that of the smallest decision tree which can represent f and this construction can be obtained in quasi-polynomial time. We also extend this result to the case where one has access only to a source of random evaluations of the Boolean function f instead of a complete representation. In this case, we show that a similar approximation can be obtained with any specified amount of confidence (as opposed to the absolute certainty of the former case.) This latter result implies proper PAC-learnability of decision trees under the uniform distribution without using membership queries.  相似文献   

5.
决策树分类技术研究   总被引:28,自引:1,他引:28  
栾丽华  吉根林 《计算机工程》2004,30(9):94-96,105
决策树分类是一种重要的数据分类技术。ID3、C4.和EC4.5是建立决策树的常用算法,但目前国内对一些新的决策树分类算法研究较少。为此,在消化大量文献资料的基础上,研究了CART、SLIQ、SPRINT、PUBLIC等新算法,对各种决策树分类算法的基本思想进行阐述,并分析比较了各种算法的主要特性,为数据分类研究者提供借鉴。  相似文献   

6.
The decision tree-based classification is a popular approach for pattern recognition and data mining. Most decision tree induction methods assume training data being present at one central location. Given the growth in distributed databases at geographically dispersed locations, the methods for decision tree induction in distributed settings are gaining importance. This paper describes one such method that generates compact trees using multifeature splits in place of single feature split decision trees generated by most existing methods for distributed data. Our method is based on Fisher's linear discriminant function, and is capable of dealing with multiple classes in the data. For homogeneously distributed data, the decision trees produced by our method are identical to decision trees generated using Fisher's linear discriminant function with centrally stored data. For heterogeneously distributed data, a certain approximation is involved with a small change in performance with respect to the tree generated with centrally stored data. Experimental results for several well-known datasets are presented and compared with decision trees generated using Fisher's linear discriminant function with centrally stored data.  相似文献   

7.
随机森林在bootstrap的基础上通过对特征进行抽样构建决策树,以牺牲决策树准确性的方式来降低决策树间的相关性,从而提高预测的准确性。但在数据规模较大时,决策树间的相关性仍然较高,导致随机森林的性能表现不佳。为解决该问题,提出一种基于袋外预测的改进算法,通过提高决策树的准确性来提升随机森林的预测性能。将随机森林的袋外预测与原特征相结合并重新训练随机森林,以有效降低决策树的VC-dimension、经验风险、泛化风险并提高其准确性,最终提升随机森林的预测性能。然而,决策树准确性的提高会使决策树间的预测趋于相近,提升了决策树间的相关性从而影响随机森林最终的预测表现,为此,通过扩展空间算法为不同决策树生成不同的特征,从而降低决策树间的相关性而不显著降低决策树的准确性。实验结果表明,该算法在32个数据集上的平均准确率相对原始随机森林提高1.7%,在校正的paired t-test上,该方法在其中19个数据集上的预测性能显著优于原始随机森林。  相似文献   

8.
决策树分类法及其在土地覆盖分类中的应用   总被引:24,自引:1,他引:24  
基于决策树分类算法在遥感影像分类方面的深厚潜力,探讨了3种不同的决策树算法(UDT、MDT和HDT)。首先对决策树算法结构、算法理论进行了阐述,然后利用决策树算法进行遥感土地覆盖分类实验,并把获得的结果与传统统计分类法进行比较。研究表明,决策树分类法有诸多优势,如:相对简单、明确、分类结构直观,另外,与以假定数据源呈一固定概率分布,然后在此基础上进行参数估计的常规分类方法相比,决策树属于严格“非参”,对于输入数据空间特征和分类标识具有更好的弹性和鲁棒性(Robust)。  相似文献   

9.
ID3算法的一种改进算法   总被引:33,自引:5,他引:33  
决策树是归纳学习和数据挖掘的重要方法,通常用来形成分类器和预测模型。ID3算法是决策树中的核心算法,文章针对ID3算法倾向于取值较多的属性的缺点,引进用户兴趣度对ID3算法作了改进,并通过实验对改进前后的算法进行了比较,实验表明,改进后的算法是有效的。  相似文献   

10.
随机森林(RF)具有抗噪能力强,预测准确率高,能够处理高维数据等优点,因此在机器学习领域得到了广泛的应用。模型决策树(MDT)是一种加速的决策树算法,虽然能够提高决策树算法的训练效率,但是随着非纯伪叶结点规模的增大,模型决策树的精度也在下降。针对上述问题,提出了一种模型决策森林算法(MDF)以提高模型决策树的分类精度。MDF算法将MDT作为基分类器,利用随机森林的思想,生成多棵模型决策树。算法首先通过旋转矩阵得到不同的样本子集,然后在这些样本子集上训练出多棵不同的模型决策树,再将这些树通过投票的方式进行集成,最后根据得到的模型决策森林给出分类结果。在标准数据集上的实验结果表明,提出的模型决策森林在分类精度上明显优于模型决策树算法,并且MDF在树的数量较少时也能取到不错的精度,避免了因树的数量增加时间复杂度增高的问题。  相似文献   

11.
《Intelligent Data Analysis》1998,2(1-4):303-310
Decision tree induction is a prominent learning method, typically yielding quick results with competitive predictive performance. However, it is not unusual to find other automated learning methods that exceed the predictive performance of a decision tree on the same application. To achieve near-optimal classification results, resampling techniques can be employed to generate multiple decision-tree solutions. These decision trees are individually applied and their answers voted. The potential for exceptionally strong performance is counterbalanced by the substantial increase in computing time to induce many decision trees. We describe estimators of predictive performance for voted decision trees induced from bootstrap (bagged) or adaptive (boosted) resampling. The estimates are found by examining the performance of a single tree and its pruned subtrees over a single, training set and a large test set. Using publicly available collections of data, we show that these estimates are usually quite accurate, with occasional weaker estimates. The great advantage of these estimates is that they reveal the predictive potential of voted decision trees prior to applying expensive computational procedures.  相似文献   

12.
基于Rough Set的最简决策树确定算法的研究   总被引:6,自引:2,他引:6  
决策树是一种有效用于分类的数据采掘方法,有确定性和非确定性决策树。传统的方法是通过信息熵的计算去生成决策树,计算量大。目前有人用RS方法去计算信息熵,但存在局限性。该文将指出其局限性,并给出了一种有效的属性选择算法,确定了最简确定性和非确定性决策树的判别准则及其通用生成算法。  相似文献   

13.
李明辉 《软件》2012,(7):85-86
数据挖掘中的决策树算法在银行业中有很重要的价值。决策树技术应用于银行业中,可以通过对特定的客户背景信息的分析,预测该客户所属的客户类别,从而采取相应的经营策略,这样既可以提高银行服务的服务水平,开发客户资源,避免客户流失,又能够节约资源,利用最小的投入,获得较大的收益。在银行贷款业务中,判断贷款对象是否有风险,贷款方案是否可行,将客户按照银行的实际需求进行分类,这些问题通过决策树算法都可以解决。  相似文献   

14.
决策树算法的一种改进算法   总被引:2,自引:0,他引:2  
决策树是归纳学习和数据挖掘的重要方法,主要用于分类和预测.ID3算法是决策树中应用最广泛的算法,通过对数据挖掘中决策树的基本思想进行阐述,讨论了ID3算法倾向于取值较多属性的缺点,引入无关度对ID3算法作了改进.实验数据结果分析表明,改进后的算法能得到更合理、更有效的规则.  相似文献   

15.
We study the possibility of constructing decision trees with evolutionary algorithms in order to increase their predictive accuracy. We present a self-adapting evolutionary algorithm for the induction of decision trees and describe the principle of decision making based on multiple evolutionary induced decision trees—decision forest. The developed model is used as a fault predictive approach to foresee dangerous software modules, which identification can largely enhance the reliability of software.  相似文献   

16.
采用决策树分类技术对北京市土地覆盖现状进行研究   总被引:19,自引:1,他引:18  
以TM1~7多波段影像为数据源,采用决策树分类技术对北京市土地覆盖现状进行研究。探讨如何使用决策树方法逐层区分草地、林地、水体、裸地、居民地和道路等基本地物类型,并进一步研究了如何区分城市裸地与乡村裸地的方法,分类精度达到93.3%。研究表明,决策树分类法有诸多优势,如:相对简单、明确、分类结构直观。  相似文献   

17.
决策树是一种采用分治策略的聚类分析方法,构建决策树的关键是选择合适的属性。传统的决策树通常从最大化信息熵的角度来构造,不能对属性的分类能力进行足够好的区分。对传统的决策树生成算法的不足,本文提出了一种基于马氏距离的决策树生成算法。算法使用马氏距离来区分不同特征属性子集的分类能力。试验结果表明,基于度量的决策树的性能优于传统的决策树。  相似文献   

18.
With the advantages of being easy to understand and efficient to compute, the decision tree method has long been one of the most popular classifiers. Decision trees constructed with existing approaches, however, tend to be huge and complex, and consequently are difficult to use in practical applications. In this study, we deal with the problem of tree complexity by allowing users to specify the number of leaf nodes, and then construct a decision tree that allows maximum classification accuracy with the given number of leaf nodes. A new algorithm, the Size Constrained Decision Tree (SCDT), is proposed with which to construct a decision tree, paying close attention on how to efficiently use the limited number of leaf nodes. Experimental results show that the SCDT method can successfully generate a simpler decision tree and offers better accuracy.  相似文献   

19.
基于决策树方法的云南省森林分类研究   总被引:2,自引:0,他引:2  
森林分类对于理解森林生态系统结构和功能具有重要意义。由于云南省地形和森林类型复杂,首先按云南省的16个行政区划将全省Landsat TM影像分为对应的16个区域。以TM波段1~5和7,以及由植被指数、缨帽变换、主成分变换、DEM组成的18个变量组,统计训练样本光谱值均值变化和光谱值与频率间的关系。利用交点计算公式计算类间最佳分类界点进而建立决策树,逐一分离各区的所有森林类型,将分类结果合并得到云南省阔叶林、针叶林和针阔混交林类型分布图。最后将分类结果与监督分类中的最大似然比法的分类结果进行对比。结果表明:监督分类的总体分类精度为74.39%,Kappa系数为0.63,决策树方法的总体分类精度为86.61%,Kappa系数为0.80,说明决策树方法可以提取高精度的云南省森林类型,进而为该区域森林叶面积指数和生物量反演等研究提供基础数据支持。  相似文献   

20.
基于Rough Set的一种决策树的确定算法   总被引:5,自引:0,他引:5  
决策树是一种有效用于分类的数据采掘方法,通常是通过信息熵的计算去选择分枝属性,计算量大而复杂。文章利用粗集理论中相对正域的概念,找到另一种信息熵的等效表示方法,只要通过简单的集合运算,便可对协调和非协调决策表得到相应的确定和非确定性决策树,从而得到分类规则。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号