首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 211 毫秒
1.
一种与神经元网络杂交的决策树算法   总被引:7,自引:0,他引:7  
神经元网络在多数情况下获得的精度要比决策树和回归算法精度高,这是因为它能适应更复杂的模型,同时由于决策树通常每次只使用一个变量来分支,它所对应的识别空间只能是超矩形,这也就比神经元网络简单,粗度不能与神经元网络相比,然而神经元网络需要相对多的学习时间,并且其模型的可理解性不如决策树、Naive-Bayes等方法直观,本文在进行两种算法对复杂模型的识别对比后,提出了一个新的算法NNTree,这是一个决策树和神经元网络杂交的算法,决策树节点包含单变量的分支就象正常的决策树,但是叶子节点包含神经元网络分类器,这个方法针对决策树处理大型数据的效能,保留了决策树的可理解性,改善了神经元网络的学习性能,同时可使这个分类器的精度大大超过这两种算法,尤其在测试更大的数据集复杂模型时更为明显。  相似文献   

2.
Decision trees have been widely used in data mining and machine learning as a comprehensible knowledge representation. While ant colony optimization (ACO) algorithms have been successfully applied to extract classification rules, decision tree induction with ACO algorithms remains an almost unexplored research area. In this paper we propose a novel ACO algorithm to induce decision trees, combining commonly used strategies from both traditional decision tree induction algorithms and ACO. The proposed algorithm is compared against three decision tree induction algorithms, namely C4.5, CART and cACDT, in 22 publicly available data sets. The results show that the predictive accuracy of the proposed algorithm is statistically significantly higher than the accuracy of both C4.5 and CART, which are well-known conventional algorithms for decision tree induction, and the accuracy of the ACO-based cACDT decision tree algorithm.  相似文献   

3.
This paper presents a new architecture of a fuzzy decision tree based on fuzzy rules – fuzzy rule based decision tree (FRDT) and provides a learning algorithm. In contrast with “traditional” axis-parallel decision trees in which only a single feature (variable) is taken into account at each node, the node of the proposed decision trees involves a fuzzy rule which involves multiple features. Fuzzy rules are employed to produce leaves of high purity. Using multiple features for a node helps us minimize the size of the trees. The growth of the FRDT is realized by expanding an additional node composed of a mixture of data coming from different classes, which is the only non-leaf node of each layer. This gives rise to a new geometric structure endowed with linguistic terms which are quite different from the “traditional” oblique decision trees endowed with hyperplanes as decision functions. A series of numeric studies are reported using data coming from UCI machine learning data sets. The comparison is carried out with regard to “traditional” decision trees such as C4.5, LADtree, BFTree, SimpleCart, and NBTree. The results of statistical tests have shown that the proposed FRDT exhibits the best performance in terms of both accuracy and the size of the produced trees.  相似文献   

4.
针对当前决策树算法较少考虑训练集的嘈杂程度对模型的影响,以及传统驻留内存算法处理海量数据困难的问题,提出一种基于Hadoop平台的不确定概率C4.5算法--IP-C4.5算法。在训练模型时,IP-C4.5算法认为用于建树的训练集是不可靠的,通过用基于不确定概率的信息增益率作为分裂属性选择标准,减小了训练集的嘈杂性对模型的影响。在Hadoop平台下,通过将IP-C4.5算法以文件分裂的方式进行MapReduce化程序设计,增强了处理海量数据的能力。与C4.5和完全信条树(CCDT)算法的对比实验结果表明,在训练集数据是嘈杂的情况下,IP-C4.5算法的准确率相对更高,尤其当数据嘈杂度大于10%时,表现更加优秀;并且基于Hadoop的并行化的IP-C4.5算法具有处理海量数据的能力。  相似文献   

5.
A decision tree approach was applied and validated for analysis of landslide susceptibility using a geographic information system (GIS). The study area was the Pyeongchang area in Gangwon Province, Korea, where many landslides occurred in 2006 and where the 2018 Winter Olympics are to be held. Spatial data, such as landslides, topography, and geology, were detected, collected, and compiled in a database using remote sensing and GIS. The 3994 recorded landslide locations were randomly split 50/50 for training and validation of the models. A decision tree model, which is a type of data-mining classification model, was applied and decision trees were constructed using the chi-squared (χ2) automatic interaction detector (CHAID) and the quick, unbiased, and efficient statistical tree (QUEST) algorithms. Also, as a reference, a frequency-ratio model was applied using the same database. The relationships between the detected landslide locations and their factors were identified and quantified by frequency-ratio and decision tree models. The relationships were used as factor ratings in the overlay analysis to create landslide susceptibility indices and maps. Then, the resulting landslide-susceptibility maps were validated using area-under-the-curve (AUC) analysis with the landslide area data that had not been used for training the model. The decision tree models using the CHAID and QUEST algorithms had accuracies of 81.56% and 80.91%, respectively, which were somewhat better than the results for the frequency-ratio model (80.15%). These results indicate that decision tree models using the CHAID and QUEST algorithms can be useful for landslide susceptibility analysis.  相似文献   

6.
用遗传算法构造决策树   总被引:20,自引:1,他引:20  
C4.5是一种归纳学习算法,它通过对一组事例的学习形成决策树形式的规则。由于C4.5采用的是局部探索的策略,它得到的决策树不一定是最优的。遗传算法是模拟自然进化的通用全局搜索算法。文中讨论了利用遗传算法的构造决策树的方法。  相似文献   

7.
Decision trees are well-known and established models for classification and regression. In this paper, we focus on the estimation and the minimization of the misclassification rate of decision tree classifiers. We apply Lidstone’s Law of Succession for the estimation of the class probabilities and error rates. In our work, we take into account not only the expected values of the error rate, which has been the norm in existing research, but also the corresponding reliability (measured by standard deviations) of the error rate. Based on this estimation, we propose an efficient pruning algorithm, called k-norm pruning, that has a clear theoretical interpretation, is easily implemented, and does not require a validation set. Our experiments show that our proposed pruning algorithm produces accurate trees quickly, and compares very favorably with two other well-known pruning algorithms, CCP of CART and EBP of C4.5. Editor: Hendrik Blockeel.  相似文献   

8.
目前,对小规模数据集进行预测时,主要使用传统机器学习算法,但传统单一模型预测效果不能达到预期准确率,且无法兼顾多项评价指标。因此,文中以小规模数据集为研究对象,融合决策树、逻辑回归、支持向量机三类模型,提出了一种多模型融合算法,并分析了其在小规模数据集上的应用效果。首先,简述了决策树、逻辑回归和支持向量机的算法原理;其次,使用决策树、逻辑回归和支持向量机作为基学习器并完成单独训练,将各模型输出结果用于下一阶段模型输入,同时使用最大似然估计迭代优化参数,从而完成多模型融合过程;最后,对数据集进行分析和处理,通过实验与单一模型进行指标对比。实验结果表明,多模型融合算法在预测精确率、召回率、准确率等方面有明显提升。  相似文献   

9.
张坤  穆志纯  常晓辉 《控制工程》2008,15(1):103-106
决策树算法训练速度快、结果易于解释,但在实际应用中其分类精度难以满足业务要求。为了提高决策树算法的精度,基于LogitBoost算法的优点,对决策树C4.5算法进行了改进。在决策树的叶节点上应用LogitBoost算法建立叠加回归模型,得到一种新型的模型树算法-LCTree算法。通过11组UCI数据集试验,经分析比较,证明LCTree算法比其他算法更有效。将该算法应用于电信客户离网预警系统建模,结果表明,该算法可有效地分析客户特征,精确地预测离网客户。  相似文献   

10.
随机森林(RF)具有抗噪能力强,预测准确率高,能够处理高维数据等优点,因此在机器学习领域得到了广泛的应用。模型决策树(MDT)是一种加速的决策树算法,虽然能够提高决策树算法的训练效率,但是随着非纯伪叶结点规模的增大,模型决策树的精度也在下降。针对上述问题,提出了一种模型决策森林算法(MDF)以提高模型决策树的分类精度。MDF算法将MDT作为基分类器,利用随机森林的思想,生成多棵模型决策树。算法首先通过旋转矩阵得到不同的样本子集,然后在这些样本子集上训练出多棵不同的模型决策树,再将这些树通过投票的方式进行集成,最后根据得到的模型决策森林给出分类结果。在标准数据集上的实验结果表明,提出的模型决策森林在分类精度上明显优于模型决策树算法,并且MDF在树的数量较少时也能取到不错的精度,避免了因树的数量增加时间复杂度增高的问题。  相似文献   

11.
Most of the methods that generate decision trees for a specific problem use the examples of data instances in the decision tree–generation process. This article proposes a method called RBDT‐1—rule‐based decision tree—for learning a decision tree from a set of decision rules that cover the data instances rather than from the data instances themselves. The goal is to create on demand a short and accurate decision tree from a stable or dynamically changing set of rules. The rules could be generated by an expert, by an inductive rule learning program that induces decision rules from the examples of decision instances such as AQ‐type rule induction programs, or extracted from a tree generated by another method, such as the ID3 or C4.5. In terms of tree complexity (number of nodes and leaves in the decision tree), RBDT‐1 compares favorably with AQDT‐1 and AQDT‐2, which are methods that create decision trees from rules. RBDT‐1 also compares favorably with ID3 while it is as effective as C4.5 where both (ID3 and C4.5) are well‐known methods that generate decision trees from data examples. Experiments show that the classification accuracies of the decision trees produced by all methods under comparison are indistinguishable.  相似文献   

12.
Artificial neural networks (ANNs) are a powerful and widely used pattern recognition technique. However, they remain "black boxes" giving no explanation for the decisions they make. This paper presents a new algorithm for extracting a logistic model tree (LMT) from a neural network, which gives a symbolic representation of the knowledge hidden within the ANN. Landwehr's LMTs are based on standard decision trees, but the terminal nodes are replaced with logistic regression functions. This paper reports the results of an empirical evaluation that compares the new decision tree extraction algorithm with Quinlan's C4.5 and ExTree. The evaluation used 12 standard benchmark datasets from the University of California, Irvine machine-learning repository. The results of this evaluation demonstrate that the new algorithm produces decision trees that have higher accuracy and higher fidelity than decision trees created by both C4.5 and ExTree.  相似文献   

13.
基于离散度的决策树构造方法   总被引:1,自引:0,他引:1  
在构造决策树的过程中,属性选择将影响到决策树的分类精度.对此,讨论了基于信息熵方法和WMR方法的局限性,提出了信息系统中条件属性集的离散度的概念.利用该概念在决策树构造过程中选择划分属性,设计了基于离散度的决策树构造算法DSD.DSD算法可以解决WMR方法在实际应用中的局限性.在UCI数据集上的实验表明,该方法构造的决策树精度与基于信息熵的方法相近,而时间复杂度则优于基于信息熵的方法.  相似文献   

14.
A fuzzy decision tree is constructed by allowing the possibility of partial membership of a point in the nodes that make up the tree structure. This extension of its expressive capabilities transforms the decision tree into a powerful functional approximant that incorporates features of connectionist methods, while remaining easily interpretable. Fuzzification is achieved by superimposing a fuzzy structure over the skeleton of a CART decision tree. A training rule for fuzzy trees, similar to backpropagation in neural networks, is designed. This rule corresponds to a global optimization algorithm that fixes the parameters of the fuzzy splits. The method developed for the automatic generation of fuzzy decision trees is applied to both classification and regression problems. In regression problems, it is seen that the continuity constraint imposed by the function representation of the fuzzy tree leads to substantial improvements in the quality of the regression and limits the tendency to overfitting. In classification, fuzzification provides a means of uncovering the structure of the probability distribution for the classification errors in attribute space. This allows the identification of regions for which the error rate of the tree is significantly lower than the average error rate, sometimes even below the Bayes misclassification rate  相似文献   

15.
16.
从熵均值决策到样本分布决策   总被引:15,自引:0,他引:15       下载免费PDF全文
为了研究归纳学习的判决精度问题,分析了C4.5算法的不足以及标准算法与亚算法之间争论和妥协的根本原因,从估计训练样本的概率分布的角度出发,给出了一种简单而新颖的决策树算法.基于UCI数据的实验结果表明,与C4.5算法相比,该方法不仅具有比较好的判决精度,而且具有更快的计算速度.  相似文献   

17.
食品安全决策是食品安全问题研究的一项重要内容。为了对食品安全状况进行分析,基于粗糙集变精度模型,提出了一种包含规则置信度的构造决策树新方法。这种新方法针对传统加权决策树生成算法进行了改进,新算法以加权平均变精度粗糙度作为属性选择标准构造决策树,用变精度近似精度来代替近似精度,可以在数据库中消除噪声冗余数据,并且能够忽略部分矛盾数据,保证决策树构建过程中能够兼容部分存在冲突的决策规则。该算法可以在生成决策树的过程中,简化其生成过程,提高其应用范围,并且有助于诠释其生成规则。验证结果表明该算法是有效可行的。  相似文献   

18.
NeC4.5: neural ensemble based C4.5   总被引:5,自引:0,他引:5  
Decision tree is with good comprehensibility while neural network ensemble is with strong generalization ability. These merits are integrated into a novel decision tree algorithm NeC4.5. This algorithm trains a neural network ensemble at first. Then, the trained ensemble is employed to generate a new training set through replacing the desired class labels of the original training examples with those output from the trained ensemble. Some extra training examples are also generated from the trained ensemble and added to the new training set. Finally, a C4.5 decision tree is grown from the new training set. Since its learning results are decision trees, the comprehensibility of NeC4.5 is better than that of neural network ensemble. Moreover, experiments show that the generalization ability of NeC4.5 decision trees can be better than that of C4.5 decision trees.  相似文献   

19.
决策树分类技术研究   总被引:28,自引:1,他引:28  
栾丽华  吉根林 《计算机工程》2004,30(9):94-96,105
决策树分类是一种重要的数据分类技术。ID3、C4.和EC4.5是建立决策树的常用算法,但目前国内对一些新的决策树分类算法研究较少。为此,在消化大量文献资料的基础上,研究了CART、SLIQ、SPRINT、PUBLIC等新算法,对各种决策树分类算法的基本思想进行阐述,并分析比较了各种算法的主要特性,为数据分类研究者提供借鉴。  相似文献   

20.
一种基于属性加权的决策树算法   总被引:1,自引:0,他引:1  
ID3算法和C4.5算法是简单而有效的决策树分类算法,但其应用于复杂决策问题上存在准确性差的问题。本文提出了一种新的基于属性加权决策树算法,基于粗集理论提出通过属性对决策影响程度的不同进行加权来构建决策树,提高了决策结果准确性。通过属性加权标记属性的重要性,权值可以从训练数据中学习得到。实验结果表明,算法明显提高了决策结果的准确率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号