期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

刘星毅《计算机技术与发展》2008,18(5):70-72

分类问题是数据挖掘和机器学习中的一个核心问题.为了得到最大程度的分类准确率,决策树分类过程中,非常关键的是结点分裂属性的选择.常见的分裂结点属性选择方法可以分为信息熵方法、GINI系数方法等.分析了目前常见的选择分裂属性方法--基于信息熵方法的优、缺点,提出了基于卡方检验的决策树分裂属性的选择方法,用真实例子和设置模拟实验说明了文中算法的优越性.实验结果显示文中算法在分类错误率方面好于以信息熵为基础的方法. 相似文献

2.

一种新的分裂属性选择方法

刘星毅《计算机应用》2009,29(3):839-842

代价敏感决策树通常讨论测试代价和误分类代价,在其分类过程中,最关键的是节点分裂属性的选择。分析了代价敏感决策树分类问题目前常见的选择分裂属性方法的优、缺点,提出了综合信息量和测试代价并且最大程度降低误分类代价的分裂属性选择方法,UCI数据集实验结果显示该方法在各个方面好于已有的方法。相似文献

3.

基于性价比的分裂属性选择方法

《计算机应用》2009,29(3)

相似文献

4.

一种改进的决策树分类属性选择方法 总被引：2，自引：0，他引：2

王苗柴瑞敏《计算机工程与应用》2010,46(8):127-129

分析了ID3算法的基本原理、实现步骤及现有两种改进分类算法的优缺点,针对ID3算法的取值偏向问题和现有两种改进算法在分类时间、分类精确度方面存在的不足,提出了一种新的分类属性选择方案,并利用数学知识对其进行了优化。经实验证明,优化后的方案克服了ID3算法的取值偏向问题,同时在分类时间及分类精确度方面优于ID3算法及现有两种改进的分类算法。相似文献

5.

决策树中基于基尼指数的属性分裂方法

陈云樱吴积钦徐可佳《计算机技术与发展》2004,14(5)

决策树是数据挖掘中的一个重要算法.文中首先介绍了决策树的生成思想,和生成过程中关于多值属性的分离问题.基尼指数是多值属性分离的一种方法,文中详细介绍了基尼指数作为一种不纯度分裂方法的原理,并通过一个分别用两种方式进行基尼分裂的实例.最后参阅国内外文献将基尼指数与其他一些算法如信息增益、χ2统计作了比较来说明其在多值属性分裂时的一些优点和缺点. 相似文献

6.

决策树中基于基尼指数的属性分裂方法 总被引：2，自引：0，他引：2

陈云樱吴积钦徐可佳《微机发展》2004,14(5):66-68

决策树是数据挖掘中的一个重要算法。文中首先介绍了决策树的生成思想,和生成过程中关于多值属性的分离问题。基尼指数是多值属性分离的一种方法,文中详细介绍了基尼指数作为一种不纯度分裂方法的原理,并通过一个分别用两种方式进行基尼分裂的实例。最后参阅国内外文献将基尼指数与其他一些算法如信息增益、χ2统计作了比较来说明其在多值属性分裂时的一些优点和缺点。相似文献

7.

一种基于决策树的多属性分类方法 总被引：2，自引：0，他引：2

赖邦传陈晓红《计算机工程》2005,31(5):88-89,226

通过分析对象属性的关系,在建立属性列表的基础上,简化有效属性,同时利用分组计数的方法统计属性取值的类别分布信息,提出了一个基于决策树的两阶段多属性分类算法,可以有效地提高发现分类规则的准确性。最后给出了相应的具体算法。相似文献

8.

决策树属性选择标准的改进 总被引：1，自引：0，他引：1

谢妞妞刘於勋《计算机工程与应用》2010,46(34):115-118

决策树算法是数据挖掘领域的一个研究热点,通常用来形成分类器和预测模型,在实际中有着广泛的应用。重点阐述了经典的ID3决策树算法,分析了它的优缺点,结合泰勒公式和麦克劳林公式提出了新的属性选择标准。改进后的算法通过简化信息熵的计算,提高了分类准确度,缩短了决策树的生成时间,减少了计算成本。实验证明,改进后算法的有效性和正确性。相似文献

9.

用粗糙集理论建立决策树的一种方法

刘旭敏黄厚宽徐维祥《通讯和计算机》2005,2(8):37-40

在数据挖掘中,分期是一个很重要的问题,有很多流行的分类器可以创建决策树木产生类模型。本文介绍了通过信息增益或熵的比较来构造一棵决策树的数桩挖掘算法思想,给出了用粗糙集理论构造决策树的一种方法,并用曲面造型方面的实例说明了决策树的生成过程。通过与ID3方法的比较,该种方法可以降低决策树的复杂性,优化决策树的结构,能挖掘较好的规则信息。相似文献

10.

一种基于属性加权的决策树算法 总被引：1，自引：0，他引：1

张琼声陈晓伟李春华刘童璇《微计算机应用》2010,31(1):58-63

ID3算法和C4．5算法是简单而有效的决策树分类算法,但其应用于复杂决策问题上存在准确性差的问题。本文提出了一种新的基于属性加权决策树算法,基于粗集理论提出通过属性对决策影响程度的不同进行加权来构建决策树,提高了决策结果准确性。通过属性加权标记属性的重要性,权值可以从训练数据中学习得到。实验结果表明,算法明显提高了决策结果的准确率。相似文献

11.

Implementation and Evaluation of Decision Trees with Range and Region Splitting

Yasuhiko Morimoto Takeshi Fukuda Shinichi Morishita Takeshi Tokuyama 《Constraints》1997,2(3-4):401-427

We propose an extension of an entropy-based heuristic for constructing a decision tree from a large database with many numeric attributes. When it comes to handling numeric attributes, conventional methods are inefficient if any numeric attributes are strongly correlated. Our approach offers one solution to this problem. For each pair of numeric attributes with strong correlation, we compute a two-dimensional association rule with respect to these attributes and the objective attribute of the decision tree. In particular, we consider a family R of grid-regions in the plane associated with the pairof attributes. For R R, the data canbe split into two classes: data inside R and dataoutside R. We compute the region R_opt R that minimizes the entropy of the splitting,and add the splitting associated with R_opt (foreach pair of strongly correlated attributes) to the set of candidatetests in an entropy-based heuristic. We give efficient algorithmsfor cases in which R is (1) x-monotone connected regions, (2) based-monotone regions, (3) rectangles, and (4) rectilinear convex regions. The algorithm has been implemented as a subsystem of SONAR (System for Optimized Numeric Association Rules) developed by the authors. We have confirmed that we can compute the optimal region efficiently. And diverse experiments show that our approach can create compact trees whose accuracy is comparable with or better than that of conventional trees. More importantly, we can grasp non-linear correlation among numeric attributes which could not be found without our region splitting. 相似文献

12.

基于信息熵的决策树算法实现 总被引：5，自引：0，他引：5

孙细明张晓鹏《计算机与数字工程》2005,33(11):94-95,121

由数据挖掘中的分类技术引出ID3算法并对其进行了简要的概括,探讨基于信息增益的度量选择测试属性方法。以MFC Class Wizard的过滤记录集取代以往记录集,研制C＋＋实现最优属性选择和ID3算法。相似文献

13.

用元决策树组合多个分类器的方法

何丽韩文秀《计算机工程》2005,31(12):18-19,80

在机器学习中,分类器融合已经成为一个新的研究领域。该本文介绍了用元决策树(MDT)融合多个分类器的新方法,阐释了MDT、元属性以及用MDT组合多个分类器的stacking框架。相似文献

14.

动态决策树算法研究 总被引：1，自引：0，他引：1

尹阿东谢霖铨龙誉杨立东《计算机工程与应用》2004,40(33):103-105,132

该文在增量决策树算法的基础上,提出一种能够处理变化数据集的减量决策树算法,提出并证明了减量决策树算法中的三个基本定理,保证了减量决策树算法的可靠性。同时将传统的增量决策树算法与该文所提出的减量决策树算法相结合,构造出一种动态决策树算法,该算法很好地解决了发生增减变化的动态数据集构造决策树的问题,另外动态决策树算法的提出也促进了在线规则提取的发展与完善。相似文献

15.

Induction of Decision Trees 总被引：390，自引：5，他引：390

Quinlan J.R. 《Machine Learning》1986,1(1):81-106

The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions. 相似文献

16.

Coding Decision Trees 总被引：4，自引：0，他引：4

Wallace C.S. Patrick J.D. 《Machine Learning》1993,11(1):7-22

相似文献

17.

Partitioning Nominal Attributes in Decision Trees

Don Coppersmith Se June Hong Jonathan R.M. Hosking 《Data mining and knowledge discovery》1999,3(2):197-217

To find the optimal branching of a nominal attribute at a node in an L-ary decision tree, one is often forced to search over all possible L-ary partitions for the one that yields the minimum impurity measure. For binary trees (L = 2) when there are just two classes a short-cut search is possible that is linear in n, the number of distinct values of the attribute. For the general case in which the number of classes, k, may be greater than two, Burshtein et al. have shown that the optimal partition satisfies a condition that involves the existence of ₂ ^L hyperplanes in the class probability space. We derive a property of the optimal partition for concave impurity measures (including in particular the Gini and entropy impurity measures) in terms of the existence ofL vectors in the dual of the class probability space, which implies the earlier condition.Unfortunately, these insights still do not offer a practical search method when n and k are large, even for binary trees. We therefore present a new heuristic search algorithm to find a good partition. It is based on ordering the attribute's values according to their principal component scores in the class probability space, and is linear in n. We demonstrate the effectiveness of the new method through Monte Carlo simulation experiments and compare its performance against other heuristic methods. 相似文献

18.

C-Net: A Method for Generating Non-deterministic and Dynamic Multivariate Decision Trees

H. A. Abbass M. Towsey G. Finn 《Knowledge and Information Systems》2001,3(2):184-197

Despite the fact that artificial neural networks (ANNs) are universal function approximators, their black box nature (that is, their lack of direct interpretability or expressive power) limits their utility. In contrast, univariate decision trees (UDTs) have expressive power, although usually they are not as accurate as ANNs. We propose an improvement, C-Net, for both the expressiveness of ANNs and the accuracy of UDTs by consolidating both technologies for generating multivariate decision trees (MDTs). In addition, we introduce a new concept, recurrent decision trees, where C-Net uses recurrent neural networks to generate an MDT with a recurrent feature. That is, a memory is associated with each node in the tree with a recursive condition which replaces the conventional linear one. Furthermore, we show empirically that, in our test cases, our proposed method achieves a balance of comprehensibility and accuracy intermediate between ANNs and UDTs. MDTs are found to be intermediate since they are more expressive than ANNs and more accurate than UDTs. Moreover, in all cases MDTs are more compact (i.e., smaller tree size) than UDTs. Received 27 January 2000 / Revised 30 May 2000 / Accepted in revised form 30 October 2000 相似文献

19.

Multivariate Decision Trees 总被引：24，自引：0，他引：24

Brodley Carla E. Utgoff Paul E. 《Machine Learning》1995,19(1):45-77

Unlike a univariate decision tree, a multivariate decision tree is not restricted to splits of the instance space that are orthogonal to the features' axes. This article addresses several issues for constructing multivariate decision trees: representing a multivariate test, including symbolic and numeric features, learning the coefficients of a multivariate test, selecting the features to include in a test, and pruning of multivariate decision trees. We present several new methods for forming multivariate decision trees and compare them with several well-known methods. We compare the different methods across a variety of learning tasks, in order to assess each method's ability to find concise, accurate decision trees. The results demonstrate that some multivariate methods are in general more effective than others (in the context of our experimental assumptions). In addition, the experiments confirm that allowing multivariate tests generally improves the accuracy of the resulting decision tree over a univariate tree. 相似文献