首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 62 毫秒
1.
While many constructive induction algorithms focus on generating new binary attributes, this paper explores novel methods of constructing nominal and numeric attributes. We propose a new constructive operator, X-of-N. An X-of-N representation is a set containing one or more attribute-value pairs. For a given instance, the value of an X-of-N representation corresponds to the number of its attribute-value pairs that are true of the instance. A single X-of-N representation can directly and simply represent any concept that can be represented by a single conjunctive, a single disjunctive, or a single M-of-N representation commonly used for constructive induction, and the reverse is not true. In this paper, we describe a constructive decision tree learning algorithm, called XofN. When building decision trees, this algorithm creates one X-of-N representation, either as a nominal attribute or as a numeric attribute, at each decision node. The construction of X-of-N representations is carried out by greedily searching the space defined by all the attribute-value pairs of a domain. Experimental results reveal that constructing X-of-N attributes can significantly improve the performance of decision tree learning in both artificial and natural domains in terms of higher prediction accuracy and lower theory complexity. The results also show the performance advantages of constructing X-of-N attributes over constructing conjunctive, disjunctive, or M-of-N representations for decision tree learning.  相似文献   

2.
牛市的开始吸引了股民蜂拥而至,为了保护投资者权益,帮助投资者进行理性投资。文章提出了从技术的角度分析股票交易数据。采用二叉决策树的方法对庞大的交易数据进行挖掘,根据决策树获取的分类规则,基本上能预测单支股票在一段时间内走势,能有效的帮助投资者进行理性投资。  相似文献   

3.
决策表中连续属性离散化,即将一个连续属性分为若干属性区间并为每个区间确定一个离散型数值。该文提出一种新的决策表连续属性离散化算法。首先使用决策强度来度量条件属性的重要性,并据此对条件属性按照属性重要性从小到大排序,然后按排序后的顺序,考察每个条件属性的所有断点,将冗余的断点去掉,从而将条件属性离散化。该算法易于理解,计算简单,算法的时间复杂性为O(3kn2)。  相似文献   

4.
在决策表中,决策规则的可信度和对象覆盖度是衡量决策能力的重要指标。以知识粗糙熵为基础,提出决策熵的概念,并定义其属性重要性;然后以条件属性子集的决策熵来度量其对决策分类的重要性,自顶向下递归构造决策树;最后遍历决策树,简化所获得的决策规则。该方法的优点在于构造决策树及提取规则前不进行属性约简,计算直观,时间复杂度较低。实例分析的结果表明,该方法能获得更为简化有效的决策规则。  相似文献   

5.
基于决策熵的决策树规则提取方法   总被引:2,自引:0,他引:2  
在决策表中,决策规则的可信度和对象覆盖度是衡量决策能力的重要指标。以知识粗糙熵为基础,提出决策熵的概念,并定义其属性重要性;然后以条件属性子集的决策熵来度量其对决策分类的重要性,自顶向下递归构造决策树;最后遍历决策树,简化所获得的决策规则。该方法的优点在于构造决策树及提取规则前不进行属性约简,计算直观,时间复杂度较低。实例分析的结果表明,该方法能获得更为简化有效的决策规则。  相似文献   

6.
决策树学习算法ID3的研究   总被引:28,自引:0,他引:28  
ID3是决策树学习的核心算法,为此详细叙述了决策树表示方法和ID3决策树学习算法,特别说明了决策属性的选取法则。通过一个学习实例给出该算法第一选取决策属性的详细过程,并且对该算法进行了讨论,一般情况下,ID3算法可以找出最优决策树。  相似文献   

7.
Many algorithms for inferring a decision tree from data involve a two-phase process: First, a very large decision tree is grown which typically ends up over-fitting the data. To reduce over-fitting, in the second phase, the tree is pruned using one of a number of available methods. The final tree is then output and used for classification on test data.In this paper, we suggest an alternative approach to the pruning phase. Using a given unpruned decision tree, we present a new method of making predictions on test data, and we prove that our algorithm's performance will not be much worse (in a precise technical sense) than the predictions made by the best reasonably small pruning of the given decision tree. Thus, our procedure is guaranteed to be competitive (in terms of the quality of its predictions) with any pruning algorithm. We prove that our procedure is very efficient and highly robust.Our method can be viewed as a synthesis of two previously studied techniques. First, we apply Cesa-Bianchi et al.'s (1993) results on predicting using expert advice (where we view each pruning as an expert) to obtain an algorithm that has provably low prediction loss, but that is computationally infeasible. Next, we generalize and apply a method developed by Buntine (1990, 1992) and Willems, Shtarkov and Tjalkens (1993, 1995) to derive a very efficient implementation of this procedure.  相似文献   

8.
To find the optimal branching of a nominal attribute at a node in an L-ary decision tree, one is often forced to search over all possible L-ary partitions for the one that yields the minimum impurity measure. For binary trees (L = 2) when there are just two classes a short-cut search is possible that is linear in n, the number of distinct values of the attribute. For the general case in which the number of classes, k, may be greater than two, Burshtein et al. have shown that the optimal partition satisfies a condition that involves the existence of 2 L hyperplanes in the class probability space. We derive a property of the optimal partition for concave impurity measures (including in particular the Gini and entropy impurity measures) in terms of the existence ofL vectors in the dual of the class probability space, which implies the earlier condition.Unfortunately, these insights still do not offer a practical search method when n and k are large, even for binary trees. We therefore present a new heuristic search algorithm to find a good partition. It is based on ordering the attribute's values according to their principal component scores in the class probability space, and is linear in n. We demonstrate the effectiveness of the new method through Monte Carlo simulation experiments and compare its performance against other heuristic methods.  相似文献   

9.
朱赟  吴炜 《计算机工程与应用》2004,40(21):90-91,156
介绍了利用平均色、代表色和轮廓线分布的联合图像特征,使用决策树建立图像分类模型方法。该方法使用变形的决策树来改善决策树的分类学习效率。实验数据表明效果显著,在对分类效果影响不大的情况下,有效提高了使用效率。  相似文献   

10.
基于Min-Ambiguity启发式算法的模糊决策树整个建立过程均是在给定的一个显著性水平参数基础上进行,该参数值的选择对于模糊决策树性能将产生重要影响。文章通过实验研究表明,在某一特定取值区间内,随着该参数值的逐步增大,可以使得模糊决策树在保持提高测试精度的前提下,使树的规模逐步减小,直至到达该参数的最优值,使树成为测试精度达到最优而树规模达到最小的一棵。而再度增大的此参数值(已超出该区间)却会导致树的过度剪枝,使树的测试精度降低。最后,通过相同数据在清晰决策树系统(C4.5系统)后剪枝前后的比较试验进一步证实,在该区间内,逐步增大的此参数值对模糊决策树性能的影响等效于清晰决策树的后剪枝。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号