首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 453 毫秒
1.
单变量决策树难以反映信息系统属性间的关联作用,构造的决策树往往规模较大。多变量决策树能较好地反映属性间的关系,得到非常简单的决策树,但使构造的决策树难以理解。针对以上两种决策树特点,提出了基于知识粗糙度的混合变量决策树的构造方法,选择知识粗糙度较小的分类属性来构造决策树。实验结果表明,这是一种操作简单、效率很高的决策树生成方法。  相似文献   

2.
单变量决策树难以反映信息系统属性间的关联作用,构造的决策树往往规模较大.多变量决策树能较好地反映属性间的关系,得到非常简单的决策树,但使构造的决策树难以理解.针对以上两种决策树特点,提出了基于知识粗糙度的混合变量决策树的构造方法,选择知识粗糙度较小的分类属性来构造决策树.实验结果表明,这是一种操作简单、效率很高的决策树生成方法.  相似文献   

3.
基于离散度的决策树构造方法   总被引:1,自引:0,他引:1  
在构造决策树的过程中,属性选择将影响到决策树的分类精度.对此,讨论了基于信息熵方法和WMR方法的局限性,提出了信息系统中条件属性集的离散度的概念.利用该概念在决策树构造过程中选择划分属性,设计了基于离散度的决策树构造算法DSD.DSD算法可以解决WMR方法在实际应用中的局限性.在UCI数据集上的实验表明,该方法构造的决策树精度与基于信息熵的方法相近,而时间复杂度则优于基于信息熵的方法.  相似文献   

4.
一个好的神经网络结构可以大大提高它的处理能力和收敛速度,所以神经网络的构造方法一直是人们研究的热点问题。本文利用粗集理论的数据分析能力和决策树对数值属性的分割能力,提出一种基于粗集与决策树的新型神经网络构造方法RCBNN。经试验表明,使用该方法构造的神经网络,具有易于构造、可理解性好、收敛速度快且构造的网络规模较小的特点。  相似文献   

5.
刘栋  宋国杰 《计算机应用》2011,31(5):1374-1377
为解决多维时间序列的分类并获取易于理解的分类规则,引入了时序熵的概念及构造时序熵的方法,基于属性选择和属性值划分两方面扩展了决策树模型。并给出了两种构造多维时间序列分类的决策树模型算法。最后,采用移动客户流失的真实数据,对过程决策树进行测试,展示了方法的可行性。  相似文献   

6.
针对C4.5决策树构造复杂、分类精度不高等问题,提出了一种基于变精度粗糙集的决策树构造改进算法.该算法采用近似分类质量作为节点选择属性的启发函数,与信息增益率相比,该标准更能准确地刻画属性分类的综合贡献能力,同时对噪声有一定的抑制能力.此外还针对两个或两个以上属性的近似分类质量相等的特殊情形,给出了如何选择最优的分类属...  相似文献   

7.
在决策表中,决策规则的可信度和对象覆盖度是衡量决策能力的重要指标。以知识粗糙熵为基础,提出决策熵的概念,并定义其属性重要性;然后以条件属性子集的决策熵来度量其对决策分类的重要性,自顶向下递归构造决策树;最后遍历决策树,简化所获得的决策规则。该方法的优点在于构造决策树及提取规则前不进行属性约简,计算直观,时间复杂度较低。实例分析的结果表明,该方法能获得更为简化有效的决策规则。  相似文献   

8.
基于决策熵的决策树规则提取方法   总被引:2,自引:0,他引:2  
在决策表中,决策规则的可信度和对象覆盖度是衡量决策能力的重要指标。以知识粗糙熵为基础,提出决策熵的概念,并定义其属性重要性;然后以条件属性子集的决策熵来度量其对决策分类的重要性,自顶向下递归构造决策树;最后遍历决策树,简化所获得的决策规则。该方法的优点在于构造决策树及提取规则前不进行属性约简,计算直观,时间复杂度较低。实例分析的结果表明,该方法能获得更为简化有效的决策规则。  相似文献   

9.
新型决策树构造方法   总被引:1,自引:0,他引:1       下载免费PDF全文
决策树是一种重要的数据挖掘工具,但构造最优决策树是一个NP-完全问题。提出了一种基于关联规则挖掘的决策树构造方法。首先定义了高可信度的近似精确规则,给出了挖掘这类规则的算法;在近似精确规则的基础上产生新的属性,并讨论了新生成属性的评价方法;然后利用新生成的属性和数据本身的属性共同构造决策树;实验结果表明新的决策树构造方法具有较高的精度。  相似文献   

10.
一种多变量决策树的构造与研究   总被引:3,自引:0,他引:3       下载免费PDF全文
单变量决策树算法造成树的规模庞大、规则复杂、不易理解,而多变量决策树是一种有效用于分类的数据挖掘方法,构造的关键是根据属性之间的相关性选择合适的属性组合构成一个新的属性作为节点。结合粗糙集原理中的知识依赖性度量和信息系统中条件属性集的离散度概念,提出了一种多变量决策树的构造算法(RD)。在UCI上部分数据集的实验结果表明,提出的多变量决策树算法的分类效果与传统的ID3算法以及基于核方法的多变量决策树的分类效果相比,有一定的提高。  相似文献   

11.
基于变精度粗糙集的决策树改进方法   总被引:1,自引:0,他引:1       下载免费PDF全文
基于变精度粗糙集理论提出了具有置信度规则决策树的新的构造方法,该方法采用β-边界域的大小作为选择分类属性的标准,并对叶节点的置信度进行了重新的定义。经实验证明,该方法能有效提高分类效率且更加容易理解。  相似文献   

12.
食品安全决策是食品安全问题研究的一项重要内容。为了对食品安全状况进行分析,基于粗糙集变精度模型,提出了一种包含规则置信度的构造决策树新方法。这种新方法针对传统加权决策树生成算法进行了改进,新算法以加权平均变精度粗糙度作为属性选择标准构造决策树,用变精度近似精度来代替近似精度,可以在数据库中消除噪声冗余数据,并且能够忽略部分矛盾数据,保证决策树构建过程中能够兼容部分存在冲突的决策规则。该算法可以在生成决策树的过程中,简化其生成过程,提高其应用范围,并且有助于诠释其生成规则。验证结果表明该算法是有效可行的。  相似文献   

13.
Support vector machine (SVM) is a state-of-art classification tool with good accuracy due to its ability to generate nonlinear model. However, the nonlinear models generated are typically regarded as incomprehensible black-box models. This lack of explanatory ability is a serious problem for practical SVM applications which require comprehensibility. Therefore, this study applies a C5 decision tree (DT) to extract rules from SVM result. In addition, a metaheuristic algorithm is employed for the feature selection. Both SVM and C5 DT require expensive computation. Applying these two algorithms simultaneously for high-dimensional data will increase the computational cost. This study applies artificial bee colony optimization (ABC) algorithm to select the important features. The proposed algorithm ABC–SVM–DT is applied to extract comprehensible rules from SVMs. The ABC algorithm is applied to implement feature selection and parameter optimization before SVM–DT. The proposed algorithm is evaluated using eight datasets to demonstrate the effectiveness of the proposed algorithm. The result shows that the classification accuracy and complexity of the final decision tree can be improved simultaneously by the proposed ABC–SVM–DT algorithm, compared with genetic algorithm and particle swarm optimization algorithm.  相似文献   

14.
基于决策支持度的决策树生成算法   总被引:2,自引:0,他引:2       下载免费PDF全文
从条件属性对决策支持程度不同的角度出发,引入了决策支持度的概念,提出了一种以其为启发式信息的决策树生成算法。实验分析表明,相对于传统的决策树生成算法,此算法改善了决策树的结构,有效提高了决策分类的精度。  相似文献   

15.
16.
The decision‐tree (DT) algorithm is a very popular and efficient data‐mining technique. It is non‐parametric and computationally fast. Besides forming interpretable classification rules, it can select features on its own. In this article, the feature‐selection ability of DT and the impacts of feature‐selection/extraction on DT with different training sample sizes were studied by using AVIRIS hyperspcetral data. DT was compared with three other feature‐selection methods; the results indicated that DT was an unstable feature selector, and the number of features selected by DT was strongly related to the sample size. Trees derived with and without feature‐selection/extraction were compared. It was demonstrated that the impacts of feature selection on DT were shown mainly as a significant increase in the number of tree nodes (14.13–23.81%) and moderate increase in tree accuracy (3.5–4.8%). Feature extraction, like Non‐parametric Weighted Feature Extraction (NWFE) and Decision Boundary Feature Extraction (DBFE), could enhance tree accuracy more obviously (4.78–6.15%) and meanwhile a decrease in the number of tree nodes (6.89–16.81%). When the training sample size was small, feature‐selection/extraction could increase the accuracy more dramatically (6.90–15.66%) without increasing tree nodes.  相似文献   

17.
在数据挖掘中,分期是一个很重要的问题,有很多流行的分类器可以创建决策树木产生类模型。本文介绍了通过信息增益或熵的比较来构造一棵决策树的数桩挖掘算法思想,给出了用粗糙集理论构造决策树的一种方法,并用曲面造型方面的实例说明了决策树的生成过程。通过与ID3方法的比较,该种方法可以降低决策树的复杂性,优化决策树的结构,能挖掘较好的规则信息。  相似文献   

18.
基于决策树的神经网络   总被引:5,自引:0,他引:5  
传统人工神经网络模型采用试探的方法确定合适的网络结构,并随机地初始化参数值,导致神经网络训练效率低、结果不稳定.熵网络是一种建立在决策树之上的3层前馈网络,在熵网络基础上,提出了基于决策树的神经网络设计方法(DTBNN).DTBNN中提供了对神经网络参数的初始值合理设置的方法,并提出了由决策树确定的只是熵网络的初始结构,在实际的网络构造中需要根据实际应用添加神经元和连接权以提高网络的性能.理论分析和实验结果表明了这种方法的合理性.  相似文献   

19.
In this research, a hybrid model is developed by integrating a case-based data clustering method and a fuzzy decision tree for medical data classification. Two datasets from UCI Machine Learning Repository, i.e., liver disorders dataset and Breast Cancer Wisconsin (Diagnosis), are employed for benchmark test. Initially a case-based clustering method is applied to preprocess the dataset thus a more homogeneous data within each cluster will be attainted. A fuzzy decision tree is then applied to the data in each cluster and genetic algorithms (GAs) are further applied to construct a decision-making system based on the selected features and diseases identified. Finally, a set of fuzzy decision rules is generated for each cluster. As a result, the FDT model can accurately react to the test data by the inductions derived from the case-based fuzzy decision tree. The average forecasting accuracy for breast cancer of CBFDT model is 98.4% and for liver disorders is 81.6%. The accuracy of the hybrid model is the highest among those models compared. The hybrid model can produce accurate but also comprehensible decision rules that could potentially help medical doctors to extract effective conclusions in medical diagnosis.  相似文献   

20.
Using Decision Trees for Agent Modeling: Improving Prediction Performance   总被引:2,自引:0,他引:2  
A modeling system may be required to predict an agent's future actions under constraints of inadequate or contradictory relevant historical evidence. This can result in low prediction accuracy, or otherwise, low prediction rates, leaving a set of cases for which no predictions are made. A previous study that explored techniques for improving prediction rates in the context of modeling students' subtraction skills using Feature Based Modeling showed a tradeoff between prediction rate and predication accuracy. This paper presents research that aims to improve prediction rates without affecting prediction accuracy. The FBM-C4.5 agent modeling system was used in this research. However, the techniques explored are applicable to any Feature Based Modeling system, and the most effective technique developed is applicable to most agent modeling systems. The default FBM-C4.5 system models agents' competencies with a set of decision trees, trained on all historical data. Each tree predicts one particular aspect of the agent's action. Predictions from multiple trees are compared for consensus. FBM-C4.5 makes no prediction when predictions from different trees contradict one another. This strategy trades off reduced prediction rates for increased accuracy. To make predictions in the absence of consensus, three techniques have been evaluated. They include using voting, using a tree quality measure and using a leaf quality measure. An alternative technique that merges multiple decision trees into a single tree provides an advantage of producing models that are more comprehensible. However, all of these techniques demonstrated the previous encountered trade-off between rate of prediction and accuracy of prediction, albeit less pronounced. It was hypothesized that models built on more current observations would outperform models built on earlier observations. Experimental results support this hypothesis. A Dual-model system, which takes this temporal factor into account, has been evaluated. This fifth approach achieved a significant improvement in prediction rate without significantly affecting prediction accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号