首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Knowledge discovery refers to identifying hidden and valid patterns in data and it can be used to build knowledge inference systems. Decision tree is one such successful technique for supervised learning and extracting knowledge or rules. This paper aims at developing a decision tree model to predict the occurrence of diabetes disease. Traditional decision tree algorithms have a problem with crisp boundaries. Much better decision rules can be identified from these clinical data sets with the use of the fuzzy decision boundaries. The key step in the construction of a decision tree is the identification of split points and in this work best split points are identified using the Gini index. Authors propose a method to minimize the calculation of Gini indices by identifying false split points and used the Gaussian fuzzy function because the clinical data sets are not crisp. As the efficiency of the decision tree depends on many factors such as number of nodes and the length of the tree, pruning of decision tree plays a key role. The modified Gini index-Gaussian fuzzy decision tree algorithm is proposed and is tested with Pima Indian Diabetes (PID) clinical data set for accuracy. This algorithm outperforms other decision tree algorithms.  相似文献   

2.
数据挖掘中判定树算法SLIQ的设计与应用   总被引:4,自引:0,他引:4  
分析了一种用Gini指标进行属性选择的SLIQ算法,讨论了提高效率的可行方法.把算法用到电力市场发电竞价决策系统中,通过对发电商的竞标能力进行挖掘,获取的知识对发电商的决策有重要现实意义.  相似文献   

3.
机器学习中的决策树算法具有重要的数据分类功能,但基于信息增益的ID3算法与基于基尼指数的CART算法的分类功效还值得提高.构造信息增益与基尼指数的自适应集成度量,设计有效的决策树算法,以提升ID3与C A RT两类基本算法的性能.分析信息增益信息表示与基尼指数代数表示的异质无关性,采用基于知识的加权线性组合来建立信息增...  相似文献   

4.
决策树分类技术研究   总被引:28,自引:1,他引:28  
栾丽华  吉根林 《计算机工程》2004,30(9):94-96,105
决策树分类是一种重要的数据分类技术。ID3、C4.和EC4.5是建立决策树的常用算法,但目前国内对一些新的决策树分类算法研究较少。为此,在消化大量文献资料的基础上,研究了CART、SLIQ、SPRINT、PUBLIC等新算法,对各种决策树分类算法的基本思想进行阐述,并分析比较了各种算法的主要特性,为数据分类研究者提供借鉴。  相似文献   

5.
根据医学图像数据的特性,提出一种基于粗糙集和决策树相结合的数据挖掘新方法。该方法利用粗糙集中基于属性重要性的离散化方法对医学图像特征进行离散化,采用粗糙集对其属性进行约简,得到低维训练数据,再用SLIQ决策树算法产生决策规则。实验表明:将粗糙理论与SLIQ相结合的数据挖掘方法既保留了原始数据的内部特点,同时剔除了与分类无关或关系不大的冗余特征,从而提高了分类的准确率和效率。  相似文献   

6.
为在同等隐私保护级别下提高模型的预测准确率并降低误差,提出一种基于ExtraTrees的差分隐私保护算法DiffPETs。在决策树生成过程中,根据不同的准则计算出各特征的结果值,利用指数机制选择得分最高的特征,通过拉普拉斯机制在叶子节点上进行加噪,使算法能够提供ε-差分隐私保护。将DiffPETs算法应用于决策树分类和回归分析中,对于分类树,选择基尼指数作为指数机制的可用性函数并给出基尼指数的敏感度,在回归树上,将方差作为指数机制的可用性函数并给出方差的敏感度。实验结果表明,与决策树差分隐私分类和回归算法相比,DiffPETs算法能有效降低预测误差。  相似文献   

7.
8.
基于SLIQ分类算法的数据挖掘技术及其在企业CRM中的应用   总被引:4,自引:0,他引:4  
研究了SLIQ算法的预处理、计算最佳分裂、执行分裂几个大的阶段以及具体算法设计实现过程。最后,将SLIQ算法运用到建设工业集团销售公司中,并与客户关系管理系统结合起来,为公司决策提供支持和依据。  相似文献   

9.
Genetically optimized fuzzy decision trees.   总被引:1,自引:0,他引:1  
In this study, we are concerned with genetically optimized fuzzy decision trees (G-DTs). Decision trees are fundamental architectures of machine learning, pattern recognition, and system modeling. Starting with the generic decision tree with discrete or interval-valued attributes, we develop its fuzzy set-based generalization. In this generalized structure we admit the values of the attributes that are represented by some membership functions. Such fuzzy decision trees are constructed in the setting of genetic optimization. The underlying genetic algorithm optimizes the parameters of the fuzzy sets associated with the individual nodes where they play a role of fuzzy "switches" by distributing a flow of processing completed within the tree. We discuss various forms of the fitness function that help capture the essence of the problem at hand (that could be either of classification nature when dealing with discrete outputs or regression-like when handling a continuous output variable). We quantify a nature of the generalization of the tree by studying an optimally adjusted spreads of the membership functions located at the nodes of the decision tree. A series of experiments exploiting synthetic and machine learning data is used to illustrate the performance of the G-DTs.  相似文献   

10.
为提升公众林业知识水平,促进树种知识的推广,以北京市乔灌木枝叶检索表中的枝叶检索知识为基础,采集乔灌木枝叶特征图片、树种知识及树种图片,使用产生式规则表示法对枝叶检索知识进行表达和组织,构建了枝叶检索知识的链式双亲表示模型,建立了乔灌木识别知识库,设计了乔灌木识别推理算法。在此基础上,研建了北京市乔灌木识别专家系统,实现了专家知识的存储、乔灌木识别推理算法以及乔灌木树种的识别。运行实例表明,采用产生式规则知识表示法构建的乔灌木识别推理算法,能够实现乔灌木树种的准确识别。  相似文献   

11.
针对现有决策树算法对连续性数据分类的信息丢失、效果不佳等缺点,提出一种邻域决策树(NDT)构造算法.首先,挖掘了邻域决策信息系统上的变精度邻域等价粒,并探讨了相关性质;然后基于变精度邻域等价粒构建邻域基尼指数度量,以度量邻域决策信息系统的不确定性;最后,用邻域基尼指数度量诱导出树节点的选取条件,并以变精度邻域等价粒为树...  相似文献   

12.
针对电信CRM中“数据丰富但知识贫乏”的现象,利用数据挖掘技术和SL IQ决策树构造算法建立一棵决策树模型,在CRM中根据客户的年龄、所属城市和性别对客户分类,对预测客户类型,防止用户流失,争取新用户具有重要意义。  相似文献   

13.
Learning from data streams is a challenging task which demands a learning algorithm with several high quality features. In addition to space complexity and speed requirements needed for processing the huge volume of data which arrives at high speed, the learning algorithm must have a good balance between stability and plasticity. This paper presents a new approach to induce incremental decision trees on streaming data. In this approach, the internal nodes contain trainable split tests. In contrast with traditional decision trees in which a single attribute is selected as the split test, each internal node of the proposed approach contains a trainable function based on multiple attributes, which not only provides the flexibility needed in the stream context, but also improves stability. Based on this approach, we propose evolving fuzzy min–max decision tree (EFMMDT) learning algorithm in which each internal node of the decision tree contains an evolving fuzzy min–max neural network. EFMMDT splits the instance space non-linearly based on multiple attributes which results in much smaller and shallower decision trees. The extensive experiments reveal that the proposed algorithm achieves much better precision in comparison with the state-of-the-art decision tree learning algorithms on the benchmark data streams, especially in the presence of concept drift.  相似文献   

14.
王雅辉  钱宇华  刘郭庆 《计算机应用》2021,41(10):2785-2792
传统决策树算法应用于有序分类任务时存在两个问题:传统决策树算法没有引入序关系,因此无法学习和抽取数据集中的序结构;现实生活中存在大量模糊而非精确的知识,而传统的决策树算法无法处理存在模糊属性取值的数据。针对上述问题,提出了基于模糊优势互补互信息的有序决策树算法。首先,使用优势集表示数据中的序关系,并引入模糊集来计算优势集以形成模糊优势集。模糊优势集不仅能反映数据中的序信息,而且能自动获取不精确知识。然后,在模糊优势集的基础上将互补互信息进行推广,并提出了模糊优势互补互信息。最后,使用模糊优势互补互信息作为启发式,设计出基于模糊优势互补互信息的有序决策树算法。在5个人工数据集及9个现实数据集上的实验结果表明,所提算法在有序分类任务上较经典决策树算法取得了更低的分类误差。  相似文献   

15.
云计算为存储和分析海量数据提供了高效的解决方案,对数据挖掘算法的研究具有重要的理论意义和应用价值。SLIQ算法采用逐一遍历并计算伸缩性指标的方法来寻找最佳分裂点,这种方法过于消耗时间,当数据量增大时,算法的执行效率很低。本文针对云计算环境下的决策规则挖掘算法展开研究,介绍了Map Reduce编程模型,在此基础上,以实现云计算环境下SLIQ并行化挖掘为目的,给出了改进后的SLIQ算法在Map Reduce编程模型上的应用过程。  相似文献   

16.
传统决策树通过对特征空间的递归划分寻找决策边界,给出特征空间的“硬”划分。但对于处理大数据和复杂模式问题时,这种精确决策边界降低了决策树的泛化能力。为了让决策树算法获得对不精确知识的自动获取,把模糊理论引进了决策树,并在建树过程中,引入神经网络作为决策树叶节点,提出了一种基于神经网络的模糊决策树改进算法。在神经网络模糊决策树中,分类器学习包含两个阶段:第一阶段采用不确定性降低的启发式算法对大数据进行划分,直到节点划分能力低于真实度阈值[ε]停止模糊决策树的增长;第二阶段对该模糊决策树叶节点利用神经网络做具有泛化能力的分类。实验结果表明,相较于传统的分类学习算法,该算法准确率高,对识别大数据和复杂模式的分类问题能够通过结构自适应确定决策树规模。  相似文献   

17.
18.
Abstract: In generating a suitable fuzzy classifier system, significant effort is often placed on the determination and the fine tuning of the fuzzy sets. However, in such systems little thought is given to the way in which membership functions are combined within the fuzzy rules. Often traditional fuzzy inference strategies are used which consequently provide no control over how strongly or weakly the inference is applied within these rules. Furthermore such strategies will allow no interaction between grades of membership. A number of theoretical fuzzy inference operators have been proposed for both regression and classification problems but they have not been investigated in the context of real-world applications. In this paper we propose a novel genetic algorithm framework for optimizing the strength of fuzzy inference operators concurrently with the tuning of membership functions for a given fuzzy classifier system. Each fuzzy system is generated using two well-established decision tree algorithms: C4.5 and CHAID. This will enable both classification and regression problems to be addressed within the framework. Each solution generated by the genetic algorithm will produce a set of fuzzy membership functions and also determine how strongly the inference will be applied within each fuzzy rule. We investigate several theoretical proven fuzzy inference techniques (T-norms) in the context of both classification and regression problems. The methodology proposed is applied to a number of real-world data sets in order to determine the effects of the simultaneous tuning of membership functions and inference parameters on the accuracy and robustness of fuzzy classifiers.  相似文献   

19.
杨杨  赵政 《计算机应用》2006,26(10):2457-2459
针对公共危机应急系统数据库中数据庞杂,记录分类较难的情况,提出了一种采用遗传算法选择参数的模糊决策树算法,提高了决策树分类算法的准确率和得到规则的可解释性。将设计的分类器应用到实际的公安系统数据库当中,在对原有记录进行分类的基础上,得到了有效的规则,成功地帮助警务人员对当前的危急事件做出快速准确的预测和判断。  相似文献   

20.
因果关系,贝叶斯网络与认知图   总被引:22,自引:0,他引:22  
刘志强 《自动化学报》2001,27(4):552-566
因果关系在预测和推理中具有重要的作用.贝叶斯网络已被用于构建诊断和决策系 统.近年来模糊认知图得到了重视.模糊认知图为结构性知识与因果推理提供了又一个理论 框架.本文简单介绍贝叶斯网络与认知图及其推理方法在智能系统中的应用.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号