首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
An analysis of a procedure to build decision trees based on imprecise probabilities and uncertainty measures, called CDT, is presented. We compare this procedure with the classic ones based on the Shannon’s entropy for precise probabilities. We found that the handling of the imprecision is a key part of obtaining improvements in the method’s performance, as it has been showed for class noise problems in classification. We present a new procedure for building decision trees extending the imprecision in the CDT’s procedure for processing all the input variables. We show, via an experimental study on data set with general noise (noise in all the input variables), that this new procedure builds smaller trees and gives better results than the original CDT and the classic decision trees.  相似文献   

2.
Data analysis techniques can be applied to discover important relations among features. This is the main objective of the Information Root Node Variation (IRNV) technique, a new method to extract knowledge from data via decision trees. The decision trees used by the original method were built using classic split criteria. The performance of new split criteria based on imprecise probabilities and uncertainty measures, called credal split criteria, differs significantly from the performance obtained using the classic criteria. This paper extends the IRNV method using two credal split criteria: one based on a mathematical parametric model, and other one based on a non-parametric model. The performance of the method is analyzed using a case study of traffic accident data to identify patterns related to the severity of an accident. We found that a larger number of rules is generated, significantly supplementing the information obtained using the classic split criteria.  相似文献   

3.
A K-nearest neighbours method based on imprecise probabilities   总被引:1,自引:1,他引:0  
K-nearest neighbours algorithms are among the most popular existing classification methods, due to their simplicity and good performances. Over the years, several extensions of the initial method have been proposed. In this paper, we propose a K-nearest neighbours approach that uses the theory of imprecise probabilities, and more specifically lower previsions. We show that the proposed approach has several assets: it can handle uncertain data in a very generic way, and decision rules developed within this theory allow us to deal with conflicting information between neighbours or with the absence of close neighbour to the instance to classify. We show that results of the basic k-NN and weighted k-NN methods can be retrieved by the proposed approach. We end with some experiments on the classical data sets.  相似文献   

4.
Our interest is in the fusion of information from multiple sources when the information provided by the individual sources is expressed in terms of an imprecise uncertainty measure. We observe that the Dempster-Shafer belief structure provides a framework for the representation of a wide class of imprecise uncertainty measures. We then discuss the fusion of multiple Dempster-Shafer belief structures using the Dempster rule and note the problems that can arise when using this fusion method because of the required normalization in the face of conflicting focal elements. We then suggest some alternative approaches fusing multiple belief structures that avoid the need for normalization.  相似文献   

5.
In the area of classification, C4.5 is a known algorithm widely used to design decision trees. In this algorithm, a pruning process is carried out to solve the problem of the over-fitting. A modification of C4.5, called Credal-C4.5, is presented in this paper. This new procedure uses a mathematical theory based on imprecise probabilities, and uncertainty measures. In this way, Credal-C4.5 estimates the probabilities of the features and the class variable by using imprecise probabilities. Besides it uses a new split criterion, called Imprecise Information Gain Ratio, applying uncertainty measures on convex sets of probability distributions (credal sets). In this manner, Credal-C4.5 builds trees for solving classification problems assuming that the training set is not fully reliable. We carried out several experimental studies comparing this new procedure with other ones and we obtain the following principal conclusion: in domains of class noise, Credal-C4.5 obtains smaller trees and better performance than classic C4.5.  相似文献   

6.
This paper presents a new architecture of a fuzzy decision tree based on fuzzy rules – fuzzy rule based decision tree (FRDT) and provides a learning algorithm. In contrast with “traditional” axis-parallel decision trees in which only a single feature (variable) is taken into account at each node, the node of the proposed decision trees involves a fuzzy rule which involves multiple features. Fuzzy rules are employed to produce leaves of high purity. Using multiple features for a node helps us minimize the size of the trees. The growth of the FRDT is realized by expanding an additional node composed of a mixture of data coming from different classes, which is the only non-leaf node of each layer. This gives rise to a new geometric structure endowed with linguistic terms which are quite different from the “traditional” oblique decision trees endowed with hyperplanes as decision functions. A series of numeric studies are reported using data coming from UCI machine learning data sets. The comparison is carried out with regard to “traditional” decision trees such as C4.5, LADtree, BFTree, SimpleCart, and NBTree. The results of statistical tests have shown that the proposed FRDT exhibits the best performance in terms of both accuracy and the size of the produced trees.  相似文献   

7.
基于代表性数据的决策树集成*   总被引:1,自引:1,他引:0  
为了获得更好的决策树集成效果,在理论分析的基础上从数据的角度提出了一种基于代表性数据的决策树集成方法。该方法使用围绕中心点的划分(PAM)算法从原始训练集中提取出代表性训练集,由该代表性训练集来训练出多个决策树分类器,并由此建立决策树集成模型。该方法能选取尽可能少的代表性数据来训练出尽可能好的决策树集成模型。实验结果表明,该方法使用更少的代表性数据能获得比Bagging和Boosting还要高的决策树集成精度。  相似文献   

8.
基于相对等待时间的代价敏感决策树   总被引:1,自引:0,他引:1       下载免费PDF全文
首先引入相对等待时间代价,将它与测试代价一起称为有形代价,利用单位有形代价中无形代价(即误分类代价)降低最多的原则选择分裂属性;然后结合序列测试策略和批量测试策略建立相对等待时间代价敏感决策树。实验结果显示,该方法无论在误分类代价的减少量上还是所需有形代价的数量上都优于存在的算法,并且实验地分析了建立代价敏感决策树考虑相对等待时间是必要的。  相似文献   

9.
基于决策树的保险客户流失分析   总被引:5,自引:4,他引:1  
保持客户和吸引客户是保险公司提高竞争力的关键,目前保险公司对客户流失的分析是粗略的或根据经验来判断。利用面向属性归纳和决策树C4.5算法对保险客户基本信息进行分析,找出客户流失的特征,能帮助保险公司有针对性地改善客户关系。  相似文献   

10.
基于离散度的决策树构造方法   总被引:1,自引:0,他引:1  
在构造决策树的过程中,属性选择将影响到决策树的分类精度.对此,讨论了基于信息熵方法和WMR方法的局限性,提出了信息系统中条件属性集的离散度的概念.利用该概念在决策树构造过程中选择划分属性,设计了基于离散度的决策树构造算法DSD.DSD算法可以解决WMR方法在实际应用中的局限性.在UCI数据集上的实验表明,该方法构造的决策树精度与基于信息熵的方法相近,而时间复杂度则优于基于信息熵的方法.  相似文献   

11.
基于决策支持度的决策树生成算法   总被引:2,自引:0,他引:2       下载免费PDF全文
从条件属性对决策支持程度不同的角度出发,引入了决策支持度的概念,提出了一种以其为启发式信息的决策树生成算法。实验分析表明,相对于传统的决策树生成算法,此算法改善了决策树的结构,有效提高了决策分类的精度。  相似文献   

12.
White (1986) and Snow (1991) have presented approximate characterizations for the posterior probabilities when the priors and conditionals are specified through linear constraint systems. This paper extends their results by developing alternative linear inequality approximations for posterior probabilities, with a particular focus on partial information expressed in terms of (i) bounds on the components of probability vectors and (ii) bounds on ratios of probabilities  相似文献   

13.
The major difficulty for large vocabulary sign recognition lies in the huge search space due to a variety of recognized classes. How to reduce the recognition time without loss of accuracy is a challenging issue. In this paper, a fuzzy decision tree with heterogeneous classifiers is proposed for large vocabulary sign language recognition. As each sign feature has the different discrimination to gestures, the corresponding classifiers are presented for the hierarchical decision to sign language attributes. A one- or two- handed classifier and a hand-shaped classifier with little computational cost are first used to progressively eliminate many impossible candidates, and then, a self-organizing feature maps/hidden Markov model (SOFM/HMM) classifier in which SOFM being as an implicit different signers' feature extractor for continuous HMM, is proposed as a special component of a fuzzy decision tree to get the final results at the last nonleaf nodes that only include a few candidates. Experimental results on a large vocabulary of 5113-signs show that the proposed method dramatically reduces the recognition time by 11 times and also improves the recognition rate about 0.95% over single SOFM/HMM.  相似文献   

14.
针对基于计算机免疫的入侵检测系统中所面临着"不完全Self集"的问题,设计了基于决策树的主从结构的Self集构造算法.将决策树引入到传统的否定选择算法中,通过决策树把经过免疫耐受淘汰后的候选检测器进行重新分类,并将满足设定条件的候选检测器集合构造"从Self集",实现Self集的动态扩充,最后利用"匹配矛盾"淘汰"从Self集"中不合格的元素.实验分析结果表明了该算法的有效性,改善了检测器识别性能.  相似文献   

15.
基于粗糙集与属性值聚类的决策树改进算法   总被引:1,自引:0,他引:1       下载免费PDF全文
采用粗糙集理论和属性值聚类相结合的方法,从决策树最优化的三个原则对其进行优化。首先,采用粗糙集理论的约简功能求出相对核,并利用信息熵作为启发信息求出相对约简,以此来保证生成决策树的路径最短和减少决策树的节点数。其次,在选择特征属性时,在信息熵增益最大的前提下,根据属性值间的相异性距离来对属性值聚类使其能够接近单峰分布。通过对UCI数据实验分析,结果表明很大程度上减少了决策树的节点数和决策树的深度。  相似文献   

16.
Increasing studies in marketing and distribution channels have shown that the power of manufacturers and retailers is reversing. In this paper, we consider a pricing decision problem in which two different manufacturers compete to distribute differentiated but substitutable products through a common retailer under different power structures. The manufacturing costs, sales costs and demands are characterized by uncertain variables. Meanwhile, uncertainty theory and game-theory-based modeling approaches are employed to formulate the pricing decision problem with three different power structures under uncertain environment. How to make the optimal pricing decisions on wholesale prices and retailer markups under three possible scenarios is derived. Numerical experiments are also given to examine the effects of power structures on the equilibrium prices and profits in uncertain environment. It is found that if the sales cost is high, consumers can enjoy lower prices when facing a powerful retailer and the super retailer can also make the supply chain more efficient.  相似文献   

17.
《Knowledge》2007,20(8):695-702
This paper presents a new approach for inducing decision trees based on Variable Precision Rough Set Model. The presented approach is aimed at handling uncertain information during the process of inducing decision trees and generalizes the rough set based approach to decision tree construction by allowing some extent misclassification when classifying objects. In the paper, two concepts, i.e. variable precision explicit region, variable precision implicit region, and the process for inducing decision trees are introduced. The authors discuss the differences between the rough set based approaches and the fundamental entropy based method. The comparison between the presented approach and the rough set based approach and the fundamental entropy based method on some data sets from the UCI Machine Learning Repository is also reported.  相似文献   

18.
Induction of multiple fuzzy decision trees based on rough set technique   总被引:5,自引:0,他引:5  
The integration of fuzzy sets and rough sets can lead to a hybrid soft-computing technique which has been applied successfully to many fields such as machine learning, pattern recognition and image processing. The key to this soft-computing technique is how to set up and make use of the fuzzy attribute reduct in fuzzy rough set theory. Given a fuzzy information system, we may find many fuzzy attribute reducts and each of them can have different contributions to decision-making. If only one of the fuzzy attribute reducts, which may be the most important one, is selected to induce decision rules, some useful information hidden in the other reducts for the decision-making will be losing unavoidably. To sufficiently make use of the information provided by every individual fuzzy attribute reduct in a fuzzy information system, this paper presents a novel induction of multiple fuzzy decision trees based on rough set technique. The induction consists of three stages. First several fuzzy attribute reducts are found by a similarity based approach, and then a fuzzy decision tree for each fuzzy attribute reduct is generated according to the fuzzy ID3 algorithm. The fuzzy integral is finally considered as a fusion tool to integrate the generated decision trees, which combines together all outputs of the multiple fuzzy decision trees and forms the final decision result. An illustration is given to show the proposed fusion scheme. A numerical experiment on real data indicates that the proposed multiple tree induction is superior to the single tree induction based on the individual reduct or on the entire feature set for learning problems with many attributes.  相似文献   

19.
20.
We give an overview of two approaches to probability theory where lower and upper probabilities, rather than probabilities, are used: Walley's behavioural theory of imprecise probabilities, and Shafer and Vovk's game-theoretic account of probability. We show that the two theories are more closely related than would be suspected at first sight, and we establish a correspondence between them that (i) has an interesting interpretation, and (ii) allows us to freely import results from one theory into the other. Our approach leads to an account of probability trees and random processes in the framework of Walley's theory. We indicate how our results can be used to reduce the computational complexity of dealing with imprecision in probability trees, and we prove an interesting and quite general version of the weak law of large numbers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号