首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Huang  Qinghua  Zhang  Fan  Li  Xuelong 《Multimedia Tools and Applications》2018,77(22):29905-29918
Multimedia Tools and Applications - This paper proposes an ultrasound breast tumor CAD system based on BI-RADS features scoring and decision tree algorithm. Because of the difficulty of biopsy...  相似文献   

2.
Learning decision tree for ranking   总被引:1,自引:3,他引:1  
Decision tree is one of the most effective and widely used methods for classification. However, many real-world applications require instances to be ranked by the probability of class membership. The area under the receiver operating characteristics curve, simply AUC, has been recently used as a measure for ranking performance of learning algorithms. In this paper, we present two novel class probability estimation algorithms to improve the ranking performance of decision tree. Instead of estimating the probability of class membership using simple voting at the leaf where the test instance falls into, our algorithms use similarity-weighted voting and naive Bayes. We design empirical experiments to verify that our new algorithms significantly outperform the recent decision tree ranking algorithm C4.4 in terms of AUC.
Liangxiao JiangEmail:
  相似文献   

3.
基于关联度函数的决策树分类算法   总被引:10,自引:0,他引:10  
韩松来  张辉  周华平 《计算机应用》2005,25(11):2655-2657
为了克服决策树算法中普遍存在的多值偏向问题,提出了一种新的基于关联度函数的决策树算法--AF算法,并从理论上分析了它克服多值偏向的原理。通过实验发现,与ID3算法比较,AF算法不仅克服了多值偏向问题,而且保持了较高的分类正确率。  相似文献   

4.
5.
This paper focuses on improving decision tree induction algorithms when a kind of tie appears during the rule generation procedure for specific training datasets. The tie occurs when there are equal proportions of the target class outcome in the leaf node's records that leads to a situation where majority voting cannot be applied. To solve the above mentioned exception, we propose to base the prediction of the result on the naive Bayes(NB)estimate, k-nearest neighbour(k-NN)and association rule mining(ARM). The other features used for splitting the parent nodes are also taken into consideration.  相似文献   

6.
In this paper a co-processor for the hardware aided decision tree induction using evolutionary approach (EFTIP) is proposed. EFTIP is used for hardware acceleration of the fitness evaluation task since this task is proven in the paper to be the execution time bottleneck. The EFTIP co-processor can significantly improve the execution time of a novel algorithm for the full decision tree induction using evolutionary approach (EFTI) when used to accelerate the fitness evaluation task. The comparison of the HW/SW EFTI implementation with the pure software implementation suggests that the proposed HW/SW architecture offers substantial DT induction time speedups for the selected benchmark datasets from the standard UCI machine learning repository database.  相似文献   

7.
Decision trees are a kind of off-the-shelf predictive models, and they have been successfully used as the base learners in ensemble learning. To construct a strong classifier ensemble, the individual classifiers should be accurate and diverse. However, diversity measure remains a mystery although there were many attempts. We conjecture that a deficiency of previous diversity measures lies in the fact that they consider only behavioral diversity, i.e., how the classifiers behave when making predictions, neglecting the fact that classifiers may be potentially different even when they make the same predictions. Based on this recognition, in this paper, we advocate to consider structural diversity in addition to behavioral diversity, and propose the TMD (tree matching diversity) measure for decision trees. To investigate the usefulness of TMD, we empirically evaluate performances of selective ensemble approaches with decision forests by incorporating different diversity measures. Our results validate that by considering structural and behavioral diversities together, stronger ensembles can be constructed. This may raise a new direction to design better diversity measures and ensemble methods.  相似文献   

8.
《Knowledge》2002,15(1-2):37-43
Decision tree is a divide and conquer classification method used in machine learning. Most pruning methods for decision trees minimize a classification error rate. In uncertain domains, some sub-trees that do not decrease the error rate can be relevant in pointing out some populations of specific interest or to give a representation of a large data file. A new pruning method (called DI pruning) is presented here. It takes into account the complexity of sub-trees and is able to keep sub-trees with leaves yielding to determine relevant decision rules, although they do not increase the classification efficiency. DI pruning allows to assess the quality of the data used for the knowledge discovery task. In practice, this method is implemented in the UnDeT software.  相似文献   

9.
BackgroundThe application of microarray data for cancer classification is important. Researchers have tried to analyze gene expression data using various computational intelligence methods.PurposeWe propose a novel method for gene selection utilizing particle swarm optimization combined with a decision tree as the classifier to select a small number of informative genes from the thousands of genes in the data that can contribute in identifying cancers.ConclusionStatistical analysis reveals that our proposed method outperforms other popular classifiers, i.e., support vector machine, self-organizing map, back propagation neural network, and C4.5 decision tree, by conducting experiments on 11 gene expression cancer datasets.  相似文献   

10.
Abstract: Cancer classification, through gene expression data analysis, has produced remarkable results, and has indicated that gene expression assays could significantly aid in the development of efficient cancer diagnosis and classification platforms. However, cancer classification, based on DNA array data, remains a difficult problem. The main challenge is the overwhelming number of genes relative to the number of training samples, which implies that there are a large number of irrelevant genes to be dealt with. Another challenge is from the presence of noise inherent in the data set. It makes accurate classification of data more difficult when the sample size is small. We apply genetic algorithms (GAs) with an initial solution provided by t statistics, called t‐GA, for selecting a group of relevant genes from cancer microarray data. The decision‐tree‐based cancer classifier is built on the basis of these selected genes. The performance of this approach is evaluated by comparing it to other gene selection methods using publicly available gene expression data sets. Experimental results indicate that t‐GA has the best performance among the different gene selection methods. The Z‐score figure also shows that some genes are consistently preferentially chosen by t‐GA in each data set.  相似文献   

11.
We propose a method for hierarchical clustering based on the decision tree approach. As in the case of supervised decision tree, the unsupervised decision tree is interpretable in terms of rules, i.e., each leaf node represents a cluster, and the path from the root node to a leaf node represents a rule. The branching decision at each node of the tree is made based on the clustering tendency of the data available at the node. We present four different measures for selecting the most appropriate attribute to be used for splitting the data at every branching node (or decision node), and two different algorithms for splitting the data at each decision node. We provide a theoretical basis for the approach and demonstrate the capability of the unsupervised decision tree for segmenting various data sets. We also compare the performance of the unsupervised decision tree with that of the supervised one.  相似文献   

12.
Real-life datasets are often imbalanced, that is, there are significantly more training samples available for some classes than for others, and consequently the conventional aim of reducing overall classification accuracy is not appropriate when dealing with such problems. Various approaches have been introduced in the literature to deal with imbalanced datasets, and are typically based on oversampling, undersampling or cost-sensitive classification. In this paper, we introduce an effective ensemble of cost-sensitive decision trees for imbalanced classification. Base classifiers are constructed according to a given cost matrix, but are trained on random feature subspaces to ensure sufficient diversity of the ensemble members. We employ an evolutionary algorithm for simultaneous classifier selection and assignment of committee member weights for the fusion process. Our proposed algorithm is evaluated on a variety of benchmark datasets, and is confirmed to lead to improved recognition of the minority class, to be capable of outperforming other state-of-the-art algorithms, and hence to represent a useful and effective approach for dealing with imbalanced datasets.  相似文献   

13.
基于粒度商的决策树构造算法   总被引:1,自引:0,他引:1  
以粗糙集理论为基础,结合知识关系具有粒度性质的原理,从条件属性集和决策属性集之间关联度来预测和表达决策属性集的一种优性度量,从而定义了粒度商的概念.基于知识粗糙性的粒度原理,以决策树方法为理论基础,把粒度商的概念应用到决策树方法中,提出了一种新的构建决策树的方法,并详细分析了该算法的优点.实例研究表明,提出的基于粒度商的决策树构造算法是可靠、有效的,为进一步研究知识的粒度计算提供了可行的方法.但没有研究不同粒度世界之间的联系,这方面工作还有待进一步研究.  相似文献   

14.
一种改进的决策树分类属性选择方法   总被引:2,自引:0,他引:2       下载免费PDF全文
分析了ID3算法的基本原理、实现步骤及现有两种改进分类算法的优缺点,针对ID3算法的取值偏向问题和现有两种改进算法在分类时间、分类精确度方面存在的不足,提出了一种新的分类属性选择方案,并利用数学知识对其进行了优化。经实验证明,优化后的方案克服了ID3算法的取值偏向问题,同时在分类时间及分类精确度方面优于ID3算法及现有两种改进的分类算法。  相似文献   

15.
随着联邦学习的不断兴起,梯度提升决策树(GBDT)作为一种传统的机器学习方法,逐渐应用于联邦学习中以达到理想的分类效果。针对现有GBDT的横向联邦学习模型,存在精度受非独立同分布数据的影响较大、信息泄露和通信成本高等问题,提出了一种面向非独立同分布数据的联邦梯度提升决策树(federated GBDT for non-IID dataset,nFL-GBDT)。首先,采用局部敏感哈希(LSH)来计算各个参与方之间的相似样本,通过加权梯度来构建第一棵树。其次,由可靠第三方计算只需要一轮通信的全局叶权重来更新树模型。最后,实验分析表明了该算法能够实现对原始数据的隐私保护,并且通信成本低于simFL和FederBoost。同时,实验按照不平衡比率来划分三组公共的数据集,结果表明该算法与Individual、TFL及F-GBDT-G相比,准确率分别提升了3.53%、5.46%和4.43%。  相似文献   

16.
This paper proposes a method for constructing ensembles of decision trees, random feature weights (RFW). The method is similar to Random Forest, they are methods that introduce randomness in the construction method of the decision trees. In Random Forest only a random subset of attributes are considered for each node, but RFW considers all of them. The source of randomness is a weight associated with each attribute. All the nodes in a tree use the same set of random weights but different from the set of weights in other trees. So, the importance given to the attributes will be different in each tree and that will differentiate their construction. The method is compared to Bagging, Random Forest, Random-Subspaces, AdaBoost and MultiBoost, obtaining favourable results for the proposed method, especially when using noisy data sets. RFW can be combined with these methods. Generally, the combination of RFW with other method produces better results than the combined methods. Kappa-error diagrams and Kappa-error movement diagrams are used to analyse the relationship between the accuracies of the base classifiers and their diversity.  相似文献   

17.
最小比率生成树是找出目标函数形式为两个线性函数比值最小的生成树,例如总代价与总收益比值最小的生成树。当不限制分母的符号时,这是一个NP-hard问题。在分析最小比率生成树数学性质的基础上,提出了最小比率生成树的竞争决策算法。为了防止算法陷入局部最优,采用edge_exchange操作来增加算法的搜索范围。为了验证算法的有效性,采用无关和相关两种策略产生测试数据,并使用Delphi 7.0实现了算法的具体步骤。  相似文献   

18.
《Knowledge》2006,19(7):511-515
Decision tree is useful to obtain a proper set of rules from a large amount of instances. However, it has difficulty in obtaining the relationship between continuous-valued data points. We propose in this paper a novel algorithm, Self-adaptive NBTree, which induces a hybrid of decision tree and Naive Bayes. The Bayes measure, which is used to construct decision tree, can directly handle continuous attributes and automatically find the most appropriate boundaries for discretization and the number of intervals. The Naive Bayes node helps to solve overgeneralization and overspecialization problems which are often seen in decision tree. Experimental results on a variety of natural domains indicate that Self-adaptive NBTree has clear advantages with respect to the generalization ability.  相似文献   

19.
With the developments in the information technology, fraud is spreading all over the world, resulting in huge financial losses. Though fraud prevention mechanisms such as CHIP&PIN are developed for credit card systems, these mechanisms do not prevent the most common fraud types such as fraudulent credit card usages over virtual POS (Point Of Sale) terminals or mail orders so called online credit card fraud. As a result, fraud detection becomes the essential tool and probably the best way to stop such fraud types. In this study, a new cost-sensitive decision tree approach which minimizes the sum of misclassification costs while selecting the splitting attribute at each non-terminal node is developed and the performance of this approach is compared with the well-known traditional classification models on a real world credit card data set. In this approach, misclassification costs are taken as varying. The results show that this cost-sensitive decision tree algorithm outperforms the existing well-known methods on the given problem set with respect to the well-known performance metrics such as accuracy and true positive rate, but also a newly defined cost-sensitive metric specific to credit card fraud detection domain. Accordingly, financial losses due to fraudulent transactions can be decreased more by the implementation of this approach in fraud detection systems.  相似文献   

20.
一种健壮有效的决策树改进模型   总被引:2,自引:0,他引:2  
提出了一种健壮有效的决策树改进模型AJD3.该决策树模型基于经典的ID3决策树模型,在属性的选取上进行了改进.利用引入属性优先关联度参数计算节点的修正信息增益,并选择具有最高修正增益的属性作为当前节点的测试属性.实验表明,AID3决策树模型在提高分类准确率的同时,有效地增强了模型的健壮性.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号