共查询到20条相似文献,搜索用时 15 毫秒
1.
Multimedia Tools and Applications - This paper proposes an ultrasound breast tumor CAD system based on BI-RADS features scoring and decision tree algorithm. Because of the difficulty of biopsy... 相似文献
2.
Learning decision tree for ranking 总被引:1,自引:3,他引:1
Decision tree is one of the most effective and widely used methods for classification. However, many real-world applications
require instances to be ranked by the probability of class membership. The area under the receiver operating characteristics
curve, simply AUC, has been recently used as a measure for ranking performance of learning algorithms. In this paper, we present
two novel class probability estimation algorithms to improve the ranking performance of decision tree. Instead of estimating
the probability of class membership using simple voting at the leaf where the test instance falls into, our algorithms use
similarity-weighted voting and naive Bayes. We design empirical experiments to verify that our new algorithms significantly
outperform the recent decision tree ranking algorithm C4.4 in terms of AUC.
相似文献
Liangxiao JiangEmail: |
3.
《Theoretical computer science》2003,292(2):387-416
4.
Real-life datasets are often imbalanced, that is, there are significantly more training samples available for some classes than for others, and consequently the conventional aim of reducing overall classification accuracy is not appropriate when dealing with such problems. Various approaches have been introduced in the literature to deal with imbalanced datasets, and are typically based on oversampling, undersampling or cost-sensitive classification. In this paper, we introduce an effective ensemble of cost-sensitive decision trees for imbalanced classification. Base classifiers are constructed according to a given cost matrix, but are trained on random feature subspaces to ensure sufficient diversity of the ensemble members. We employ an evolutionary algorithm for simultaneous classifier selection and assignment of committee member weights for the fusion process. Our proposed algorithm is evaluated on a variety of benchmark datasets, and is confirmed to lead to improved recognition of the minority class, to be capable of outperforming other state-of-the-art algorithms, and hence to represent a useful and effective approach for dealing with imbalanced datasets. 相似文献
5.
This paper proposes a method for constructing ensembles of decision trees, random feature weights (RFW). The method is similar to Random Forest, they are methods that introduce randomness in the construction method of the decision trees. In Random Forest only a random subset of attributes are considered for each node, but RFW considers all of them. The source of randomness is a weight associated with each attribute. All the nodes in a tree use the same set of random weights but different from the set of weights in other trees. So, the importance given to the attributes will be different in each tree and that will differentiate their construction. The method is compared to Bagging, Random Forest, Random-Subspaces, AdaBoost and MultiBoost, obtaining favourable results for the proposed method, especially when using noisy data sets. RFW can be combined with these methods. Generally, the combination of RFW with other method produces better results than the combined methods. Kappa-error diagrams and Kappa-error movement diagrams are used to analyse the relationship between the accuracies of the base classifiers and their diversity. 相似文献
6.
基于粒度商的决策树构造算法 总被引:1,自引:0,他引:1
以粗糙集理论为基础,结合知识关系具有粒度性质的原理,从条件属性集和决策属性集之间关联度来预测和表达决策属性集的一种优性度量,从而定义了粒度商的概念.基于知识粗糙性的粒度原理,以决策树方法为理论基础,把粒度商的概念应用到决策树方法中,提出了一种新的构建决策树的方法,并详细分析了该算法的优点.实例研究表明,提出的基于粒度商的决策树构造算法是可靠、有效的,为进一步研究知识的粒度计算提供了可行的方法.但没有研究不同粒度世界之间的联系,这方面工作还有待进一步研究. 相似文献
7.
We propose a method for hierarchical clustering based on the decision tree approach. As in the case of supervised decision tree, the unsupervised decision tree is interpretable in terms of rules, i.e., each leaf node represents a cluster, and the path from the root node to a leaf node represents a rule. The branching decision at each node of the tree is made based on the clustering tendency of the data available at the node. We present four different measures for selecting the most appropriate attribute to be used for splitting the data at every branching node (or decision node), and two different algorithms for splitting the data at each decision node. We provide a theoretical basis for the approach and demonstrate the capability of the unsupervised decision tree for segmenting various data sets. We also compare the performance of the unsupervised decision tree with that of the supervised one. 相似文献
8.
With the developments in the information technology, fraud is spreading all over the world, resulting in huge financial losses. Though fraud prevention mechanisms such as CHIP&PIN are developed for credit card systems, these mechanisms do not prevent the most common fraud types such as fraudulent credit card usages over virtual POS (Point Of Sale) terminals or mail orders so called online credit card fraud. As a result, fraud detection becomes the essential tool and probably the best way to stop such fraud types. In this study, a new cost-sensitive decision tree approach which minimizes the sum of misclassification costs while selecting the splitting attribute at each non-terminal node is developed and the performance of this approach is compared with the well-known traditional classification models on a real world credit card data set. In this approach, misclassification costs are taken as varying. The results show that this cost-sensitive decision tree algorithm outperforms the existing well-known methods on the given problem set with respect to the well-known performance metrics such as accuracy and true positive rate, but also a newly defined cost-sensitive metric specific to credit card fraud detection domain. Accordingly, financial losses due to fraudulent transactions can be decreased more by the implementation of this approach in fraud detection systems. 相似文献
9.
10.
11.
12.
Alzu’bi Amal Najadat Hassan Doulat Wesam Al-Shari Osama Zhou Leming 《Multimedia Tools and Applications》2021,80(9):13787-13800
Multimedia Tools and Applications - Breast cancer is one of the most common types of cancer among Jordanian women. Recently, healthcare organizations in Jordan have adopted electronic health... 相似文献
13.
In this paper, we present a novel computer-aided diagnostic (CAD) system based on the Breast Imaging Reporting and Data System (BI-RADS) terminology scores of screening ultrasonography (US). The decision tree algorithm is adopted to analyze the BI-RADS information to differentiate between the malignant and benign breast tumors. Although many ultrasonography CAD systems have been developed for decades, there are still some problems in clinical practice. Previous CAD systems are opaque for clinicians and cannot process the ultrasound image from different ultrasound machines. This study proposes a novel CAD system utilizing BI-RADS scoring standard and Classification and Regression Tree (CART) algorithm to overcome the two problems. The original dataset consists of 1300 ultrasound breast images. Three well-experienced clinicians evaluated all of the images according to the BI-RADS feature scoring standard. Subsequently, each image could be transformed into a 25?×?1 vector. The CART algorithm was finally used to classify these vectors. In the experiments, we used the oversampling method to balance the number of malignant samples and benign samples. The 5-fold cross validation was employed to evaluate the performance of the system. The accuracy reached 94.58%, the specificity was 98.84%, the sensitivity was 90.80%, the positive predictive value (PPV) was 98.91% and the negative predictive value (NVP) was 90.56%. The experiment results show that the proposed system can obtain a sufficient performance in the breast diagnosis and can effectively recognize the benign breast tumors in BI-RADS 3. 相似文献
14.
Database classification suffers from two well-known difficulties, i.e., the high dimensionality and non-stationary variations within the large historic data. This paper presents a hybrid classification model by integrating a case-based reasoning technique, a fuzzy decision tree (FDT), and genetic algorithms (GAs) to construct a decision-making system for data classification in various database applications. The model is major based on the idea that the historic database can be transformed into a smaller case base together with a group of fuzzy decision rules. As a result, the model can be more accurately respond to the current data under classifying from the inductions by these smaller case-based fuzzy decision trees. Hit rate is applied as a performance measure and the effectiveness of our proposed model is demonstrated experimentally compared with other approaches on different database classification applications. The average hit rate of our proposed model is the highest among others. 相似文献
15.
A new node splitting measure termed as distinct class based splitting measure (DCSM) for decision tree induction giving importance to the number of distinct classes in a partition has been proposed in this paper. The measure is composed of the product of two terms. The first term deals with the number of distinct classes in each child partition. As the number of distinct classes in a partition increase, this first term increases and thus Purer partitions are thus preferred. The second term decreases when there are more examples of a class compared to the total number of examples in the partition. The combination thus still favors purer partition. It is shown that the DCSM satisfies two important properties that a split measure should possess viz. convexity and well-behavedness. Results obtained over several datasets indicate that decision trees induced based on the DCSM provide better classification accuracy and are more compact (have fewer nodes) than trees induced using two of the most popular node splitting measures presently in use. 相似文献
16.
对ID3算法的基本原理及其主要不足以及现有几种改进算法的优缺点进行了简要分析,针对ID3算法的主要不足即倾向于多值属性的选取,利用粗糙集理论和数学相关知识点对其进行了一定程度的改进。理论分析和实验结果表明,改进后的算法在一定程度上不仅较好地解决了ID3算法的多值偏向问题而且大大简化了算法的计算过程,明显提高了算法分类准确度和执行效率。 相似文献
17.
The aim of this study is to define the risk factors that are effective in Breast Cancer (BC) occurrence, and to construct a supportive model that will promote the cause-and-effect relationships among the factors that are crucial to public health. In this study, we utilize Rule-Based Fuzzy Cognitive Map (RBFCM) approach that can successfully represent knowledge and human experience, introducing concepts to represent the essential elements and the cause-and-effect relationships among the concepts to model the behavior of any system. In this study, a decision-making system is constructed to evaluate risk factors of BC based on the information from oncologists. To construct causal relationship, the weight matrix of RBFCM is determined with the combination of the experts’ experience, expertise and views. The results of the proposed methodology will allow better understanding into several root causes, with the help of which, oncologists can improve their prevention and protection recommendation. The results showed that Social Class and Late Maternal Age can be seen as important modifiable factors; on the other hand, Benign Breast Disease, Family History and Breast Density can be considered as important factors as non-modifiable risk factors. This study is somehow weighing the interrelations of the BC risk factors and is enabling us to make a sensitivity analysis between the scenario studies and BC risk factors. A soft computing method is used to simulate the changes of a system over time and address “what if” questions to compare between different case studies. 相似文献
18.
Witold Charatonik 《国际计算机数学杂志》2013,90(6):1150-1170
There are many decision problems in automata theory (including membership, emptiness, inclusion and universality problems) that are NP-hard for some classes of tree automata (TA). The study of their parameterized complexity allows us to find new bounds of their nonpolynomial time algorithmic behaviours. We present results of such a study for classical TA, rigid tree automata, TA with global equality and disequality and t-DAG automata. As parameters we consider the number of states, the cardinality of the signature, the size of the term or the t-dag and the size of the automaton. 相似文献
19.
Petra Perner 《Applied Artificial Intelligence》2013,27(8):747-760
Selecting the right set of features for classification is one of the most important problems in designing a good classifier. Decision tree induction algorithms such as C4.5 have incorporated in their learning phase an automatic feature selection strategy, while some other statistical classification algorithms require the feature subset to be selected in a preprocessing phase. It is well known that correlated and irrelevant features may degrade the performance of the C4.5 algorithm. In our study, we evaluated the influence of feature preselection on the prediction accuracy of C4.5 using a real-world data set. We observed that accuracy of the C4.5 classifier could be improved with an appropriate feature preselection phase for the learning algorithm. Beyond that, the number of features used for classification can be reduced, which is important for image interpretation tasks since feature calculation is a time-consuming process. 相似文献
20.
Aswini Kumar Mohanty Manas Ranjan Senapati Swapnasikta Beberta Saroj Kumar Lenka 《Neural computing & applications》2013,23(3-4):1011-1017
Mammogram—breast X-ray—is considered the most effective, low cost, and reliable method in early detection of breast cancer. Although general rules for the differentiation between benign and malignant breast lesions exist, only 15–30 % of masses referred for surgical biopsy are actually malignant. In this work, an approach is proposed to develop a computer-aided classification system for cancer detection from digital mammograms. The proposed system consists of three major steps. The first step is region of interest (ROI) extraction of 256 × 256 pixels size. The second step is the feature extraction; we used a set of 19 GLCM and GLRLM features, and the 19 (nineteen) features extracted from gray-level run-length matrix and gray-level co-occurrence matrix could distinguish malignant masses from benign masses with an accuracy of 96.7 %. Further analysis was carried out by involving only 12 of the 19 features extracted, which consists of 5 features extracted from GLCM matrix and 7 features extracted from GLRL matrix. The 12 selected features are as follows: Energy, Inertia, Entropy, Maxprob, Inverse, SRE, LRE, GLN, RLN, LGRE, HGRE, and SRLGE; ARM with 12 features as prediction can distinguish malignant mass image and benign mass with a level of accuracy of 93.6 %. Further analysis showed that area under the receiver operating curve was 0.995, which means that the accuracy level of classification is good or very good. Based on that data, it was concluded that texture analysis based on GLCM and GLRLM could distinguish malignant image and benign image with considerably good result. The third step is the classification process; we used the technique of decision tree using image content to classify between normal and cancerous masses. The proposed system was shown to have the large potential for cancer detection from digital mammograms. 相似文献