首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
In this paper a co-processor for the hardware aided decision tree induction using evolutionary approach (EFTIP) is proposed. EFTIP is used for hardware acceleration of the fitness evaluation task since this task is proven in the paper to be the execution time bottleneck. The EFTIP co-processor can significantly improve the execution time of a novel algorithm for the full decision tree induction using evolutionary approach (EFTI) when used to accelerate the fitness evaluation task. The comparison of the HW/SW EFTI implementation with the pure software implementation suggests that the proposed HW/SW architecture offers substantial DT induction time speedups for the selected benchmark datasets from the standard UCI machine learning repository database.  相似文献   

2.
Look-ahead based fuzzy decision tree induction   总被引:2,自引:0,他引:2  
Decision tree induction is typically based on a top-down greedy algorithm that makes locally optimal decisions at each node. Due to the greedy and local nature of the decisions made at each node, there is considerable possibility of instances at the node being split along branches such that instances along some or all of the branches require a large number of additional nodes for classification. In this paper, we present a computationally efficient way of incorporating look-ahead into fuzzy decision tree induction. Our algorithm is based on establishing the decision at each internal node by jointly optimizing the node splitting criterion (information gain or gain ratio) and the classifiability of instances along each branch of the node. Simulations results confirm that the use of the proposed look-ahead method leads to smaller decision trees and as a consequence better test performance  相似文献   

3.
This paper addresses the problem of using decision lists for building machine learning algorithms. In this work, we first highlight the expressive power of Decision Lists (DL), which were already known to generalize decision trees. We also present ICDL, a new algorithm for learning simple decision lists. This problem – learning low size and high accuracy lists – is, as we prove formally, theoretically hard and calls for the use of heuristics such as CN2, BruteDL or ICDL. Our method is based on an original technique midway between learning rule based procedures and decision trees. ICDL operates in two stages: it first greedily builds a large decision list then prunes it to obtain a smaller yet accurate one, thereby avoiding the drawbacks associated with the first phase alone. Experimental results show the efficiency of our approach by comparing them to the two well-known algorithms CN2 and C4.5. ICDL's time complexity is low. It produces decision lists whose size is far smaller compared to both CN2 and C4.5, and whose accuracy also compares favourably with theirs. Finally, ICDL has the advantage that it can also be used to build decision trees using a CART-like scheme. Thus, our algorithm has the particularity to be able to provide two different types of widely used classifiers, which the user can choose freely.  相似文献   

4.
This paper deals with some improvements to rule induction algorithms in order to resolve the tie that appear in special cases during the rule generation procedure for specific training data sets. These improvements are demonstrated by experimental results on various data sets. The tie occurs in decision tree induction algorithm when the class prediction at a leaf node cannot be determined by majority voting. When there is a conflict in the leaf node, we need to find the source and the solution to the problem. In this paper, we propose to calculate the Influence factor for each attribute and an update procedure to the decision tree has been suggested to deal with the problem and provide subsequent rectification steps.  相似文献   

5.
In this article we show that there is a strong connection between decision tree learning and local pattern mining. This connection allows us to solve the computationally hard problem of finding optimal decision trees in a wide range of applications by post-processing a set of patterns: we use local patterns to construct a global model. We exploit the connection between constraints in pattern mining and constraints in decision tree induction to develop a framework for categorizing decision tree mining constraints. This framework allows us to determine which model constraints can be pushed deeply into the pattern mining process, and allows us to improve the state-of-the-art of optimal decision tree induction.  相似文献   

6.
Classification based on decision trees is one of the important problems in data mining and has applications in many fields. In recent years, database systems have become highly distributed, and distributed system paradigms, such as federated and peer-to-peer databases, are being adopted. In this paper, we consider the problem of inducing decision trees in a large distributed network of genomic databases. Our work is motivated by the existence of distributed databases in healthcare and in bioinformatics, and by the emergence of systems which automatically analyze these databases, and by the expectancy that these databases will soon contain large amounts of highly dimensional genomic data. Current decision tree algorithms require high communication bandwidth when executed on such data, which are large-scale distributed systems. We present an algorithm that sharply reduces the communication overhead by sending just a fraction of the statistical data. A fraction which is nevertheless sufficient to derive the exact same decision tree learned by a sequential learner on all the data-in the network. Extensive experiments using standard synthetic SNP data show that the algorithm utilizes the high dependency among attributes, typical to genomic data, to reduce communication overhead by up to 99 percent. Scalability tests show that the algorithm scales well with both the size of the data set, the dimensionality of the data, and the size of the distributed system.  相似文献   

7.
A mass assignment based ID3 algorithm for learning probabilistic fuzzy decision trees is introduced. Fuzzy partitions are used to discretize continuous feature universes and to reduce complexity when universes are discrete but with large cardinalities. Furthermore, the fuzzy partitioning of classification universes facilitates the use of these decision trees in function approximation problems. Generally the incorporation of fuzzy sets into this paradigm overcomes many of the problems associated with the application of decision trees to real-world problems. The probabilities required for the trees are calculated according to mass assignment theory applied to fuzzy labels. The latter concept is introduced to overcome computational complexity problems associated with higher dimensional mass assignment evaluations on databases. ©1997 John Wiley & Sons, Inc.  相似文献   

8.
模糊决策树算法在处理数量型属性的数据时,需要进行数据模糊化预处理。但是,每个数量型属性应该模糊化为几个语言项通常要凭经验设定的,目前还没有使用标准粒子群优化算法(PSO)自动设定语言项个数的研究。提出使用PSO确定语言项个数的模糊决策树算法(FDT-K算法),通过实验证明FDT-K算法产生的模糊决策树性能明显优于凭经验设定语言项个数所产生的模糊决策树。  相似文献   

9.
The decision tree (DT) induction process has two major phases: the growth phase and the pruning phase. The pruning phase aims to generalize the DT that was generated in the growth phase by generating a sub-tree that avoids over-fitting to the training data. Most post-pruning methods essentially address post-pruning as if it were a single objective problem (i.e. maximize validation accuracy), and address the issue of simplicity (in terms of the number of leaves) only in the case of a tie. However, it is well known that apart from accuracy there are other performance measures (e.g. stability, simplicity, interpretability) that are important for evaluating DT quality. In this paper, we propose that multi-objective evaluation be done during the post-pruning phase in order to select the best sub-tree, and propose a procedure for obtaining the optimal sub-tree based on user provided preference and value function information.  相似文献   

10.

Selecting the right set of features for classification is one of the most important problems in designing a good classifier. Decision tree induction algorithms such as C4.5 have incorporated in their learning phase an automatic feature selection strategy, while some other statistical classification algorithms require the feature subset to be selected in a preprocessing phase. It is well known that correlated and irrelevant features may degrade the performance of the C4.5 algorithm. In our study, we evaluated the influence of feature preselection on the prediction accuracy of C4.5 using a real-world data set. We observed that accuracy of the C4.5 classifier could be improved with an appropriate feature preselection phase for the learning algorithm. Beyond that, the number of features used for classification can be reduced, which is important for image interpretation tasks since feature calculation is a time-consuming process.  相似文献   

11.
Decision trees are a widely used tool for pattern recognition and data mining. Over the last 4 decades, many algorithms have been developed for the induction of decision trees. Most of the classic algorithms use a greedy, divide‐and‐conquer search method to find an optimal tree, whereas recently evolutionary methods have been used to perform a global search in the space of possible trees. To the best of our knowledge, limited research has addressed the issue of multi‐interval decision trees. In this paper, we improve our previous work on multi‐interval trees and compare our previous and current work with a classic algorithm, ie, chi‐squared automatic interaction detection, and an evolutionary algorithm, ie, evtree. The results show that the proposed method improves on our previous method both in accuracy and in speed. It also outperforms chi‐squared automatic interaction detection and performs comparably to evtree. The trees generated by our method have more nodes but are shallower than those produced by evtree.  相似文献   

12.
Shuyu  Zhongying 《Knowledge》2006,19(8):675-680
This paper proposes an improved decision tree method for web information retrieval with self-map attributes. Our self-map tree has a value of self-map attribute in its internal node, and information based on dissimilarity between a pair of map sequences. Our method selects self-map which exists between data by exhaustive search based on relation and attribute information. Experimental results confirm that our improved method constructs comprehensive and accurate decision tree. Moreover, an example shows that our self-map decision tree is promising for data mining and knowledge discovery.  相似文献   

13.
Learning decision tree for ranking   总被引:1,自引:3,他引:1  
Decision tree is one of the most effective and widely used methods for classification. However, many real-world applications require instances to be ranked by the probability of class membership. The area under the receiver operating characteristics curve, simply AUC, has been recently used as a measure for ranking performance of learning algorithms. In this paper, we present two novel class probability estimation algorithms to improve the ranking performance of decision tree. Instead of estimating the probability of class membership using simple voting at the leaf where the test instance falls into, our algorithms use similarity-weighted voting and naive Bayes. We design empirical experiments to verify that our new algorithms significantly outperform the recent decision tree ranking algorithm C4.4 in terms of AUC.
Liangxiao JiangEmail:
  相似文献   

14.
The relational data model has become the standard for mainstream database processing despite its well-known weakness in the area of representing application semantics. The research community's response to this situation has been the development of a collection of semantic data models that allow more of the meaning of information to be presented in a database. The primary tool for accomplishing this has been the use of various data abstractions, most commonly: inclusion, aggregation and association. This paper develops a general model for analyzing data abstractions, and then applies it to these three best-known abstractions.  相似文献   

15.
An induction principle, called context induction, is presented which is appropriate for the verification of behavioural properties of abstract data types. The usefulness of the proof principle is documented by several applications: the verification of behavioural theorems over a behavioural specification, the verification of behavioural implementations and the verification of forget-restrict-identify implementations.In particular, it is shown that behavioural implementations and forget-restrict-identify implementations (under certain assumptions) can be characterised by the same condition on contexts, i.e. (under the given assumptions) both concepts are equivalent. This leads to the suggestion to use context induction as a uniform proof method for correctness proofs of algebraic implementations.  相似文献   

16.
Hybrid decision tree   总被引:6,自引:0,他引:6  
  相似文献   

17.
基于关联度函数的决策树分类算法   总被引:10,自引:0,他引:10  
韩松来  张辉  周华平 《计算机应用》2005,25(11):2655-2657
为了克服决策树算法中普遍存在的多值偏向问题,提出了一种新的基于关联度函数的决策树算法--AF算法,并从理论上分析了它克服多值偏向的原理。通过实验发现,与ID3算法比较,AF算法不仅克服了多值偏向问题,而且保持了较高的分类正确率。  相似文献   

18.
In this paper, we present a new algorithm for learning oblique decision trees. Most of the current decision tree algorithms rely on impurity measures to assess the goodness of hyperplanes at each node while learning a decision tree in top-down fashion. These impurity measures do not properly capture the geometric structures in the data. Motivated by this, our algorithm uses a strategy for assessing the hyperplanes in such a way that the geometric structure in the data is taken into account. At each node of the decision tree, we find the clustering hyperplanes for both the classes and use their angle bisectors as the split rule at that node. We show through empirical studies that this idea leads to small decision trees and better performance. We also present some analysis to show that the angle bisectors of clustering hyperplanes that we use as the split rules at each node are solutions of an interesting optimization problem and hence argue that this is a principled method of learning a decision tree.  相似文献   

19.
Cut selection based on heuristic information is one of the most fundamental issues in the induction of decision trees with continuous valued attributes. This paper connects the selection of optimal cuts with a class of heuristic information functions together. It statistically shows that both training and testing accuracies in decision tree learning are dependent strongly on the selection of heuristics. A clear relationship between the second-order derivative of heuristic information function and locations of optimal cuts is mathematically derived and further is confirmed experimentally. Incorporating this relationship into a process of building decision trees, we can significantly reduce the number of detected cuts and furthermore improve the generalization of the decision tree.  相似文献   

20.
Decision trees are a kind of off-the-shelf predictive models, and they have been successfully used as the base learners in ensemble learning. To construct a strong classifier ensemble, the individual classifiers should be accurate and diverse. However, diversity measure remains a mystery although there were many attempts. We conjecture that a deficiency of previous diversity measures lies in the fact that they consider only behavioral diversity, i.e., how the classifiers behave when making predictions, neglecting the fact that classifiers may be potentially different even when they make the same predictions. Based on this recognition, in this paper, we advocate to consider structural diversity in addition to behavioral diversity, and propose the TMD (tree matching diversity) measure for decision trees. To investigate the usefulness of TMD, we empirically evaluate performances of selective ensemble approaches with decision forests by incorporating different diversity measures. Our results validate that by considering structural and behavioral diversities together, stronger ensembles can be constructed. This may raise a new direction to design better diversity measures and ensemble methods.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号