首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
检索系统利用排名学习算法从训练集中产生一个排名模型。而减少检索数据需要的时间则是检索系统的一种重要研究方向。为了减少检索的时间,对排名模型的剪枝策略和缓存进行了研究。利用决策树的冗余特性和高速缓冲存储器,提出了剪枝决策树模型和分块算法。最后,在两个公开的数据集上进行了实验,主要关注了是否可以在不影响模型效果的条件下,提高排名模型的效率的问题。实验结果表明剪枝决策树模型和分块算法可以有效地减少每个查询的排名时间。  相似文献   

2.

Pruning is an effective technique in improving the generalization performance of decision tree. However, most of the existing methods are time-consuming or unsuitable for small dataset. In this paper, a new pruning algorithm based on structural risk of leaf node is proposed. The structural risk is measured by the product of the accuracy and the volume (PAV) in leaf node. The comparison experiments with Cost-Complexity Pruning using cross-validation (CCP-CV) algorithm on some benchmark datasets show that PAV pruning largely reduces the time cost of CCP-CV, while the test accuracy of PAV pruning is close to that of CCP-CV.

  相似文献   

3.
张晓龙  骆名剑 《计算机应用》2005,25(9):1986-1988
决策树是机器学习和数据挖掘领域中一种基本的学习方法。文中分析了C4.5算法以及该算法不足之处,提出了一种决策树裁剪算法,其中以规则信息量作为判断标准。实验结果表明这种方法可以提高最终模型的预测精度,并能够很好克服数据中的噪音。  相似文献   

4.
目前关于决策树剪枝优化方面的研究主要集中于预剪枝和后剪枝算法。然而,这些剪枝算法通常作用于传统的决策树分类算法,在代价敏感学习与剪枝优化算法相结合方面还没有较好的研究成果。基于经济学中的效益成本分析理论,提出代价收益矩阵及单位代价收益等相关概念,采用单位代价收益最大化原则对决策树叶节点的类标号进行分配,并通过与预剪枝策略相结合,设计一种新型的决策树剪枝算法。通过对生成的决策树进行单位代价收益剪枝,使其具有代价敏感性,能够很好地解决实际问题。实验结果表明,该算法能生成较小规模的决策树,且与REP、EBP算法相比具有较好的分类效果。  相似文献   

5.
《Knowledge》2002,15(1-2):37-43
Decision tree is a divide and conquer classification method used in machine learning. Most pruning methods for decision trees minimize a classification error rate. In uncertain domains, some sub-trees that do not decrease the error rate can be relevant in pointing out some populations of specific interest or to give a representation of a large data file. A new pruning method (called DI pruning) is presented here. It takes into account the complexity of sub-trees and is able to keep sub-trees with leaves yielding to determine relevant decision rules, although they do not increase the classification efficiency. DI pruning allows to assess the quality of the data used for the knowledge discovery task. In practice, this method is implemented in the UnDeT software.  相似文献   

6.
It is very expensive and time-consuming to annotate huge amounts of data. Active learning would be a suitable approach to minimize the effort of annotation. A novel active learning approach, coupled K nearest neighbor pseudo pruning (CKNNPP), is proposed in the paper, which is based on querying examples by KNNPP method. The KNNPP method applies k nearest neighbor technique to search for k neighbor samples from labeled samples of unlabeled samples. When k labeled samples are not belong to the same class, the corresponded unlabeled sample is queried and given its right label by supervisor, and then it is added to labeled training set. In contrast with the previous depiction, the unlabeled sample is not selected and pruned, that is the pseudo pruning. This definition is enlightened from the K nearest neighbor pruning preprocessing. These samples selected by KNNPP are considered to be near or on the optimal classification hyperplane that is crucial for active learning. Especially, in order to avoid the excursion of the optimal classification hyperplane after adding a queried sample, CKNNPP method is proposed finally that two samples with different class label (like a couple, annotated by supervisor) are queried by KNNPP and added in the training set simultaneously for updating training set in each iteration. The CKNNPP can provide a good performance, and especially it is simple, effective, and robust, and can solve the classification problem with unbalanced dataset compared with the existing methods. Then, the computational complexity of CKNNPP is analyzed. Additionally, a new stopping criterion is applied in the proposed method, and the classifier is implemented by Lagrangian Support Vector Machines in iterations of active learning. Finally, twelve UCI datasets, image datasets of aircrafts, and the dataset of radar high-resolution range profile are used to validate the feasibility and effectiveness of the proposed method. The results illuminate that CKNNPP gains superior performance compared with the other seven state-of-the-art active learning approaches.  相似文献   

7.
A generalisation of bottom-up pruning is proposed as a model level combination method for a decision tree ensemble. Bottom up pruning on a single tree involves choosing between a subtree rooted at a node, and a leaf, dependant on a pruning criterion. A natural extension to an ensemble of trees is to allow subtrees from other ensemble trees to be grafted onto a node in addition to the operations of pruning to a leaf and leaving the existing subtree intact. Suitable pruning criteria are proposed and tested for this multi-tree pruning context. Gains in both performance and in particular compactness over individually pruned trees are observed in tests performed on a number of datasets from the UCI database. The method is further illustrated on a churn prediction problem in the telecommunications domain.  相似文献   

8.
一种基于强化规则学习的高效入侵检测方法   总被引:9,自引:1,他引:8  
在入侵检测研究领域中,提高检测模型的检测率并降低误报率是一个重要的研究课题.在对归纳学习理论深入研究的基础上,将规则学习算法应用到入侵检测建模中.针对审计训练数据不足时出现的检测精度下降的情况,提出了一种基于强化规则学习的高效入侵检测方法EAIDBRL(efficient approach to intrusion detection based on boosting rule learning).在EAIDBRL方法中,首先调整传统Boosting算法的权重更新过程在各个预测目标类内部进行,以消除退化现象;然后修改传统规则学习算法中规则生长和规则剪枝过程的评价准则函数;最后使用改进后的Boosting算法来增强弱规则学习器对网络审计数据的分类性能.标准入侵检测数据集上的测试结果表明,EAIDBRL方法能够较大地提高传统规则学习检测模型在小样本条件下的入侵检测性能.  相似文献   

9.
Cost-sensitive learning algorithms are typically designed for minimizing the total cost when multiple costs are taken into account. Like other learning algorithms, cost-sensitive learning algorithms must face a significant challenge, over-fitting, in an applied context of cost-sensitive learning. Specifically speaking, they can generate good results on training data but normally do not produce an optimal model when applied to unseen data in real world applications. It is called data over-fitting. This paper deals with the issue of data over-fitting by designing three simple and efficient strategies, feature selection, smoothing and threshold pruning, against the TCSDT (test cost-sensitive decision tree) method. The feature selection approach is used to pre-process the data set before applying the TCSDT algorithm. The smoothing and threshold pruning are used in a TCSDT algorithm before calculating the class probability estimate for each decision tree leaf. To evaluate our approaches, we conduct extensive experiments on the selected UCI data sets across different cost ratios, and on a real world data set, KDD-98 with real misclassification cost. The experimental results show that our algorithms outperform both the original TCSDT and other competing algorithms on reducing data over-fitting.  相似文献   

10.
针对YOLO系列目标检测算法中复杂的网络模型和大量冗余参数问题,提出了一种基于自适应阈值的循环剪枝算法:在经过基础训练和稀疏化训练后,进入到自适应阈值剪枝模块,该模块针对缩放因子分布情况,通过缩放因子对通道和卷积层的重要性进行评估,自主学习到一个剪枝阈值,再对网络模型进行剪枝,此过程可以循环进行,并在通道剪枝和层剪枝中应用。该算法中的阈值不是人为设定,而是针对当前网络结构学习获得,通过剪枝获得一个更优的精简模型。算法实验基于YOLOv3在三个数据集上验证,结果表明,该算法对不同数据集、不同网络结构表现出较强的适应性,与传统固定阈值相比,通过自适应阈值剪枝的模型在检测精度、压缩效果、推理速度等方面都取得了更优的效果。  相似文献   

11.
目的 深度学习在自动驾驶环境感知中的应用,将极大提升感知系统的精度和可靠性,但是现有的深度学习神经网络模型因其计算量和存储资源的需求难以部署在计算资源有限的自动驾驶嵌入式平台上。因此为解决应用深度神经网络所需的庞大计算量与嵌入式平台有限的计算能力之间的矛盾,提出了一种基于权重的概率分布的贪婪网络剪枝方法,旨在减少网络模型中的冗余连接,提高模型的计算效率。方法 引入权重的概率分布,在训练过程中记录权重参数中较小值出现的概率。在剪枝阶段,依据训练过程中统计的权重概率分布进行增量剪枝和网络修复,改善了目前仅以权重大小为依据的剪枝策略。结果 经实验验证,在Cifar10数据集上,在各个剪枝率下本文方法相比动态网络剪枝策略的准确率更高。在ImageNet数据集上,此方法在较小精度损失的情况下,有效地将AlexNet、VGG(visual geometry group)16的参数数量分别压缩了5.9倍和11.4倍,且所需的训练迭代次数相对于动态网络剪枝策略更少。另外对于残差类型网络ResNet34和ResNet50也可以进行有效的压缩,其中对于ResNet50网络,在精度损失增加较小的情况下,相比目前最优的方法HRank实现了更大的压缩率(2.1倍)。结论 基于概率分布的贪婪剪枝策略解决了深度神经网络剪枝的不确定性问题,进一步提高了模型压缩后网络的稳定性,在实现压缩网络模型参数数量的同时保证了模型的准确率。  相似文献   

12.
Decision trees are well-known and established models for classification and regression. In this paper, we focus on the estimation and the minimization of the misclassification rate of decision tree classifiers. We apply Lidstone’s Law of Succession for the estimation of the class probabilities and error rates. In our work, we take into account not only the expected values of the error rate, which has been the norm in existing research, but also the corresponding reliability (measured by standard deviations) of the error rate. Based on this estimation, we propose an efficient pruning algorithm, called k-norm pruning, that has a clear theoretical interpretation, is easily implemented, and does not require a validation set. Our experiments show that our proposed pruning algorithm produces accurate trees quickly, and compares very favorably with two other well-known pruning algorithms, CCP of CART and EBP of C4.5. Editor: Hendrik Blockeel.  相似文献   

13.
Classification trees with neural network feature extraction   总被引:2,自引:0,他引:2  
The ideal use of small multilayer nets at the decision nodes of a binary classification tree to extract nonlinear features is proposed. The nets are trained and the tree is grown using a gradient-type learning algorithm in the multiclass case. The method improves on standard classification tree design methods in that it generally produces trees with lower error rates and fewer nodes. It also reduces the problems associated with training large unstructured nets and transfers the problem of selecting the size of the net to the simpler problem of finding a tree of the right size. An efficient tree pruning algorithm is proposed for this purpose. Trees constructed with the method and the CART method are compared on a waveform recognition problem and a handwritten character recognition problem. The approach demonstrates significant decrease in error rate and tree size. It also yields comparable error rates and shorter training times than a large multilayer net trained with backpropagation on the same problems.  相似文献   

14.
Inductive Logic Programming (ILP) combines rule-based and statistical artificial intelligence methods, by learning a hypothesis comprising a set of rules given background knowledge and constraints for the search space. We focus on extending the XHAIL algorithm for ILP which is based on Answer Set Programming and we evaluate our extensions using the Natural Language Processing application of sentence chunking. With respect to processing natural language, ILP can cater for the constant change in how we use language on a daily basis. At the same time, ILP does not require huge amounts of training examples such as other statistical methods and produces interpretable results, that means a set of rules, which can be analysed and tweaked if necessary. As contributions we extend XHAIL with (i) a pruning mechanism within the hypothesis generalisation algorithm which enables learning from larger datasets, (ii) a better usage of modern solver technology using recently developed optimisation methods, and (iii) a time budget that permits the usage of suboptimal results. We evaluate these improvements on the task of sentence chunking using three datasets from a recent SemEval competition. Results show that our improvements allow for learning on bigger datasets with results that are of similar quality to state-of-the-art systems on the same task. Moreover, we compare the hypotheses obtained on datasets to gain insights on the structure of each dataset.  相似文献   

15.
Ensemble pruning deals with the reduction of base classifiers prior to combination in order to improve generalization and prediction efficiency. Existing ensemble pruning algorithms require much pruning time. This paper presents a fast pruning approach: pattern mining based ensemble pruning (PMEP). In this algorithm, the prediction results of all base classifiers are organized as a transaction database, and FP-Tree structure is used to compact the prediction results. Then a greedy pattern mining method is explored to find the ensemble of size k. After obtaining the ensembles of all possible sizes, the one with the best accuracy is outputted. Compared with Bagging, GASEN, and Forward Selection, experimental results show that PMEP achieves the best prediction accuracy and keeps the size of the final ensemble small, more importantly, its pruning time is much less than other ensemble pruning algorithms.  相似文献   

16.
The C4.5 decision tree (DT) can be applied in various fields and discovers knowledge for human understanding. However, different problems typically require different parameter settings. Rule of thumb or trial-and-error methods are generally utilized to determine parameter settings. However, these methods may result in poor parameter settings and unsatisfactory results. On the other hand, although a dataset can contain numerous features, not all features are beneficial for classification in C4.5 algorithm. Therefore, a novel scatter search-based approach (SS + DT) is proposed to acquire optimal parameter settings and to select the beneficial subset of features that result in better classification results. To evaluate the efficiency of the proposed SS + DT approach, datasets in the UCI (University of California, Irvine) Machine Learning Repository are utilized to assess the performance of the proposed approach. Experimental results demonstrate that the parameter settings for the C4.5 algorithm obtained by the SS + DT approach are better than those obtained by other approaches. When feature selection is considered, classification accuracy rates on most datasets are increased. Therefore, the proposed approach can be utilized to identify effectively the best parameter settings for C4.5 algorithm and useful features.  相似文献   

17.
An iterative pruning algorithm for feedforward neural networks   总被引:7,自引:0,他引:7  
The problem of determining the proper size of an artificial neural network is recognized to be crucial, especially for its practical implications in such important issues as learning and generalization. One popular approach for tackling this problem is commonly known as pruning and it consists of training a larger than necessary network and then removing unnecessary weights/nodes. In this paper, a new pruning method is developed, based on the idea of iteratively eliminating units and adjusting the remaining weights in such a way that the network performance does not worsen over the entire training set. The pruning problem is formulated in terms of solving a system of linear equations, and a very efficient conjugate gradient algorithm is used for solving it, in the least-squares sense. The algorithm also provides a simple criterion for choosing the units to be removed, which has proved to work well in practice. The results obtained over various test problems demonstrate the effectiveness of the proposed approach.  相似文献   

18.
A critical issue in classification tree design-obtaining right-sized trees, i.e. trees which neither underfit nor overfit the data-is addressed. Instead of stopping rules to halt partitioning, the approach of growing a large tree with pure terminal nodes and selectively pruning it back is used. A new efficient iterative method is proposed to grow and prune classification trees. This method divides the data sample into two subsets and iteratively grows a tree with one subset and prunes it with the other subset, successively interchanging the roles of the two subsets. The convergence and other properties of the algorithm are established. Theoretical and practical considerations suggest that the iterative free growing and pruning algorithm should perform better and require less computation than other widely used tree growing and pruning algorithms. Numerical results on a waveform recognition problem are presented to support this view  相似文献   

19.
Probability trees are decision trees that predict class probabilities rather than the most likely class. The pruning criterion used to learn a probability tree strongly influences the size of the tree and thereby also the quality of its probability estimates. While the effect of pruning criteria on classification accuracy is well-studied, only recently has there been more interest in the effect on probability estimates. Hence, it is currently unclear which pruning criteria for probability trees are preferable under which circumstances. In this paper we survey six of the most important pruning criteria for probability trees, and discuss their theoretical advantages and disadvantages. We also perform an extensive experimental study of the relative performance of these pruning criteria. The main conclusion is that overall a pruning criterion based on randomization tests performs best because it is most robust to extreme data characteristics (such as class skew or a high number of classes). We also identify and explain several shortcomings of the other pruning criteria.  相似文献   

20.
We propose a task allocation algorithm that aims at finding an optimal task assignment for any parallel programs on a given machine configuration. The theme of the approach is to traverse a state–space tree that enumerates all possible task assignments. The efficiency of the task allocation algorithm comes from that we apply a pruning rule on each traversed state to check whether traversal of a given sub-tree is required by taking advantage of dominance relation and task clustering heuristics. The pruning rules try to eliminate partial assignments that violate the clustering of tasks, but still keeping some optimal assignments in the future search space. In contrast to previous state–space searching methods for task allocation, the proposed pruning rules significantly reduce the time and space required to obtain an optimal assignment and lead the traversal to a near optimal assignment in a small number of states. Experimental evaluation shows that the pruning rules make the state–space searching approach feasible for practical use.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号