首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Detection of malware using data mining techniques has been explored extensively. Techniques used for detecting malware based on structural features rely on being able to identify anomalies in the structure of executable files. The structural attributes of an executable that can be extracted include byte ngrams, Portable Executable (PE) features, API call sequences and Strings. After a thorough analysis we have extracted various features from executable files and applied it on an ensemble of classifiers to efficiently detect malware. Ensemble methods combine several individual pattern classifiers in order to achieve better classification. The challenge is to choose the minimal number of classifiers that achieve the best performance. An ensemble that contains too many members might incur large storage requirements and even reduce the classification performance. Hence the goal of ensemble pruning is to identify a subset of ensemble members that performs at least as good as the original ensemble and discard any other members.  相似文献   

2.
Non-parametric classification procedures based on a certainty measure and nearest neighbour rule for motor unit potential classification (MUP) during electromyographic (EMG) signal decomposition were explored. A diversity-based classifier fusion approach is developed and evaluated to achieve improved classification performance. The developed system allows the construction of a set of non-parametric base classifiers and then automatically chooses, from the pool of base classifiers, subsets of classifiers to form candidate classifier ensembles. The system selects the classifier ensemble members by exploiting a diversity measure for selecting classifier teams. The kappa statistic is used as the diversity measure to estimate the level of agreement between base classifier outputs, i.e., to measure the degree of decision similarity between base classifiers. The pool of base classifiers consists of two kinds of classifiers: adaptive certainty-based classifiers (ACCs) and adaptive fuzzy k-NN classifiers (AFNNCs) and both utilize different types of features. Once the patterns are assigned to their classes, by the classifier fusion system, firing pattern consistency statistics for each class are calculated to detect classification errors in an adaptive fashion. Performance of the developed system was evaluated using real and simulated EMG signals and was compared with the performance of the constituent base classifiers and the performance of the fixed ensemble containing the full set of base classifiers. Across the EMG signal data sets used, the diversity-based classifier fusion approach had better average classification performance overall, especially in terms of reducing classification errors.  相似文献   

3.
The primary effect of using a reduced number of classifiers is a reduction in the computational requirements during learning and classification time. In addition to this obvious result, research shows that the fusion of all available classifiers is not a guarantee of best performance but good results on the average. The much researched issue of whether it is more convenient to fuse or to select has become even more of interest in recent years with the development of the Online Boosting theory, where a limited set of classifiers is continuously updated as new inputs are observed and classifications performed. The concept of online classification has recently received significant interest in the computer vision community. Classifiers can be trained on the visual features of a target, casting the tracking problem into a binary classification one: distinguishing the target from the background.Here we discuss how to optimize the performance of a classifier ensemble employed for target tracking in video sequences. In particular, we propose the F-score measure as a novel means to select the members of the ensemble in a dynamic fashion. For each frame, the ensemble is built as a subset of a larger pool of classifiers selecting its members according to their F-score. We observed an overall increase in classification accuracy and a general tendency in redundancy reduction among the members of an f-score optimized ensemble. We carried out our experiments both on benchmark binary datasets and standard video sequences.  相似文献   

4.
Many techniques have been proposed for credit risk assessment, from statistical models to artificial intelligence methods. During the last few years, different approaches to classifier ensembles have successfully been applied to credit scoring problems, demonstrating to be more accurate than single prediction models. However, it is still a question what base classifiers should be employed in each ensemble in order to achieve the highest performance. Accordingly, the present paper evaluates the performance of seven individual prediction techniques when used as members of five different ensemble methods. The ultimate aim of this study is to suggest appropriate classifiers for each ensemble approach in the context of credit scoring. The experimental results and statistical tests show that the C4.5 decision tree constitutes the best solution for most ensemble methods, closely followed by the multilayer perceptron neural network and logistic regression, whereas the nearest neighbour and the naive Bayes classifiers appear to be significantly the worst.  相似文献   

5.
集成学习被广泛用于提高分类精度, 近年来的研究表明, 通过多模态扰乱策略来构建集成分类器可以进一步提高分类性能. 本文提出了一种基于近似约简与最优采样的集成剪枝算法(EPA_AO). 在EPA_AO中, 我们设计了一种多模态扰乱策略来构建不同的个体分类器. 该扰乱策略可以同时扰乱属性空间和训练集, 从而增加了个体分类器的多样性. 我们利用证据KNN (K-近邻)算法来训练个体分类器, 并在多个UCI数据集上比较了EPA_AO与现有同类型算法的性能. 实验结果表明, EPA_AO是一种有效的集成学习方法.  相似文献   

6.
Recent researches in fault classification have shown the importance of accurately selecting the features that have to be used as inputs to the diagnostic model. In this work, a multi-objective genetic algorithm (MOGA) is considered for the feature selection phase. Then, two different techniques for using the selected features to develop the fault classification model are compared: a single classifier based on the feature subset with the best classification performance and an ensemble of classifiers working on different feature subsets. The motivation for developing ensembles of classifiers is that they can achieve higher accuracies than single classifiers. An important issue for an ensemble to be effective is the diversity in the predictions of the base classifiers which constitute it, i.e. their capability of erring on different sub-regions of the pattern space. In order to show the benefits of having diverse base classifiers in the ensemble, two different ensembles have been developed: in the first, the base classifiers are constructed on feature subsets found by MOGAs aimed at maximizing the fault classification performance and at minimizing the number of features of the subsets; in the second, diversity among classifiers is added to the MOGA search as the third objective function to maximize. In both cases, a voting technique is used to effectively combine the predictions of the base classifiers to construct the ensemble output. For verification, some numerical experiments are conducted on a case of multiple-fault classification in rotating machinery and the results achieved by the two ensembles are compared with those obtained by a single optimal classifier.  相似文献   

7.
This paper performs an exploratory study of the use of metaheuristic optimization techniques to select important parameters (features and members) in the design of ensemble of classifiers. In order to do this, an empirical investigation, using 10 different optimization techniques applied to 23 classification problems, will be performed. Furthermore, we will analyze the performance of both mono and multi-objective versions of these techniques, using all different combinations of three objectives, classification error as well as two important diversity measures to ensembles, which are good and bad diversity measures. Additionally, the optimization techniques will also have to select members for heterogeneous ensembles, using k-NN, Decision Tree and Naive Bayes as individual classifiers and they are all combined using the majority vote technique. The main aim of this study is to define which optimization techniques obtained the best results in the context of mono and multi-objective as well as to provide a comparison with classical ensemble techniques, such as bagging, boosting and random forest. Our findings indicated that three optimization techniques, Memetic, SA and PSO, provided better performance than the other optimization techniques as well as traditional ensemble generator (bagging, boosting and random forest).  相似文献   

8.
蔡铁  伍星  李烨 《计算机应用》2008,28(8):2091-2093
为构造集成学习中具有差异性的基分类器,提出基于数据离散化的基分类器构造方法,并用于支持向量机集成。该方法采用粗糙集和布尔推理离散化算法处理训练样本集,能有效删除不相关和冗余的属性,提高基分类器的准确性和差异性。实验结果表明,所提方法能取得比传统集成学习算法Bagging和Adaboost更好的性能。  相似文献   

9.
Ensemble learning has attracted considerable attention owing to its good generalization performance. The main issues in constructing a powerful ensemble include training a set of diverse and accurate base classifiers, and effectively combining them. Ensemble margin, computed as the difference of the vote numbers received by the correct class and the another class received with the most votes, is widely used to explain the success of ensemble learning. This definition of the ensemble margin does not consider the classification confidence of base classifiers. In this work, we explore the influence of the classification confidence of the base classifiers in ensemble learning and obtain some interesting conclusions. First, we extend the definition of ensemble margin based on the classification confidence of the base classifiers. Then, an optimization objective is designed to compute the weights of the base classifiers by minimizing the margin induced classification loss. Several strategies are tried to utilize the classification confidences and the weights. It is observed that weighted voting based on classification confidence is better than simple voting if all the base classifiers are used. In addition, ensemble pruning can further improve the performance of a weighted voting ensemble. We also compare the proposed fusion technique with some classical algorithms. The experimental results also show the effectiveness of weighted voting with classification confidence.  相似文献   

10.
In general, the analysis of microarray data requires two steps: feature selection and classification. From a variety of feature selection methods and classifiers, it is difficult to find optimal ensembles composed of any feature-classifier pairs. This paper proposes a novel method based on the evolutionary algorithm (EA) to form sophisticated ensembles of features and classifiers that can be used to obtain high classification performance. In spite of the exponential number of possible ensembles of individual feature-classifier pairs, an EA can produce the best ensemble in a reasonable amount of time. The chromosome is encoded with real values to decide the weight for each feature-classifier pair in an ensemble. Experimental results with two well-known microarray datasets in terms of time and classification rate indicate that the proposed method produces ensembles that are superior to individual classifiers, as well as other ensembles optimized by random and greedy strategies.  相似文献   

11.
Voting ensembles for spoken affect classification   总被引:1,自引:0,他引:1  
Affect or emotion classification from speech has much to benefit from ensemble classification methods. In this paper we apply a simple voting mechanism to an ensemble of classifiers and attain a modest performance increase compared to the individual classifiers. A natural emotional speech database was compiled from 11 speakers. Listener-judges were used to validate the emotional content of the speech. Thirty-eight prosody-based features correlating characteristics of speech with emotional states were extracted from the data. A classifier ensemble was designed using a multi-layer perceptron, support vector machine, K* instance-based learner, K-nearest neighbour, and random forest of decision trees. A simple voting scheme determined the most popular prediction. The accuracy of the ensemble is compared with the accuracies of the individual classifiers.  相似文献   

12.
Training neural networks in distinguishing different emotions from physiological signals frequently involves fuzzy definitions of each affective state. In addition, manual design of classification tasks often uses sub-optimum classifier parameter settings, leading to average classification performance. In this study, an attempt to create a framework for multi-layered optimization of an ensemble of classifiers to maximize the system's ability to learn and classify affect, and to minimize human involvement in setting optimum parameters for the classification system is proposed. Using fuzzy adaptive resonance theory mapping (ARTMAP) as the classifier template, genetic algorithms (GAs) were employed to perform exhaustive search for the best combination of parameter settings for individual classifier performance. Speciation was implemented using subset selection of classification data attributes, as well as using an island model genetic algorithms method. Subsequently, the generated population of optimum classifier configurations was used as candidates to form an ensemble of classifiers. Another set of GAs were used to search for the combination of classifiers that would result in the best classification ensemble accuracy. The proposed methodology was tested using two affective data sets and was able to produce relatively small ensembles of fuzzy ARTMAPs with excellent affect recognition accuracy.  相似文献   

13.
Predicting future stock index price movement has always been a fascinating research area both for the investors who wish to yield a profit by trading stocks and for the researchers who attempt to expose the buried information from the complex stock market time series data. This prediction problem can be addressed as a binary classification problem with two class labels, one for the increasing movement and other for the decreasing movement. In literature, a wide range of classifiers has been tested for this application. As the performance of individual classifier varies for a diverse dataset with respect to different performance measures, it is impractical to acknowledge a specific classifier to be the best one. Hence, designing an efficient classifier ensemble instead of an individual classifier is fetching increasing attention from many researchers. Again selection of base classifiers and deciding their preferences in ensemble with respect to a variety of performance criteria can be considered as a Multi Criteria Decision Making (MCDM) problem. In this paper, an integrated TOPSIS Crow Search based weighted voting classifier ensemble is proposed for stock index price movement prediction. Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS), one of the popular MCDM techniques, is suggested for ranking and selecting a set of base classifiers for the ensemble whereas the weights of the classifiers used in the ensemble are tuned by the Crow Search method. The proposed ensemble model is validated for prediction of stock index price over the historical prices of BSE SENSEX, S&P500 and NIFTY 50 stock indices. The model has shown better performance compared to individual classifiers and other ensemble models such as majority voting, weighted voting, differential evolution and particle swarm optimization based classifier ensemble.  相似文献   

14.
Ensemble of classifiers can improve classification accuracy by combining several models. The fusion method plays an important role in the ensemble performance. Usually, a criterion for weighting the decision of each ensemble member is adopted. Frequently, this can be done using some heuristic based on accuracy or confidence. Then, the used fusion rule must consider the established criterion for providing a most reliable ensemble output through a kind of competition among the ensemble members. This article presents a new ensemble fusion method, named centrality score-based fusion, which uses the centrality concept in the context of social network analysis (SNA) as a criterion for the ensemble decision. Centrality measures have been applied in the SNA to measure the importance of each person inside of a social network, taking into account the relationship of each person with all others. Thus, the idea is to derive the classifier weight considering the overall classifier prominence inside the ensemble network, which reflects the relationships among pairs of classifiers. We hypothesized that the prominent position of a classifier based on its pairwise relationship with the other ensemble members could be its weight in the fusion process. A robust experimental protocol has confirmed that centrality measures represent a promising strategy to weight the classifiers of an ensemble, showing that the proposed fusion method performed well against the literature.  相似文献   

15.
针对垃圾网页检测过程中轻微的不平衡分类问题,提出三种随机欠采样集成分类器算法,分别为一次不放回随机欠采样(RUS-once)、多次不放回随机欠采样(RUS-multiple)和有放回随机欠采样(RUS-replacement)算法。首先使用其中一种随机欠采样技术将训练样本集转换成平衡样本集,然后对每个平衡样本集使用分类回归树(CART)分类器算法进行分类,最后采用简单投票法构建集成分类器对测试样本进行分类。实验表明,三种随机欠采样集成分类器均取得了良好的分类效果,其中RUS-multiple和RUS-replacement比RUS-once的分类效果更好。与CART及其Bagging和Adaboost集成分类器相比,在WEBSPAM UK-2006数据集上,RUS-multiple和RUS-replacement方法的AUC指标值提高了10%左右,在WEBSPAM UK-2007数据集上,提高了25%左右;与其他最优研究结果相比,RUS-multiple和RUS-replacement方法在AUC指标上能达到最优分类结果。  相似文献   

16.
Kernel Matching Pursuit Classifier (KMPC), a novel classification machine in pattern recognition, has an excellent advantage in solving classification problems for the sparsity of the solution. Unfortunately, the performance of the KMPC is far from the theoretically expected level of it. Ensemble Methods are learning algorithms that construct a collection of individual classifiers which are independent and yet accurate, and then classify a new data point by taking vote of their predictions. In such a way, the performance of classifiers can be improved greatly. In this paper, on a thorough investigation into the principle of KMPC and Ensemble Method, we expatiate on the theory of KMPC ensemble and pointed out the ways to construct it. The experiments performed on the artificial data and UCI data show KMPC ensemble combines the advantages of KMPC with ensemble method, and improves classification performance remarkably.  相似文献   

17.
为了提高面部表情的分类识别性能,基于集成学习理论,提出了一种二次优化选择性(Quadratic Optimization Choice, QOC)集成分类模型。首先,对于9个基分类器,依据性能进行排序,选择前30%的基分类器作为集成模型的候选基分类器。其次,依据组合规则产生集成模型簇。最后,对集成模型簇进行二次优化选择,选择具有最小泛化误差的集成分类器的子集,从而确定最优集成分类模型。为了验证QOC集成分类模型的性能,选择采用最大值、最小值和均值规则的集成模型作为对比模型,实验结果表明:相对基分类器,QOC集成分类模型取得了较好的分类效果,尤其是对于识别率较差的悲伤表情类,平均识别率提升了21.11%。相对于非选择性集成模型,QOC集成分类模型识别性能也有显著提高。  相似文献   

18.
相比于集成学习,集成剪枝方法是在多个分类器中搜索最优子集从而改善分类器的泛化性能,简化集成过程。帕累托集成剪枝方法同时考虑了分类器的精准度及集成规模两个方面,并将二者均作为优化的目标。然而帕累托集成剪枝算法只考虑了基分类器的精准度与集成规模,忽视了分类器之间的差异性,从而导致了分类器之间的相似度比较大。本文提出了融入差异性的帕累托集成剪枝算法,该算法将分类器的差异性与精准度综合为第1个优化目标,将集成规模作为第2个优化目标,从而实现多目标优化。实验表明,当该改进的集成剪枝算法与帕累托集成剪枝算法在集成规模相当的前提下,由于差异性的融入该改进算法能够获得较好的性能。  相似文献   

19.
In this paper we introduce a framework for making statistical inference on the asymptotic prediction of parallel classification ensembles. The validity of the analysis is fairly general. It only requires that the individual classifiers are generated in independent executions of some randomized learning algorithm, and that the final ensemble prediction is made via majority voting. Given an unlabeled test instance, the predictions of the classifiers in the ensemble are obtained sequentially. As the individual predictions become known, Bayes' theorem is used to update an estimate of the probability that the class predicted by the current ensemble coincides with the classification of the corresponding ensemble of infinite size. Using this estimate, the voting process can be halted when the confidence on the asymptotic prediction is sufficiently high. An empirical investigation in several benchmark classification problems shows that most of the test instances require querying only a small number of classifiers to converge to the infinite ensemble prediction with a high degree of confidence. For these instances, the difference between the generalization error of the finite ensemble and the infinite ensemble limit is very small, often negligible.  相似文献   

20.
In the class imbalanced learning scenario, traditional machine learning algorithms focusing on optimizing the overall accuracy tend to achieve poor classification performance especially for the minority class in which we are most interested. To solve this problem, many effective approaches have been proposed. Among them, the bagging ensemble methods with integration of the under-sampling techniques have demonstrated better performance than some other ones including the bagging ensemble methods integrated with the over-sampling techniques, the cost-sensitive methods, etc. Although these under-sampling techniques promote the diversity among the generated base classifiers with the help of random partition or sampling for the majority class, they do not take any measure to ensure the individual classification performance, consequently affecting the achievability of better ensemble performance. On the other hand, evolutionary under-sampling EUS as a novel undersampling technique has been successfully applied in searching for the best majority class subset for training a good-performance nearest neighbor classifier. Inspired by EUS, in this paper, we try to introduce it into the under-sampling bagging framework and propose an EUS based bagging ensemble method EUS-Bag by designing a new fitness function considering three factors to make EUS better suited to the framework. With our fitness function, EUS-Bag could generate a set of accurate and diverse base classifiers. To verify the effectiveness of EUS-Bag, we conduct a series of comparison experiments on 22 two-class imbalanced classification problems. Experimental results measured using recall, geometric mean and AUC all demonstrate its superior performance.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号