首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 24 毫秒
1.
Minimal Learning Machine (MLM) is a recently proposed supervised learning algorithm with performance comparable to most state-of-the-art machine learning methods. In this work, we propose ensemble methods for classification and regression using MLMs. The goal of ensemble strategies is to produce more robust and accurate models when compared to a single classifier or regression model. Despite its successful application, MLM employs a computationally intensive optimization problem as part of its test procedure (out-of-sample data estimation). This becomes even more noticeable in the context of ensemble learning, where multiple models are used. Aiming to provide fast alternatives to the standard MLM, we also propose the Nearest Neighbor Minimal Learning Machine and the Cubic Equation Minimal Learning Machine to cope with classification and single-output regression problems, respectively. The experimental assessment conducted on real-world datasets reports that ensemble of fast MLMs perform comparably or superiorly to reference machine learning algorithms.  相似文献   

2.
An ensemble in machine learning is defined as a set of models (such as classifiers or predictors) that are induced individually from data by using one or more machine learning algorithms for a given task and then work collectively in the hope of generating improved decisions. In this paper we investigate the factors that influence ensemble performance, which mainly include accuracy of individual classifiers, diversity between classifiers, the number of classifiers in an ensemble and the decision fusion strategy. Among them, diversity is believed to be a key factor but more complex and difficult to be measured quantitatively, and it was thus chosen as the focus of this study, together with the relationships between the other factors. A technique was devised to build ensembles with decision trees that are induced with randomly selected features. Three sets of experiments were performed using 12 benchmark datasets, and the results indicate that (i) a high level of diversity indeed makes an ensemble more accurate and robust compared with individual models; (ii) small ensembles can produce results as good as, or better than, large ensembles provided the appropriate (e.g. more diverse) models are selected for the inclusion. This has implications that for scaling up to larger databases the increased efficiency of smaller ensembles becomes more significant and beneficial. As a test case study, ensembles are built based on these findings for a real world application—osteoporosis classification, and found that, in each case of three datasets used, the ensembles out-performed individual decision trees consistently and reliably.  相似文献   

3.
支特向量机是一种新的机器学习方法,已成功地应用于模式分类、回归分析和密度估计等问题中.本文依据统计学习理论和最优化理论建立了线性支特向量机的无约束优化模型,并给出了一种有效的近似解法一极大熵方法,为求解支持向量机优化问题提供了一种新途径,本文方法特别易于计算机实现。数值实验结果表明了模型和算法的可行性和有效性.  相似文献   

4.
Improving accuracies of machine learning algorithms is vital in designing high performance computer-aided diagnosis (CADx) systems. Researches have shown that a base classifier performance might be enhanced by ensemble classification strategies. In this study, we construct rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to evaluate their classification performances using Parkinson's, diabetes and heart diseases from literature.While making experiments, first the feature dimension of three datasets is reduced using correlation based feature selection (CFS) algorithm. Second, classification performances of 30 machine learning algorithms are calculated for three datasets. Third, 30 classifier ensembles are constructed based on RF algorithm to assess performances of respective classifiers with the same disease data. All the experiments are carried out with leave-one-out validation strategy and the performances of the 60 algorithms are evaluated using three metrics; classification accuracy (ACC), kappa error (KE) and area under the receiver operating characteristic (ROC) curve (AUC).Base classifiers succeeded 72.15%, 77.52% and 84.43% average accuracies for diabetes, heart and Parkinson's datasets, respectively. As for RF classifier ensembles, they produced average accuracies of 74.47%, 80.49% and 87.13% for respective diseases.RF, a newly proposed classifier ensemble algorithm, might be used to improve accuracy of miscellaneous machine learning algorithms to design advanced CADx systems.  相似文献   

5.
This paper performs an exploratory study of the use of metaheuristic optimization techniques to select important parameters (features and members) in the design of ensemble of classifiers. In order to do this, an empirical investigation, using 10 different optimization techniques applied to 23 classification problems, will be performed. Furthermore, we will analyze the performance of both mono and multi-objective versions of these techniques, using all different combinations of three objectives, classification error as well as two important diversity measures to ensembles, which are good and bad diversity measures. Additionally, the optimization techniques will also have to select members for heterogeneous ensembles, using k-NN, Decision Tree and Naive Bayes as individual classifiers and they are all combined using the majority vote technique. The main aim of this study is to define which optimization techniques obtained the best results in the context of mono and multi-objective as well as to provide a comparison with classical ensemble techniques, such as bagging, boosting and random forest. Our findings indicated that three optimization techniques, Memetic, SA and PSO, provided better performance than the other optimization techniques as well as traditional ensemble generator (bagging, boosting and random forest).  相似文献   

6.
《Information Fusion》2008,9(1):4-20
Broad classes of statistical classification algorithms have been developed and applied successfully to a wide range of real-world domains. In general, ensuring that the particular classification algorithm matches the properties of the data is crucial in providing results that meet the needs of the particular application domain. One way in which the impact of this algorithm/application match can be alleviated is by using ensembles of classifiers, where a variety of classifiers (either different types of classifiers or different instantiations of the same classifier) are pooled before a final classification decision is made. Intuitively, classifier ensembles allow the different needs of a difficult problem to be handled by classifiers suited to those particular needs. Mathematically, classifier ensembles provide an extra degree of freedom in the classical bias/variance tradeoff, allowing solutions that would be difficult (if not impossible) to reach with only a single classifier. Because of these advantages, classifier ensembles have been applied to many difficult real-world problems. In this paper, we survey select applications of ensemble methods to problems that have historically been most representative of the difficulties in classification. In particular, we survey applications of ensemble methods to remote sensing, person recognition, one vs. all recognition, and medicine.  相似文献   

7.
When applying machine learning technology to real-world applications, such as visual quality inspection, several practical issues need to be taken care of. One problem is posed by the reality that usually there are multiple human operators doing the inspection, who will inevitably contradict each other for some of the products to be inspected. In this paper an architecture for learning visual quality inspection is proposed which can be trained by multiple human operators, based on trained ensembles of classifiers. Most of the applicable ensemble techniques have however difficulties learning in these circumstances. In order to effectively train the system a novel ensemble framework is proposed as an enhancement of the grading ensemble technique—called active grading. The active grading algorithms are evaluated on data obtained from a real-world industrial system for visual quality inspection of the printing of labels on CDs, which was labelled independently by four different human operators and their supervisor, and compared to the standard grading algorithm and a range of other ensemble (classifier fusion) techniques.  相似文献   

8.
Classifier performance optimization in machine learning can be stated as a multi-objective optimization problem. In this context, recent works have shown the utility of simple evolutionary multi-objective algorithms (NSGA-II, SPEA2) to conveniently optimize the global performance of different anti-spam filters. The present work extends existing contributions in the spam filtering domain by using three novel indicator-based (SMS-EMOA, CH-EMOA) and decomposition-based (MOEA/D) evolutionary multi-objective algorithms. The proposed approaches are used to optimize the performance of a heterogeneous ensemble of classifiers into two different but complementary scenarios: parsimony maximization and e-mail classification under low confidence level. Experimental results using a publicly available standard corpus allowed us to identify interesting conclusions regarding both the utility of rule-based classification filters and the appropriateness of a three-way classification system in the spam filtering domain.  相似文献   

9.
In many applications of information systems learning algorithms have to act in dynamic environments where data are collected in the form of transient data streams. Compared to static data mining, processing streams imposes new computational requirements for algorithms to incrementally process incoming examples while using limited memory and time. Furthermore, due to the non-stationary characteristics of streaming data, prediction models are often also required to adapt to concept drifts. Out of several new proposed stream algorithms, ensembles play an important role, in particular for non-stationary environments. This paper surveys research on ensembles for data stream classification as well as regression tasks. Besides presenting a comprehensive spectrum of ensemble approaches for data streams, we also discuss advanced learning concepts such as imbalanced data streams, novelty detection, active and semi-supervised learning, complex data representations and structured outputs. The paper concludes with a discussion of open research problems and lines of future research.  相似文献   

10.
Urban traffic congestion prediction is a very hot topic due to the environmental and economical impacts that currently implies. In this sense, to be able to predict bottlenecks and to provide alternatives to the circulation of vehicles becomes an essential task for traffic management. A novel methodology, based on ensembles of machine learning algorithms, is proposed to predict traffic congestion in this paper. In particular, a set of seven algorithms of machine learning has been selected to prove their effectiveness in the traffic congestion prediction. Since all the seven algorithms are able to address supervised classification, the methodology has been developed to be used as a binary classification problem. Thus, collected data from sensors located at the Spanish city of Seville are analyzed and models reaching up to 83 % are generated.  相似文献   

11.
Ensemble classification – combining the results of a set of base learners – has received much attention in the machine learning community and has demonstrated promising capabilities in improving classification accuracy. Compared with neural network or decision tree ensembles, there is no comprehensive empirical research in support vector machine (SVM) ensembles. To fill this void, this paper analyses and compares SVM ensembles with four different ensemble constructing techniques, namely bagging, AdaBoost, Arc-X4 and a modified AdaBoost. Twenty real-world data sets from the UCI repository are used as benchmarks to evaluate and compare the performance of these SVM ensemble classifiers by their classification accuracy. Different kernel functions and different numbers of base SVM learners are tested in the ensembles. The experimental results show that although SVM ensembles are not always better than a single SVM, the SVM bagged ensemble performs as well or better than other methods with a relatively higher generality, particularly SVMs with a polynomial kernel function. Finally, an industrial case study of gear defect detection is conducted to validate the empirical analysis results.  相似文献   

12.
杨菊  袁玉龙  于化龙 《计算机科学》2016,43(10):266-271
针对现有极限学习机集成学习算法分类精度低、泛化能力差等缺点,提出了一种基于蚁群优化思想的极限学习机选择性集成学习算法。该算法首先通过随机分配隐层输入权重和偏置的方法生成大量差异的极限学习机分类器,然后利用一个二叉蚁群优化搜索算法迭代地搜寻最优分类器组合,最终使用该组合分类测试样本。通过12个标准数据集对该算法进行了测试,该算法在9个数据集上获得了最优结果,在另3个数据集上获得了次优结果。采用该算法可显著提高分类精度与泛化性能。  相似文献   

13.
《Information Fusion》2009,10(2):150-162
Information fusion research has recently focused on the characteristics of the decision profiles of ensemble members in order to optimize performance. These characteristics are particularly important in the selection of ensemble members. However, even though the control of overfitting is a challenge in machine learning problems, much less work has been devoted to the control of overfitting in selection tasks. The objectives of this paper are: (1) to show that overfitting can be detected at the selection stage; and (2) to present strategies to control overfitting. Decision trees and k nearest neighbors classifiers are used to create homogeneous ensembles, while single- and multi-objective genetic algorithms are employed as search algorithms at the selection stage. In this study, we use bagging and random subspace methods for ensemble generation. The classification error rate and a set of diversity measures are applied as search criteria. We show experimentally that the selection of classifier ensembles conducted by genetic algorithms is prone to overfitting, especially in the multi-objective case. In this study, the partial validation, backwarding and global validation strategies are tailored for classifier ensemble selection problem and compared. This comparison allows us to show that a global validation strategy should be applied to control overfitting in pattern recognition systems involving an ensemble member selection task. Furthermore, this study has helped us to establish that the global validation strategy can be used to measure the relationship between diversity and classification performance when diversity measures are employed as single-objective functions.  相似文献   

14.
Bagging and boosting negatively correlated neural networks.   总被引:2,自引:0,他引:2  
In this paper, we propose two cooperative ensemble learning algorithms, i.e., NegBagg and NegBoost, for designing neural network (NN) ensembles. The proposed algorithms incrementally train different individual NNs in an ensemble using the negative correlation learning algorithm. Bagging and boosting algorithms are used in NegBagg and NegBoost, respectively, to create different training sets for different NNs in the ensemble. The idea behind using negative correlation learning in conjunction with the bagging/boosting algorithm is to facilitate interaction and cooperation among NNs during their training. Both NegBagg and NegBoost use a constructive approach to automatically determine the number of hidden neurons for NNs. NegBoost also uses the constructive approach to automatically determine the number of NNs for the ensemble. The two algorithms have been tested on a number of benchmark problems in machine learning and NNs, including Australian credit card assessment, breast cancer, diabetes, glass, heart disease, letter recognition, satellite, soybean, and waveform problems. The experimental results show that NegBagg and NegBoost require a small number of training epochs to produce compact NN ensembles with good generalization.  相似文献   

15.
Ensemble methods aim at combining multiple learning machines to improve the efficacy in a learning task in terms of prediction accuracy, scalability, and other measures. These methods have been applied to evolutionary machine learning techniques including learning classifier systems (LCSs). In this article, we first propose a conceptual framework that allows us to appropriately categorize ensemble‐based methods for fair comparison and highlights the gaps in the corresponding literature. The framework is generic and consists of three sequential stages: a pre‐gate stage concerned with data preparation; the member stage to account for the types of learning machines used to build the ensemble; and a post‐gate stage concerned with the methods to combine ensemble output. A taxonomy of LCSs‐based ensembles is then presented using this framework. The article then focuses on comparing LCS ensembles that use feature selection in the pre‐gate stage. An evaluation methodology is proposed to systematically analyze the performance of these methods. Specifically, random feature sampling and rough set feature selection‐based LCS ensemble methods are compared. Experimental results show that the rough set‐based approach performs significantly better than the random subspace method in terms of classification accuracy in problems with high numbers of irrelevant features. The performance of the two approaches are comparable in problems with high numbers of redundant features.  相似文献   

16.
Real-time and reliable measurements of the effluent quality are essential to improve operating efficiency and reduce energy consumption for the wastewater treatment process.Due to the low accuracy and unstable performance of the traditional effluent quality measurements,we propose a selective ensemble extreme learning machine modeling method to enhance the effluent quality predictions.Extreme learning machine algorithm is inserted into a selective ensemble frame as the component model since it runs much faster and provides better generalization performance than other popular learning algorithms.Ensemble extreme learning machine models overcome variations in different trials of simulations for single model.Selective ensemble based on genetic algorithm is used to further exclude some bad components from all the available ensembles in order to reduce the computation complexity and improve the generalization performance.The proposed method is verified with the data from an industrial wastewater treatment plant,located in Shenyang,China.Experimental results show that the proposed method has relatively stronger generalization and higher accuracy than partial least square,neural network partial least square,single extreme learning machine and ensemble extreme learning machine model.  相似文献   

17.
Multi-label classification (MLC) tasks are encountered more and more frequently in machine learning applications. While MLC methods exist for the classical batch setting, only a few methods are available for streaming setting. In this paper, we propose a new methodology for MLC via multi-target regression in a streaming setting. Moreover, we develop a streaming multi-target regressor iSOUP-Tree that uses this approach. We experimentally compare two variants of the iSOUP-Tree method (building regression and model trees), as well as ensembles of iSOUP-Trees with state-of-the-art tree and ensemble methods for MLC on data streams. We evaluate these methods on a variety of measures of predictive performance (appropriate for the MLC task). The ensembles of iSOUP-Trees perform significantly better on some of these measures, especially the ones based on label ranking, and are not significantly worse than the competitors on any of the remaining measures. We identify the thresholding problem for the task of MLC on data streams as a key issue that needs to be addressed in order to obtain even better results in terms of predictive performance.  相似文献   

18.
Optimization is at the core of control theory and appears in several areas of this field, such as optimal control, distributed control, system identification, robust control, state estimation, model predictive control and dynamic programming. The recent advances in various topics of modern optimization have also been revamping the area of machine learning. Motivated by the crucial role of optimization theory in the design, analysis, control and operation of real-world systems, this tutorial paper offers a detailed overview of some major advances in this area, namely conic optimization and its emerging applications. First, we discuss the importance of conic optimization in different areas. Then, we explain seminal results on the design of hierarchies of convex relaxations for a wide range of nonconvex problems. Finally, we study different numerical algorithms for large-scale conic optimization problems.  相似文献   

19.
Learning classifier systems (LCS) are machine learning systems designed to work for both multi-step and single-step decision tasks. The latter case presents an interesting challenge for such algorithms, especially when they are applied to real-world data mining (DM) problems. The present investigation departs from the popular approach of applying accuracy-based LCS to single-step classification and aims to uncover the potential of strength-based LCS in such tasks. Although the latter family of algorithms have often been associated with poor generalization and performance, we aim at alleviating these problems by defining appropriate extensions to the traditional strength-based LCS framework. These extensions are detailed and their effect on system performance is studied through the application of the proposed algorithm on a set of artificial problems, designed to challenge its scalability and generalization abilities. The comparison of the proposed algorithm with UCS, its state-of-the-art accuracy-based counterpart, emphasizes the effects of our extended strength-based approach and validates its competitiveness in multi-class problems with various class distributions. Overall, our work presents an investigation of strength-based LCS in the domain of supervised classification. Our extensive analysis of the learning dynamics involved in these systems provides proof of their potential as real-world DM tools, inducing tractable rule-based classification models, even in the presence of severe class imbalances.  相似文献   

20.
This paper presents a cooperative coevolutive approach for designing neural network ensembles. Cooperative coevolution is a recent paradigm in evolutionary computation that allows the effective modeling of cooperative environments. Although theoretically, a single neural network with a sufficient number of neurons in the hidden layer would suffice to solve any problem, in practice many real-world problems are too hard to construct the appropriate network that solve them. In such problems, neural network ensembles are a successful alternative. Nevertheless, the design of neural network ensembles is a complex task. In this paper, we propose a general framework for designing neural network ensembles by means of cooperative coevolution. The proposed model has two main objectives: first, the improvement of the combination of the trained individual networks; second, the cooperative evolution of such networks, encouraging collaboration among them, instead of a separate training of each network. In order to favor the cooperation of the networks, each network is evaluated throughout the evolutionary process using a multiobjective method. For each network, different objectives are defined, considering not only its performance in the given problem, but also its cooperation with the rest of the networks. In addition, a population of ensembles is evolved, improving the combination of networks and obtaining subsets of networks to form ensembles that perform better than the combination of all the evolved networks. The proposed model is applied to ten real-world classification problems of a very different nature from the UCI machine learning repository and proben1 benchmark set. In all of them the performance of the model is better than the performance of standard ensembles in terms of generalization error. Moreover, the size of the obtained ensembles is also smaller.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号