首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This research aims to evaluate ensemble learning (bagging, boosting, and modified bagging) potential in predicting microbially induced concrete corrosion in sewer systems from the data mining (DM) perspective. Particular focus is laid on ensemble techniques for network-based DM methods, including multi-layer perceptron neural network (MLPNN) and radial basis function neural network (RBFNN) as well as tree-based DM methods, such as chi-square automatic interaction detector (CHAID), classification and regression tree (CART), and random forests (RF). Hence, an interdisciplinary approach is presented by combining findings from material sciences and hydrochemistry as well as data mining analyses to predict concrete corrosion. The effective factors on concrete corrosion such as time, gas temperature, gas-phase H2S concentration, relative humidity, pH, and exposure phase are considered as the models’ inputs. All 433 datasets are randomly selected to construct an individual model and twenty component models of boosting, bagging, and modified bagging based on training, validating, and testing for each DM base learners. Considering some model performance indices, (e.g., Root mean square error, RMSE; mean absolute percentage error, MAPE; correlation coefficient, r) the best ensemble predictive models are selected. The results obtained indicate that the prediction ability of the random forests DM model is superior to the other ensemble learners, followed by the ensemble Bag-CHAID method. On average, the ensemble tree-based models acted better than the ensemble network-based models; nevertheless, it was also found that taking the advantages of ensemble learning would enhance the general performance of individual DM models by more than 10%.  相似文献   

2.
Considerable research effort has been expended to identify more accurate models for decision support systems in financial decision domains including credit scoring and bankruptcy prediction. The focus of this earlier work has been to identify the “single best” prediction model from a collection that includes simple parametric models, nonparametric models that directly estimate data densities, and nonlinear pattern recognition models such as neural networks. Recent theories suggest this work may be misguided in that ensembles of predictors provide more accurate generalization than the reliance on a single model. This paper investigates three recent ensemble strategies: crossvalidation, bagging, and boosting. We employ the multilayer perceptron neural network as a base classifier. The generalization ability of the neural network ensemble is found to be superior to the single best model for three real world financial decision applications.  相似文献   

3.

Supply chain finance (SCF) becomes more important for small- and medium-sized enterprises (SMEs) due to global credit crunch, supply chain financing woes and tightening credit criteria for corporate lending. Currently, predicting SME credit risk is significant for guaranteeing SCF in smooth operation. In this paper, we apply six methods, i.e., one individual machine learning (IML, i.e., decision tree) method, three ensemble machine learning methods [EML, i.e., bagging, boosting, and random subspace (RS)], and two integrated ensemble machine learning methods (IEML, i.e., RS–boosting and multi-boosting), to predict SMEs credit risk in SCF and compare the effectiveness and feasibility of six methods. In the experiment, we choose the quarterly financial and non-financial data of 48 listed SMEs from Small and Medium Enterprise Board of Shenzhen Stock Exchange, six listed core enterprises (CEs) from Shanghai Stock Exchange and three listed CEs from Shenzhen Stock Exchange during the period of 2012–2013 as the empirical samples. Experimental results reveal that the IEML methods acquire better performance than IML and EML method. In particular, RS–boosting is the best method to predict SMEs credit risk among six methods.

  相似文献   

4.
This paper performs an exploratory study of the use of metaheuristic optimization techniques to select important parameters (features and members) in the design of ensemble of classifiers. In order to do this, an empirical investigation, using 10 different optimization techniques applied to 23 classification problems, will be performed. Furthermore, we will analyze the performance of both mono and multi-objective versions of these techniques, using all different combinations of three objectives, classification error as well as two important diversity measures to ensembles, which are good and bad diversity measures. Additionally, the optimization techniques will also have to select members for heterogeneous ensembles, using k-NN, Decision Tree and Naive Bayes as individual classifiers and they are all combined using the majority vote technique. The main aim of this study is to define which optimization techniques obtained the best results in the context of mono and multi-objective as well as to provide a comparison with classical ensemble techniques, such as bagging, boosting and random forest. Our findings indicated that three optimization techniques, Memetic, SA and PSO, provided better performance than the other optimization techniques as well as traditional ensemble generator (bagging, boosting and random forest).  相似文献   

5.
Banks provide a financial intermediary service by channeling funds efficiently between borrowers and lenders. Bank lending is subject to credit risk when loans are not paid back on a timely basis or are in default. The ability or possessing a methodology to evaluate the creditworthiness of a borrower is therefore crucial to managing the bank’s risk management and profitability.The aim of the paper is dichotomous classification of the individual borrowers to the groups of creditworthy or non-creditworthy clients. The recognition of borrowers is provided applying single and aggregated classification trees.Classification trees are a powerful alternative to the more traditional statistical models. This model has the advantage of being able to detect non-linear relationships and showing a good performance in presence of qualitative information as it happens in the creditworthiness evaluation of individual borrowers. As a result, they are widely used as base classifiers for ensemble methods.Aggregated classification trees are constructed employing two ensemble methods: Adaboost and bagging. AdaBoost constructs its base classifiers in sequence, updating a distribution over the training examples to create each base classifier. Bagging combines the individual classifiers built in bootstrap replicates of the training set.The research is conducted employing actual data regarding the individual borrowers that got a mortgage credit in one of the commercial banks that operate in Poland. Each of the clients is described by 11 variables. The grouping variable informs if the client pays off the credit regularly due to the credit agreement or he is back in loan redemption. Diagnostic variables describe the clients in terms of demographic features and characterize the credits that are to be paid back (i.e. value and currency of the credit, credit rate, etc.).  相似文献   

6.
Enterprise credit risk assessment has long been regarded as a critical topic and many statistical and intelligent methods have been explored for this issue. However there are no consistent conclusions on which methods are better. Recent researches suggest combining multiple classifiers, i.e., ensemble learning, may have a better performance. In this paper, we propose a new hybrid ensemble approach, called RSB-SVM, which is based on two popular ensemble strategies, i.e., bagging and random subspace and uses Support Vector Machine (SVM) as base learner. As there are two different factors, i.e., bootstrap selection of instances and random selection of features, encouraging diversity in RSB-SVM, it would be advantageous to get better performance. The enterprise credit risk dataset, which includes 239 companies’ financial records and is collected by the Industrial and Commercial Bank of China, is selected to demonstrate the effectiveness and feasibility of proposed method. Experimental results reveal that RSB-SVM can be used as an alternative method for enterprise credit risk assessment.  相似文献   

7.
This work aims to connect two rarely combined research directions, i.e., non-stationary data stream classification and data analysis with skewed class distributions. We propose a novel framework employing stratified bagging for training base classifiers to integrate data preprocessing and dynamic ensemble selection methods for imbalanced data stream classification. The proposed approach has been evaluated based on computer experiments carried out on 135 artificially generated data streams with various imbalance ratios, label noise levels, and types of concept drift as well as on two selected real streams. Four preprocessing techniques and two dynamic selection methods, used on both bagging classifiers and base estimators levels, were considered. Experimentation results showed that, for highly imbalanced data streams, dynamic ensemble selection coupled with data preprocessing could outperform online and chunk-based state-of-art methods.  相似文献   

8.
This paper addresses the task of identification of nonlinear dynamic systems from measured data. The discrete-time variant of this task is commonly reformulated as a regression problem. As tree ensembles have proven to be a successful predictive modeling approach, we investigate the use of tree ensembles for solving the regression problem. While different variants of tree ensembles have been proposed and used, they are mostly limited to using regression trees as base models. We introduce ensembles of fuzzified model trees with split attribute randomization and evaluate them for nonlinear dynamic system identification.Models of dynamic systems which are built for control purposes are usually evaluated by a more stringent evaluation procedure using the output, i.e., simulation error. Taking this into account, we perform ensemble pruning to optimize the output error of the tree ensemble models. The proposed Model-Tree Ensemble method is empirically evaluated by using input–output data disturbed by noise. It is compared to representative state-of-the-art approaches, on one synthetic dataset with artificially introduced noise and one real-world noisy data set. The evaluation shows that the method is suitable for modeling dynamic systems and produces models with comparable output error performance to the other approaches. Also, the method is resilient to noise, as its performance does not deteriorate even when up to 20% of noise is added.  相似文献   

9.
The decision tree method has grown fast in the past two decades and its performance in classification is promising. The tree-based ensemble algorithms have been used to improve the performance of an individual tree. In this study, we compared four basic ensemble methods, that is, bagging tree, random forest, AdaBoost tree and AdaBoost random tree in terms of the tree size, ensemble size, band selection (BS), random feature selection, classification accuracy and efficiency in ecological zone classification in Clark County, Nevada, through multi-temporal multi-source remote-sensing data. Furthermore, two BS schemes based on feature importance of the bagging tree and AdaBoost tree were also considered and compared. We conclude that random forest or AdaBoost random tree can achieve accuracies at least as high as bagging tree or AdaBoost tree with higher efficiency; and although bagging tree and random forest can be more efficient, AdaBoost tree and AdaBoost random tree can provide a significantly higher accuracy. All ensemble methods provided significantly higher accuracies than the single decision tree. Finally, our results showed that the classification accuracy could increase dramatically by combining multi-temporal and multi-source data set.  相似文献   

10.
With the rapid growth and increased competition in credit industry, the corporate credit risk prediction is becoming more important for credit-granting institutions. In this paper, we propose an integrated ensemble approach, called RS-Boosting, which is based on two popular ensemble strategies, i.e., boosting and random subspace, for corporate credit risk prediction. As there are two different factors encouraging diversity in RS-Boosting, it would be advantageous to get better performance. Two corporate credit datasets are selected to demonstrate the effectiveness and feasibility of the proposed method. Experimental results reveal that RS-Boosting gets the best performance among seven methods, i.e., logistic regression analysis (LRA), decision tree (DT), artificial neural network (ANN), bagging, boosting and random subspace. All these results illustrate that RS-Boosting can be used as an alternative method for corporate credit risk prediction.  相似文献   

11.
12.
The prediction of bankruptcy for financial companies, especially banks, has been extensively researched area and creditors, auditors, stockholders and senior managers are all interested in bank bankruptcy prediction. In this paper, three common machine learning models namely Logistic, J48 and Voted Perceptron are used as the base learners. In addition, an attribute-base ensemble learning method namely Random Subspaces and two instance-base ensemble learning methods namely Bagging and Multi-Boosting are employed to enhance the prediction accuracy of conventional machine learning models for bank failure prediction. The models are grouped in the following families of approaches: (i) conventional machine learning models, (ii) ensemble learning models and (iii) hybrid ensemble learning models. Experimental results indicate a clear outperformance of hybrid ensemble machine learning models over conventional base and ensemble models. These results indicate that hybrid ensemble learning models can be used as a reliable predicting model for bank failures.  相似文献   

13.
集成学习算法的思想就是集成多个学习器,并组合它们的预测结果,以形成最终的结论。典型的学习模型组合方法有投票法,专家混合方法,堆叠泛化法与级联法,但这些方法的性能都有待进一步提高。提出了一种新颖的集成学习算法--增强的集成学习算法(ReinforcedEnsemble)。ReinforcedEnsemble集成算法由两大部分组成:ReinforcedEnsemble特征提取算法与ReinforcedEnsemble基分类器。通过实验,将ReinforcedEnsemble算法与其他集成学习算法进行了性能比较。实验结果表明,所提出的算法在多项指标上均达到最优。  相似文献   

14.
Constrained cascade generalization of decision trees   总被引:1,自引:0,他引:1  
While decision tree techniques have been widely used in classification applications, a shortcoming of many decision tree inducers is that they do not learn intermediate concepts, i.e., at each node, only one of the original features is involved in the branching decision. Combining other classification methods, which learn intermediate concepts, with decision tree inducers can produce more flexible decision boundaries that separate different classes, potentially improving classification accuracy. We propose a generic algorithm for cascade generalization of decision tree inducers with the maximum cascading depth as a parameter to constrain the degree of cascading. Cascading methods proposed in the past, i.e., loose coupling and tight coupling, are strictly special cases of this new algorithm. We have empirically evaluated the proposed algorithm using logistic regression and C4.5 as base inducers on 32 UCI data sets and found that neither loose coupling nor tight coupling is always the best cascading strategy and that the maximum cascading depth in the proposed algorithm can be tuned for better classification accuracy. We have also empirically compared the proposed algorithm and ensemble methods such as bagging and boosting and found that the proposed algorithm performs marginally better than bagging and boosting on the average.  相似文献   

15.
Both statistical techniques and Artificial Intelligence (AI) techniques have been explored for credit scoring, an important finance activity. Although there are no consistent conclusions on which ones are better, recent studies suggest combining multiple classifiers, i.e., ensemble learning, may have a better performance. In this study, we conduct a comparative assessment of the performance of three popular ensemble methods, i.e., Bagging, Boosting, and Stacking, based on four base learners, i.e., Logistic Regression Analysis (LRA), Decision Tree (DT), Artificial Neural Network (ANN) and Support Vector Machine (SVM). Experimental results reveal that the three ensemble methods can substantially improve individual base learners. In particular, Bagging performs better than Boosting across all credit datasets. Stacking and Bagging DT in our experiments, get the best performance in terms of average accuracy, type I error and type II error.  相似文献   

16.
We present attribute bagging (AB), a technique for improving the accuracy and stability of classifier ensembles induced using random subsets of features. AB is a wrapper method that can be used with any learning algorithm. It establishes an appropriate attribute subset size and then randomly selects subsets of features, creating projections of the training set on which the ensemble classifiers are built. The induced classifiers are then used for voting. This article compares the performance of our AB method with bagging and other algorithms on a hand-pose recognition dataset. It is shown that AB gives consistently better results than bagging, both in accuracy and stability. The performance of ensemble voting in bagging and the AB method as a function of the attribute subset size and the number of voters for both weighted and unweighted voting is tested and discussed. We also demonstrate that ranking the attribute subsets by their classification accuracy and voting using only the best subsets further improves the resulting performance of the ensemble.  相似文献   

17.
Seasonality effects and empirical regularities in financial data have been well documented in the financial economics literature for over seven decades. This paper proposes an expert system that uses novel machine learning techniques to predict the price return over these seasonal events, and then uses these predictions to develop a profitable trading strategy. While simple approaches to trading these regularities can prove profitable, such trading leads to potential large drawdowns (peak-to-trough decline of an investment measured as a percentage between the peak and the trough) in profit. In this paper, we introduce an automated trading system based on performance weighted ensembles of random forests that improves the profitability and stability of trading seasonality events. An analysis of various regression techniques is performed as well as an exploration of the merits of various techniques for expert weighting. The performance of the models is analysed using a large sample of stocks from the DAX. The results show that recency-weighted ensembles of random forests produce superior results in terms of both profitability and prediction accuracy compared with other ensemble techniques. It is also found that using seasonality effects produces superior results than not having them modelled explicitly.  相似文献   

18.
Oak forests are essential for the ecosystems of many countries, particularly when they are used in vegetal restoration. Therefore, models for predicting the potential habitat of oaks can be a valuable tool for work in the environment. In accordance with this objective, the building and comparison of data mining models are presented for the prediction of potential habitats for the oak forest type in Mediterranean areas (southern Spain), with conclusions applicable to other regions. Thirty-one environmental input variables were measured and six base models for supervised classification problems were selected: linear and quadratic discriminant analysis, logistic regression, classification trees, neural networks and support vector machines. Three ensemble methods, based on the combination of classification tree models fitted from samples and sets of variables generated from the original data set were also evaluated: bagging, random forests and boosting. The available data set was randomly split into three parts: training set (50%), validation set (25%), and test set (25%). The analysis of the accuracy, the sensitivity, the specificity, together with the area under the ROC curve for the test set reveal that the best models for our oak data set are those of bagging and random forests. All of these models can be fitted by free R programs which use the libraries and functions described in this paper. Furthermore, the methodology used in this study will allow researchers to determine the potential distribution of oaks in other kinds of areas.  相似文献   

19.
A theoretical analysis of bagging as a linear combination of classifiers   总被引:1,自引:0,他引:1  
We apply an analytical framework for the analysis of linearly combined classifiers to ensembles generated by bagging. This provides an analytical model of bagging misclassification probability as a function of the ensemble size, which is a novel result in the literature. Experimental results on real data sets confirm the theoretical predictions. This allows us to derive a novel and theoretically grounded guideline for choosing bagging ensemble size. Furthermore, our results are consistent with explanations of bagging in terms of classifier instability and variance reduction, support the optimality of the simple average over the weighted average combining rule for ensembles generated by bagging, and apply to other randomization-based methods for constructing classifier ensembles. Although our results do not allow to compare bagging misclassification probability with the one of an individual classifier trained on the original training set, we discuss how the considered theoretical framework could be exploited to this aim.  相似文献   

20.
COVID-19 has significantly impacted the growth prediction of a pandemic, and it is critical in determining how to battle and track the disease progression. In this case, COVID-19 data is a time-series dataset that can be projected using different methodologies. Thus, this work aims to gauge the spread of the outbreak severity over time. Furthermore, data analytics and Machine Learning (ML) techniques are employed to gain a broader understanding of virus infections. We have simulated, adjusted, and fitted several statistical time-series forecasting models, linear ML models, and nonlinear ML models. Examples of these models are Logistic Regression, Lasso, Ridge, ElasticNet, Huber Regressor, Lasso Lars, Passive Aggressive Regressor, K-Neighbors Regressor, Decision Tree Regressor, Extra Trees Regressor, Support Vector Regressions (SVR), AdaBoost Regressor, Random Forest Regressor, Bagging Regressor , AuoRegression, MovingAverage, Gradient Boosting Regressor, Autoregressive Moving Average (ARMA), Auto-Regressive Integrated Moving Averages (ARIMA), SimpleExpSmoothing, Exponential Smoothing, Holt-Winters, Simple Moving Average, Weighted Moving Average, Croston, and naive Bayes. Furthermore, our suggested methodology includes the development and evaluation of ensemble models built on top of the best-performing statistical and ML-based prediction methods. A third stage in the proposed system is to examine three different implementations to determine which model delivers the best performance. Then, this best method is used for future forecasts, and consequently, we can collect the most accurate and dependable predictions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号