首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.

Dementia is one of the leading causes of severe cognitive decline, it induces memory loss and impairs the daily life of millions of people worldwide. In this work, we consider the classification of dementia using magnetic resonance (MR) imaging and clinical data with machine learning models. We adapt univariate feature selection in the MR data pre-processing step as a filter-based feature selection. Bagged decision trees are also implemented to estimate the important features for achieving good classification accuracy. Several ensemble learning-based machine learning approaches, namely gradient boosting (GB), extreme gradient boost (XGB), voting-based, and random forest (RF) classifiers, are considered for the diagnosis of dementia. Moreover, we propose voting-based classifiers that train on an ensemble of numerous basic machine learning models, such as the extra trees classifier, RF, GB, and XGB. The implementation of a voting-based approach is one of the important contributions, and the performance of different classifiers are evaluated in terms of precision, accuracy, recall, and F1 score. Moreover, the receiver operating characteristic curve (ROC) and area under the ROC curve (AUC) are used as metrics for comparing these classifiers. Experimental results show that the voting-based classifiers often perform better compared to the RF, GB, and XGB in terms of precision, recall, and accuracy, thereby indicating the promise of differentiating dementia from imaging and clinical data.

  相似文献   

2.
Combining the stock prediction with portfolio optimization can improve the performance of the portfolio construction. In this article, we propose a novel portfolio construction approach by utilizing a two-stage ensemble model to forecast stock prices and combining the forecasting results with the portfolio optimization. To be specific, there are two phases in the approach: stock prediction and portfolio optimization. The stock prediction has two stages. In the first stage, three neural networks, that is, multilayer perceptron (MLP), gated recurrent unit (GRU), and long short-term memory (LSTM) are used to integrate the forecasting results of four individual models, that is, LSTM, GRU, deep multilayer perceptron (DMLP), and random forest (RF). In the second stage, the time-varying weight ordinary least square model (OLS) is utilized to combine the first-stage forecasting results to obtain the ultimate forecasting results, and then the stocks having a better potential return on investment are chosen. In the portfolio optimization, a diversified mean-variance with forecasting model named DMVF is proposed, in which an average predictive error term is considered to obtain excess returns, and a 2-norm cost function is introduced to diversify the portfolio. Using the historical data from the Shanghai stock exchange as the study sample, the results of the experiments indicate the DMVF model with two-stage ensemble prediction outperforms benchmarks in terms of return and return-risk characteristics.  相似文献   

3.
Machine learning algorithms have been widely used in mine fault diagnosis. The correct selection of the suitable algorithms is the key factor that affects the fault diagnosis. However, the impact of machine learning algorithms on the prediction performance of mine fault diagnosis models has not been fully evaluated. In this study, the windage alteration faults (WAFs) diagnosis models, which are based on K-nearest neighbor algorithm (KNN), multi-layer perceptron (MLP), support vector machine (SVM), and decision tree (DT), are constructed. Furthermore, the applicability of these four algorithms in the WAFs diagnosis is explored by a T-type ventilation network simulation experiment and the field empirical application research of Jinchuan No. 2 mine. The accuracy of the fault location diagnosis for the four models in both networks was 100%. In the simulation experiment, the mean absolute percentage error (MAPE) between the predicted values and the real values of the fault volume of the four models was 0.59%, 97.26%, 123.61%, and 8.78%, respectively. The MAPE for the field empirical application was 3.94%, 52.40%, 25.25%, and 7.15%, respectively. The results of the comprehensive evaluation of the fault location and fault volume diagnosis tests showed that the KNN model is the most suitable algorithm for the WAFs diagnosis, whereas the prediction performance of the DT model was the second-best. This study realizes the intelligent diagnosis of WAFs, and provides technical support for the realization of intelligent ventilation.  相似文献   

4.
This research aims to evaluate ensemble learning (bagging, boosting, and modified bagging) potential in predicting microbially induced concrete corrosion in sewer systems from the data mining (DM) perspective. Particular focus is laid on ensemble techniques for network-based DM methods, including multi-layer perceptron neural network (MLPNN) and radial basis function neural network (RBFNN) as well as tree-based DM methods, such as chi-square automatic interaction detector (CHAID), classification and regression tree (CART), and random forests (RF). Hence, an interdisciplinary approach is presented by combining findings from material sciences and hydrochemistry as well as data mining analyses to predict concrete corrosion. The effective factors on concrete corrosion such as time, gas temperature, gas-phase H2S concentration, relative humidity, pH, and exposure phase are considered as the models’ inputs. All 433 datasets are randomly selected to construct an individual model and twenty component models of boosting, bagging, and modified bagging based on training, validating, and testing for each DM base learners. Considering some model performance indices, (e.g., Root mean square error, RMSE; mean absolute percentage error, MAPE; correlation coefficient, r) the best ensemble predictive models are selected. The results obtained indicate that the prediction ability of the random forests DM model is superior to the other ensemble learners, followed by the ensemble Bag-CHAID method. On average, the ensemble tree-based models acted better than the ensemble network-based models; nevertheless, it was also found that taking the advantages of ensemble learning would enhance the general performance of individual DM models by more than 10%.  相似文献   

5.
This paper investigates the use of wavelet ensemble models for high performance concrete (HPC) compressive strength forecasting. More specifically, we incorporate bagging and gradient boosting methods in building artificial neural networks (ANN) ensembles (bagged artificial neural networks (BANN) and gradient boosted artificial neural networks (GBANN)), first. Coefficient of determination (R2), mean absolute error (MAE) and the root mean squared error (RMSE) statics are used for performance evaluation of proposed predictive models. Empirical results show that ensemble models (R2BANN=0.9278, R2GBANN=0.9270) are superior to a conventional ANN model (R2ANN=0.9088). Then, we use the coupling of discrete wavelet transform (DWT) and ANN ensembles for enhancing the prediction accuracy. The study concludes that DWT is an effective tool for increasing the accuracy of the ANN ensembles (R2WBANN=0.9397, R2WGBANN=0.9528).  相似文献   

6.
价格预测对于大宗农产品市场的稳定具有重要意义,但是大宗农产品价格与多种因素有着复杂的相关关系.针对当前价格预测中对数据完整性依赖性强与单一模型难以全面利用多种数据特征等问题,提出了一种将基于注意力机制的卷积双向长短期记忆神经网络(CNN-BiLSTM-Attention)、支持向量机回归(SVR)与LightGBM组合的增强式集成学习方法,并分别在包含历史交易、天气、汇率、油价等多种特征数据的数据集上进行了实验.实验以小麦和棉花价格预测为目标任务,使用互信息法进行特征选择,选择误差较低的CNN-BiLSTM-Attention模型作为基模型,与机器学习模型通过线性回归进行增强式集成学习.实验结果表明该集成学习方法在小麦及棉花数据集上预测结果的均方根误差(RMSE)值分别为12.812, 74.365,较之3个基模型分别降低11.00%, 0.94%、4.44%,1.99%与13.03%, 4.39%,能够有效降低价格预测的误差.  相似文献   

7.
Accurate prediction of electricity consumption is essential for providing actionable insights to decision-makers for managing volume and potential trends in future energy consumption for efficient resource management. A single model might not be sufficient to solve the challenges that result from linear and non-linear problems that occur in electricity consumption prediction. Moreover, these models cannot be applied in practice because they are either not interpretable or poorly generalized. In this paper, a stacking ensemble model for short-term electricity consumption is proposed. We experimented with machine learning and deep models like Random Forests, Long Short Term Memory, Deep Neural Networks, and Evolutionary Trees as our base models. Based on the experimental observations, two different ensemble models are proposed, where the predictions of the base models are combined using Gradient Boosting and Extreme Gradient Boosting (XGB). The proposed ensemble models were tested on a standard dataset that contains around 500,000 electricity consumption values, measured at periodic intervals, over the span of 9 years. Experimental validation revealed that the proposed ensemble model built on XGB reduces the training time of the second layer of the ensemble by a factor of close to 10 compared to the state-of-the-art , and also is more accurate. An average reduction of approximately 39% was observed in the Root mean square error.  相似文献   

8.
Gestational Diabetes Mellitus (GDM) is an illness that represents a certain degree of glucose intolerance with onset or first recognition during pregnancy. In the past few decades, numerous investigations were conducted upon early identification of GDM. Machine Learning (ML) methods are found to be efficient prediction techniques with significant advantage over statistical models. In this view, the current research paper presents an ensemble of ML-based GDM prediction and classification models. The presented model involves three steps such as preprocessing, classification, and ensemble voting process. At first, the input medical data is preprocessed in four levels namely, format conversion, class labeling, replacement of missing values, and normalization. Besides, four ML models such as Logistic Regression (LR), k-Nearest Neighbor (KNN), Support Vector Machine (SVM), and Random Forest (RF) are used for classification. In addition to the above, RF, LR, KNN and SVM classifiers are integrated to perform the final classification in which a voting classifier is also used. In order to investigate the proficiency of the proposed model, the authors conducted extensive set of simulations and the results were examined under distinct aspects. Particularly, the ensemble model has outperformed the classical ML models with a precision of 94%, recall of 94%, accuracy of 94.24%, and F-score of 94%.  相似文献   

9.
Software reliability prediction by soft computing techniques   总被引:1,自引:0,他引:1  
In this paper, ensemble models are developed to accurately forecast software reliability. Various statistical (multiple linear regression and multivariate adaptive regression splines) and intelligent techniques (backpropagation trained neural network, dynamic evolving neuro-fuzzy inference system and TreeNet) constitute the ensembles presented. Three linear ensembles and one non-linear ensemble are designed and tested. Based on the experiments performed on the software reliability data obtained from literature, it is observed that the non-linear ensemble outperformed all the other ensembles and also the constituent statistical and intelligent techniques.  相似文献   

10.
We address the task of multi-target regression, where we generate global models that simultaneously predict multiple continuous variables. We use ensembles of generalized decision trees, called predictive clustering trees (PCTs), in particular bagging and random forests (RF) of PCTs and extremely randomized PCTs (extra PCTs). We add another dimension of randomization to these ensemble methods by learning individual base models that consider random subsets of target variables, while leaving the input space randomizations (in RF PCTs and extra PCTs) intact. Moreover, we propose a new ensemble prediction aggregation function, where the final ensemble prediction for a given target is influenced only by those base models that considered it during learning. An extensive experimental evaluation on a range of benchmark datasets has been conducted, where the extended ensemble methods were compared to the original ensemble methods, individual multi-target regression trees, and ensembles of single-target regression trees in terms of predictive performance, running times and model sizes. The results show that the proposed ensemble extension can yield better predictive performance, reduce learning time or both, without a considerable change in model size. The newly proposed aggregation function gives best results when used with extremely randomized PCTs. We also include a comparison with three competing methods, namely random linear target combinations and two variants of random projections.  相似文献   

11.
Credit scoring is an effective tool for banks to properly guide decision profitably on granting loans. Ensemble methods, which according to their structures can be divided into parallel and sequential ensembles, have been recently developed in the credit scoring domain. These methods have proven their superiority in discriminating borrowers accurately. However, among the ensemble models, little consideration has been provided to the following: (1) highlighting the hyper-parameter tuning of base learner despite being critical to well-performed ensemble models; (2) building sequential models (i.e., boosting, as most have focused on developing the same or different algorithms in parallel); and (3) focusing on the comprehensibility of models. This paper aims to propose a sequential ensemble credit scoring model based on a variant of gradient boosting machine (i.e., extreme gradient boosting (XGBoost)). The model mainly comprises three steps. First, data pre-processing is employed to scale the data and handle missing values. Second, a model-based feature selection system based on the relative feature importance scores is utilized to remove redundant variables. Third, the hyper-parameters of XGBoost are adaptively tuned with Bayesian hyper-parameter optimization and used to train the model with selected feature subset. Several hyper-parameter optimization methods and baseline classifiers are considered as reference points in the experiment. Results demonstrate that Bayesian hyper-parameter optimization performs better than random search, grid search, and manual search. Moreover, the proposed model outperforms baseline models on average over four evaluation measures: accuracy, error rate, the area under the curve (AUC) H measure (AUC-H measure), and Brier score. The proposed model also provides feature importance scores and decision chart, which enhance the interpretability of credit scoring model.  相似文献   

12.
Improving accuracies of machine learning algorithms is vital in designing high performance computer-aided diagnosis (CADx) systems. Researches have shown that a base classifier performance might be enhanced by ensemble classification strategies. In this study, we construct rotation forest (RF) ensemble classifiers of 30 machine learning algorithms to evaluate their classification performances using Parkinson's, diabetes and heart diseases from literature.While making experiments, first the feature dimension of three datasets is reduced using correlation based feature selection (CFS) algorithm. Second, classification performances of 30 machine learning algorithms are calculated for three datasets. Third, 30 classifier ensembles are constructed based on RF algorithm to assess performances of respective classifiers with the same disease data. All the experiments are carried out with leave-one-out validation strategy and the performances of the 60 algorithms are evaluated using three metrics; classification accuracy (ACC), kappa error (KE) and area under the receiver operating characteristic (ROC) curve (AUC).Base classifiers succeeded 72.15%, 77.52% and 84.43% average accuracies for diabetes, heart and Parkinson's datasets, respectively. As for RF classifier ensembles, they produced average accuracies of 74.47%, 80.49% and 87.13% for respective diseases.RF, a newly proposed classifier ensemble algorithm, might be used to improve accuracy of miscellaneous machine learning algorithms to design advanced CADx systems.  相似文献   

13.
This paper investigates and empirically evaluates and compares six popular computational intelligence models in the context of fault density prediction in aspect-oriented systems. These models are multi-layer perceptron (MLP), radial basis function (RBF), k-nearest neighbor (KNN), regression tree (RT), dynamic evolving neuro-fuzzy inference system (DENFIS), and support vector regression (SVR). The models were trained and tested, using leave-one-out procedure, on a dataset that consists of twelve aspect-level metrics (explanatory variables) that measure different structural properties of an aspect. It was observed that the DENFIS, SVR, and RT models were more accurate in predicting fault density compared to the MLP, RBF, and KNN models. The MLP model was the worst model, and all the other models were significantly better than it.  相似文献   

14.
Bayesian model averaging (BMA) is a statistical method for post-processing forecast ensembles of atmospheric variables, obtained from multiple runs of numerical weather prediction models, in order to create calibrated predictive probability density functions (PDFs). The BMA predictive PDF of the future weather quantity is the mixture of the individual PDFs corresponding to the ensemble members and the weights and model parameters are estimated using forecast ensembles and validating observations from a given training period. A BMA model for calibrating wind speed forecasts is introduced using truncated normal distributions as conditional PDFs and the method is applied to the ALADIN-HUNEPS ensemble of the Hungarian Meteorological Service and to the University of Washington Mesoscale Ensemble. Three parameter estimation methods are proposed and each of the corresponding models outperforms the traditional gamma BMA model both in calibration and in accuracy of predictions.  相似文献   

15.
The aim of bankruptcy prediction in the areas of data mining and machine learning is to develop an effective model which can provide the higher prediction accuracy. In the prior literature, various classification techniques have been developed and studied, in/with which classifier ensembles by combining multiple classifiers approach have shown their outperformance over many single classifiers. However, in terms of constructing classifier ensembles, there are three critical issues which can affect their performance. The first one is the classification technique actually used/adopted, and the other two are the combination method to combine multiple classifiers and the number of classifiers to be combined, respectively. Since there are limited, relevant studies examining these aforementioned disuses, this paper conducts a comprehensive study of comparing classifier ensembles by three widely used classification techniques including multilayer perceptron (MLP) neural networks, support vector machines (SVM), and decision trees (DT) based on two well-known combination methods including bagging and boosting and different numbers of combined classifiers. Our experimental results by three public datasets show that DT ensembles composed of 80–100 classifiers using the boosting method perform best. The Wilcoxon signed ranked test also demonstrates that DT ensembles by boosting perform significantly different from the other classifier ensembles. Moreover, a further study over a real-world case by a Taiwan bankruptcy dataset was conducted, which also demonstrates the superiority of DT ensembles by boosting over the others.  相似文献   

16.
We propose and assess a set of non-parametric ensembles, including bagging and boosting schemes, to recognize tumors in digital mammograms. Different approaches were examined as candidates for the two major components of the bagging ensembles, three spatial resampling schemes (residuals, centers and standardized centers), and four combination criteria (at least one, majority vote, top 25% models, and false discovery rate). A conversion to a classification problem prior to aggregation was employed for the boosting ensemble. The ensembles were compared at the lesion level against a single expert, and to a set of Markov Random Field (MRF) models in real images using three different criteria. The performance of the ensembles depended on its components, particularly the combination, with at least one and top 25% models offering a greater detection power independently of the type of lesion, and of the booststrapping scheme in a lesser degree. The ensembles were comparable in performance to MRFs in the unsupervised recognition of patterns exhibiting spatial structure.  相似文献   

17.
Due to deregulation of electricity industry, accurate load forecasting and predicting the future electricity demand play an important role in the regional and national power system strategy management. Electricity load forecasting is a challenging task because electric load has complex and nonlinear relationships with several factors. In this paper, two hybrid models are developed for short-term load forecasting (STLF). These models use “ant colony optimization (ACO)” and “combination of genetic algorithm (GA) and ACO (GA-ACO)” for feature selection and multi-layer perceptron (MLP) for hourly load prediction. Weather and climatic conditions, month, season, day of the week, and time of the day are considered as load-influencing factors in this study. Using load time-series of a regional power system, the performance of ACO?+?MLP and GA-ACO?+?MLP hybrid models is compared with principal component analysis (PCA)?+?MLP hybrid model and also with the case of no-feature selection (NFS) when using MLP and radial basis function (RBF) neural models. Experimental results and the performance comparison with similar recent researches in this field show that the proposed GA-ACO?+?MLP hybrid model performs better in load prediction of 24-h ahead in terms of mean absolute percentage error (MAPE).  相似文献   

18.
This paper performs an exploratory study of the use of metaheuristic optimization techniques to select important parameters (features and members) in the design of ensemble of classifiers. In order to do this, an empirical investigation, using 10 different optimization techniques applied to 23 classification problems, will be performed. Furthermore, we will analyze the performance of both mono and multi-objective versions of these techniques, using all different combinations of three objectives, classification error as well as two important diversity measures to ensembles, which are good and bad diversity measures. Additionally, the optimization techniques will also have to select members for heterogeneous ensembles, using k-NN, Decision Tree and Naive Bayes as individual classifiers and they are all combined using the majority vote technique. The main aim of this study is to define which optimization techniques obtained the best results in the context of mono and multi-objective as well as to provide a comparison with classical ensemble techniques, such as bagging, boosting and random forest. Our findings indicated that three optimization techniques, Memetic, SA and PSO, provided better performance than the other optimization techniques as well as traditional ensemble generator (bagging, boosting and random forest).  相似文献   

19.
Abstract: Neural network ensembles (sometimes referred to as committees or classifier ensembles) are effective techniques to improve the generalization of a neural network system. Combining a set of neural network classifiers whose error distributions are diverse can generate better results than any single classifier. In this paper, some methods for creating ensembles are reviewed, including the following approaches: methods of selecting diverse training data from the original source data set, constructing different neural network models, selecting ensemble nets from ensemble candidates and combining ensemble members' results. In addition, new results on ensemble combination methods are reported.  相似文献   

20.
Heart failure is now widely spread throughout the world. Heart disease affects approximately 48% of the population. It is too expensive and also difficult to cure the disease. This research paper represents machine learning models to predict heart failure. The fundamental concept is to compare the correctness of various Machine Learning (ML) algorithms and boost algorithms to improve models’ accuracy for prediction. Some supervised algorithms like K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Trees (DT), Random Forest (RF), Logistic Regression (LR) are considered to achieve the best results. Some boosting algorithms like Extreme Gradient Boosting (XGBoost) and CatBoost are also used to improve the prediction using Artificial Neural Networks (ANN). This research also focuses on data visualization to identify patterns, trends, and outliers in a massive data set. Python and Scikit-learns are used for ML. Tensor Flow and Keras, along with Python, are used for ANN model training. The DT and RF algorithms achieved the highest accuracy of 95% among the classifiers. Meanwhile, KNN obtained a second height accuracy of 93.33%. XGBoost had a gratified accuracy of 91.67%, SVM, CATBoost, and ANN had an accuracy of 90%, and LR had 88.33% accuracy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号