首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper proposes the random subspace binary logit (RSBL) model (or random subspace binary logistic regression analysis) by taking the random subspace approach and using the classical logit model to generate a group of diverse logit decision agents from various perspectives for predictive problem. These diverse logit models are then combined for a more accurate analysis. The proposed RSBL model takes advantage of both logit (or logistic regression) and random subspace approaches. The random subspace approach generates diverse sets of variables to represent the current problem as different masks. Different logit decision agents from these masks, instead of a single logit model, are constructed. To verify its performance, we used the proposed RSBL model to forecast corporate failure in China. The results indicate that this model significantly improves the predictive ability of classical statistical models such as multivariate discriminant analysis, logit model, and probit model. Thus, the proposed model should make logit model more suitable for predictive problems in academic and industrial uses.  相似文献   

2.
Financial distress prediction (FDP) is of great importance to both inner and outside parts of companies. Though lots of literatures have given comprehensive analysis on single classifier FDP method, ensemble method for FDP just emerged in recent years and needs to be further studied. Support vector machine (SVM) shows promising performance in FDP when compared with other single classifier methods. The contribution of this paper is to propose a new FDP method based on SVM ensemble, whose candidate single classifiers are trained by SVM algorithms with different kernel functions on different feature subsets of one initial dataset. SVM kernels such as linear, polynomial, RBF and sigmoid, and the filter feature selection/extraction methods of stepwise multi discriminant analysis (MDA), stepwise logistic regression (logit), and principal component analysis (PCA) are applied. The algorithm for selecting SVM ensemble's base classifiers from candidate ones is designed by considering both individual performance and diversity analysis. Weighted majority voting based on base classifiers’ cross validation accuracy on training dataset is used as the combination mechanism. Experimental results indicate that SVM ensemble is significantly superior to individual SVM classifier when the number of base classifiers in SVM ensemble is properly set. Besides, it also shows that RBF SVM based on features selected by stepwise MDA is a good choice for FDP when individual SVM classifier is applied.  相似文献   

3.
Credit scoring focuses on the development of empirical models to support the financial decision‐making processes of financial institutions and credit industries. It makes use of applicants' historical data and statistical or machine learning techniques to assess the risk associated with an applicant. However, the historical data may consist of redundant and noisy features that affect the performance of credit scoring models. The main focus of this paper is to develop a hybrid model, combining feature selection and a multilayer ensemble classifier framework, to improve the predictive performance of credit scoring. The proposed hybrid credit scoring model is modeled in three phases. The initial phase constitutes preprocessing and assigns ranks and weights to classifiers. In the next phase, the ensemble feature selection approach is applied to the preprocessed dataset. Finally, in the last phase, the dataset with the selected features is used in a multilayer ensemble classifier framework. In addition, a classifier placement algorithm based on the Choquet integral value is designed, as the classifier placement affects the predictive performance of the ensemble framework. The proposed hybrid credit scoring model is validated on real‐world credit scoring datasets, namely, Australian, Japanese, German‐categorical, and German‐numerical datasets.  相似文献   

4.
Bayesian model averaging (BMA) is a statistical method for post-processing forecast ensembles of atmospheric variables, obtained from multiple runs of numerical weather prediction models, in order to create calibrated predictive probability density functions (PDFs). The BMA predictive PDF of the future weather quantity is the mixture of the individual PDFs corresponding to the ensemble members and the weights and model parameters are estimated using forecast ensembles and validating observations from a given training period. A BMA model for calibrating wind speed forecasts is introduced using truncated normal distributions as conditional PDFs and the method is applied to the ALADIN-HUNEPS ensemble of the Hungarian Meteorological Service and to the University of Washington Mesoscale Ensemble. Three parameter estimation methods are proposed and each of the corresponding models outperforms the traditional gamma BMA model both in calibration and in accuracy of predictions.  相似文献   

5.
Much has been written about word of mouth and customer behavior. Telephone call detail records provide a novel way to understand the strength of the relationship between individuals. In this paper, we predict using call detail records the impact that the behavior of one customer has on another customer's decisions. We study this in the context of churn (a decision to leave a communication service provider) and cross-buying decisions based on an anonymized data set from a telecommunications provider. Call detail records are represented as a weighted graph and a novel statistical learning technique, Markov logic networks, is used in conjunction with logit models based on lagged neighborhood variables to develop the predictive model. In addition, we propose an approach to propositionalization tailored to predictive modeling with social network data. The results show that information on the churn of network neighbors has a significant positive impact on the predictive accuracy and in particular the sensitivity of churn models. The results provide evidence that word of mouth has a considerable impact on customers' churn decisions and also on the purchase decisions, leading to a 19.5% and 8.4% increase in sensitivity of predictive models.  相似文献   

6.
In the last few years, machine learning techniques have been successfully applied to solve engineering problems. However, owing to certain complexities found in real-world problems, such as class imbalance, classical learning algorithms may not reach a prescribed performance. There can be situations where a good result on different conflicting objectives is desirable, such as true positive and true negative ratios, or it is important to balance model’s complexity and prediction score. To solve such issues, the application of multi-objective optimization design procedures can be used to analyze various trade-offs and build more robust machine learning models. Thus, the creation of ensembles of predictive models using such procedures is addressed in this work. First, a set of diverse predictive models is built by employing a multi-objective evolutionary algorithm. Next, a second multi-objective optimization step selects the previous models as ensemble members, resulting on several non-dominated solutions. A final multi-criteria decision making stage is applied to rank and visualize the resulting ensembles. To analyze the proposed methodology, two different experiments are conducted for binary classification. The first case study is a famous classification problem through which the proposed procedure is illustrated. The second one is a challenging real-world problem related to water quality monitoring, where the proposed procedure is compared to four classical ensemble learning algorithms. Results on this second experiment show that the proposed technique is able to create robust ensembles that can outperform other ensemble methods. Overall, the authors conclude that the proposed methodology for ensemble generation creates competitive models for real-world engineering problems.  相似文献   

7.
We address the task of multi-target regression, where we generate global models that simultaneously predict multiple continuous variables. We use ensembles of generalized decision trees, called predictive clustering trees (PCTs), in particular bagging and random forests (RF) of PCTs and extremely randomized PCTs (extra PCTs). We add another dimension of randomization to these ensemble methods by learning individual base models that consider random subsets of target variables, while leaving the input space randomizations (in RF PCTs and extra PCTs) intact. Moreover, we propose a new ensemble prediction aggregation function, where the final ensemble prediction for a given target is influenced only by those base models that considered it during learning. An extensive experimental evaluation on a range of benchmark datasets has been conducted, where the extended ensemble methods were compared to the original ensemble methods, individual multi-target regression trees, and ensembles of single-target regression trees in terms of predictive performance, running times and model sizes. The results show that the proposed ensemble extension can yield better predictive performance, reduce learning time or both, without a considerable change in model size. The newly proposed aggregation function gives best results when used with extremely randomized PCTs. We also include a comparison with three competing methods, namely random linear target combinations and two variants of random projections.  相似文献   

8.
In this paper, we present a highly accurate forecasting method that supports improved investment decisions. The proposed method extends the novel hybrid SVM-TLBO model consisting of a support vector machine (SVM) and a teaching-learning-based optimization (TLBO) method that determines the optimal SVM parameters, by combining it with dimensional reduction techniques (DR-SVM-TLBO). The dimension reduction techniques (feature extraction approach) extract critical, non-collinear, relevant, and de-noised information from the input variables (features), and reduce the time complexity. We investigated three different feature extraction techniques: principal component analysis, kernel principal component analysis, and independent component analysis. The feasibility and effectiveness of this proposed ensemble model were examined using a case study, predicting the daily closing prices of the COMDEX commodity futures index traded in the Multi Commodity Exchange of India Limited. In this study, we assessed the performance of the new ensemble model with the three feature extraction techniques, using different performance metrics and statistical measures. We compared our results with results from a standard SVM model and an SVM-TLBO hybrid model. Our experimental results show that the new ensemble model is viable and effective, and provides better predictions. This proposed model can provide technical support for better financial investment decisions and can be used as an alternative model for forecasting tasks that require more accurate predictions.  相似文献   

9.
We have investigated business failure prediction (BFP) by a combination of decision-aid, statistical, and artificial intelligence techniques. The goal is to construct a hybrid forecasting method for BFP by combining various outranking preference functions with case-based reasoning (CBR), whose heart is the k-nearest neighbor (k-NN) algorithm, and to empirically test the predictive performance of its modules. The hybrid2 CBR (H2CBR) forecasting method was constructed by integrating six hybrid CBR modules. These hybrid CBR modules were built up by combining and modifying six outranking preference functions with the algorithm of k-NN inside CBR. A trial-and-error iterative process was employed to identify the optimal hybrid CBR module of the H2CBR forecasting system. The prediction of the optimal module is the final output of the H2CBR forecasting method. We have compared the predictive performance of the six hybrid CBR modules in BFP of Chinese listed companies. In this empirical study, the classical CBR algorithm based on the Euclidean metric, and the two classical statistical methods of logistic regression (Logit) and multivariate discriminant analysis (MDA) were used as baseline models for comparison. Feature subsets were selected with the stepwise method of MDA. The predictive performance of the H2CBR system is promising; the most preferred hybrid CBR for short-term BFP of Chinese listed companies is based on the ranking-order preference function.  相似文献   

10.
In this paper a brute force logistic regression (LR) modeling approach is proposed and used to develop predictive credit scoring model for corporate entities. The modeling is based on 5 years of data from end-of-year financial statements of Serbian corporate entities, as well as, default event data. To the best of our knowledge, so far no relevant research about predictive power of financial ratios derived from Serbian financial statements has been published. This is also the first paper that generated 350 financial ratios to represent independent variables for 7590 corporate entities default predictions’. Many of derived financial ratios are new and were not discussed in literature before. Weight of evidence (WOE) method has been applied to transform and prepare financial ratios for brute force LR fitting simulations. Clustering method has been utilized to reduce long list of variables and to remove highly correlated financial ratios from partitioned training and validation datasets. The clustering results have revealed that number of variables can be reduced to short list of 24 financial ratios which are then analyzed in terms of default event predictive power. In this paper we propose the most predictive financial ratios from financial statements of Serbian corporate entities. The obtained short list of financial ratios has been used as a main input for brute force LR model simulations. According to literature, common practice to select variables in final model is to run stepwise, forward or backward LR. However, this research has been conducted in a way that the brute force LR simulations have to obtain all possible combinations of models that comprise of 5–14 independent variables from the short list of 24 financial ratios. The total number of simulated resulting LR models is around 14 million. Each model has been fitted through extensive and time consuming brute force LR simulations using SAS® code written by the authors. The total number of 342,016 simulated models (“well-founded” models) has satisfied the established credit scoring model validity conditions. The well-founded models have been ranked according to GINI performance on validation dataset. After all well-founded models have been ranked, the model with highest predictive power and consisting of 8 financial ratios has been selected and analyzed in terms of receiver-operating characteristic curve (ROC), GINI, AIC, SC, LR fitting statistics and correlation coefficients. The financial ratio constituents of that model have been discussed and benchmarked with several models from relevant literature.  相似文献   

11.
During the last three decades, the large spatial coverage of remote sensing data has been used in coral reef research to map dominant substrate types, geomorphologic zones, and bathymetry. During the same period, field studies have documented statistical relationships between variables quantifying aspects of the reef habitat and its fish community. Although the results of these studies are ambiguous, some habitat variables have frequently been found to correlate with one or more aspects of the fish community. Several of these habitat variables, including depth, the structural complexity of the substrate, and live coral cover, are possible to estimate with remote sensing data. In this study, we combine a set of statistical and machine-learning models with habitat variables derived from IKONOS data to produce spatially explicit predictions of the species richness, biomass, and diversity of the fish community around two reefs in Zanzibar. In the process, we assess the ability of IKONOS imagery to estimate live coral cover, structural complexity and habitat diversity, and we explore the importance of habitat variables, at a range of spatial scales, in the predictive models using a permutation-based technique. Our findings indicate that structural complexity at a fine spatial scale (∼ 5 to 10 m) is the most important habitat variable in predictive models of fish species richness and diversity, whereas other variables such as depth, habitat diversity, and structural complexity at coarser spatial scales contribute to predictions of biomass. In addition, our results demonstrate that complex model types such as tree-based ensemble techniques provide superior predictive performance compared to the more frequently used linear models, achieving a reduction of the cross-validated root-mean-squared prediction error of 3-11%. Although aerial photographs and airborne lidar instruments have recently been used to produce spatially explicit predictions of reef fish community variables, our study illustrates the possibility of doing so with satellite data. The ability to use satellite data may bring the cost of creating such maps within the reach of both spatial ecology researchers and the wide range of organizations involved in marine spatial planning.  相似文献   

12.
Boosting has been shown to improve the predictive performance of unstable learners such as decision trees, but not of stable learners like Support Vector Machines (SVM), k‐nearest neighbors and Naive Bayes classifiers. In addition to the model stability problem, the high time complexity of some stable learners such as SVM prohibits them from generating multiple models to form an ensemble for large data sets. This paper introduces a simple method that not only enables Boosting to improve the predictive performance of stable learners, but also significantly reduces the computational time to generate an ensemble of stable learners such as SVM for large data sets that would otherwise be infeasible. The method proposes to build local models, instead of global models; and it is the first method, to the best of our knowledge, to solve the two problems in Boosting stable learners at the same time. We implement the method by using a decision tree to define local regions and build a local model for each local region. We show that this implementation of the proposed method enables successful Boosting of three types of stable learners: SVM, k‐nearest neighbors and Naive Bayes classifiers.  相似文献   

13.
Case-based reasoning (CBR) has several advantages for business failure prediction (BFP), including ease of understanding, explanation, and implementation and the ability to make suggestions on how to avoid failure. We constructed a new ensemble method of CBR that we termed principal component CBR ensemble (PC-CBR-E): it, was intended to improve the predictive ability of CBR in BFP by integrating the feature selection methods in the representation level, a hybrid of principal component analysis with its two classical CBR algorithms at the modeling level and weighted majority voting at the ensemble level. We statistically validated our method by comparing it with other methods, including the best base model, multivariate discriminant analysis, logistic regression, and the two classical CBR algorithms. The results from a one-tailed significance test indicated that PC-CBR-E produced superior predictive performance in Chinese short-term and medium-term BFP.  相似文献   

14.
Tzong-Huei   《Neurocomputing》2009,72(16-18):3507
In 2008, financial tsunami started to impair the economic development of many countries, including Taiwan. The prediction of financial crisis turns to be much more important and doubtlessly holds public attention when the world economy goes to depression. This study examined the predictive ability of the four most commonly used financial distress prediction models and thus constructed reliable failure prediction models for public industrial firms in Taiwan. Multiple discriminate analysis (MDA), logit, probit, and artificial neural networks (ANNs) methodology were employed to a dataset of matched sample of failed and non-failed Taiwan public industrial firms during 1998–2005. The final models are validated using within sample test and out-of-the-sample test, respectively. The results indicated that the probit, logit, and ANN models which used in this study achieve higher prediction accuracy and possess the ability of generalization. The probit model possesses the best and stable performance. However, if the data does not satisfy the assumptions of the statistical approach, then the ANN approach would demonstrate its advantage and achieve higher prediction accuracy. In addition, the models which used in this study achieve higher prediction accuracy and possess the ability of generalization than those of [Altman, Financial ratios—discriminant analysis and the prediction of corporate bankruptcy using capital market data, Journal of Finance 23 (4) (1968) 589–609, Ohlson, Financial ratios and the probability prediction of bankruptcy, Journal of Accounting Research 18 (1) (1980) 109–131, and Zmijewski, Methodological issues related to the estimation of financial distress prediction models, Journal of Accounting Research 22 (1984) 59–82]. In summary, the models used in this study can be used to assist investors, creditors, managers, auditors, and regulatory agencies in Taiwan to predict the probability of business failure.  相似文献   

15.
Statistical shape modeling is a widely used technique for the representation and analysis of the shapes and shape variations present in a population. A statistical shape model models the distribution in a high dimensional shape space, where each shape is represented by a single point. We present a design study on the intuitive exploration and visualization of shape spaces and shape models. Our approach focuses on the dual‐space nature of these spaces. The high‐dimensional shape space represents the population, whereas object space represents the shape of the 3D object associated with a point in shape space. A 3D object view provides local details for a single shape. The high dimensional points in shape space are visualized using a 2D scatter plot projection, the axes of which can be manipulated interactively. This results in a dynamic scatter plot, with the further extension that each point is visualized as a small version of the object shape that it represents. We further enhance the population‐object duality with a new type of view aimed at shape comparison. This new “shape evolution view” visualizes shape variability along a single trajectory in shape space, and serves as a link between the two spaces described above. Our three‐view exploration concept strongly emphasizes linked interaction between all spaces. Moving the cursor over the scatter plot or evolution views, shapes are dynamically interpolated and shown in the object view. Conversely, camera manipulation in the object view affects the object visualizations in the other views. We present a GPU‐accelerated implementation, and show the effectiveness of the three‐view approach using a number of real‐world cases. In these, we demonstrate how this multi‐view approach can be used to visually explore important aspects of a statistical shape model, including specificity, compactness and reconstruction error.  相似文献   

16.
Ratio Selection for Classification Models   总被引:2,自引:0,他引:2  
This paper is concerned with the selection of inputs for classification models based on ratios of measured quantities. For this purpose, all possible ratios are built from the quantities involved and variable selection techniques are used to choose a convenient subset of ratios. In this context, two selection techniques are proposed: one based on a pre-selection procedure and another based on a genetic algorithm. In an example involving the financial distress prediction of companies, the models obtained from ratios selected by the proposed techniques compare favorably to a model using ratios usually found in the financial distress literature.  相似文献   

17.

针对复杂环境下的多变量工业过程在线故障检测问题, 提出基于集成核主分量分析的解决方法. 该方法首先求出样本映射后的无限维空间的多组近似基, 将主分量分析问题特征向量的解空间限定在近似基张成空间求解; 然后集成特征向量和特征值, 并计算Hotelling ??2 统计量和平方预报误差; 最后据此判断检测结果. 该方法对Tennessee Eastman 过程故障检测样本进行测试, 并与其他两种方法进行对比. 测试结果表明了所提出方法的有效性.

  相似文献   

18.
The ensemble learning paradigm has proved to be relevant to solving most challenging industrial problems. Despite its successful application especially in the Bioinformatics, the petroleum industry has not benefited enough from the promises of this machine learning technology. The petroleum industry, with its persistent quest for high-performance predictive models, is in great need of this new learning methodology. A marginal improvement in the prediction indices of petroleum reservoir properties could have huge positive impact on the success of exploration, drilling and the overall reservoir management portfolio. Support vector machines (SVM) is one of the promising machine learning tools that have performed excellently well in most prediction problems. However, its performance is a function of the prudent choice of its tuning parameters most especially the regularization parameter, C. Reports have shown that this parameter has significant impact on the performance of SVM. Understandably, no specific value has been recommended for it. This paper proposes a stacked generalization ensemble model of SVM that incorporates different expert opinions on the optimal values of this parameter in the prediction of porosity and permeability of petroleum reservoirs using datasets from diverse geological formations. The performance of the proposed SVM ensemble was compared to that of conventional SVM technique, another SVM implemented with the bagging method, and Random Forest technique. The results showed that the proposed ensemble model, in most cases, outperformed the others with the highest correlation coefficient, and the lowest mean and absolute errors. The study indicated that there is a great potential for ensemble learning in petroleum reservoir characterization to improve the accuracy of reservoir properties predictions for more successful explorations and increased production of petroleum resources. The results also confirmed that ensemble models perform better than the conventional SVM implementation.  相似文献   

19.
This paper introduces a new ensemble approach, Feature-Subspace Aggregating (Feating), which builds local models instead of global models. Feating is a generic ensemble approach that can enhance the predictive performance of both stable and unstable learners. In contrast, most existing ensemble approaches can improve the predictive performance of unstable learners only. Our analysis shows that the new approach reduces the execution time to generate a model in an ensemble through an increased level of localisation in Feating. Our empirical evaluation shows that Feating performs significantly better than Boosting, Random Subspace and Bagging in terms of predictive accuracy, when a stable learner SVM is used as the base learner. The speed up achieved by Feating makes feasible SVM ensembles that would otherwise be infeasible for large data sets. When SVM is the preferred base learner, we show that Feating SVM performs better than Boosting decision trees and Random Forests. We further demonstrate that Feating also substantially reduces the error of another stable learner, k-nearest neighbour, and an unstable learner, decision tree.  相似文献   

20.
This article proposes a new approach to improve the classification performance of remotely sensed images with an aggregative model based on classifier ensemble (AMCE). AMCE is a multi-classifier system with two procedures, namely ensemble learning and predictions combination. Two ensemble algorithms (Bagging and AdaBoost.M1) were used in the ensemble learning process to stabilize and improve the performance of single classifiers (i.e. maximum likelihood classifier, minimum distance classifier, back propagation neural network, classification and regression tree, and support vector machine (SVM)). Prediction results from single classifiers were integrated according to a diversity measurement with an averaged double-fault indicator and different combination strategies (i.e. weighted vote, Bayesian product, logarithmic consensus, and behaviour knowledge space). The suitability of the AMCE model was examined using a Landsat Thematic Mapper (TM) image of Dongguan city (Guangdong, China), acquired on 2 January 2009. Experimental results show that the proposed model was significantly better than the most accurate single classification (i.e. SVM) in terms of classification accuracy (i.e. from 88.83% to 92.45%) and kappa coefficient (i.e. from 0.8624 to 0.9088). A stepwise comparison illustrates that both ensemble learning and predictions combination with the AMCE model improved classification.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号