共查询到20条相似文献,搜索用时 0 毫秒
1.
The aim of bankruptcy prediction in the areas of data mining and machine learning is to develop an effective model which can provide the higher prediction accuracy. In the prior literature, various classification techniques have been developed and studied, in/with which classifier ensembles by combining multiple classifiers approach have shown their outperformance over many single classifiers. However, in terms of constructing classifier ensembles, there are three critical issues which can affect their performance. The first one is the classification technique actually used/adopted, and the other two are the combination method to combine multiple classifiers and the number of classifiers to be combined, respectively. Since there are limited, relevant studies examining these aforementioned disuses, this paper conducts a comprehensive study of comparing classifier ensembles by three widely used classification techniques including multilayer perceptron (MLP) neural networks, support vector machines (SVM), and decision trees (DT) based on two well-known combination methods including bagging and boosting and different numbers of combined classifiers. Our experimental results by three public datasets show that DT ensembles composed of 80–100 classifiers using the boosting method perform best. The Wilcoxon signed ranked test also demonstrates that DT ensembles by boosting perform significantly different from the other classifier ensembles. Moreover, a further study over a real-world case by a Taiwan bankruptcy dataset was conducted, which also demonstrates the superiority of DT ensembles by boosting over the others. 相似文献
2.
Bankruptcy prediction and credit scoring have long been regarded as critical topics and have been studied extensively in the accounting and finance literature. Artificial intelligence and machine learning techniques have been used to solve these financial decision-making problems. The multilayer perceptron (MLP) network trained by the back-propagation learning algorithm is the mostly used technique for financial decision-making problems. In addition, it is usually superior to other traditional statistical models. Recent studies suggest combining multiple classifiers (or classifier ensembles) should be better than single classifiers. However, the performance of multiple classifiers in bankruptcy prediction and credit scoring is not fully understood. In this paper, we investigate the performance of a single classifier as the baseline classifier to compare with multiple classifiers and diversified multiple classifiers by using neural networks based on three datasets. By comparing with the single classifier as the benchmark in terms of average prediction accuracy, the multiple classifiers only perform better in one of the three datasets. The diversified multiple classifiers trained by not only different classifier parameters but also different sets of training data perform worse in all datasets. However, for the Type I and Type II errors, there is no exact winner. We suggest that it is better to consider these three classifier architectures to make the optimal financial decision. 相似文献
3.
Previous studies about ensembles of classifiers for bankruptcy prediction and credit scoring have been presented. In these studies, different ensemble schemes for complex classifiers were applied, and the best results were obtained using the Random Subspace method. The Bagging scheme was one of the ensemble methods used in the comparison. However, it was not correctly used. It is very important to use this ensemble scheme on weak and unstable classifiers for producing diversity in the combination. In order to improve the comparison, Bagging scheme on several decision trees models is applied to bankruptcy prediction and credit scoring. Decision trees encourage diversity for the combination of classifiers. Finally, an experimental study shows that Bagging scheme on decision trees present the best results for bankruptcy prediction and credit scoring. 相似文献
4.
We discuss approaches to incrementally construct an ensemble. The first constructs an ensemble of classifiers choosing a subset from a larger set, and the second constructs an ensemble of discriminants, where a classifier is used for some classes only. We investigate criteria including accuracy, significant improvement, diversity, correlation, and the role of search direction. For discriminant ensembles, we test subset selection and trees. Fusion is by voting or by a linear model. Using 14 classifiers on 38 data sets, incremental search finds small, accurate ensembles in polynomial time. The discriminant ensemble uses a subset of discriminants and is simpler, interpretable, and accurate. We see that an incremental ensemble has higher accuracy than bagging and random subspace method; and it has a comparable accuracy to AdaBoost, but fewer classifiers. 相似文献
5.
This paper presents an alternative technique for financial distress prediction systems. The method is based on a type of neural network, which is called hybrid associative memory with translation. While many different neural network architectures have successfully been used to predict credit risk and corporate failure, the power of associative memories for financial decision-making has not been explored in any depth as yet. The performance of the hybrid associative memory with translation is compared to four traditional neural networks, a support vector machine and a logistic regression model in terms of their prediction capabilities. The experimental results over nine real-life data sets show that the associative memory here proposed constitutes an appropriate solution for bankruptcy and credit risk prediction, performing significantly better than the rest of models under class imbalance and data overlapping conditions in terms of the true positive rate and the geometric mean of true positive and true negative rates. 相似文献
6.
In accounting and finance domains, bankruptcy prediction is of great utility for all of the economic stakeholders. The challenge of accurate assessment of business failure prediction, specially under scenarios of financial crisis, is known to be complicated. Although there have been many successful studies on bankruptcy detection, seldom probabilistic approaches were carried out. In this paper we assume a probabilistic point-of-view by applying Gaussian processes (GP) in the context of bankruptcy prediction, comparing it against the support vector machines (SVM) and the logistic regression (LR). Using real-world bankruptcy data, an in-depth analysis is conducted showing that, in addition to a probabilistic interpretation, the GP can effectively improve the bankruptcy prediction performance with high accuracy when compared to the other approaches. We additionally generate a complete graphical visualization to improve our understanding of the different attained performances, effectively compiling all the conducted experiments in a meaningful way. We complete our study with an entropy-based analysis that highlights the uncertainty handling properties provided by the GP, crucial for prediction tasks under extremely competitive and volatile business environments. 相似文献
7.
The paper introduces meta decision trees (MDTs), a novel method for combining multiple classifiers. Instead of giving a prediction, MDT leaves specify which classifier should be used to obtain a prediction. We present an algorithm for learning MDTs based on the C4.5 algorithm for learning ordinary decision trees (ODTs). An extensive experimental evaluation of the new algorithm is performed on twenty-one data sets, combining classifiers generated by five learning algorithms: two algorithms for learning decision trees, a rule learning algorithm, a nearest neighbor algorithm and a naive Bayes algorithm. In terms of performance, stacking with MDTs combines classifiers better than voting and stacking with ODTs. In addition, the MDTs are much more concise than the ODTs and are thus a step towards comprehensible combination of multiple classifiers. MDTs also perform better than several other approaches to stacking. 相似文献
8.
In this work, a new method for the creation of classifier ensembles is introduced. The patterns are partitioned into clusters to group together similar patterns, a training set is built using the patterns that belong to a cluster. Each of the new sets is used to train a classifier. We show that the approach here presented, called FuzzyBagging, obtains performance better than Bagging. 相似文献
9.
Many data mining problems involve an investigation of the relationships between features in heterogeneous data sets, where different learning algorithms can be more appropriate for different regions. The author proposes herein a technique of integrating global and local voting of classifiers. A comparison with other well-known combining methods on standard benchmark data sets was performed, and the accuracy of the proposed method was greater. 相似文献
10.
We set out in this study to review a vast amount of recent literature on machine learning (ML) approaches to predicting financial distress (FD), including supervised, unsupervised and hybrid supervised–unsupervised learning algorithms. Four supervised ML models including the traditional support vector machine (SVM), recently developed hybrid associative memory with translation (HACT), hybrid GA-fuzzy clustering and extreme gradient boosting (XGBoost) were compared in prediction performance to the unsupervised classifier deep belief network (DBN) and the hybrid DBN-SVM model, whereby a total of sixteen financial variables were selected from the financial statements of the publicly-listed Taiwanese firms as inputs to the six approaches. Our empirical findings, covering the 2010–2016 sample period, demonstrated that among the four supervised algorithms, the XGBoost provided the most accurate FD prediction. Moreover, the hybrid DBN-SVM model was able to generate more accurate forecasts than the use of either the SVM or the classifier DBN in isolation. 相似文献
11.
The primary concern of the rating policies for a banking industry is to develop a more objective, accurate and competitive scoring model to avoid losses from potential bad debt. This study proposes an artificial immune classifier based on the artificial immune network (named AINE-based classifier) to evaluate the applicants’ credit scores. Two experimental credit datasets are used to show the accuracy rate of the artificial immune classifier. The ten-fold cross-validation method is applied to evaluate the performance of the classifier. The classifier is compared with other data mining techniques. Experimental results show that for the AINE-based classifier in credit scoring is more competitive than the SVM and hybrid SVM-based classifiers, except the BPN classifier. We further compare our classifier with other three AIS-based classifiers in the benchmark datasets, and show that the AINE-based classifier can rival the AIRS-based classifiers and outperforms the SAIS classifier when the number of attributes and classes increase. Our classifier can provide the credit card issuer with accurate and valuable information of credit scoring analyses to avoid making incorrect decisions that result in the loss of applicants’ bad debt. 相似文献
12.
Boosting is a set of methods for the construction of classifier ensembles. The differential feature of these methods is that they allow to obtain a strong classifier from the combination of weak classifiers. Therefore, it is possible to use boosting methods with very simple base classifiers. One of the most simple classifiers are decision stumps, decision trees with only one decision node. This work proposes a variant of the most well-known boosting method, AdaBoost. It is based on considering, as the base classifiers for boosting, not only the last weak classifier, but a classifier formed by the last r selected weak classifiers (r is a parameter of the method). If the weak classifiers are decision stumps, the combination of r weak classifiers is a decision tree. The ensembles obtained with the variant are formed by the same number of decision stumps than the original AdaBoost. Hence, the original version and the variant produce classifiers with very similar sizes and computational complexities (for training and classification). The experimental study shows that the variant is clearly beneficial. 相似文献
13.
Consumer credit scoring is often considered a classification task where clients receive either a good or a bad credit status. Default probabilities provide more detailed information about the creditworthiness of consumers, and they are usually estimated by logistic regression. Here, we present a general framework for estimating individual consumer credit risks by use of machine learning methods. Since a probability is an expected value, all nonparametric regression approaches which are consistent for the mean are consistent for the probability estimation problem. Among others, random forests (RF), k-nearest neighbors (kNN), and bagged k-nearest neighbors (bNN) belong to this class of consistent nonparametric regression approaches. We apply the machine learning methods and an optimized logistic regression to a large dataset of complete payment histories of short-termed installment credits. We demonstrate probability estimation in Random Jungle, an RF package written in C++ with a generalized framework for fast tree growing, probability estimation, and classification. We also describe an algorithm for tuning the terminal node size for probability estimation. We demonstrate that regression RF outperforms the optimized logistic regression model, kNN, and bNN on the test data of the short-term installment credits. 相似文献
14.
In 2008, financial tsunami started to impair the economic development of many countries, including Taiwan. The prediction of financial crisis turns to be much more important and doubtlessly holds public attention when the world economy goes to depression. This study examined the predictive ability of the four most commonly used financial distress prediction models and thus constructed reliable failure prediction models for public industrial firms in Taiwan. Multiple discriminate analysis (MDA), logit, probit, and artificial neural networks (ANNs) methodology were employed to a dataset of matched sample of failed and non-failed Taiwan public industrial firms during 1998–2005. The final models are validated using within sample test and out-of-the-sample test, respectively. The results indicated that the probit, logit, and ANN models which used in this study achieve higher prediction accuracy and possess the ability of generalization. The probit model possesses the best and stable performance. However, if the data does not satisfy the assumptions of the statistical approach, then the ANN approach would demonstrate its advantage and achieve higher prediction accuracy. In addition, the models which used in this study achieve higher prediction accuracy and possess the ability of generalization than those of [Altman, Financial ratios—discriminant analysis and the prediction of corporate bankruptcy using capital market data, Journal of Finance 23 (4) (1968) 589–609, Ohlson, Financial ratios and the probability prediction of bankruptcy, Journal of Accounting Research 18 (1) (1980) 109–131, and Zmijewski, Methodological issues related to the estimation of financial distress prediction models, Journal of Accounting Research 22 (1984) 59–82]. In summary, the models used in this study can be used to assist investors, creditors, managers, auditors, and regulatory agencies in Taiwan to predict the probability of business failure. 相似文献
15.
Selecting an effective method for combining the votes of base inducers in a multiclassifier system can have a significant impact on the system’s overall classification accuracy. Some methods cannot even achieve as high a classification accuracy as the most accurate base classifier. To address this issue, we present the strategy of aggregate certainty estimators, which uses multiple measures to estimate a classifier’s certainty in its predictions on an instance‐by‐instance basis. Use of these certainty estimators for vote‐weighting allows the system to achieve a higher overall average in classification accuracy than the most accurate base classifier. Weighting with these aggregate measures also results in higher average classification accuracy than weighting with single certainty estimates. Aggregate certainty estimators outperform three baseline strategies, as well as the methods of modified stacking and arbitration, in terms of average accuracy over 36 data sets. 相似文献
16.
In this paper, we propose a new encoding technique that combines the different physicochemical properties of amino acids together with Needleman–Wunsch algorithm. The algorithm was tested in the recognition of T-cell epitopes. A series of SVM classifiers, where each SVM is trained using a different physicochemical property, combined with the “max rule” enables us to obtain an improvement over the state-of-the-art approaches. 相似文献
17.
用改进的BP网络模型作了分类器。改进的模型拓扑结构最简,学习速率快,分类准确率高。 相似文献
18.
Artificial neural networks techniques have been successfully applied in vector quantization (VQ) encoding. The objective of VQ is to statistically preserve the topological relationships existing in a data set and to project the data to a lattice of lower dimensions, for visualization, compression, storage, or transmission purposes. However, one of the major drawbacks in the application of artificial neural networks is the difficulty to properly specify the structure of the lattice that best preserves the topology of the data. To overcome this problem, in this paper we introduce merging algorithms for machine-fusion, boosting-fusion-based and hybrid-fusion ensembles of SOM, NG and GSOM networks. In these ensembles not the output signals of the base learners are combined, but their architectures are properly merged. We empirically show the quality and robustness of the topological representation of our proposed algorithm using both synthetic and real benchmarks datasets. 相似文献
19.
We empirically evaluate several state-of-the-art methods for constructing ensembles of heterogeneous classifiers with stacking and show that they perform (at best) comparably to selecting the best classifier from the ensemble by cross validation. Among state-of-the-art stacking methods, stacking with probability distributions and multi-response linear regression performs best. We propose two extensions of this method, one using an extended set of meta-level features and the other using multi-response model trees to learn at the meta-level. We show that the latter extension performs better than existing stacking approaches and better than selecting the best classifier by cross validation. 相似文献
20.
In this paper we study methods that combine multiple classification models learned over separate data sets. Numerous studies
posit that such approaches provide the means to efficiently scale learning to large data sets, while also boosting the accuracy
of individual classifiers. These gains, however, come at the expense of an increased demand for run-time system resources.
The final ensemble meta-classifier may consist of a large collection of base classifiers that require increased memory resources
while also slowing down classification throughput. Here, we describe an algorithm for pruning (i.e., discarding a subset of
the available base classifiers) the ensemble meta-classifier as a means to reduce its size while preserving its accuracy and
we present a technique for measuring the trade-off between predictive performance and available run-time system resources.
The algorithm is independent of the method used initially when computing the meta-classifier. It is based on decision tree
pruning methods and relies on the mapping of an arbitrary ensemble meta-classifier to a decision tree model. Through an extensive
empirical study on meta-classifiers computed over two real data sets, we illustrate our pruning algorithm to be a robust and
competitive approach to discarding classification models without degrading the overall predictive performance of the smaller
ensemble computed over those that remain after pruning.
Received 30 August 2000 / Revised 7 March 2001 / Accepted in revised form 21 May 2001 相似文献
|