共查询到20条相似文献,搜索用时 15 毫秒
1.
A data driven ensemble classifier for credit scoring analysis 总被引:2,自引:0,他引:2
This study focuses on predicting whether a credit applicant can be categorized as good, bad or borderline from information initially supplied. This is essentially a classification task for credit scoring. Given its importance, many researchers have recently worked on an ensemble of classifiers. However, to the best of our knowledge, unrepresentative samples drastically reduce the accuracy of the deployment classifier. Few have attempted to preprocess the input samples into more homogeneous cluster groups and then fit the ensemble classifier accordingly. For this reason, we introduce the concept of class-wise classification as a preprocessing step in order to obtain an efficient ensemble classifier. This strategy would work better than a direct ensemble of classifiers without the preprocessing step. The proposed ensemble classifier is constructed by incorporating several data mining techniques, mainly involving optimal associate binning to discretize continuous values; neural network, support vector machine, and Bayesian network are used to augment the ensemble classifier. In particular, the Markov blanket concept of Bayesian network allows for a natural form of feature selection, which provides a basis for mining association rules. The learned knowledge is represented in multiple forms, including causal diagram and constrained association rules. The data driven nature of the proposed system distinguishes it from existing hybrid/ensemble credit scoring systems. 相似文献
2.
The aim of bankruptcy prediction in the areas of data mining and machine learning is to develop an effective model which can provide the higher prediction accuracy. In the prior literature, various classification techniques have been developed and studied, in/with which classifier ensembles by combining multiple classifiers approach have shown their outperformance over many single classifiers. However, in terms of constructing classifier ensembles, there are three critical issues which can affect their performance. The first one is the classification technique actually used/adopted, and the other two are the combination method to combine multiple classifiers and the number of classifiers to be combined, respectively. Since there are limited, relevant studies examining these aforementioned disuses, this paper conducts a comprehensive study of comparing classifier ensembles by three widely used classification techniques including multilayer perceptron (MLP) neural networks, support vector machines (SVM), and decision trees (DT) based on two well-known combination methods including bagging and boosting and different numbers of combined classifiers. Our experimental results by three public datasets show that DT ensembles composed of 80–100 classifiers using the boosting method perform best. The Wilcoxon signed ranked test also demonstrates that DT ensembles by boosting perform significantly different from the other classifier ensembles. Moreover, a further study over a real-world case by a Taiwan bankruptcy dataset was conducted, which also demonstrates the superiority of DT ensembles by boosting over the others. 相似文献
3.
Hybrid mining approach in the design of credit scoring models 总被引:1,自引:0,他引:1
Unrepresentative data samples are likely to reduce the utility of data classifiers in practical application. This study presents a hybrid mining approach in the design of an effective credit scoring model, based on clustering and neural network techniques. We used clustering techniques to preprocess the input samples with the objective of indicating unrepresentative samples into isolated and inconsistent clusters, and used neural networks to construct the credit scoring model. The clustering stage involved a class-wise classification process. A self-organizing map clustering algorithm was used to automatically determine the number of clusters and the starting points of each cluster. Then, the K-means clustering algorithm was used to generate clusters of samples belonging to new classes and eliminate the unrepresentative samples from each class. In the neural network stage, samples with new class labels were used in the design of the credit scoring model. The proposed method demonstrates by two real world credit data sets that the hybrid mining approach can be used to build effective credit scoring models. 相似文献
4.
Bankruptcy prediction and credit scoring have long been regarded as critical topics and have been studied extensively in the accounting and finance literature. Artificial intelligence and machine learning techniques have been used to solve these financial decision-making problems. The multilayer perceptron (MLP) network trained by the back-propagation learning algorithm is the mostly used technique for financial decision-making problems. In addition, it is usually superior to other traditional statistical models. Recent studies suggest combining multiple classifiers (or classifier ensembles) should be better than single classifiers. However, the performance of multiple classifiers in bankruptcy prediction and credit scoring is not fully understood. In this paper, we investigate the performance of a single classifier as the baseline classifier to compare with multiple classifiers and diversified multiple classifiers by using neural networks based on three datasets. By comparing with the single classifier as the benchmark in terms of average prediction accuracy, the multiple classifiers only perform better in one of the three datasets. The diversified multiple classifiers trained by not only different classifier parameters but also different sets of training data perform worse in all datasets. However, for the Type I and Type II errors, there is no exact winner. We suggest that it is better to consider these three classifier architectures to make the optimal financial decision. 相似文献
5.
The aim of this paper is to propose a new hybrid data mining model based on combination of various feature selection and ensemble learning classification algorithms, in order to support decision making process. The model is built through several stages. In the first stage, initial dataset is preprocessed and apart of applying different preprocessing techniques, we paid a great attention to the feature selection. Five different feature selection algorithms were applied and their results, based on ROC and accuracy measures of logistic regression algorithm, were combined based on different voting types. We also proposed a new voting method, called if_any, that outperformed all other voting methods, as well as a single feature selection algorithm's results. In the next stage, a four different classification algorithms, including generalized linear model, support vector machine, naive Bayes and decision tree, were performed based on dataset obtained in the feature selection process. These classifiers were combined in eight different ensemble models using soft voting method. Using the real dataset, the experimental results show that hybrid model that is based on features selected by if_any voting method and ensemble GLM + DT model performs the highest performance and outperforms all other ensemble and single classifier models. 相似文献
6.
A two-stage hybrid credit scoring model using artificial neural networks and multivariate adaptive regression splines 总被引:4,自引:0,他引:4
The objective of the proposed study is to explore the performance of credit scoring using a two-stage hybrid modeling procedure with artificial neural networks and multivariate adaptive regression splines (MARS). The rationale under the analyses is firstly to use MARS in building the credit scoring model, the obtained significant variables are then served as the input nodes of the neural networks model. To demonstrate the effectiveness and feasibility of the proposed modeling procedure, credit scoring tasks are performed on one bank housing loan dataset using cross-validation approach. As the results reveal, the proposed hybrid approach outperforms the results using discriminant analysis, logistic regression, artificial neural networks and MARS and hence provides an alternative in handling credit scoring tasks. 相似文献
7.
María-Dolores Cubiles-De-La-Vega Antonio Blanco-Oliver Rafael Pino-Mejías Juan Lara-Rubio 《Expert systems with applications》2013,40(17):6910-6917
A wide range of supervised classification algorithms have been successfully applied for credit scoring in non-microfinance environments according to recent literature. However, credit scoring in the microfinance industry is a relatively recent application, and current research is based, to the best of our knowledge, on classical statistical methods. This lack is surprising since the implementation of credit scoring based on supervised classification algorithms should contribute towards the efficiency of microfinance institutions, thereby improving their competitiveness in an increasingly constrained environment. This paper explores an extensive list of Statistical Learning techniques as microfinance credit scoring tools from an empirical viewpoint. A data set of microcredits belonging to a Peruvian Microfinance Institution is considered, and the following models are applied to decide between default and non-default credits: linear and quadratic discriminant analysis, logistic regression, multilayer perceptron, support vector machines, classification trees, and ensemble methods based on bagging and boosting algorithm. The obtained results suggest the use of a multilayer perceptron trained in the R statistical system with a second order algorithm. Moreover, our findings show that, with the implementation of this MLP-based model, the MFI? misclassification costs could be reduced to 13.7% with respect to the application of other classic models. 相似文献
8.
A number of credit scoring models have been developed to evaluate credit risk of new loan applicants and existing loan customers, respectively. This study proposes a method to manage existing customers by using misclassification patterns of credit scoring model. We divide two groups of customers, the currently good and bad credit customers, into two subgroups, respectively, according to whether their credit status is misclassified or not by the neural network model. In addition, we infer the characteristics of each subgroup and propose management strategies corresponding to each subgroup. 相似文献
9.
Due to recent financial crisis and regulatory concerns of Basel II, credit risk assessment is becoming one of the most important topics in the field of financial risk management. Quantitative credit scoring models are widely used tools for credit risk assessment in financial institutions. Although single support vector machines (SVM) have been demonstrated with good performance in classification, a single classifier with a fixed group of training samples and parameters setting may have some kind of inductive bias. One effective way to reduce the bias is ensemble model. In this study, several ensemble models based on least squares support vector machines (LSSVM) are brought forward for credit scoring. The models are tested on two real world datasets and the results show that ensemble strategies can help to improve the performance in some degree and are effective for building credit scoring models. 相似文献
10.
In this paper, a generalized adaptive ensemble generation and aggregation (GAEGA) method for the design of multiple classifier systems (MCSs) is proposed. GAEGA adopts an “over-generation and selection” strategy to achieve a good bias-variance tradeoff. In the training phase, different ensembles of classifiers are adaptively generated by fitting the validation data globally with different degrees. The test data are then classified by each of the generated ensembles. The final decision is made by taking into consideration both the ability of each ensemble to fit the validation data locally and reducing the risk of overfitting. In this paper, the performance of GAEGA is assessed experimentally in comparison with other multiple classifier aggregation methods on 16 data sets. The experimental results demonstrate that GAEGA significantly outperforms the other methods in terms of average accuracy, ranging from 2.6% to 17.6%. 相似文献
11.
The ability to accurately predict business failure is a very important issue in financial decision-making. Incorrect decision-making in financial institutions is very likely to cause financial crises and distress. Bankruptcy prediction and credit scoring are two important problems facing financial decision support. As many related studies develop financial distress models by some machine learning techniques, more advanced machine learning techniques, such as classifier ensembles and hybrid classifiers, have not been fully assessed. The aim of this paper is to develop a novel hybrid financial distress model based on combining the clustering technique and classifier ensembles. In addition, single baseline classifiers, hybrid classifiers, and classifier ensembles are developed for comparisons. In particular, two clustering techniques, Self-Organizing Maps (SOMs) and k-means and three classification techniques, logistic regression, multilayer-perceptron (MLP) neural network, and decision trees, are used to develop these four different types of bankruptcy prediction models. As a result, 21 different models are compared in terms of average prediction accuracy and Type I & II errors. By using five related datasets, combining Self-Organizing Maps (SOMs) with MLP classifier ensembles performs the best, which provides higher predication accuracy and lower Type I & II errors. 相似文献
12.
Analyzing bank databases for customer behavior management is difficult since bank databases are multi-dimensional, comprised of monthly account records and daily transaction records. This study proposes an integrated data mining and behavioral scoring model to manage existing credit card customers in a bank. A self-organizing map neural network was used to identify groups of customers based on repayment behavior and recency, frequency, monetary behavioral scoring predicators. It also classified bank customers into three major profitable groups of customers. The resulting groups of customers were then profiled by customer's feature attributes determined using an Apriori association rule inducer. This study demonstrates that identifying customers by a behavioral scoring model is helpful characteristics of customer and facilitates marketing strategy development. 相似文献
13.
The credit card industry has been growing rapidly recently, and thus huge numbers of consumers’ credit data are collected by the credit department of the bank. The credit scoring manager often evaluates the consumer’s credit with intuitive experience. However, with the support of the credit classification model, the manager can accurately evaluate the applicant’s credit score. Support Vector Machine (SVM) classification is currently an active research area and successfully solves classification problems in many domains. This study used three strategies to construct the hybrid SVM-based credit scoring models to evaluate the applicant’s credit score from the applicant’s input features. Two credit datasets in UCI database are selected as the experimental data to demonstrate the accuracy of the SVM classifier. Compared with neural networks, genetic programming, and decision tree classifiers, the SVM classifier achieved an identical classificatory accuracy with relatively few input features. Additionally, combining genetic algorithms with SVM classifier, the proposed hybrid GA-SVM strategy can simultaneously perform feature selection task and model parameters optimization. Experimental results show that SVM is a promising addition to the existing data mining methods. 相似文献
14.
基于数据挖掘聚类技术的信用评分评级 总被引:7,自引:0,他引:7
本文提出了一个基于数据挖掘聚类技术的信用评分评级方法。该方法使用数据挖掘的聚类算法,对传统信用评分模型进行了改进,本文给出了方法的理论证明,并在一个信用卡分析系统DMCA中实现了该方法,进行了详细的数据测试。理论证明及实验结果都表明,聚类技术在传统信用评分模型的DM/MTM,分界值,均方差,交叉验证等问题上取得了良好的效果。 相似文献
15.
Neural nets have become one of the most important tools using in credit scoring. Credit scoring is regarded as a core appraised tool of commercial banks during the last few decades. The purpose of this paper is to investigate the ability of neural nets, such as probabilistic neural nets and multi-layer feed-forward nets, and conventional techniques such as, discriminant analysis, probit analysis and logistic regression, in evaluating credit risk in Egyptian banks applying credit scoring models. The credit scoring task is performed on one bank’s personal loans’ data-set. The results so far revealed that the neural nets-models gave a better average correct classification rate than the other techniques. A one-way analysis of variance and other tests have been applied, demonstrating that there are some significant differences amongst the means of the correct classification rates, pertaining to different techniques. 相似文献
16.
Bo-Wen ChiChiun-Chieh Hsu 《Expert systems with applications》2012,39(3):2650-2661
Credit scoring model is an important tool for assessing risks in financial industry, consequently the majority of financial institutions actively develops credit scoring model on the credit approval assessment of new customers and the credit risk management of existing customers. Nonetheless, most past researches used the one-dimensional credit scoring model to measure customer risk. In this study, we select important variables by genetic algorithm (GA) to combine the bank’s internal behavioral scoring model with the external credit bureau scoring model to construct the dual scoring model for credit risk management of mortgage accounts. It undergoes more accurate risk judgment and segmentation to further discover the parts which are required to be enhanced in management or control from mortgage portfolio. The results show that the predictive ability of the dual scoring model outperforms both one-dimensional behavioral scoring model and credit bureau scoring model. Moreover, this study proposes credit strategies such as on-lending retaining and collection actions for corresponding customers in order to contribute benefits to the practice of banking credit. 相似文献
17.
Jue Wang Abdel-Rahman HedarShouyang Wang Jian Ma 《Expert systems with applications》2012,39(6):6123-6128
As the credit industry has been growing rapidly, credit scoring models have been widely used by the financial industry during this time to improve cash flow and credit collections. However, a large amount of redundant information and features are involved in the credit dataset, which leads to lower accuracy and higher complexity of the credit scoring model. So, effective feature selection methods are necessary for credit dataset with huge number of features. In this paper, a novel approach, called RSFS, to feature selection based on rough set and scatter search is proposed. In RSFS, conditional entropy is regarded as the heuristic to search the optimal solutions. Two credit datasets in UCI database are selected to demonstrate the competitive performance of RSFS consisted in three credit models including neural network model, J48 decision tree and Logistic regression. The experimental result shows that RSFS has a superior performance in saving the computational costs and improving classification accuracy compared with the base classification methods. 相似文献
18.
Recently, credit scoring has become a very important task as credit cards are now widely used by customers. A method that can accurately predict credit scoring is greatly needed and good prediction techniques can help to predict credit more accurately. One powerful classifier, the support vector machine (SVM), was successfully applied to a wide range of domains. In recent years, researchers have applied the SVM-based in the prediction of credit scoring, and the results have been shown it to be effective. In this study, two real world credit datasets in the University of California Irvine Machine Learning Repository were selected. SVM and a new classifier, clustering-launched classification (CLC), were employed to predict the accuracy of credit scoring. The advantages of using CLC are that it can classify data efficiently and only need one parameter needs to be decided. In substance, the results show that CLC is better than SVM. Therefore, CLC is an effective tool to predict credit scoring. 相似文献
19.
In this paper the problem of finding piecewise linear boundaries between sets is considered and is applied for solving supervised data classification problems. An algorithm for the computation of piecewise linear boundaries, consisting of two main steps, is proposed. In the first step sets are approximated by hyperboxes to find so-called “indeterminate” regions between sets. In the second step sets are separated inside these “indeterminate” regions by piecewise linear functions. These functions are computed incrementally starting with a linear function. Results of numerical experiments are reported. These results demonstrate that the new algorithm requires a reasonable training time and it produces consistently good test set accuracy on most data sets comparing with mainstream classifiers. 相似文献
20.
Credit scoring aims to assess the risk associated with lending to individual consumers. Recently, ensemble classification methodology has become popular in this field. However, most researches utilize random sampling to generate training subsets for constructing the base classifiers. Therefore, their diversity is not guaranteed, which may lead to a degradation of overall classification performance. In this paper, we propose an ensemble classification approach based on supervised clustering for credit scoring. In the proposed approach, supervised clustering is employed to partition the data samples of each class into a number of clusters. Clusters from different classes are then pairwise combined to form a number of training subsets. In each training subset, a specific base classifier is constructed. For a sample whose class label needs to be predicted, the outputs of these base classifiers are combined by weighted voting. The weight associated with a base classifier is determined by its classification performance in the neighborhood of the sample. In the experimental study, two benchmark credit data sets are adopted for performance evaluation, and an industrial case study is conducted. The results show that compared to other ensemble classification methods, the proposed approach is able to generate base classifiers with higher diversity and local accuracy, and improve the accuracy of credit scoring. 相似文献