首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Statistical methods have been widely employed to assess the capabilities of credit scoring classification models in order to reduce the risk of wrong decisions when granting credit facilities to clients. The predictive quality of a classification model can be evaluated based on measures such as sensitivity, specificity, predictive values, accuracy, correlation coefficients and information theoretical measures, such as relative entropy and mutual information. In this paper we analyze the performance of a naive logistic regression model (Hosmer & Lemeshow, 1989) and a logistic regression with state-dependent sample selection model (Cramer, 2004) applied to simulated data. Also, as a case study, the methodology is illustrated on a data set extracted from a Brazilian bank portfolio. Our simulation results so far revealed that there is no statistically significant difference in terms of predictive capacity between the naive logistic regression models and the logistic regression with state-dependent sample selection models. However, there is strong difference between the distributions of the estimated default probabilities from these two statistical modeling techniques, with the naive logistic regression models always underestimating such probabilities, particularly in the presence of balanced samples.  相似文献   

2.
As the credit industry has been growing rapidly, credit scoring models have been widely used by the financial industry during this time to improve cash flow and credit collections. However, a large amount of redundant information and features are involved in the credit dataset, which leads to lower accuracy and higher complexity of the credit scoring model. So, effective feature selection methods are necessary for credit dataset with huge number of features. In this paper, a novel approach, called RSFS, to feature selection based on rough set and scatter search is proposed. In RSFS, conditional entropy is regarded as the heuristic to search the optimal solutions. Two credit datasets in UCI database are selected to demonstrate the competitive performance of RSFS consisted in three credit models including neural network model, J48 decision tree and Logistic regression. The experimental result shows that RSFS has a superior performance in saving the computational costs and improving classification accuracy compared with the base classification methods.  相似文献   

3.
The development of an effective credit scoring model has become a very important issue as the credit industry is confronted with ever‐intensifying competition and aggravating bad debt problems. During the past few years, a substantial number of studies in the field of statistics have been conducted to improve the accuracy of credit scoring models. In order to refine the classification and decrease misclassification, this paper presents a two‐stage model. Focusing on classification, the first stage aims at constructing an artificial neural network (ANN)‐based credit scoring model to categorize applicants into the group of accepted (good) credit and the group of rejected (bad) credit. Switching from classification to reassignment, the second stage proceeds to reduce the Type I error by retrieving the originally rejected good credit applicants to conditional acceptance using the Case‐Based Reasoning (CBR) classification technique. The proposed model (RST–ANN–CBR) is applied to a credit card dataset to verify its effectiveness. As the results indicate, the proposed model is able to achieve more accurate credit scoring than four other methods; more importantly, it is validated to recover potentially lost customers and to increase business revenues.  相似文献   

4.
Credit scoring with a data mining approach based on support vector machines   总被引:3,自引:0,他引:3  
The credit card industry has been growing rapidly recently, and thus huge numbers of consumers’ credit data are collected by the credit department of the bank. The credit scoring manager often evaluates the consumer’s credit with intuitive experience. However, with the support of the credit classification model, the manager can accurately evaluate the applicant’s credit score. Support Vector Machine (SVM) classification is currently an active research area and successfully solves classification problems in many domains. This study used three strategies to construct the hybrid SVM-based credit scoring models to evaluate the applicant’s credit score from the applicant’s input features. Two credit datasets in UCI database are selected as the experimental data to demonstrate the accuracy of the SVM classifier. Compared with neural networks, genetic programming, and decision tree classifiers, the SVM classifier achieved an identical classificatory accuracy with relatively few input features. Additionally, combining genetic algorithms with SVM classifier, the proposed hybrid GA-SVM strategy can simultaneously perform feature selection task and model parameters optimization. Experimental results show that SVM is a promising addition to the existing data mining methods.  相似文献   

5.
With the rapid growth of credit industry, credit scoring model has a great significance to issue a credit card to the applicant with a minimum risk. So credit scoring is very important in financial firm like bans etc. With the previous data, a model is established. From that model is decision is taken whether he will be granted for issuing loans, credit cards or he will be rejected. There are several methodologies to construct credit scoring model i.e. neural network model, statistical classification techniques, genetic programming, support vector model etc. Computational time for running a model has a great importance in the 21st century. The algorithms or models with less computational time are more efficient and thus gives more profit to the banks or firms. In this study, we proposed a new strategy to reduce the computational time for credit scoring. In this approach we have used SVM incorporated with the concept of reduction of features using F score and taking a sample instead of taking the whole dataset to create the credit scoring model. We run our method two real dataset to see the performance of the new method. We have compared the result of the new method with the result obtained from other well known method. It is shown that new method for credit scoring model is very much competitive to other method in the view of its accuracy as well as new method has a less computational time than the other methods.  相似文献   

6.
Credit scoring aims to assess the risk associated with lending to individual consumers. Recently, ensemble classification methodology has become popular in this field. However, most researches utilize random sampling to generate training subsets for constructing the base classifiers. Therefore, their diversity is not guaranteed, which may lead to a degradation of overall classification performance. In this paper, we propose an ensemble classification approach based on supervised clustering for credit scoring. In the proposed approach, supervised clustering is employed to partition the data samples of each class into a number of clusters. Clusters from different classes are then pairwise combined to form a number of training subsets. In each training subset, a specific base classifier is constructed. For a sample whose class label needs to be predicted, the outputs of these base classifiers are combined by weighted voting. The weight associated with a base classifier is determined by its classification performance in the neighborhood of the sample. In the experimental study, two benchmark credit data sets are adopted for performance evaluation, and an industrial case study is conducted. The results show that compared to other ensemble classification methods, the proposed approach is able to generate base classifiers with higher diversity and local accuracy, and improve the accuracy of credit scoring.  相似文献   

7.
The paper deals with the problem of predicting the time to default in credit behavioural scoring. This area opens a possibility of including a dynamic component in behavioural scoring modelling which enables making decisions related to limit, collection and recovery strategies, retention and attrition, as well as providing an insight into the profitability, pricing or term structure of the loan. In this paper, we compare survival analysis and neural networks in terms of modelling and results. The neural network architecture is designed such that its output is comparable to the survival analysis output. Six neural network models were created, one for each period of default. A radial basis neural network algorithm was used to test all six models. The survival model used a Cox modelling procedure. Further, different performance measures of all models were discussed since even in highly accurate scoring models, misclassification patterns appear. A systematic comparison ‘3 + 2 + 2’ procedure is suggested to find the most effective model for a bank. Additionally, the survival analysis model is compared to neural network models according to the relative importance of different variables in predicting the time to default. Although different models can have very similar performance measures they may consist of different variables. The dataset used for the research was collected from a Croatian bank and credit customers were observed during a 12-month period. The paper emphasizes the importance of conducting a detailed comparison procedure while selecting the best model that satisfies the users’ interest.  相似文献   

8.
During the last few years there has been marked attention towards hybrid and ensemble systems development, having proved their ability to be more accurate than single classifier models. However, among the hybrid and ensemble models developed in the literature there has been little consideration given to: 1) combining data filtering and feature selection methods 2) combining classifiers of different algorithms; and 3) exploring different classifier output combination techniques other than the traditional ones found in the literature. In this paper, the aim is to improve predictive performance by presenting a new hybrid ensemble credit scoring model through the combination of two data pre-processing methods based on Gabriel Neighbourhood Graph editing (GNG) and Multivariate Adaptive Regression Splines (MARS) in the hybrid modelling phase. In addition, a new classifier combination rule based on the consensus approach (ConsA) of different classification algorithms during the ensemble modelling phase is proposed. Several comparisons will be carried out in this paper, as follows: 1) Comparison of individual base classifiers with the GNG and MARS methods applied separately and combined in order to choose the best results for the ensemble modelling phase; 2) Comparison of the proposed approach with all the base classifiers and ensemble classifiers with the traditional combination methods; and 3) Comparison of the proposed approach with recent related studies in the literature. Five of the well-known base classifiers are used, namely, neural networks (NN), support vector machines (SVM), random forests (RF), decision trees (DT), and naïve Bayes (NB). The experimental results, analysis and statistical tests prove the ability of the proposed approach to improve prediction performance against all the base classifiers, hybrid and the traditional combination methods in terms of average accuracy, the area under the curve (AUC) H-measure and the Brier Score. The model was validated over seven real world credit datasets.  相似文献   

9.
Credit risk evaluation is an integral part of any lending process, and even more so for financial institutions involved in lending to SMEs. The importance of credit scoring has increased recently because of the financial crisis and increased capital requirements for banks. There are, however, only few studies that develop credit coring models for SME lending. The objective of this study is to introduce a novel, more accurate credit risk estimation approach for SMEs business lending. Based on traditional statistical methods and recent artificial intelligence (AI) techniques, we proposed a hybrid model which combines the logistic regression approach and artificial neural networks (ANN). In order to test the effectiveness and feasibility of the proposed hybrid model, we use the data of Finnish SMEs from the fiscal years 2004 to 2012. Our results suggest that the proposed ANN/logistic hybrid model is more accurate than either of the initial models ANN or logistic regression. This improvement in the accuracy of the credit scoring model decreases evaluation errors and has thereby many potential practical implications. First of all, a more accurate credit scoring model can result in better performance of the whole SME loan portfolio. Second, it can also result in lower capital requirements from the banks perspective and lower interest rates from the individual firm's perspective. Combined, these effects will enhance the banks competitiveness in the market for SME loans.  相似文献   

10.
Credit scoring is an effective tool for banks to properly guide decision profitably on granting loans. Ensemble methods, which according to their structures can be divided into parallel and sequential ensembles, have been recently developed in the credit scoring domain. These methods have proven their superiority in discriminating borrowers accurately. However, among the ensemble models, little consideration has been provided to the following: (1) highlighting the hyper-parameter tuning of base learner despite being critical to well-performed ensemble models; (2) building sequential models (i.e., boosting, as most have focused on developing the same or different algorithms in parallel); and (3) focusing on the comprehensibility of models. This paper aims to propose a sequential ensemble credit scoring model based on a variant of gradient boosting machine (i.e., extreme gradient boosting (XGBoost)). The model mainly comprises three steps. First, data pre-processing is employed to scale the data and handle missing values. Second, a model-based feature selection system based on the relative feature importance scores is utilized to remove redundant variables. Third, the hyper-parameters of XGBoost are adaptively tuned with Bayesian hyper-parameter optimization and used to train the model with selected feature subset. Several hyper-parameter optimization methods and baseline classifiers are considered as reference points in the experiment. Results demonstrate that Bayesian hyper-parameter optimization performs better than random search, grid search, and manual search. Moreover, the proposed model outperforms baseline models on average over four evaluation measures: accuracy, error rate, the area under the curve (AUC) H measure (AUC-H measure), and Brier score. The proposed model also provides feature importance scores and decision chart, which enhance the interpretability of credit scoring model.  相似文献   

11.
A data driven ensemble classifier for credit scoring analysis   总被引:2,自引:0,他引:2  
This study focuses on predicting whether a credit applicant can be categorized as good, bad or borderline from information initially supplied. This is essentially a classification task for credit scoring. Given its importance, many researchers have recently worked on an ensemble of classifiers. However, to the best of our knowledge, unrepresentative samples drastically reduce the accuracy of the deployment classifier. Few have attempted to preprocess the input samples into more homogeneous cluster groups and then fit the ensemble classifier accordingly. For this reason, we introduce the concept of class-wise classification as a preprocessing step in order to obtain an efficient ensemble classifier. This strategy would work better than a direct ensemble of classifiers without the preprocessing step. The proposed ensemble classifier is constructed by incorporating several data mining techniques, mainly involving optimal associate binning to discretize continuous values; neural network, support vector machine, and Bayesian network are used to augment the ensemble classifier. In particular, the Markov blanket concept of Bayesian network allows for a natural form of feature selection, which provides a basis for mining association rules. The learned knowledge is represented in multiple forms, including causal diagram and constrained association rules. The data driven nature of the proposed system distinguishes it from existing hybrid/ensemble credit scoring systems.  相似文献   

12.
Support vector machines (SVM) is an effective tool for building good credit scoring models. However, the performance of the model depends on its parameters’ setting. In this study, we use direct search method to optimize the SVM-based credit scoring model and compare it with other three parameters optimization methods, such as grid search, method based on design of experiment (DOE) and genetic algorithm (GA). Two real-world credit datasets are selected to demonstrate the effectiveness and feasibility of the method. The results show that the direct search method can find the effective model with high classification accuracy and good robustness and keep less dependency on the initial search space or point setting.  相似文献   

13.
Classification and regression models are widely used by mainstream credit granting institutions to assess the risk of customer default. In practice, the objectives used to derive model parameters and the business objectives used to assess models differ. Models parameters are determined by minimising some function or error or by maximising likelihood, but performance is assessed using global measures such as the GINI coefficient, or the misclassification rate at a specific point in the score distribution. This paper seeks to determine the impact on performance that results from having different objectives for model construction and model assessment. To do this a genetic algorithm (GA) is utilized to generate linear scoring models that directly optimise business measures of interest. The performance of the GA models is then compared to those constructed using logistic and linear regression. Empirical results show that all models perform similarly well, suggesting that modelling and business objectives are well aligned.  相似文献   

14.
Least squares support vector machines ensemble models for credit scoring   总被引:1,自引:0,他引:1  
Due to recent financial crisis and regulatory concerns of Basel II, credit risk assessment is becoming one of the most important topics in the field of financial risk management. Quantitative credit scoring models are widely used tools for credit risk assessment in financial institutions. Although single support vector machines (SVM) have been demonstrated with good performance in classification, a single classifier with a fixed group of training samples and parameters setting may have some kind of inductive bias. One effective way to reduce the bias is ensemble model. In this study, several ensemble models based on least squares support vector machines (LSSVM) are brought forward for credit scoring. The models are tested on two real world datasets and the results show that ensemble strategies can help to improve the performance in some degree and are effective for building credit scoring models.  相似文献   

15.
The credit card industry has been growing rapidly recently, and thus huge numbers of consumers’ credit data are collected by the credit department of the bank. The credit scoring manager often evaluates the consumer’s credit with intuitive experience. However, with the support of the credit classification model, the manager can accurately evaluate the applicant’s credit score. Support Vector Machine (SVM) classification is currently an active research area and successfully solves classification problems in many domains. This study used three strategies to construct the hybrid SVM-based credit scoring models to evaluate the applicant’s credit score from the applicant’s input features. Two credit datasets in UCI database are selected as the experimental data to demonstrate the accuracy of the SVM classifier. Compared with neural networks, genetic programming, and decision tree classifiers, the SVM classifier achieved an identical classificatory accuracy with relatively few input features. Additionally, combining genetic algorithms with SVM classifier, the proposed hybrid GA-SVM strategy can simultaneously perform feature selection task and model parameters optimization. Experimental results show that SVM is a promising addition to the existing data mining methods.  相似文献   

16.
Credit scoring is the term used to describe methods utilized for classifying applicants for credit into classes of risk. This paper evaluates two induction approaches, rough sets and decision trees, as techniques for classifying credit (business) applicants. Inductive learning methods, like rough sets and decision trees, have a better knowledge representational structure than neural networks or statistical procedures because they can be used to derive production rules. If decision trees have already been used for credit granting, the rough sets approach is rarely utilized in this domain. In this paper, we use production rules obtained on a sample of 1102 business loans in order to compare the classification abilities of the two techniques. We show that decision trees obtain better results with 87.5% of good classifications with a pruned tree, against 76.7% for rough sets. However, decision trees make more type–II errors than rough sets, but fewer type–I errors.  相似文献   

17.
Credit scoring is very important in business, especially in banks. We want to describe a person who is a good credit or a bad one by evaluating his/her credit. We systematically proposed three link analysis algorithms based on the preprocess of support vector machine, to estimate an applicant’s credit so as to decide whether a bank should provide a loan to the applicant. The proposed algorithms have two major phases which are called input weighted adjustor and class by support vector machine-based models. In the first phase, we consider the link relation by link analysis and integrate the relation of applicants through their information into input vector of next phase. In the other phase, an algorithm is proposed based on general support vector machine model. A real world credit dataset is used to evaluate the performance of the proposed algorithms by 10-fold cross-validation method. It is shown that the genetic link analysis ranking methods have higher performance in terms of classification accuracy.  相似文献   

18.
The ability to accurately predict business failure is a very important issue in financial decision-making. Incorrect decision-making in financial institutions is very likely to cause financial crises and distress. Bankruptcy prediction and credit scoring are two important problems facing financial decision support. As many related studies develop financial distress models by some machine learning techniques, more advanced machine learning techniques, such as classifier ensembles and hybrid classifiers, have not been fully assessed. The aim of this paper is to develop a novel hybrid financial distress model based on combining the clustering technique and classifier ensembles. In addition, single baseline classifiers, hybrid classifiers, and classifier ensembles are developed for comparisons. In particular, two clustering techniques, Self-Organizing Maps (SOMs) and k-means and three classification techniques, logistic regression, multilayer-perceptron (MLP) neural network, and decision trees, are used to develop these four different types of bankruptcy prediction models. As a result, 21 different models are compared in terms of average prediction accuracy and Type I & II errors. By using five related datasets, combining Self-Organizing Maps (SOMs) with MLP classifier ensembles performs the best, which provides higher predication accuracy and lower Type I & II errors.  相似文献   

19.
向欣  陆歌皓 《计算机应用研究》2021,38(12):3604-3610
针对现实信用评估业务中样本类别不平衡和代价敏感的情况,为降低信用风险评估的误分类损失,提出一种基于DESMID-AD动态选择的信用评估集成模型,根据每一个测试样本的特点动态地选择合适的基分类器对其进行信用预测.为提高模型对信用差客户(小类)的识别能力,在基分类器训练前使用过采样的方法对训练数据作类别平衡,采用元学习的方式基于多个指标进行基分类器的性能评估并在此阶段设计权重机制增强小类的影响.在三个公开信用评估数据集上,以AUC、一型、二型错误率以及误分类代价作为评价指标,与九种信用评估常用模型做比较,证明了该方法在信用评估领域的有效性和可行性.  相似文献   

20.
Credit score classification is a prominent research problem in the banking or financial industry, and its predictive performance is responsible for the profitability of financial industry. This paper addresses how Spiking Extreme Learning Machine (SELM) can be effectively used for credit score classification. A novel spike-generating function is proposed in Leaky Nonlinear Integrate and Fire Model (LNIF). Its interspike period is computed and utilized in the extreme learning machine (ELM) for credit score classification. The proposed model is named as SELM and is validated on five real-world credit scoring datasets namely: Australian, German-categorical, German-numerical, Japanese, and Bankruptcy. Further, results obtained by SELM are compared with back propagation, probabilistic neural network, ELM, voting-based Q-generalized extreme learning machine, Radial basis neural network and ELM with some existing spiking neuron models in terms of classification accuracy, Area under curve (AUC), H-measure and computational time. From the experimental results, it has been noticed that improvement in accuracy and execution time for the proposed SELM is highly statistically important for all aforementioned credit scoring datasets. Thus, integrating a biological spiking function with ELM makes it more efficient for categorization.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号