首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 37 毫秒
1.
A data driven ensemble classifier for credit scoring analysis   总被引:2,自引:0,他引:2  
This study focuses on predicting whether a credit applicant can be categorized as good, bad or borderline from information initially supplied. This is essentially a classification task for credit scoring. Given its importance, many researchers have recently worked on an ensemble of classifiers. However, to the best of our knowledge, unrepresentative samples drastically reduce the accuracy of the deployment classifier. Few have attempted to preprocess the input samples into more homogeneous cluster groups and then fit the ensemble classifier accordingly. For this reason, we introduce the concept of class-wise classification as a preprocessing step in order to obtain an efficient ensemble classifier. This strategy would work better than a direct ensemble of classifiers without the preprocessing step. The proposed ensemble classifier is constructed by incorporating several data mining techniques, mainly involving optimal associate binning to discretize continuous values; neural network, support vector machine, and Bayesian network are used to augment the ensemble classifier. In particular, the Markov blanket concept of Bayesian network allows for a natural form of feature selection, which provides a basis for mining association rules. The learned knowledge is represented in multiple forms, including causal diagram and constrained association rules. The data driven nature of the proposed system distinguishes it from existing hybrid/ensemble credit scoring systems.  相似文献   

2.
The wrong clusters number or poor starting points of each cluster have negative influence on the classification accuracy in the hybrid classifier based credit scoring system. The paper represents a new hybrid classifier based on fuzzy-rough instance selection, which have the same ability as clustering algorithms, but it can eliminate isolated and inconsistent instances without the need of determining clusters number and starting points of each cluster. The unrepresentative instances that cause conflicts with other instances are completely determined by the fuzzy-rough positive region which is only related to intrinsic data structure of datasets. By removing unrepresentative instances, both the training data quality and classifier training time can be improved. To prevent eliminating more instances than strictly necessary, the k-nearest neighbor algorithm is adopted to check the eliminated instances, and the instance whose predicted class is the same with predefined class is added back. SVM classifier with three different kernel functions are applied to the reduced dataset. The experimental results show that the proposed hybrid classifier has better classification accuracy on two real world datasets.  相似文献   

3.

Credit scoring is important for credit risk evaluation and monitoring in the accounting and finance domain. For financial institutions, the ability to predict the business failure is crucial, as incorrect decisions have direct financial consequences. A variety of pattern recognition techniques including neural networks, decision trees, and support vector machines have been applied to predict whether the borrowers should be considered a good or bad credit risk. This paper presents a hybrid approach to building the credit scoring model and illustrates how the unsupervised learning based on self-organizing map (SOM) can improve the discriminant capability of feedforward neural network (FNN). Within the hybridization scheme, the knowledge (i.e., prototypes of clusters) found by SOM is transferred as input to the subsequent FNN model. Four real-world data sets are used in the experiments for credit approval problems. By varying the parameters, the experimental results demonstrate the predictive model built by the hybrid approach can achieve better performance than the stand-alone FNN particularly when a limited amount of labeled data is available. This gives some insights on how to construct more accurate predictive models when the data collection is difficult in some financial applications. A complete and unique graphical visualization technique is shown which better outlines the trade-off between distinct metrics and attained performance.

  相似文献   

4.
信用卡公司是一个服务性的金融企业,如何提高在服务过程中的服务质量,改进服务方法,使公司的决策更为准确及时,是信用卡公司追求的一个目标。本文介绍了神经网络方法及数据挖掘技术在信用卡公司对用户评分中的应用,对比分析了几种个人信用评分模型建模方法的特点,建立了一种决策树-神经网络个人信用评分模型,并针对该模型提出了一种近邻聚类算法,该算法在信用评分应用中可以得到较理想的结果。  相似文献   

5.
基于数据挖掘聚类技术的信用评分评级   总被引:7,自引:0,他引:7  
本文提出了一个基于数据挖掘聚类技术的信用评分评级方法。该方法使用数据挖掘的聚类算法,对传统信用评分模型进行了改进,本文给出了方法的理论证明,并在一个信用卡分析系统DMCA中实现了该方法,进行了详细的数据测试。理论证明及实验结果都表明,聚类技术在传统信用评分模型的DM/MTM,分界值,均方差,交叉验证等问题上取得了良好的效果。  相似文献   

6.
数据挖掘技术在个人信用评估模型中的应用   总被引:4,自引:0,他引:4  
为了能够及时、恰当地进行个人信用评估分析,加快信用卡发卡机构的决策速度,介绍了数据挖掘技术在信用卡公司对用户评估中的应用,对比分析了数理统计模型、分类-聚类个人信用评估模型等几种个人信用评估模型建模方法的优缺点。建立了一种决策树-神经网络个人信用评估模型,针对该模型提出了一种近邻聚类算法。该算法不需要事先给定聚类的类别数,可以进行无监督学习。通过对比分析可知,该算法在个人信用评估应用中可以得到较理想的结果。  相似文献   

7.
加热炉钢坯温度软测量模型研究   总被引:10,自引:2,他引:10  
研究基于模糊聚类的钢坯温度神经网络软测量模型.该方法由两个部分组成, FCM(Fuzzy C-Means)聚类算法用来对训练样本进行分类,分布式RBF(Radial Basis Function) 网络对每类样本进行训练.在线测量时,采用自适应模糊聚类算法对新的工况数据进行 隶属度计算.文中将该算法应用于步进式加热炉钢坯温度的预报,仿真结果表明该算法的有 效性.  相似文献   

8.
针对耦合性较强的多维气象数据,在光伏(PV)多传感器系统中获取有效数据的基础上,提出了一种基于雾霾因素影响的数据挖掘光伏发电预测方法.利用多传感器采集大数据,利用逐步选择法对多维变量进行约减,有效降低了不同天气因素之间的耦合性.通过混合高斯聚类算法对样本进行聚类,并分别建立不同的径向基函数(RBF)神经网络模型,经过模糊推理的方法选择适当模型,实际预测结果验证了方法的高精度和实用性.  相似文献   

9.
As churn management is a major task for companies to retain valuable customers, the ability to predict customer churn is necessary. In literature, neural networks have shown their applicability to churn prediction. On the other hand, hybrid data mining techniques by combining two or more techniques have been proved to provide better performances than many single techniques over a number of different domain problems. This paper considers two hybrid models by combining two different neural network techniques for churn prediction, which are back-propagation artificial neural networks (ANN) and self-organizing maps (SOM). The hybrid models are ANN combined with ANN (ANN + ANN) and SOM combined with ANN (SOM + ANN). In particular, the first technique of the two hybrid models performs the data reduction task by filtering out unrepresentative training data. Then, the outputs as representative data are used to create the prediction model based on the second technique. To evaluate the performance of these models, three different kinds of testing sets are considered. They are the general testing set and two fuzzy testing sets based on the filtered out data by the first technique of the two hybrid models, i.e. ANN and SOM, respectively. The experimental results show that the two hybrid models outperform the single neural network baseline model in terms of prediction accuracy and Types I and II errors over the three kinds of testing sets. In addition, the ANN + ANN hybrid model significantly performs better than the SOM + ANN hybrid model and the ANN baseline model.  相似文献   

10.
Clustering algorithms, a fundamental base for data mining procedures and learning techniques, suffer from the lack of efficient methods for determining the optimal number of clusters to be found in an arbitrary dataset. The few methods existing in the literature always use some sort of evolutionary algorithm having a cluster validation index as its objective function. In this article, a new evolutionary algorithm, based on a hybrid model of global and local heuristic search, is proposed for the same task, and some experimentation is done with different datasets and indexes. Due to its design, independent of any clustering procedure, it is applicable to virtually any clustering method like the widely used \(k\)-means algorithm. Moreover, the use of non-parametric statistical tests over the experimental results, clearly show the proposed algorithm to be more efficient than other evolutionary algorithms currently used for the same task.  相似文献   

11.
Credit scoring focuses on the development of empirical models to support the financial decision‐making processes of financial institutions and credit industries. It makes use of applicants' historical data and statistical or machine learning techniques to assess the risk associated with an applicant. However, the historical data may consist of redundant and noisy features that affect the performance of credit scoring models. The main focus of this paper is to develop a hybrid model, combining feature selection and a multilayer ensemble classifier framework, to improve the predictive performance of credit scoring. The proposed hybrid credit scoring model is modeled in three phases. The initial phase constitutes preprocessing and assigns ranks and weights to classifiers. In the next phase, the ensemble feature selection approach is applied to the preprocessed dataset. Finally, in the last phase, the dataset with the selected features is used in a multilayer ensemble classifier framework. In addition, a classifier placement algorithm based on the Choquet integral value is designed, as the classifier placement affects the predictive performance of the ensemble framework. The proposed hybrid credit scoring model is validated on real‐world credit scoring datasets, namely, Australian, Japanese, German‐categorical, and German‐numerical datasets.  相似文献   

12.
对信用风险、信用评分进行了分析,在综合分析国内外企业信用评分指标体系的基础上,结合我国企业信用评分的特点,建立了适合我国企业信用评价的指标体系。结合国内外相关研究的现状与进展,及信用评分本身所具有的特点,建立了基于径向基函数神经网络的信用评分模型,利用现有数据分别进行判别和分析,研究其计算结果与实际情况的差距,然后使用改进的RBFNN学习算法,对径向基函数神经网络进行了学习训练,得到了令人满意的评价结果。利用该模型建立的评分系统具有进一步研究和推广应用的价值。  相似文献   

13.
Recently, a massive quantity of data is being produced from a distinct number of sources and the size of the daily created on the Internet has crossed two Exabytes. At the same time, clustering is one of the efficient techniques for mining big data to extract the useful and hidden patterns that exist in it. Density-based clustering techniques have gained significant attention owing to the fact that it helps to effectively recognize complex patterns in spatial dataset. Big data clustering is a trivial process owing to the increasing quantity of data which can be solved by the use of Map Reduce tool. With this motivation, this paper presents an efficient Map Reduce based hybrid density based clustering and classification algorithm for big data analytics (MR-HDBCC). The proposed MR-HDBCC technique is executed on Map Reduce tool for handling the big data. In addition, the MR-HDBCC technique involves three distinct processes namely pre-processing, clustering, and classification. The proposed model utilizes the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) technique which is capable of detecting random shapes and diverse clusters with noisy data. For improving the performance of the DBSCAN technique, a hybrid model using cockroach swarm optimization (CSO) algorithm is developed for the exploration of the search space and determine the optimal parameters for density based clustering. Finally, bidirectional gated recurrent neural network (BGRNN) is employed for the classification of big data. The experimental validation of the proposed MR-HDBCC technique takes place using the benchmark dataset and the simulation outcomes demonstrate the promising performance of the proposed model interms of different measures.  相似文献   

14.
对信用风险、信用评分进行了分析,在综合分析国内外企业信用评分指标体系的基础上,结合我国企业信用评分的特点,建立了适合我国企业信用评价的指标体系。结合国内外相关研究的现状与进展,及信用评分本身所具有的特点,建立了基于径向基函数神经网络的信用评分模型,利用现有数据分别进行判别和分析,研究其计算结果与实际情况的差距,然后使用改进的RBFNN学习算法,对径向基函数神经网络进行了学习训练,得到了令人满意的评价结果。利用该模型建立的评分系统具有进一步研究和推广应用的价值。  相似文献   

15.
In this paper we propose a clustering algorithm to cluster data with arbitrary shapes without knowing the number of clusters in advance. The proposed algorithm is a two-stage algorithm. In the first stage, a neural network incorporated with an ART-like training algorithm is used to cluster data into a set of multi-dimensional hyperellipsoids. At the second stage, a dendrogram is built to complement the neural network. We then use dendrograms and so-called tables of relative frequency counts to help analysts to pick some trustable clustering results from a lot of different clustering results. Several data sets were tested to demonstrate the performance of the proposed algorithm.  相似文献   

16.
The objective of the proposed study is to explore the performance of credit scoring using a two-stage hybrid modeling procedure with artificial neural networks and multivariate adaptive regression splines (MARS). The rationale under the analyses is firstly to use MARS in building the credit scoring model, the obtained significant variables are then served as the input nodes of the neural networks model. To demonstrate the effectiveness and feasibility of the proposed modeling procedure, credit scoring tasks are performed on one bank housing loan dataset using cross-validation approach. As the results reveal, the proposed hybrid approach outperforms the results using discriminant analysis, logistic regression, artificial neural networks and MARS and hence provides an alternative in handling credit scoring tasks.  相似文献   

17.
介绍了一种基于模糊聚类的组合BP神经网络的数据挖掘方法,并给出了该方法的模型和启发式BP改进算法Heuristicbp,且将其应用于数学函数值预测中,取得了学习时间短和预测精度高的效果,实验证明该方法是有效的,具有较高的实用性。  相似文献   

18.
改进的RBFNN在运动员竞技状态预测中的应用   总被引:1,自引:0,他引:1       下载免费PDF全文
提出了一种改进的径向基函数(RBF)神经网络,该神经网络以模糊系统模型为基础。首先利用减法聚类算法确定径向基函数的中心数,然后通过模糊C均值聚类算法优化基函数中心与宽度,最后依据样本数据的聚类结果设计RBF神经网络并进行训练。将该神经网络应用于网球队运动员的竞技状态的预测。仿真结果表明:该算法先进有效、具有较高的精度,用其建立的模型具有较强的实用性。  相似文献   

19.
Rapid technological advances imply that the amount of data stored in databases is rising very fast. However, data mining can discover helpful implicit information in large databases. How to detect the implicit and useful information with lower time cost, high correctness, high noise filtering rate and fit for large databases is of priority concern in data mining, specifying why considerable clustering schemes have been proposed in recent decades. This investigation presents a new data clustering approach called PHD, which is an enhanced version of KIDBSCAN. PHD is a hybrid density-based algorithm, which partitions the data set by K-means, and then clusters the resulting partitions with IDBSCAN. Finally, the closest pairs of clusters are merged until the natural number of clusters of data set is reached. Experimental results reveal that the proposed algorithm can perform the entire clustering, and efficiently reduce the run-time cost. They also indicate that the proposed new clustering algorithm conducts better than several existing well-known schemes such as the K-means, DBSCAN, IDBSCAN and KIDBSCAN algorithms. Consequently, the proposed PHD algorithm is efficient and effective for data clustering in large databases.  相似文献   

20.
基于多维自组织特征映射的聚类算法研究   总被引:2,自引:1,他引:1  
江波  张黎 《计算机科学》2008,35(6):181-182
作为神经网络的一种方法,自组织特征映射在数据挖掘、模式分类和机器学习中得到了广泛应用.本文详细讨论了自组织特征映射的聚类算法的工作原理和具体实现算法.通过系统仿真实验分析,SOFMF算法很好地克服了许多聚类算法存在的问题,在时间复杂度上具有良好的性能.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号