首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
为了提高企业信用风险评估准确率,提出了基于PSO-BP集成的企业信用风险评估模型.使用Bagging抽样技术获得足够多不同的训练数据集,用不同的训练集子集训练得到不同的PSO-BP组合成员分类器,最后通过多数投票准则整合不同组合成员分类器的分类结果.分别在包含了国内外公司的详细数据的数据集上证明了模型的有效性.  相似文献   

2.
针对现实信用评分业务中样本类别不平衡和代价敏感问题,以及金融机构更期望以得分的方式直观地认识贷款申请人的信用风险的实际需求,提出一种基于Ext-GBDT集成的类别不平衡信用评分模型。使用欠采样的方法从"好"客户(大类)中随机采样多份与全部"坏"客户(小类)等量的样本,分别与全部小类构成训练子集;用不同的训练子集及特征采样和参数扰动的方法训练得到多个差异化的Ext-GBDT子模型;然后使用简单平均法整合子模型的预测概率;最后将信用概率转换为信用评分。在UCI德国信用数据集上,以AUC和代价敏感错误率作为评价指标,与决策树、逻辑回归、朴素贝叶斯、支持向量机、随机森林及其集成模型等当前最为常用的信用评分模型进行对比,验证了该模型的有效性。  相似文献   

3.
针对标准支持向量机处理大规模数据集会出现训练速度慢、计算量大的缺点,提出了一种基于二叉树模型的支持向量机回归方法。通过二叉树模型将大样本数据集自适应分解成若干个子集,利用支持向量机分段提出支持向量,再把这些支持向量汇合成一个训练样本集进行训练产生决策函数,并将其应用到混沌时间序列的预测。与标准算法相比,该方法在保证泛化精度一致的前提下,极大地加快了训练速度。  相似文献   

4.
将支持向量机与半监督学习理论相结合,提出基于支持向量机协同训练的半监督回归模型,使用两个支持向量机回归模型相互影响,协同训练。利用实验数据集进行实验,并与监督支持向量机回归模型、半监督自训练支持向量机回归模型作比较。实验结果表明,基于支持向量机协同训练的半监督回归模型在缺少标记样本的情况下,提高了回归估计的精度。  相似文献   

5.
张永  浮盼盼  张玉婷 《计算机应用》2013,33(10):2801-2803
针对大规模数据的分类问题,将监督学习与无监督学习结合起来,提出了一种基于分层聚类和重采样技术的支持向量机(SVM)分类方法。该方法首先利用无监督学习算法中的k-means聚类分析技术将数据集划分成不同的子集,然后对各个子集进行逐类聚类,分别选出各类中心邻域内的样本点,构成最终的训练集,最后利用支持向量机对所选择的最具代表样本点进行训练建模。实验表明,所提方法可以大幅度降低支持向量机的学习代价,其分类精度比随机欠采样更优,而且可以达到采用完整数据集训练所得的结果  相似文献   

6.
由于支持向量机完整的理论框架和在实际应用中取得的好效果,在机器学习领域受到了广泛的重视.但是支持向量机算法最大的缺点就是在处理大规模训练数据集时需要巨大的内存和很长的训练时间.在这样的背景下,提出了使用并行化技术训练支持向量机.其基本思想是把大的数据集分解成小的子集,每个子集分别用于训练一个支持向量机,然后将多个训练结果有效融合.在现有技术的基础上,提出改进方案,在保证正确分类的情况下使用并行化技术来提高支持向量机的训练速度.实验结果表明,新方案在保证分类精度基本不变的情况下,可以有效减少支持向量机的训练时间.  相似文献   

7.
本文提出一种基于K-means聚类与机器学习回归算法的预测模型以解决零售行业多个商品的销售预测问题,首先通过聚类分析识别出具有相似销售模式的商品从而实现数据集的划分,然后分别在每个子数据集上训练了支持向量回归、随机森林以及XGBoost模型,通过构建数据池的方式增加了用于训练模型的数据量以及预测变量的选择范围.在一家零售企业的真实销售数据集上对提出的模型进行了验证,实验结果表明基于K-means和支持向量回归的预测模型表现最优,且所提出的模型预测效果明显优于基准模型以及不使用聚类的机器学习模型.  相似文献   

8.
针对传统的入侵检测模型IDM(Intrusion Detection System)不能检测最新的入侵手段且系统的特征数据库需要频繁更新的问题,提出融合K-均值聚类、模糊神经网络和支持向量机等数据挖掘技术来构建IDM。首先,利用K-均值聚类将原始的训练集划分为不同的训练子集;然后,基于各训练子集训练各自的模糊神经网络模型,并通过模糊神经网络模型生成支持向量机的支持向量;最后,采用径向支持向量机检测入侵行为是否发生。在KDD CUP 1999数据集上的实验验证了所提模型的有效性及可靠性。实验结果表明,相比其他几种较为先进的检测方法,所提模型在入侵检测方面取得了更高的检测精度。  相似文献   

9.
支持向量机是一种新的回归方法,介绍了基于支持向量机的回归建模技术,并应用于GDP的回归预测。GDP属性子集的特点是训练数据量比较少、含有稀疏数据。在转换、添加、下钻GDP相关属性的情况下对支持向量机的参数进行实验分析。实验结果显示支持向量机能很好的处理属性集的变化并得到很好的预测效果。  相似文献   

10.
程昊翔  王坚 《控制与决策》2016,31(4):755-758
为了使数据集的内在分布更好地影响训练模型,提出一种密度加权孪生支持向量回归机算法.该算法通过k近邻算法计算获得每个数据点基于数据密度分布的密度加权值,并将密度加权值引入到标准孪生支持向量回归机算法中.算法能够很好地反映训练数据集的内在分布,使数据点准确影响训练模型.通过6个UCI数据集上的实验结果分析验证了所提出算法的有效性.  相似文献   

11.
信用欺诈数据分布极度不均衡时,信息失真、周期性统计误差和报告偏倚所产生的噪声错误对训练模型干扰凸显,且易产生过拟合现象.鉴于此,提出一种深度信念神经网络集成算法来解决类极度不均衡的信用欺诈问题.首先,提出双向联合采样算法克服信息缺失和过拟合问题;然后,构造2阶段基分类器簇,针对支持向量机(support vector ...  相似文献   

12.
Least squares support vector machines ensemble models for credit scoring   总被引:1,自引:0,他引:1  
Due to recent financial crisis and regulatory concerns of Basel II, credit risk assessment is becoming one of the most important topics in the field of financial risk management. Quantitative credit scoring models are widely used tools for credit risk assessment in financial institutions. Although single support vector machines (SVM) have been demonstrated with good performance in classification, a single classifier with a fixed group of training samples and parameters setting may have some kind of inductive bias. One effective way to reduce the bias is ensemble model. In this study, several ensemble models based on least squares support vector machines (LSSVM) are brought forward for credit scoring. The models are tested on two real world datasets and the results show that ensemble strategies can help to improve the performance in some degree and are effective for building credit scoring models.  相似文献   

13.
This paper presents an optimal training subset for support vector regression (SVR) under deregulated power, which has a distinct advantage over SVR based on the full training set, since it solves the problem of large sample memory complexity O(N2) and prevents over-fitting during unbalanced data regression. To compute the proposed optimal training subset, an approximation convexity optimization framework is constructed through coupling a penalty term for the size of the optimal training subset to the mean absolute percentage error (MAPE) for the full training set prediction. Furthermore, a special method for finding the approximate solution of the optimization goal function is introduced, which enables us to extract maximum information from the full training set and increases the overall prediction accuracy. The applicability and superiority of the presented algorithm are shown by the half-hourly electric load data (48 data points per day) experiments in New South Wales under three different sample sizes. Especially, the benefit of the developed methods for large data sets is demonstrated by the significantly less CPU running time.  相似文献   

14.
The most commonly used techniques for credit scoring is logistic regression, and more recent research has proposed that the support vector machine is a more effective method. However, both logistic regression and support vector machine suffers from curse of dimension. In this paper, we introduce a new way to address this problem which is defined as orthogonal dimension reduction. We discuss the related properties of this method in detail and test it against other common statistical approaches—principal component analysis and hybridizing logistic regression to better solve and evaluate the data. With experiments on German data set, there is also an interesting phenomenon with respect to the use of support vector machine, which we define as ‘Dimensional interference’, and discuss in general. Based on the results of cross-validation, it can be found that through the use of logistic regression filtering the dummy variables and orthogonal extracting feature, the support vector machine not only reduces complexity and accelerates convergence, but also achieves better performance.  相似文献   

15.
Recent finance and debt crises have made credit risk management one of the most important issues in financial research.Reliable credit scoring models are crucial for financial agencies to evaluate credit applications and have been widely studied in the field of machine learning and statistics.In this paper,a novel feature-weighted support vector machine(SVM) credit scoring model is presented for credit risk assessment,in which an F-score is adopted for feature importance ranking.Considering the mutual interaction among modeling features,random forest is further introduced for relative feature importance measurement.These two feature-weighted versions of SVM are tested against the traditional SVM on two real-world datasets and the research results reveal the validity of the proposed method.  相似文献   

16.
针对个人信用评估中未标号数据获取容易而已标号数据获取相对困难,以及普遍存在的数据不对称问题,提出了基于改进图半监督学习技术的个人信用评估模型。该模型采用了半监督学习技术,一方面能从大量的未标号数据中学习,避免了个人信用评估中已标号数据相对缺乏造成的泛化能力下降问题;另一方面,通过改进图半监督学习技术,对图半监督迭代结果进行归一化及修改决策边界,有效减小了数据不对称的影响。在UCI的三个信用审核数据集上的评测结果表明,该模型具有明显优于支持向量机和改进前方法的评估效果。  相似文献   

17.
Credit scoring aims to assess the risk associated with lending to individual consumers. Recently, ensemble classification methodology has become popular in this field. However, most researches utilize random sampling to generate training subsets for constructing the base classifiers. Therefore, their diversity is not guaranteed, which may lead to a degradation of overall classification performance. In this paper, we propose an ensemble classification approach based on supervised clustering for credit scoring. In the proposed approach, supervised clustering is employed to partition the data samples of each class into a number of clusters. Clusters from different classes are then pairwise combined to form a number of training subsets. In each training subset, a specific base classifier is constructed. For a sample whose class label needs to be predicted, the outputs of these base classifiers are combined by weighted voting. The weight associated with a base classifier is determined by its classification performance in the neighborhood of the sample. In the experimental study, two benchmark credit data sets are adopted for performance evaluation, and an industrial case study is conducted. The results show that compared to other ensemble classification methods, the proposed approach is able to generate base classifiers with higher diversity and local accuracy, and improve the accuracy of credit scoring.  相似文献   

18.
信用评估分类器的好坏能够直接影响信贷金融机构的盈利能力. 传统的网格搜索法进行参数寻优时会耗费大量的时间, 基于此提出改进的网格搜索法优化XGBoost (GS-XGBoost)的个人信用评估算法. 该算法利用随机森林进行特征选择后, 将改进的网格搜索法对XGBoost中的n_estimators和learning_rate进行参数寻优, 建立评估模型. 从UCI数据库中选取信贷数据进行分析, 分别与支持向量机、随机森林、逻辑回归、神经网络以及未改进的XGBoost进行比较. 实验结果表明, 该模型的F-scoreG-mean的值均有提高.  相似文献   

19.
The assessment of risk of default on credit is important for financial institutions. Logistic regression and discriminant analysis are techniques traditionally used in credit scoring for determining likelihood to default based on consumer application and credit reference agency data. We test support vector machines against these traditional methods on a large credit card database. We find that they are competitive and can be used as the basis of a feature selection method to discover those features that are most significant in determining risk of default.  相似文献   

20.
由于纳税评估过程中存在不精确、模糊以及冗余信息,传统评估模型多数采用经验法和比较法,缺乏科学性和公正性,评估结果正确率低。为了提高纳税信用等级评估的正确率,提出了一种采用模糊神经网络的纳税信用等级评估模型。首先利用模糊逻辑推理对纳税评估过程中的不精确、模糊的信息进行有效的处理,然后利用训练数据对神经网络模型进行训练学习,获得纳税评估指标和信用等级间的评估模型,最后通过利用测试集对模型进行验证,结果表明,模糊神经网络方法提高了纳税信用等级评估的正确率,为税收信用评估提供有效的依据。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号