首页 | 本学科首页   官方微博 | 高级检索  
     

面向高维混合不平衡信贷数据的单类分类方法
引用本文:张东梅,买日旦·吾守尔,古兰拜尔·吐尔洪. 面向高维混合不平衡信贷数据的单类分类方法[J]. 计算机工程与应用, 2021, 57(10): 233-240. DOI: 10.3778/j.issn.1002-8331.2002-0212
作者姓名:张东梅  买日旦·吾守尔  古兰拜尔·吐尔洪
作者单位:新疆大学 信息科学与工程学院,乌鲁木齐 830046
基金项目:新疆大学博士启动基金;教育厅高校科研青年项目;自治区高层次创新人才项目
摘    要:为实现对高维混合、不平衡信贷数据中的不良贷款者的准确预测,从降维预处理和分类算法两方面进行优化,提出一种基于混合数据主成分分析(Principal Component Analysis of Mixed Data,PCAmix)预处理的单类K近邻(K-Nearest Neighbor,KNN)计算均值算法.针对传统的主...

关 键 词:信用评分  单类分类  不平衡数据  高维混合数据  混合数据主成分分析

One-Class Classification Method for High-Dimensional Mixed and Unbalanced Credit Score Data
ZHANG Dongmei,Mairidan Wushouer,Gulanbaier Tuerhong. One-Class Classification Method for High-Dimensional Mixed and Unbalanced Credit Score Data[J]. Computer Engineering and Applications, 2021, 57(10): 233-240. DOI: 10.3778/j.issn.1002-8331.2002-0212
Authors:ZHANG Dongmei  Mairidan Wushouer  Gulanbaier Tuerhong
Affiliation:College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
Abstract:To conduct an accurate prediction of “bad” loan applicants in high-dimensional, mixed and unbalanced credit score data, this paper proposes a one-class [KNN][(K]-Nearest Neighbor) algorithm based on Principal Component Analysis of Mixed Data processing(PCAmix), in which both the preprocessing of dimension reduction and classification itself are optimized. Since the traditional Principal Component Analysis(PCA) methods cannot deal with qualitative variables directly, this paper not only employs the PCAmix, but also incorporates the concept of one-class classification and average distance calculation to avoid the poor performance of binary classification on unbalanced data. Besides, the proposed method adopts the Bootstrap algorithm to find the best decision boundaries that maximize the separation of positive and negative samples to accomplish accurate predicting for customer’s default risk. The experiments on UCI datasets of German and Default credit score show that the proposed algorithm performs better when the data are high-dimensional, mixed as well as unbalanced.
Keywords:credit score  one-class classification  imbalance data  high-dimensional mixed data  Principal Component Analysis of Mixed Data(PCAmix)  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号