首页 | 本学科首页   官方微博 | 高级检索  
     

基于K最近邻样本平均距离的代价敏感算法的集成
引用本文:杨浩,王宇,张中原.基于K最近邻样本平均距离的代价敏感算法的集成[J].计算机应用,2019,39(7):1883-1887.
作者姓名:杨浩  王宇  张中原
作者单位:河海大学计算机与信息学院,南京,211100;滑铁卢大学计算机系,安大略滑铁卢N2L 3G1,加拿大
基金项目:国家自然青年科学基金资助项目(61103017);中国科学院感知中国先导专项子课题项目(XDA06040504)。
摘    要:为了解决不均衡数据集的分类问题和一般的代价敏感学习算法无法扩展到多分类情况的问题,提出了一种基于 K 最近邻( K NN)样本平均距离的代价敏感算法的集成方法。首先,根据最大化最小间隔的思想提出一种降低决策边界样本密度的重采样方法;接着,采用每类样本的平均距离作为分类结果的判断依据,并提出一种符合贝叶斯决策理论的学习算法,使得改进后的算法具备代价敏感性;最后,对改进后的代价敏感算法按 K 值进行集成,以代价最小为原则,调整各基学习器的权重,得到一个以总体误分代价最低为目标的代价敏感AdaBoost算法。实验结果表明,与传统的 K NN算法相比,改进后的算法在平均误分代价上下降了31.4个百分点,并且代价敏感性能更好。

关 键 词:代价敏感  最大化最小间隔  样本间距离  贝叶斯决策理论  集成
收稿时间:2018-12-17
修稿时间:2019-01-28

Integration of cost-sensitive algorithms based on average distance of K-nearest neighbor samples
YANG Hao,WANG Yu,ZHANG Zhongyuan.Integration of cost-sensitive algorithms based on average distance of K-nearest neighbor samples[J].journal of Computer Applications,2019,39(7):1883-1887.
Authors:YANG Hao  WANG Yu  ZHANG Zhongyuan
Affiliation:1. College of Computer and Information, Hohai University, Nanjing Jiangsu 211100, China;
2. Department of Computer Science, Waterloo University, Waterloo Ontario N2L 3G1, Canada
Abstract:To solve the problem of classification of unbalanced data sets and the problem that the general cost-sensitive learning algorithm can not be applied to multi-classification condition, an integration method of cost-sensitive algorithm based on average distance of K-Nearest Neighbor (KNN) samples was proposed. Firstly, according to the idea of maximizing the minimum interval, a resampling method for reducing the density of decision boundary samples was proposed. Then, the average distance of each type of samples was used as the basis of judgment of classification results, and a learning algorithm based on Bayesian decision-making theory was proposed, which made the improved algorithm cost sensitive. Finally, the improved cost-sensitive algorithm was integrated according to the K value. The weight of each base learner was adjusted according to the principle of minimum cost, obtaining the cost-sensitive AdaBoost algorithm aiming at the minimum total misclassification cost. The experimental results show that compared with traditional KNN algorithm, the improved algorithm reduces the average misclassification cost by 31.4 percentage points and has better cost sensitivity.
Keywords:cost-sensitive                                                                                                                        maximization of minimum interval                                                                                                                        distance between samples                                                                                                                        Bayesian decision-making theory                                                                                                                        integration
本文献已被 维普 万方数据 等数据库收录!
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号