首页 | 本学科首页   官方微博 | 高级检索  
     

改进型加权KNN算法的不平衡数据集分类
引用本文:王超学,潘正茂,马春森,董丽丽,张涛. 改进型加权KNN算法的不平衡数据集分类[J]. 计算机工程, 2012, 38(20): 160-163
作者姓名:王超学  潘正茂  马春森  董丽丽  张涛
作者单位:1. 西安建筑科技大学信息与控制工程学院,西安,710055
2. 中国农业科学院植物保护研究所,北京,100193
基金项目:国家自然科学基金资助项目(31170393);陕西省自然科学基金资助项目(2012JM8023);陕西省教育厅自然科学专项基金资助项目(12JK0726)
摘    要:K最邻近(KNN)算法对不平衡数据集进行分类时分类判决总会倾向于多数类.为此,提出一种加权KNN算法GAK-KNN.定义新的权重分配模型,综合考虑类间分布不平衡及类内分布不均匀的不良影响,采用基于遗传算法的K-means算法对训练样本集进行聚类,按照权重分配模型计算各训练样本的权重,通过改进的KNN算法对测试样本进行分类.基于UCI数据集的大量实验结果表明,GAK-KNN算法的识别率和整体性能都优于传统KNN算法及其他改进算法.

关 键 词:不平衡数据集  分类  K最邻近算法  权重分配模型  遗传算法  K-means算法
收稿时间:2011-12-16
修稿时间:2012-02-20

Classification for Imbalanced Dataset of Improved Weighted KNN Algorithm
WANG Chao-xue , PAN Zheng-mao , MA Chun-sen , DONG Li-li , ZHANG Tao. Classification for Imbalanced Dataset of Improved Weighted KNN Algorithm[J]. Computer Engineering, 2012, 38(20): 160-163
Authors:WANG Chao-xue    PAN Zheng-mao    MA Chun-sen    DONG Li-li    ZHANG Tao
Affiliation:1(1.School of Information and Control Engineering,Xi’an University of Architecture and Technology,Xi’an 710055,China;2.Institute of Plant Protection,Chinese Academy of Agricultural Sciences,Beijing 100193,China)
Abstract:Based on analyzing the shortages of K-Nearest Neighbor(KNN) algorithm in solving classification problems on imbalanced dataset,a novel KNN approach based on weight strategy(GAK-KNN) is presented.The key of GAK-KNN lies on defining a new weight assignment model,which can fully take into account the adverse effects caused by the uneven distribution of training sample between classes and within classes.The specific steps are as follows: use K-means algorithm based on Genetic Algorithm(GA) to cluster the training sample set,compute the weight for each training sample in accordance to the clustering results and weight assignment model,use the improved KNN algorithm to classify the test samples.GAK-KNN algorithm can significantly improve the identification rate of the minority samples and overall classification performance.Theoretical analysis and comprehensive experimental results on the UCI dataset con?rm the claims.
Keywords:imbalanced dataset  classification  K-Nearest Neighbor(KNN) algorithm  weight assignment model  Genetic Algorithm(GA)  K-means algorithm
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号