首页 | 本学科首页   官方微博 | 高级检索  
     

面向K最近邻分类的遗传实例选择算法
引用本文:黄宇扬,董明刚,敬超.面向K最近邻分类的遗传实例选择算法[J].计算机应用,2018,38(11):3112-3118.
作者姓名:黄宇扬  董明刚  敬超
作者单位:1. 桂林理工大学 信息科学与工程学院, 桂林 541004;2. 广西嵌入式技术与智能系统重点实验室(桂林理工大学), 桂林 541004
基金项目:国家自然科学基金资助项目(61563012,61203109);广西自然科学基金资助项目(2014GXNSFAA118371,2015GXNSFBA139260);广西嵌入式技术与智能系统重点实验室基金资助项目。
摘    要:针对传统的实例选择算法会误删训练集中非噪声样本、算法效率低的不足,提出了一种面向K最近邻(KNN)的遗传实例选择算法。该算法采用基于决策树和遗传算法的二阶段筛选机制,先使用决策树确定噪声样本存在的范围;再使用遗传算法在该范围内精确删除噪声样本,可有效地降低误删率并提高效率,采用基于最近邻规则的验证集选择策略,进一步提高了遗传算法实例选择的准确度;最后引进基于均方误差(MSE)的分类精度惩罚函数来计算遗传算法中个体的适应度,提高有效性和稳定性。在20个数据集上,该方法相较于基于预分类的KNN (PRKNN)、基于协同进化的实例特征选择算法(IFS-CoCo)、K最近邻(KNN),在分类精度上的提升分别为0.07~26.9个百分点、0.03~11.8个百分点、0.2~12.64个百分点,在AUC和Kappa的上的提升分别为0.25~18.32个百分点、1.27~23.29个百分点、0.04~12.82个百分点。实验结果表明,该方法相较于当前实例选择算法在分类精度和分类效率上均具有优势。

关 键 词:K最近邻  遗传算法  决策树  实例选择  噪声样本  机器学习  
收稿时间:2018-04-30
修稿时间:2018-06-21

Genetic instance selection algorithm for K-nearest neighbor classifier
HUANG Yuyang,DONG Minggang,JING Chao.Genetic instance selection algorithm for K-nearest neighbor classifier[J].journal of Computer Applications,2018,38(11):3112-3118.
Authors:HUANG Yuyang  DONG Minggang  JING Chao
Affiliation:1. College of Information Science and Engineering, Guilin University of Technology, Guilin Guangxi 541004, China;2. Guangxi Key Laboratory of Embedded Technology and Intelligent System(Guilin University of Technology), Guilin Guangxi 541004, China
Abstract:Traditional instance selection algorithms may remove non-noise samples by mistake and have low algorithm efficiency. For this issue, a genetic instance selection algorithm for K-Nearest Neighbor (KNN) classifier was proposed. A two-stage selection mechanism based on decision tree and genetic algorithm was used in the algorithm. Firstly, the decision tree was used to determine the range of noise samples. Then, the genetic algorithm was used to remove the noise samples in this range precisely, which could reduce the risk of mistaken remove effectively and improve the algorithm efficiency. Secondly, the 1NN-based selection strategy of validation set was proposed to improve the instance selection accuracy of the genetic algorithm. Finally, the MSE (Mean Squared Error)-based objective function was used as the fitness function, which could improve the effectiveness and stability of the algorithm. Compared with PRe-classification based KNN (PRKNN), Instance and Feature Selection based on Cooperative Coevolution (IFS-CoCo), K-Nearest Neighbors (KNN), the improvement in classification accuracy is 0.07 to 26.9 percentage points, 0.03 to 11.8 percentage points and 0.2 to 12.64 percentage points respectively, the improvement in AUC (Area Under Curve) and Kappa is 0.25 to 18.32 percentage points, 1.27 to 23.29 percentage points, and 0.04 to 12.82 percentage points respectively. The experimental results show that the proposed method has advantages in terms of classification accuracy and classification efficiency.
Keywords:K-Nearest Neighbor (KNN)  genetic algorithm  decision tree  instance selection  noise sample  machine learning  
点击此处可从《计算机应用》浏览原始摘要信息
点击此处可从《计算机应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号