面向K最近邻分类的遗传实例选择算法 Genetic instance selection algorithm for K-nearest neighbor classifier期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

面向K最近邻分类的遗传实例选择算法

引用本文：	黄宇扬,董明刚,敬超.面向K最近邻分类的遗传实例选择算法[J].计算机应用,2018,38(11):3112-3118.

作者姓名：	黄宇扬董明刚敬超

作者单位：	1. 桂林理工大学信息科学与工程学院, 桂林 541004;2. 广西嵌入式技术与智能系统重点实验室(桂林理工大学), 桂林 541004

基金项目：	国家自然科学基金资助项目（61563012，61203109）；广西自然科学基金资助项目（2014GXNSFAA118371，2015GXNSFBA139260）；广西嵌入式技术与智能系统重点实验室基金资助项目。

摘要：	针对传统的实例选择算法会误删训练集中非噪声样本、算法效率低的不足，提出了一种面向K最近邻（KNN）的遗传实例选择算法。该算法采用基于决策树和遗传算法的二阶段筛选机制，先使用决策树确定噪声样本存在的范围；再使用遗传算法在该范围内精确删除噪声样本，可有效地降低误删率并提高效率，采用基于最近邻规则的验证集选择策略，进一步提高了遗传算法实例选择的准确度；最后引进基于均方误差（MSE）的分类精度惩罚函数来计算遗传算法中个体的适应度，提高有效性和稳定性。在20个数据集上，该方法相较于基于预分类的KNN （PRKNN）、基于协同进化的实例特征选择算法（IFS-CoCo）、K最近邻（KNN），在分类精度上的提升分别为0.07~26.9个百分点、0.03~11.8个百分点、0.2~12.64个百分点，在AUC和Kappa的上的提升分别为0.25~18.32个百分点、1.27~23.29个百分点、0.04~12.82个百分点。实验结果表明，该方法相较于当前实例选择算法在分类精度和分类效率上均具有优势。
关键词：	K最近邻遗传算法决策树实例选择噪声样本机器学习
收稿时间：	2018-04-30
修稿时间：	2018-06-21
Genetic instance selection algorithm for K-nearest neighbor classifier

HUANG Yuyang,DONG Minggang,JING Chao.Genetic instance selection algorithm for K-nearest neighbor classifier[J].journal of Computer Applications,2018,38(11):3112-3118.

Authors:	HUANG Yuyang DONG Minggang JING Chao

Affiliation:	1. College of Information Science and Engineering, Guilin University of Technology, Guilin Guangxi 541004, China;2. Guangxi Key Laboratory of Embedded Technology and Intelligent System(Guilin University of Technology), Guilin Guangxi 541004, China

Abstract:	Traditional instance selection algorithms may remove non-noise samples by mistake and have low algorithm efficiency. For this issue, a genetic instance selection algorithm for K-Nearest Neighbor (KNN) classifier was proposed. A two-stage selection mechanism based on decision tree and genetic algorithm was used in the algorithm. Firstly, the decision tree was used to determine the range of noise samples. Then, the genetic algorithm was used to remove the noise samples in this range precisely, which could reduce the risk of mistaken remove effectively and improve the algorithm efficiency. Secondly, the 1NN-based selection strategy of validation set was proposed to improve the instance selection accuracy of the genetic algorithm. Finally, the MSE (Mean Squared Error)-based objective function was used as the fitness function, which could improve the effectiveness and stability of the algorithm. Compared with PRe-classification based KNN (PRKNN), Instance and Feature Selection based on Cooperative Coevolution (IFS-CoCo), K-Nearest Neighbors (KNN), the improvement in classification accuracy is 0.07 to 26.9 percentage points, 0.03 to 11.8 percentage points and 0.2 to 12.64 percentage points respectively, the improvement in AUC (Area Under Curve) and Kappa is 0.25 to 18.32 percentage points, 1.27 to 23.29 percentage points, and 0.04 to 12.82 percentage points respectively. The experimental results show that the proposed method has advantages in terms of classification accuracy and classification efficiency.

Keywords:	K-Nearest Neighbor (KNN) genetic algorithm decision tree instance selection noise sample machine learning

	点击此处可从《计算机应用》浏览原始摘要信息
	点击此处可从《计算机应用》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏