基于异构距离的集成分类算法研究 Imbalanced heterogeneous data ensemble classification based on HVDM-KNN期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

基于异构距离的集成分类算法研究

引用本文：	张燕,杜红乐. 基于异构距离的集成分类算法研究[J]. 智能系统学报, 2019, 14(4): 733-742. DOI: 10.11992/tis.201807023

作者姓名：	张燕杜红乐

作者单位：	商洛学院数学与计算机应用学院, 陕西商洛 726000

基金项目：	陕西省自然科学基础研究计划项目(2015JM6347);陕西省教育厅科技计划项目(15JK1218);商洛学院科学与技术项目(18sky014);商洛学院科技创新团队建设项目(18SCX002);商洛学院重点学科建设项目,学科名:数学”

摘要：	针对异构数据集下的不均衡分类问题,从数据集重采样、集成学习算法和构建弱分类器3个角度出发,提出一种针对异构不均衡数据集的分类方法HVDM-Adaboost-KNN算法(heterogeneous value difference metric-Adaboost-KNN),该算法首先通过聚类算法对数据集进行均衡处理,获得多个均衡的数据子集,并构建多个子分类器,采用异构距离计算异构数据集中2个样本之间的距离,提高KNN算法的分类准性能,然后用Adaboost算法进行迭代获得最终分类器。用8组UCI数据集来评估算法在不均衡数据集下的分类性能,Adaboost实验结果表明,相比Adaboost等算法,F1值、AUC、G-mean等指标在异构不均衡数据集上的分类性能都有相应的提高。
关键词：	异构数据不均衡数据异构距离集成学习过取样欠取样
Imbalanced heterogeneous data ensemble classification based on HVDM-KNN

ZHANG Yan,DU Hongle. Imbalanced heterogeneous data ensemble classification based on HVDM-KNN[J]. CAAL Transactions on Intelligent Systems, 2019, 14(4): 733-742. DOI: 10.11992/tis.201807023

Authors:	ZHANG Yan DU Hongle

Affiliation:	School of Math and Computer Application, Shangluo University, Shangluo 726000, China

Abstract:	A novel classification method, the heterogeneous value difference metric-Adaboost-KNN (HVDM-Adaboost-KNN), is proposed to achieve data resampling, to obtain an ensemble learning algorithm, and to construct a weak classifier for addressing the imbalanced classification of a heterogeneous dataset. This algorithm initially equalizes the dataset using a clustering algorithm to obtain several equalized data subsets and constructs several sub-classifiers. Further, the heterogeneous distance is used to calculate the distance between two samples in the heterogeneous dataset to improve the classification accuracy of the KNN algorithm. Subsequently, the Adaboost algorithm is used to iteratively obtain the final classifier. Eight groups of UCI datasets are used to evaluate the classification performance of the algorithm in imbalanced datasets. The Adaboost experimental results denote that the classification performance of indices, such as the F1 value, AUC, and G-means, using the heterogeneous imbalanced datasets was better when compared with that exhibited by other algorithms.

Keywords:	heterogeneous data imbalanced data heterogeneous value difference metric ensemble learning over sampling undersampling
本文献已被维普等数据库收录！
	点击此处可从《智能系统学报》浏览原始摘要信息
	点击此处可从《智能系统学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏