首页 | 本学科首页   官方微博 | 高级检索  
     

利用自然最近邻的不平衡数据过采样方法
引用本文:孟东霞,李玉鑑.利用自然最近邻的不平衡数据过采样方法[J].计算机工程与应用,2021,57(2):91-96.
作者姓名:孟东霞  李玉鑑
作者单位:1.河北金融学院 金融科技学院,河北 保定 071051 2.桂林电子科技大学 人工智能学院,广西 桂林 541004
基金项目:国家自然科学基金;河北省高校智慧金融应用技术研发中心基金
摘    要:针对现有过采样方法存在的易引入噪声点、合成样本重叠的问题,提出一种基于自然最近邻的不平衡数据过采样方法.确定少数类样本的自然最近邻,每个样本的近邻个数由算法自适应计算生成,反映了样本分布的疏密程度.基于自然近邻关系对少数类样本聚类,由位于同一类簇中密集区域的核心点和稀疏区域的非核心点生成新样本.在二维合成数据集和UCI...

关 键 词:不平衡数据集  过采样  自然最近邻  聚类

Oversampling Method for Unbalanced Data by Natural Nearest Neighbor
MENG Dongxia,LI Yujian.Oversampling Method for Unbalanced Data by Natural Nearest Neighbor[J].Computer Engineering and Applications,2021,57(2):91-96.
Authors:MENG Dongxia  LI Yujian
Affiliation:1.School of Financial Technology, Hebei Finance University, Baoding, Hebei 071051, China 2.School of Artificial Intelligence, Guilin University of Electronic Technology, Guilin, Guangxi 541004, China
Abstract:Aiming at the problem of introducing noise points and synthesizing overlapping samples in existing oversampling methods, this paper proposes an oversampling method based on natural nearest neighbors. The proposed method firstly determines the natural nearest neighbor for minority samples. Each sample’s number of nearest neighbors is generated by adaptive calculation in the algorithm, which reflects the density of distribution. After cluster analysis for minority samples based on relations of natural neighbor, this method generates new samples using core points in dense area and non-core points in sparse area from the same cluster. The comparison experiments on a two-dimensional synthesis dataset and UCI datasets verify the feasibility and effectiveness of this method and improve the classification accuracy of unbalanced data.
Keywords:imbalanced data set  over sampling  natural nearest neighbor  clustering  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程与应用》浏览原始摘要信息
点击此处可从《计算机工程与应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号