首页 | 本学科首页   官方微博 | 高级检索  
     

基于最大决策边界的高维类不平衡数据在线流特征选择
引用本文:林耀进,陈祥焰,白盛兴,王晨曦.基于最大决策边界的高维类不平衡数据在线流特征选择[J].模式识别与人工智能,2020,33(9):820-829.
作者姓名:林耀进  陈祥焰  白盛兴  王晨曦
作者单位:1.闽南师范大学 计算机学院 漳州 363000
2.闽南师范大学 数据科学与智能应用福建省高等学校重点实验室 漳州 363000
基金项目:国家自然科学基金;福建省自然科学基金;福建省自然科学基金;福建省教育厅科技项目
摘    要:数据的特征空间常随时间动态变化,而训练样本的数量固定不变,数据的特征空间在呈现超高维特点的同时通常伴随决策空间的类别不平衡问题.对此,文中提出基于最大决策边界的高维类不平衡数据在线流特征选择算法.借助邻域粗糙集模型,在充分考虑边界样本影响的基础上, 定义自适应邻域关系,设计基于最大决策边界的粗糙依赖度计算公式.同时,提出三种在线特征子集评估指标,用于选择在大类和小类之间具有强区分能力的特征.在 11 个高维类不平衡数据集上的实验表明,在相同的实验环境及特征数量下,文中算法综合性能较优.

关 键 词:在线特征选择  高维类不平衡数据  自适应邻域  邻域粗糙集  
收稿时间:2020-07-01

Online Streaming Feature Selection for High-Dimensional and Class-Imbalanced Data Based on Max-Decision Boundary
LIN Yaojin,CHEN Xiangyan,BAI Shengxing,WANG Chenxi.Online Streaming Feature Selection for High-Dimensional and Class-Imbalanced Data Based on Max-Decision Boundary[J].Pattern Recognition and Artificial Intelligence,2020,33(9):820-829.
Authors:LIN Yaojin  CHEN Xiangyan  BAI Shengxing  WANG Chenxi
Affiliation:1. School of Computer Science and Engineering, Minnan Normal University, Zhangzhou 363000
2. Key Laboratory of Data Science and Intelligence Application, The Education Department of Fujian Province, Minnan Normal University, Zhangzhou 363000
Abstract:The feature space of data changes with time dynamically. The number of features on training data is high-dimensional and fixed, and the label space is imbalanced. Motivated by the above, an online streaming feature selection algorithm for high-dimensional and class-imbalanced data based on max-decision boundary is proposed. An adaptive neighborhood relation is defined with consideration of the effect of boundary samples based on neighborhood rough set, and then a rough dependency calculation formula with respect to max-decision boundary is designed. Meanwhile, three online feature subset evaluation metrics are proposed to select features with great discriminability in majority and minority classes. Experiments on eleven high-dimensional and class-imbalanced datasets indicate that the proposed method achieves better performance than some state-of-the-art online streaming feature selection algorithms.
Keywords:Online Feature Selection  High-Dimensional and Class-Imbalanced Data  Adaptive Neighborhood  Neighborhood Rough Set  
本文献已被 万方数据 等数据库收录!
点击此处可从《模式识别与人工智能》浏览原始摘要信息
点击此处可从《模式识别与人工智能》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号