首页 | 本学科首页   官方微博 | 高级检索  
     

基于概率矩阵分解的不完整数据集特征选择方法
引用本文:范林歌,武欣嵘,童玮,曾维军. 基于概率矩阵分解的不完整数据集特征选择方法[J]. 计算机工程, 2022, 48(6): 57-64. DOI: 10.19678/j.issn.1000-3428.0061524
作者姓名:范林歌  武欣嵘  童玮  曾维军
作者单位:中国人民解放军陆军工程大学 通信工程学院, 南京 210007
基金项目:国家自然科学基金(61802425);
摘    要:在机器学习理论与应用中,特征选择是降低高维数据特征维度的常用方法之一。传统的特征选择方法多数基于完整数据集,对实际应用中普遍存在缺失数据的情形研究较少。针对不完整数据中含有未被观察信息和存在异常值的特点,提出一种基于概率矩阵分解技术的鲁棒特征选择方法。使用基于分簇的概率矩阵分解模型对数据集中的缺失值进行近似估计,以有效测量相邻簇之间数据的相似性,缩小问题规模,同时降低填充误差。依据缺失数据值存在少量异常值的情形,利用基于l2,1损失函数的方法进行特征选择,在此基础上给出不完整数据集的特征选择方法流程,并对其收敛性进行理论分析。该方法利用不完整数据集中的所有信息,有效应对不完整数据集中异常值带来的影响。实验结果表明,相比传统特征选择方法,该方法在合成数据集上选择更少的无关特征,可降低异常值带来的影响,在真实数据集上获得了较高的分类准确率,能够选择出更为准确的特征。

关 键 词:矩阵分解  缺失值填补  鲁棒特征选择  不完整数据  12  1范数  
收稿时间:2021-04-30
修稿时间:2021-07-08

Feature Selection Method for Incomplete Data Sets Based on Probability Matrix Decomposition
FAN Linge,WU Xinrong,TONG Wei,ZENG Weijun. Feature Selection Method for Incomplete Data Sets Based on Probability Matrix Decomposition[J]. Computer Engineering, 2022, 48(6): 57-64. DOI: 10.19678/j.issn.1000-3428.0061524
Authors:FAN Linge  WU Xinrong  TONG Wei  ZENG Weijun
Affiliation:College of Communications Engineering, Army Engineering University of PLA, Nanjing 210007, China
Abstract:In machine learning theory and application, feature selection is one of the common methods of reducing the feature dimension of high-dimensional data.Traditional feature selection methods are mostly based on complete data sets, and a few studies have been conducted on missing data in practical applications.In this study, a robust feature selection method is proposed based on Probability Matrix Decomposition(PMF) for incomplete data containing unobserved information and outliers.First, a probabilistic matrix decomposition model, based on clustering, is used to approximate the missing values in the data set.The model can effectively measure data similarity between adjacent clusters, reduce the scale of the problem, and reduce the imputation error.Secondly, the feature selection method, based on loss function, is used in the case involving missing data values with a few outliers.Finally, the flow of feature selection method for incomplete data sets is constructed, and its convergence is theoretically analyzed.The method proposed in this studyutilizes all the information in incomplete data sets and effectively deals with the influence of outliers in incomplete data sets.Experimental results show that when compared with traditional feature selection methods, the proposed method can select fewer irrelevant features in the synthetic data set and reduce the influence of outliers.Conversely, on real data sets, the proposed method realizes higher classification accuracy and selects more accurate features.
Keywords:matrix decomposition  missing value filling  Robust Feature Selection(RFS)  incomplete data  l2  1 norm  
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机工程》浏览原始摘要信息
点击此处可从《计算机工程》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号