首页 | 本学科首页   官方微博 | 高级检索  
     

面向高维数据的安全半监督分类算法
引用本文:赵建华,刘宁.面向高维数据的安全半监督分类算法[J].计算机系统应用,2019,28(5):178-184.
作者姓名:赵建华  刘宁
作者单位:商洛学院 数学与计算机应用学院,商洛,726000;商洛学院 经济管理学院,商洛,726000
基金项目:陕西省自然科学基础研究计划(2015JM6347);商洛学院科研项目(14SKY026);商洛学院科技创新团队建设项目(18SCX002);商洛学院重点学科建设项目(学科名:数学)
摘    要:半监督学习过程中,由于无标记样本的随机选择造成分类器性能降低及不稳定性的情况经常发生;同时,面对仅包含少量有标记样本的高维数据的分类问题,传统的半监督学习算法效果不是很理想.为了解决这些问题,本文从探索数据样本空间和特征空间两个角度出发,提出一种结合随机子空间技术和集成技术的安全半监督学习算法(A safe semi-supervised learning algorithm combining stochastic subspace technology and ensemble technology,S3LSE),处理仅包含极少量有标记样本的高维数据分类问题.首先,S3LSE采用随机子空间技术将高维数据集分解为B个特征子集,并根据样本间的隐含信息对每个特征子集优化,形成B个最优特征子集;接着,将每个最优特征子集抽样形成G个样本子集,在每个样本子集中使用安全的样本标记方法扩充有标记样本,生成G个分类器,并对G个分类器进行集成;然后,对B个最优特征子集生成的B个集成分类器再次进行集成,实现高维数据的分类.最后,使用高维数据集模拟半监督学习过程进行实验,实验结果表明S3LSE具有较好的性能.

关 键 词:高维数据  半监督学习  随机子空间  集成技术  分类
收稿时间:2018/12/2 0:00:00
修稿时间:2018/12/25 0:00:00

Safe Semi-supervised Classification Algorithm for High Dimensional Data
ZHAO Jian-Hua and LIU Ning.Safe Semi-supervised Classification Algorithm for High Dimensional Data[J].Computer Systems& Applications,2019,28(5):178-184.
Authors:ZHAO Jian-Hua and LIU Ning
Affiliation:College of mathematics and computer application, Shangluo University, Shangluo 726000, China and Faculty of Economics and Management, Shangluo University, Shangluo 726000, China
Abstract:In the semi-supervised learning process, the performance of the classifier is often degraded and unstable due to the random selection of unlabeled samples. At the same time, the performance of the traditional semi-supervised learning algorithm is not sufficient for the classification problem of high-dimensional data containing only a small number of labeled samples. In order to solve these problems, this study proposes a safe semi-supervised learning algorithm S3LSE, which combines stochastic subspace technology with ensemble technology from the perspective of exploring data sample space and feature space. Firstly, S3LSE decomposes the high-dimensional data set into B feature subsets using random subspace technique, and optimizes each feature subset according to the implicit information among the samples to form B optimal feature subsets. Then, each optimal feature subset is sampled to form G sample subsets, and a safe sample marking method is used in each sample subset. The learning algorithm generates G classifiers and integrates G classifiers, and then integrates B classifiers generated by B optimal feature subsets to realize the classification of high-dimensional data. Finally, a high dimensional data set is used to simulate semi-supervised learning and the experiment result shows that the algorithm has better performance.
Keywords:high dimensional data  semi-supervised learning  stochastic subspace  ensemble technology  classification
本文献已被 万方数据 等数据库收录!
点击此处可从《计算机系统应用》浏览原始摘要信息
点击此处可从《计算机系统应用》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号