首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于极大熵的快速无监督线性降维方法
引用本文:王继奎,杨正国,刘学文,易纪海,李冰,聂飞平.一种基于极大熵的快速无监督线性降维方法[J].软件学报,2023,34(4):1779-1795.
作者姓名:王继奎  杨正国  刘学文  易纪海  李冰  聂飞平
作者单位:兰州财经大学 信息工程学院, 甘肃 兰州 730020;西北工业大学 光学影像分析与学习中心, 陕西 西安 710072
基金项目:国家自然科学基金 (61772427, 11801345); 甘肃省高等学校创新能力提升项目(2019B-97); 兰州财经大学校级重点项目(Lzufe2020B-0010, Lzufe2020B-011)
摘    要:现实世界中高维数据无处不在,然而在高维数据中往往存在大量的冗余和噪声信息,这导致很多传统聚类算法在对高维数据聚类时不能获得很好的性能.实践中发现高维数据的类簇结构往往嵌入在较低维的子空间中.因而,降维成为挖掘高维数据类簇结构的关键技术.在众多降维方法中,基于图的降维方法是研究的热点.然而,大部分基于图的降维算法存在以下两个问题:(1)需要计算或者学习邻接图,计算复杂度高;(2)降维的过程中没有考虑降维后的用途.针对这两个问题,提出一种基于极大熵的快速无监督降维算法MEDR. MEDR算法融合线性投影和极大熵聚类模型,通过一种有效的迭代优化算法寻找高维数据嵌入在低维子空间的潜在最优类簇结构. MEDR算法不需事先输入邻接图,具有样本个数的线性时间复杂度.在真实数据集上的实验结果表明,与传统的降维方法相比, MEDR算法能够找到更好地将高维数据投影到低维子空间的投影矩阵,使投影后的数据有利于聚类.

关 键 词:无监督学习  线性降维  邻接图  聚类  极大熵
收稿时间:2021/2/22 0:00:00
修稿时间:2021/5/19 0:00:00

Fast Unsupervised Dimension Reduction Method Based on Maximum Entropy
WANG Ji-Kui,YANG Zheng-Guo,LIU Xue-Wen,YI Ji-Hai,LI Bing,NIE Fei-Ping.Fast Unsupervised Dimension Reduction Method Based on Maximum Entropy[J].Journal of Software,2023,34(4):1779-1795.
Authors:WANG Ji-Kui  YANG Zheng-Guo  LIU Xue-Wen  YI Ji-Hai  LI Bing  NIE Fei-Ping
Affiliation:College of Information Engineering, Lanzhou University of Finance and Economics, Lanzhou 730020, China; Center for Optical Imagery Analysis and Learning (OPTIMAL), Northwestern Polytechnical University, Xi''an 710072, China
Abstract:High-dimensional data is widely adopted in the real world. However, there is usually plenty of redundant and noisy information existing in high-dimensional data, which accounts for the poor performance of many traditional clustering algorithms when clustering high-dimensional data. In practice, it is found that the cluster structure of high-dimensional data is often embedded in the lower dimensional subspace. Therefore, dimension reduction becomes the key technology of mining high-dimensional data. Among many dimension reduction methods, graph-based method becomes a research hotspot. However, most graph-based dimension reduction algorithms suffer from the following two problems: (1) most of the graph-based dimension reduction algorithms need to calculate or learn adjacency graphs, which have high computational complexity; (2) the purpose of dimension reduction is not considered in the process of dimension reduction. To address the problem, a fast unsupervised dimension reduction algorithm is proposed based on the maximum entropy-MEDR, which combines linear projection and the maximum entropy clustering model to find the potential optimal cluster structure of high-dimensional data embedded in low-dimensional subspace through an effective iterative optimization algorithm. The MEDR algorithm does not need the adjacency graph as an input in advance, and has linear time complexity of input data scale. A large number of experimental results on real datasets show that the MEDR algorithm can find a better projection matrix to project high-dimensional data into low-dimensional subspace compared with the traditional dimensionality reduction method, so that the projected data is conducive to clustering analysis.
Keywords:unsupervised learning  dimension reduction  adjacency graph  clustering  maximum entropy
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号