首页 | 本学科首页   官方微博 | 高级检索  
     

一种解决大规模数据集问题的核主成分分析算法
引用本文:史卫亚,郭跃飞,薛向阳.一种解决大规模数据集问题的核主成分分析算法[J].软件学报,2009,20(8):2153-2159.
作者姓名:史卫亚  郭跃飞  薛向阳
作者单位:复旦大学,计算机科学与技术系,上海,200433
基金项目:Supported by the National High-Tech Research and Development Plan of China under Grant No.2007AA01Z176 (国家高技术研究发展计划(863)); the Key Project of the Ministry of Education of China under Grant No.104075 (国家教育部科学技术研究重点项目); the National Key Technology R&D Program of China under Grant No.2007BAH09B03 (国家科技支撑计划)
摘    要:提出一种大规模数据集求解核主成分的计算方法.首先使用Gram矩阵生成一个Gram-power矩阵,根据线性代数的理论可知,新形成的矩阵和原先的Gram矩阵具有相同的特征向量.因此,可以把Gram矩阵的每一列看成核空间迭代算法的输入样本,这样,无须使用特征分解即可迭代地计算出核主成分.该算法的空间复杂度只有O(m);在大规模数据集的情况下,时间复杂度也降低为O(pkm).实验结果表明了所提出算法的有效性.更为重要的是,在大规模数据集的情况下,当传统的特征分解技术无法使用时,该方法仍然可以提取非线性特征.

关 键 词:核主成分分析  Gram矩阵  大规模数据集  协方差无关  特征分解
收稿时间:2008/4/10 0:00:00
修稿时间:6/3/2008 12:00:00 AM

Efficient Kernel Principal Component Analysis Algorithm for Large-Scale Data Set
SHI Wei-Y,GUO Yue-Fei and XUE Xiang-Yang.Efficient Kernel Principal Component Analysis Algorithm for Large-Scale Data Set[J].Journal of Software,2009,20(8):2153-2159.
Authors:SHI Wei-Y  GUO Yue-Fei and XUE Xiang-Yang
Affiliation:Department of Computer Science and Technology;Fudan University;Shanghai 200433;China
Abstract:A covariance-free method of computing kernel principal components is proposed. First, a matrix, called Gram-power matrix, is constructed with the original Gram matrix. It is proven by the theorem of linear algebra that the eigenvectors of newly constructed matrix are the same as those of the Gram matrix. Therefore, each column of the Gram matrix can be treated as the input sample for the iterative algorithm. Thus, the kernel principle components can be iteratively computed without the eigen-decomposition. The space complexity of the proposed method is only O(m), and the time complexity is reduced to O(pkm). The effectiveness of the proposed method is validated by experimental results. More importantly, it still can be used even if traditional eigen-decomposition technique cannot be applied when faced with the extremely large-scale data set.
Keywords:KPCA (kernel principal component analysis)  Gram matrix  large-scale data set  covariance-free  eigen-decomposition
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《软件学报》浏览原始摘要信息
点击此处可从《软件学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号