首页 | 本学科首页   官方微博 | 高级检索  
     


Noise-free principal component analysis: An efficient dimension reduction technique for high dimensional molecular data
Affiliation:1. Department of Computer Science, Tarbiat Modares University, P.O.Box 14115-175, Tehran, Iran;2. Department of Pediatric Oncology, Erasmus Medical Center, Rotterdam, The Netherlands;1. School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China;2. State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China;3. Department of Computing, The Hong Kong Polytechnic University, Hong Kong;1. Product Lifecycle Management Research Lab, Department of Industrial and Manufacturing Systems Engineering, Faculty of Engineering, University of Windsor, Windsor, Canada;2. Department of Industrial Engineering, Faculty of Engineering, University of Kharazmi, Karaj, Iran;1. Electrical and Computer Engineering Department, University of Miami, Coral Gables, FL 33146, United States;2. Evelyn F. McKnight Brain Institute, University of Miami, Miller School of Medicine, Miami, FL 33136, United States;1. Indian Institute of Technology Banaras Hindu University, Varanasi 221005, Uttar Pradesh, India;2. Malaviya National Institute of Technology, Jaipur 302017, Rajasthan, India
Abstract:Principal component analysis (PCA) is one of the powerful dimension reduction techniques widely used in data mining field. PCA tries to project the data into lower dimensional space while preserving the intrinsic information hidden in the data as much as possible. Disadvantage of PCA is that, extracted principal components (PCs) are linear combination of all features, hence PCs are may still contaminated with noise in the data. To address this problem we propose a modified version of PCA called noise free PCA (NFPCA), in which regularization is introduced during the PCs extraction step to mitigate the effect of noise. Potentials of the proposed method is assessed in two important application of high-dimensional molecular data: classification and survival prediction. Multiple publicly available real-world data sets are used for this illustration. Experimental results show that, the NFPCA produce highly informative than the ordinary PCA method. This is largely due to the fact that the NFPCA suppress the effect of noise in the PCs more efficiently with minimum information lost. The NFPCA is a promising alternative to existing PCA approaches not only in terms of highly informative PCs, but also its relatively cheap computational cost.
Keywords:PCA  Regularization  High-dimensional data analysis  Classification
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号