首页 | 本学科首页   官方微博 | 高级检索  
     


Missing data methods in PCA and PLS: Score calculations with incomplete observations
Affiliation:1. Department of Electrical Power Systems, Kaunas University of Technology, Studentu 50, LT-51368 Kaunas, Lithuania;2. Department of Information Systems, Kaunas University of Technology, Studentu 50, LT-51368 Kaunas, Lithuania;3. Intelligent Systems Laboratory, Centre for Applied Intelligent Systems Research, Halmstad University, Kristian IV:s väg 3, PO Box 823, S-301 18 Halmstad, Sweden;4. Marine Science and Technology Centre, Klaipeda University, Herkaus Manto 84, LT-92294 Klaipeda, Lithuania;5. Department of Marine Research, Environmental Protection Agency, Taikos Av. 26, LT-91144 Klaipeda, Lithuania
Abstract:A very important problem in industrial applications of PCA and PLS models, such as process modelling or monitoring, is the estimation of scores when the observation vector has missing measurements. The alternative of suspending the application until all measurements are available is usually unacceptable. The problem treated in this work is that of estimating scores from an existing PCA or PLS model when new observation vectors are incomplete. Building the model with incomplete observations is not treated here, although the analysis given in this paper provides considerable insight into this problem. Several methods for estimating scores from data with missing measurements are presented, and analysed: a method, termed single component projection, derived from the NIPALS algorithm for model building with missing data; a method of projection to the model plane; and data replacement by the conditional mean. Expressions are developed for the error in the scores calculated by each method. The error analysis is illustrated using simulated data sets designed to highlight problem situations. A larger industrial data set is also used to compare the approaches. In general, all the methods perform reasonable well with moderate amounts of missing data (up to 20% of the measurements). However, in extreme cases where critical combinations of measurements are missing, the conditional mean replacement method is generally superior to the other approaches.
Keywords:
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号