Dealing with missing values and outliers in principal component analysis

Talanta. 2007 Apr 15;72(1):172-8. doi: 10.1016/j.talanta.2006.10.011. Epub 2006 Nov 7.


An efficient methodology for dealing with missing values and outlying observations simultaneously in principal component analysis (PCA) is proposed. The concept described in the paper consists of using a robust technique to obtain robust principal components combined with the expectation maximization approach to process data with missing elements. It is shown that the proposed strategy works well for highly contaminated data containing different amounts of missing elements. The authors come to this conclusion on the basis of the results obtained from a simulation study and from analysis of a real environmental data set.