A high dimensional regression setting is considered with p predictors X=(X1,...,Xp)T and a response Y. The interest is with large p, possibly much larger than n the number of observations. Three novel methodologies based on Principal Fitted Components models (PFC; Cook, 2007) are presented: (1) Screening by PFC (SPFC) for variable screening when p is excessively large, (2) Prediction by PFC (PPFC), and (3) Sparse PFC (SpPFC) for variable selection.
SPFC uses a test statistic to detect all predictors marginally related to the outcome. We show that SPFC subsumes the Sure Independence Screening of Fan and Lv (2008).
PPFC is a novel methodology for prediction in regression where p can be large or larger than n. PPFC assumes that X|Y has a normal distribution and applies to continuous response variables regardless of their distribution. It yields accuracy in prediction better than current leading methods.
We adapt the Sparse Principal Components Analysis (Zou et al., 2006) to the PFC model to develop SpPFC. SpPFC performs variable selection as good as forward linear model methods like the lasso (Tibshirani, 1996), but moreover, it encompasses cases where the distribution of Y|X is non-normal or the predictors and the response are not linearly related.