#' --- #' title: "Supplementary Appendix A: Linear Regression Example" #' output: html_document #' --- #' ## Objective #' #' This simulation example demonstrates how to conduct a permutation-based #' test for a partial regression coefficient in a multiple linear regression #' model. #' #' ## Document Preamble #+ docPreamble, warning=FALSE, message=FALSE # Load Libraries library(knitr) library(mosaic) library(ggplot2) library(MASS) # Set knitr options opts_chunk$set(fig.width = 6, fig.height=5) # Clear Environment (optional) remove(list=ls()) # Set seed set.seed(314159) #' ## Simulation Example #' #' Here we will consider a simple simulation where a response variable, y, #' is related to two predictor variables, x1 and x2. The predictors are #' themselves correlated. We will illustrate a simple permutation-based #' test for the effect of x1, adjusted for x2. #' #' Steps: #' #' 1. Fit a linear regression model relating x1 to x2. #' 2. Add the residuals from this model to the original data set. #' 3. Create the permutation distribution by shuffling these residuals. #' 4. Determine the p-value by comparing the t-statistic from the #' fit to the original data set to the permutation-based distribution #' of this same statistic. #' #' Simulation parameters #' #' - Sigma (variance/covariance matrix of x1 and x2). #' - We will assume mean of x1 and x2 =0 #' - Beta = vector of regression parameters (with intercept=0) #' Sigma <- matrix(c(10,3,3,2),2,2) Beta <- c(0.2, -0.5) #' Create correlated predictors X<- mvrnorm(n = 100, rep(0, 2), Sigma) cor(X) #' Form response variables y<-X%*%Beta+rnorm(100,0,2) Mydata<-data.frame(y=y, x1=X[,1], x2=X[,2]) #' Fit regression model to the data lmsim<-lm(y~x1+x2, data=Mydata) summary(lmsim) #' Step 1: capture the part of x1 that is not related to x2 lm1<-lm(x1~x2, data=Mydata) Mydata<-Mydata %>% mutate(x1resid=lm1$resid) #' Demonstrate that using the residuals here results in the same #' coefficient, standard error, t-statistic and p-value for x1 #' as in our original regression (lmsim) lmsim2<-lm(y~x1resid+x2, data=Mydata) summary(lmsim2) #' Store the t-statistic for x1 from this model (tstat<-summary(lmsim)$coefficients[2,3]) #' Step 2: create the permutation distribution randsims<-do(10000)*{ lmrand<-lm(y~shuffle(x1resid)+x2, data=Mydata) summary(lmrand)$coefficients[2,3] } head(randsims) #' Plot the randomization distribution with our original statistic histogram(~result, data=randsims, v=tstat, col="gray") #' Determine our p-value prop(~I(abs(result)>=tstat), data=randsims) #' ## Conclusions #' #' The permutation-based approach allows us to relax the Normality assumption. #' Our randomization-based p-value is really similar to the p-value #' of the original t-test. This result is not surprising given that #' the assumptions of linear regression (constant variance, normality, #' linearity) all hold in the simulation example. #' #' ## Document footer #' #' Session Information: #+ sessionInfo sessionInfo()