Misspecification of the covariance matrix in the linear mixed model: a Monte Carlo simulation

LeBeau, Brandon C.2013-04-012013-04-012013-02https://hdl.handle.net/11299/146916University of Minnesota Ph.D. dissertation. February 2013. Major: Educational Psychology. Advisor: Michael Harwell. 1 computer file (PDF); xii, 143 pages.The linear mixed model has become a popular method for analyzing longitudinal and cross sectional data due to its ability to overcome many of the limitations found using classical methods such as repeated measures analysis of variance or multivariate analysis of variance. Although the linear mixed model allows for flexible modeling of clustered data, the simulation research literature is not nearly as extensive as classical methods. This current study looks to add to this literature and the statistical properties associated with the linear mixed model under longitudinal data conditions. Historically when using the linear mixed model to analyze longitudinal data, researchers have allowed the random effects to solely account for the dependency due to repeated measurements. This dependency arises in this case, from repeated measurements on the same individual and measurements taken closer in time would be more correlated than measurements taken further apart in time. If measurements are taken close in time (i.e. every hour, daily, weekly, etc.), the random effects alone may not adequately account for the dependency due to repeated measurements. In this case serial correlation may be present and need to be modeled. Previous simulation work exploring the effects of misspecification of serial correlation have shown that the fixed effects tend to be unbiased, however evidence of bias show up in the variance of the random components of the model. In addition, some evidence of bias was found in the standard errors of the fixed effects. These simulation studies were done with all other model conditions being "perfect," including normally distributed random effects and larger sample size. The current simulation study looks to generalize to a wider variety of data conditions. The current simulation study used a factorial design with four simulation conditions manipulated. These included: covariance structure, random effect distribution, number of subjects, and number of measurement occasions. Relative bias of the fixed and random components were explored descriptively and inferentially. In addition, the type I error rate was explored to examine any impact the simulation conditions had on the robustness of hypothesis testing. A second smaller study was also conducted that explicitly misspecified the random slope for time to see if serial correlation could overcome the misspecification of that random effect. Results for the larger simulation study found no bias in the fixed effects. There was however evidence of bias in the random components of the model. The fitted and generated serial correlation structures as well as their interaction explained significant variation in the bias of the random components. The largest amounts of bias were found when the fitted structure was underspecified as independent. Type I error rates for the five fixed effects were just over 0.05, with many around 0.06. Many of the simulation conditions explained significant variation in the empirical type I error rates. Study two again found no bias in the fixed effects. Just as in study one, bias was found in the random components and the fitted and generated serial correlation structures as well as the interaction between the two explaining significant variation in the relative bias statistics. Of most concern were the severely inflated type I error rates for the fixed effects associated with the slope terms. The average type I error rate was on average twice what would be expected and ranged as high as 0.25. The fitted serial correlation structure and the interaction between the fitted and generated serial correlation structure explained significant variation in these terms. More specifically, when the serial correlation was underspecified as independent in conjunction with a missing random effect for time, the type I error rate can become severely inflated. Serial correlation does not appear to bias the fixed effects, therefore if point estimates are all that are desired serial correlation does not need to be modeled. However, if estimates of the random components or inference are concerned care needs to be taken to at least include serial correlation in the model when it is found in the data. In addition, if serial correlation is present and the model is misspecified without the random effect for time serious distortions of the empirical type I error rate occur. This would lead to rejecting many more true null hypotheses which would make conclusions extremely uncertain.en-USCovariance MatrixLinear Mixed ModelLongitudinal DataMonte CarloSerial CorrelationSimulationMisspecification of the covariance matrix in the linear mixed model: a Monte Carlo simulationThesis or Dissertation