Challenges in the Analysis of Data with Clustering or Correlation

Loading...
Thumbnail Image

Persistent link to this item

Statistics
View Statistics

Journal Title

Journal ISSN

Volume Title

Title

Challenges in the Analysis of Data with Clustering or Correlation

Published Date

2015-08

Publisher

Type

Thesis or Dissertation

Abstract

We examine several challenges associated with the analysis of clustered or correlated data. First, multiple maxima can occur in posterior distributions or likelihoods for mixed linear models, though they are sparsely noted in the literature. For those problems with covariance structures that are diagonalisable in a specific sense, we present the restricted likelihood as a generalised linear model with gamma errors, identity link and a prior distribution on the error variance. The generalised linear model portion of the restricted likelihood can be made to conflict with the portion of the restricted likelihood that functions like a prior distribution on the error variance, and in doing so we demonstrate that multiple maxima can occur in the restricted likelihood as well. Adding an explicit conjugate prior distribution to variance parameters permits a second local maximum in the marginal posterior distribution even if the likelihood contribution has a single maximum. Moreover, reparameterisation from variance to precision can change the posterior modality; the converse also is true. Modelers should beware of these potential pitfalls when selecting prior distributions or using peak-finding algorithms to estimate parameters. Second, Gaussian copula regression models provide a flexible, intuitive framework in which to model dependent responses with a variety of marginal distributions. The time required to compute the likelihood directly grows exponentially with sample size. Instead, we conduct inference for Gaussian copula regression models of non-continuous outcomes using three distinct approaches in a Bayesian setting: the continuous extension, the distributional transform, and the composite likelihood. The latter two include curvature correction. Using frequentist methods, we evaluate the inference resulting from these three approaches for several types of non-continuous data. We consider the computational performance as well. In most cases, the distributional transform with curvature correction has acceptable but considerably faster performance, making it attractive for evaluating models of mutually dependent non-continuous responses. Finally, we extend the Gaussian copula regression model to ordinal outcomes. The setting is femoroacetabular impingement (FAI), a condition in which subtle deformities of the femoral head and acetabulum (hip socket) result in pathologic abutment during hip motion. We apply a proportional odds model to relate T2* mapping data (an investigational, objective MRI sequence) to Beck's scale of cartilage damage, with adjustment for spatial proximity of responses. Each hip in the study is assigned its own parameter to describe the degree of association among the spatially grouped responses. A Dirichlet process allows clustering of these spatial association parameters. Using the fitted model, a patient-specific map of the entire acetabular cartilage displays gradations of cartilage quality with six colors representing the six grades of cartilage quality on Beck's scale. Such a map will facilitate patient education and clinical decision-making.

Description

University of Minnesota Ph.D. dissertation. August 2015. Major: Biostatistics. Advisors: James Hodges, John Hughes. 1 computer file (PDF); x, 87 pages.

Related to

Replaces

License

Collections

Series/Report Number

Funding information

Isbn identifier

Doi identifier

Previously Published Citation

Suggested citation

Henn, Lisa. (2015). Challenges in the Analysis of Data with Clustering or Correlation. Retrieved from the University Digital Conservancy, https://hdl.handle.net/11299/175537.

Content distributed via the University Digital Conservancy may be subject to additional license and use restrictions applied by the depositor. By using these files, users agree to the Terms of Use. Materials in the UDC may contain content that is disturbing and/or harmful. For more information, please see our statement on harmful content in digital repositories.