Browsing by Subject "Multivariate analysis"
Now showing 1 - 3 of 3
- Results Per Page
- Sort Options
Item Likelihood ratio tests for high-dimensional normal distributions.(2011-12) Yang, FanFor a random sample of size n obtained from p-variate normal distributions, we consider the likelihood ratio tests (LRT) for their means and covariance matrices. Most of these test statistics have been extensively studied in the classical multivariate analysis and their limiting distributions under the null hypothesis were proved to be a Chi-Square distribution under the assumption that n goes to infinity while p remains fixed. In our research, we consider the high-dimensional case where both p and n go to infinity and their ratio p/n converges to a constant y in (0, 1]. We prove that the likelihood ratio test statistics under this assumption will converge in distribution to a normal random variable and we also give the explicit forms of its mean and variance. We run simulation study to show that the likelihood ratio test using this new central limit theorem outperforms the one using the traditional Chi-square approximation for analyzing high-dimensional data.Item Model-based methods for high-dimensional multivariate analysis(2017-04) Molstad, AaronThis thesis consists of three main parts. In the first part, we propose a penalized likelihood method to fit the linear discriminant analysis model when the predictor is matrix valued. We simultaneously estimate the means and the precision matrix, which we assume has a Kronecker product decomposition. Our penalties encourage pairs of response category mean matrix estimators to have equal entries and also encourage zeros in the precision matrix estimator. To compute our estimators, we use a blockwise coordinate descent algorithm. To update the optimization variables corresponding to response category mean matrices, we use an alternating minimization algorithm that takes advantage of the Kronecker structure of the precision matrix. We show that our method can outperform relevant competitors in classification, even when our modeling assumptions are violated. We analyze an EEG dataset to demonstrate our method's interpretability and classification accuracy. In the second part, we propose a class of estimators of the multivariate response linear regression coefficient matrix that exploits the assumption that the response and predictors have a joint multivariate normal distribution. This allows us to indirectly estimate the regression coefficient matrix through shrinkage estimation of the parameters of the inverse regression, or the conditional distribution of the predictors given the responses. We establish a convergence rate bound for estimators in our class and we study two examples, which respectively assume that the inverse regression's coefficient matrix is sparse and rank deficient. These estimators do not require that the forward regression coefficient matrix is sparse or has small Frobenius norm. Using simulation studies, we show that our estimators outperform competitors. In the final part of this thesis, we propose a framework to shrink a user-specified characteristic of a precision matrix estimator that is needed to fit a predictive model. Estimators in our framework minimize the Gaussian negative log-likelihood plus an L1 penalty on a linear or affine function evaluated at the optimization variable corresponding to the precision matrix. We establish convergence rate bounds for these estimators and we propose an alternating direction method of multipliers algorithm for their computation. Our simulation studies show that our estimators can perform better than competitors when they are used to fit predictive models. In particular, we illustrate cases where our precision matrix estimators perform worse at estimating the population precision matrix while performing better at prediction.Item Statistical Modeling and Testing for Joint Association in Genome-Wide Association Studies(2015-07) Ray, DebashreeMost common human diseases are complex genetic traits, with multiple genetic and environmental components contributing to the disease susceptibility. Genome-wide Association Studies (GWASs) offer a powerful approach to identify the genetic variants (single nucleotide polymorphisms or SNPs) that modulate the susceptibility to these complex diseases. GWASs have identified hundreds of SNPs associated with such diseases, but these SNPs appear to explain very little of the genetic risk. This dissertation aims at investigating several alternative hypotheses for explaining the disease risk and develop statistical techniques to improve the power to detect SNPs influencing such diseases. A Bayesian dimension reduction model is developed to study the joint effect of a group of SNPs on the disease status for unrelated individuals. Modeling the joint effects of multiple SNPs can help in the detection of SNPs that jointly have significant risk effects but individually make only a small contribution. Thus, our method based on the proposed dimension reduction model, Bayesian partitioning model (BPM), may have improved power over multiple single-SNP association analysis when testing the association of multiple SNPs with a single binary trait. Similarly, joint analysis of multiple disease-related traits may also improve detection of SNPs associated with a disease. GWASs often collect data on multiple disease-related traits. These traits may share a common set of SNPs influencing them and a joint analysis of these traits may improve the power to detect these SNPs which may provide a better understanding of the underlying disease mechanism. Multivariate analysis of variance (MANOVA) can perform such an association analysis at a GWAS level. The behavior of MANOVA is investigated, both theoretically and using simulations, and the conditions where MANOVA loses power are derived. Based on these findings, a novel unified score-based association test (USAT) is proposed that adaptively uses the data to optimize power to detect association of a single SNP with multiple quantitative phenotypes/traits. This test and other such multivariate tests are based on the assumption of random sampling, and may suffer from severely inflated type I error in case of selected sampling. This motivated us to explore scenarios in which popular methods would fail to provide valid tests of the null hypothesis of no association of a single SNP with multiple traits within the framework of a case-control study. Two alternative hypothesis testing approaches (one based on maximum p-value and the other based on propensity score) are proposed for such scenarios.