Dr. Charles J. Geyer

Persistent link for this collectionhttps://hdl.handle.net/11299/55568

Search within Dr. Charles J. Geyer

Browse

Now showing 1 - 18 of 18

Aster Models for Life History Analysis
(School of Statistics, University of Minnesota, 2005-09-05) Geyer, Charles J; Wagenius, Stuart; Shaw, Ruth G
We present a new class of statistical models designed for life history analysis of plants and animals. They allow joint analysis of data on survival and reproduction over multiple years, allow for variables having different statistical distributions, and correctly account for the dependence of variables on earlier variables (for example, that a dead individual stays dead and cannot reproduce). We illustrate their utility with an analysis of data taken from an experimental study of Echinacea angustifolia sampled from remnant prairie populations in western Minnesota. Statistically, they are graphical models with some resemblance to generalized linear models and survival analysis. They have directed acyclic graphs with nodes having no more than one parent. The conditional distribution of each node given the parent is a one-parameter exponential family with the parent variable the sample size. The model may be heterogeneous, each node having a different exponential family. We show that the joint distribution is a flat exponential family and derive its canonical parameters, Fisher information, and other properties. These models are implemented in an R package "aster" available from CRAN.
Likelihood Ratio Tests and Inequality Constraints
(School of Statistics, University of Minnesota, 1995-12-18) Geyer, Charles J.
In likelihood ratio tests involving inequality-constrained hypotheses, the Neyman-Pearson test based on the least favourable parameter value in a compound null hypothesis can be extremely conservative. The ordinary parametric bootstrap is generally inconsistent and usually too liberal. Two methods of correcting the inconsistency of the parametric bootstrap are proposed: shrinking the constraint set toward the maximum likelihood estimate and superefficient estimation of the active set of constraints. Optimal shrinkage adjustment can be determined using bootstrap calibration. These methods are compared with the double bootstrap, the subsampling bootstrap, Bayes factors, and Bayesian P-values. The Bayesian methods are also too liberal if diffuse priors are used.
Two data sets that are examples for an article titled "Computationally efficient likelihood inference in exponential families when the maximum likelihood estimator does not exist"
(2018-05-22) Eck, Daniel J.; Geyer, Charles J.; http://dx.doi.org/10.1214/08-EJS349
Two data sets, one previously on the web since 2009 at http://www.stat.umn.edu/geyer/gdor/catrec.txt and used as an example in the article "Likelihood inference in exponential families and directions of recession" doi:10.1214/08-EJS349, and the other a new example for a new article (https://arxiv.org/abs/1803.11240). For neither does the maximum likelihood estimator exist in the conventional sense. The new data set is much bigger and takes 4 days of computer time to use the methods of the 2009 article but only seconds with the methods of the new article.
Aster Models with Random Effects and Additive Genetic Variance for Fitness
(2013-07-10) Geyer, Charles J.; Shaw, Ruth G.
This technical report is a minor supplement to the paper Geyer et al. (in press) and its accompanying technical report Geyer et al. (2012). It shows how to move variance components from the canonical parameter scale to the mean value parameter scale. This is useful in estimating additive genetic variance for fitness, and that appears in Fisher's fundamental theorem of natural selection, which predicts the rate of increase in fitness via natural selection.
Aster Models with Random Effects via Penalized Likelihood
(2012-10-09) Geyer, Charles J.; Ridley, Caroline E.; Latta, Robert G.; Etterson, Julie R.; Shaw, Ruth G.
This technical report works out details of approximate maximum likelihood estimation for aster models with random effects. Fixed and random effects are estimated by penalized log likelihood. Variance components are estimated by integrating out the random effects in the Laplace approximation of the complete data likelihood following Breslow and Clayton (1993), which can be done analytically, and maximizing the resulting approximate missing data likelihood. A further approximation treats the second derivative matrix of the cumulant function of the exponential family where it appears in the approximate missing data log likelihood as a constant (not a function of parameters). Then first and second derivatives of the approximate missing data log likelihood can be done analytically. Minus the second derivative matrix of the approximate missing data log likelihood is treated as approximate Fisher information and used to estimate standard errors.
Supplementary Material for the paper "Asymptotics for Constrained Dirichlet Distributions"
(2012-06-25) Geyer, Charles J.; Meeden, Glen
This document is supplementary material for a paper. It shows how to simulate the linear-equality-and-inequality-constrained normal distribution that is the large sample approximation to a similarly constrained Dirichlet posterior.
Computation for the Introduction to MCMC Chapter of Handbook of Markov Chain Monte Carlo
(2010-07-29) Geyer, Charles J.
This technical report does the computation for the "Introduction to MCMC" chapter of Brooks, Gelman, Jones and Meng (forthcoming). All analyses are done in R (R Development Core Team, 2008) using the Sweave function so this entire technical report and all of the analyses reported in it are exactly reproducible by anyone who has R with the mcmc package (Geyer, 2005) installed and the R noweb file specifying the document.
Supplementary Material for "Asymptotics of Maximum Likelihood without the LLN or CLT or Sample Size Going to Infinity"
(University of Minnesota School of Statistics, 2005-05) Geyer, Charles J.
Supplementary material for a paper.
Markov Chain Monte Carlo Maximum Likelihood
(Interface Foundation of North America, 1991) Geyer, Charles J.
Markov chain Monte Carlo (e. g., the Metropolis algorithm and Gibbs sampler) is a general tool for simulation of complex stochastic processes useful in many types of statistical inference. The basics of Markov chain Monte Carlo are reviewed, including choice of algorithms and variance estimation, and some new methods are introduced. The use of Markov chain Monte Carlo for maximum likelihood estimation is explained, and its performance is compared with maximum pseudo likelihood estimation.
Estimating Normalizing Constants and Reweighting Mixtures
(1994-07-09) Geyer, Charles J.
Markov chain Monte Carlo (the Metropolis-Hastings algorithm and the Gibbs sampler) is a general multivariate simulation method that permits sampling from any stochastic process whose density is known up to a constant of proportionality. It has recently received much attention as a method of carrying out Bayesian, likelihood, and frequentist inference in analytically intractable problems. Although many applications of Markov chain Monte Carlo do not need estimation of normalizing constants, three do: calculation of Bayes factors, calculation of likelihoods in the presence of missing data, and importance sampling from mixtures. Here reverse logistic regression is proposed as a solution to the problem of estimating normalizing constants, and convergence and asymptotic normality of the estimates are proved under very weak regularity conditions. Markov chain Monte Carlo is most useful when combined with importance reweighting so that a Monte Carlo sample from one distribution can be used for inference about many distributions. In Bayesian inference, reweighting permits the calculation of posteriors corresponding to a range of priors using a Monte Carlo sample from just one posterior. In likelihood inference, reweighting permits the calculation of the whole likelihood function using a Monte Carlo sample from just one distribution in the model. Given this estimate of the likelihood, a parametric bootstrap calculation of the sampling distribution of the maximum likelihood estimate can be done using just one more Monte Carlo sample. Although reweighting can save much calculation, it does not work well unless the distribution being reweighted places appreciable mass in all regions of interest. Hence it is often not advisable to sample from a distribution in the model. Reweighting a mixture of distributions in the model performs much better, but this cannot be done unless the mixture density is known and this requires knowledge of the normalizing constants, or at least good estimates such as those provided by reverse logistic regression.
A Philosophical Look at Aster Models
(2010-02-03) Geyer, Charles J.
Aster Models and Lande-Arnold Beta (revised)
(2010-01-13) Geyer, Charles J.; Shaw, Ruth G.
Lande and Arnold (1983) proposed an estimate of beta, the directional selection gradient, by ordinary least squares (OLS). Aster models (Geyer, Wagenius and Shaw, 2007; Shaw, Geyer, Wagenius, Hangelbroek, and Etterson, 2008) estimate exactly the same beta, so providing no improvement over the Lande-Arnold method in point estimation of this quantity. Aster models do provide correct confidence intervals, confidence regions, and hypothesis tests for beta; in contrast, such procedures derived from OLS are often invalid because the assumptions for OLS are grossly incorrect. This revision fixes a bug which made the figure incorrect in the original.
Likelihood and Exponential Families
(University of Washington, 1990-06-08) Geyer, Charles J.
A family of probability densities with respect to a positive Borel measure on a finite-dimensional affine space is standard exponential if the log densities are affine functions. The family is convex if the natural parameter set (gradients of the log densities) is convex. In the closure of the family in the topology of pointwise almost everywhere convergence of densities, the maximum likelihood estimate (MLE) exists whenever the supremum of the log likelihood is finite. It is not defective if the family is convex. The MLE is a density in the original family conditioned on some affine subspace (the support of the MLE) which is determined by the "Phase I" algorithm, a sequence of linear programming feasibility problems. Standard methods determine the MLE in the family conditioned on the support ("Phase II"). An extended-real-valued function on an affine space is generalized affine if it is both convex and concave. The space of all generalized affine functions is a compact Hausdorff space, sequentially compact if the carrier is finite-dimensional. A family of probability densities is a standard generalized exponential family if the log densities are generalized affine. The closure of an exponential family is equivalent to a generalized exponential family. When the likelihood of an exponential family cannot calculated exactly, it can sometimes be calculated by Monte Carlo using the Metropolis algorithm or the Gibbs sampler. The Monte Carlo log likelihood (the log likelihood in the exponential family generated by the Monte Carlo empirical distribution) then hypoconverges strongly (almost surely over sample paths) to the true log likelihood. For a closed convex family the Monte Carlo approximants to the MLE and all level sets of the likelihood converge strongly to the truth. For nonconvex families, the outer set limits converge. These methods are demonstrated by an autologistic model for estimation of relatedness from DNA fingerprint data and by isotonic, convex logistic regression for the maternal-age-specific incidence of Down’s syndrome, both constrained MLE problems. Hypothesis tests and confidence intervals are constructed for these models using the iterated parametric bootstrap.
Aster Models and Lande-Arnold Beta
(2010-01-09) Geyer, Charles J.; Shaw, Ruth G.
Lande and Arnold (1983) proposed an estimate of beta, the directional selection gradient, by ordinary least squares (OLS). Aster models (Geyer, Wagenius and Shaw, 2007; Shaw, Geyer, Wagenius, Hangelbroek, and Etterson, 2008) estimate exactly the same beta, so providing no improvement over the Lande-Arnold method in point estimation of this quantity. Aster models do provide correct confidence intervals, confidence regions, and hypothesis tests for beta; in contrast, such procedures derived from OLS are often invalid because the assumptions for OLS are grossly incorrect.
Hypothesis Tests and Confidence Intervals Involving Fitness Landscapes fit by Aster Models
(2010-01-09) Geyer, Charles J.; Shaw, Ruth G.
This technical report explores some issues left open in Technical Reports 669 and 670 (Geyer and Shaw, 2008a,b): for fitness landscapes fit using an aster models, we propose hypothesis tests of whether the landscape has a maximum and confidence regions for the location of the maximum. All analyses are done in R (R Development Core Team, 2008) using the aster contributed package described by Geyer, Wagenius and Shaw (2007) and Shaw, Geyer, Wagenius, Hangelbroek, and Etterson (2008). Furthermore, all analyses are done using the Sweave function in R, so this entire technical report and all of the analyses reported in it are completely reproducible by anyone who has R with the aster package installed and the R noweb file specifying the document. The revision fixes one error in the confidence ellipsoids in Section 4 (a square root was forgotten so the regions in the original were too big).
Model Selection in Estimation of Fitness Landscapes
(School of Statistics, University of Minnesota, 2009-07-06) Geyer, Charles J.; Shaw, Ruth G.
A solution to the problem of estimating fitness landscapes was proposed by Lande and Arnold (1983). Another solution, which avoids problematic aspects of the Lande-Arnold methodology, was proposed by Shaw, Geyer, Wagenius, Hangelbroek, and Etterson (2008), who also provided an illustrative example involving real data. An earlier technical report (Geyer and Shaw, 2008) gave an example that was simpler in some ways (the data are simulated from the aster model so there are no issues making the data fit the model one has with real data) and much more complicated in others (each individual has five measured components of fitness over four time periods, 20 variables in all) and illustrates the full richness possible in aster analysis of fitness landscapes. The one issue that technical report did not deal with is model selection. When many phenotypic variables are measured, one often does not know which to put in the model. Lande and Arnold (1983) proposed using principal components regression as a method of dimension reduction, but this method is known to have no theoretical basis. Much of late 20th century and 21st century statistics is about model selection and model averaging, and we apply some of this methodology (which does have strong theoretical basis) to estimation of fitness landscapes using another simulated data set. All analyses are done in R (R Development Core Team, 2008) using the aster contributed package described by Geyer, Wagenius and Shaw (2007) except for analyses in the style of Lande and Arnold (1983), which use ordinary least squares regression. Furthermore, all analyses are done using the Sweave function in R, so this entire technical report and all of the analyses reported in it are completely reproducible by anyone who has R with the aster package installed and the R noweb file specifying the document. This revision corrects major errors in the frequentist model averaging calculations (Section 8) in the first version of the technical report.
Commentary on Lande-Arnold Analysis
(School of Statistics, University of Minnesota, 2008-05-14) Geyer, Charles J.; Shaw, Ruth G.
A solution to the problem of estimating fitness landscapes was proposed by Lande and Arnold (1983). Another solution, which avoids problematic aspects of the Lande-Arnold methodology, was proposed by Shaw, Geyer, Wagenius, Hangelbroek, and Etterson (2008). This technical report goes through Lande-Arnold theory in detail paying careful attention to problematic aspects. The only completely new material is a theoretical analysis of when the best quadratic approximation to a fitness landscape, which is what the Lande-Arnold method estimates, is a good approximation to the actual fitness landscape.
Supporting Data Analysis for a talk to be given at Evolution 2008 University of Minnesota, June 20-24
(School of Statistics, University of Minnesota, 2008-05-14) Geyer, Charles J.; Shaw, Ruth G.
A solution to the problem of estimating fitness landscapes was proposed by Lande and Arnold (1983). Another solution, which avoids problematic aspects of the Lande-Arnold methodology, was proposed by Shaw, Geyer, Wagenius, Hangelbroek, and Etterson (2008), who also provided an illustrative example. Here we provide another example using simulated data that are more suitable to aster analysis. All analyses are done in R (R Development Core Team, 2008) using the aster contributed package described by Geyer et al. (2007) except for analyses in the style of Lande and Arnold (1983), which use ordinary least squares regression. Furthermore, all analyses are done using the Sweave function in R, so this entire technical report and all of the analyses reported in it are completely reproducible by anyone who has R with the aster package installed and the R noweb file specifying the document.

University Digital Conservancy

University of Minnesota Twin Cities

Browse

Recent Submissions