Fisher Example for Le Cam Made Simple
Charles J. Geyer
May 20, 2005
1 Introduction
1.1 Model
We do, rather inefficiently, an example from quantitative genetics, using
a model originally proposed by Fisher (1918) that directly began modern
quantitative genetics and indirectly led to much of modern regression and
analysis of variance. The data y are multivariate normal, decomposed as
y = µ+ b+ e
where µ is an unknown parameter vector (the mean of y) and b and e are
independent multivariate normal
b ∼ N (0, σ2A)
e ∼ N (0, τ2I)
and σ2 and τ2 are unknown parameters called the additive genetic and en-
vironmental variance, respectively, A is a known matrix called the numera-
tor relationship matrix in the animal breeding literature (Henderson, 1976;
Quaas, 1976), and I is the identity matrix. We will say more about A in
Section 1.3.
1.2 Pedigree
First we simulate a pedigree.
> set.seed(42)
> ngen <- 20
> popsize <- 200
> obsgen <- seq(11, 20)
1
> trips <- matrix(0, 0, 5)
> dimnames(trips) <- list(NULL, c("ind", "pa", "ma", "sex", "gen"))
> for (i in 1:ngen) {
+ if (i == 1) {
+ pas <- numeric(0)
+ mas <- numeric(0)
+ }
+ else {
+ pas <- trips[, "sex"] == 1 & trips[, "gen"] == (i - 1)
+ mas <- trips[, "sex"] == 2 & trips[, "gen"] == (i - 1)
+ pas <- seq(along = pas)[pas]
+ mas <- seq(along = mas)[mas]
+ if (length(pas) == 0 & length(mas) == 0)
+ stop("one or the other sex missing")
+ }
+ for (j in 1:popsize) {
+ sex <- sample(2:1, 1)
+ if (i == 1) {
+ pa <- 0
+ ma <- 0
+ }
+ else {
+ if (length(pas) == 1)
+ pa <- pas
+ else pa <- sample(pas, 1)
+ if (length(mas) == 1)
+ ma <- mas
+ else ma <- sample(mas, 1)
+ }
+ ind <- nrow(trips) + 1
+ trips <- rbind(trips, c(ind, pa, ma, sex, i))
+ }
+ }
> trips <- as.data.frame(trips)
> write.table(trips, file = "pedigree.txt", row.names = FALSE)
> rm(i, ind, j, ma, pa, mas, pas, sex)
2
1.3 Numerator Relationship Matrix
Then we calculate the numerator relationship matrix. From Henderson
(1976, Section 3.1) we find that the elements aij can be calculated recursively
if, as here, parents come before children in the pedigree.
(i) If the i-th individual is a founder (parents labeled zero), then
aii = 1
aij = aji = 0, j < i
(ii) If the i-th individual is not a founder (parents nonzero, say p and m,
but less than i), then
aii = 1 + 12apm
aij = aji = 12(ajp + ajm), j < i
> imat <- diag(nrow(trips))
> rmat <- imat
> founder <- trips$pa == 0
> for (ind in 1:nrow(trips)) if (!founder[ind]) {
+ pa <- trips[ind, "pa"]
+ ma <- trips[ind, "ma"]
+ for (j in 1:(ind - 1)) {
+ rmat[ind, j] <- rmat[j, ind] <- (rmat[j, pa] + rmat[j,
+ ma])/2
+ }
+ rmat[ind, ind] <- 1 + rmat[ma, pa]/2
+ }
> rm(founder, ind, j, ma, pa)
1.4 Data
And then we simulate some data. Let us suppose we only have data on
the bottom half of the pedigree.
> inies <- is.element(trips$gen, obsgen)
> imat <- imat[inies, inies]
> rmat <- rmat[inies, inies]
> range(rmat[lower.tri(rmat)])
[1] 0.02201080 0.80165141
3
Note that everybody on which we have data is related to everybody else
(relatedness greater than zero).
> sig <- tau <- 1
> mu <- 5
> library(MASS)
> zerovec <- rep(0, sum(inies))
> g <- mvrnorm(mu = zerovec, Sigma = sig^2 * rmat)
> e <- mvrnorm(mu = zerovec, Sigma = tau^2 * imat)
> y <- mu + g + e
> yout <- rep(NA, length(inies))
> yout[inies] <- y
> write.table(data.frame(y = yout), file = "data.txt", row.names = FALSE)
> rm(e, g, inies, yout, zerovec)
2 Variogram Estimation
The original plan for this note was that it would make no contribution to
the quantitative genetics literature, just use existing methodology, but our
need for a “good” starting point for Newton’s method prompts the following
new idea, which is a “technology transfer” from spatial statistics (not the
first such, MCMC for linkage being another). The estimator proposed here
is much better than, for example, the estimator of equations (2) and (3)
of Thomas et al. (2000). One may consider our proposal the obvious fix of
theirs from a statistician knowledgeable about kriging.
In spatial statistics, the term variogram refers to the variance of the dif-
ference of correlated quantities as a function of position. The semivariogram
is just half the variogram. Here the semivariogram is the function from (i, j)
pairs to
1
2E{(yi − yj)2} = 12(aii + ajj − 2aij)σ2 + τ2 (1a)
This suggests a regression
regress 12(yi − yj)2 on 12(aii + ajj − 2aij) (1b)
call the“response”here the empirical semivariogram and the“predictor”here
the theoretical additive genetic semivariogram. The “slope” in the regression
estimates σ2 and the “intercept” estimates τ2.
But the “response” is not suitable material for ordinary least squares
(OLS) regression. For one thing, its components are not independent, but,
following the usual variogram estimation idea, we ignore that. But worse,
4
its components are positive and highly skewed. They more resemble scaled
chi2(1) random variables, than normal random variables. Thus we use a
gamma generalized linear model (GLM) with identity link.
> foo <- outer(y, y, "-")^2/2
> foo <- foo[lower.tri(foo)]
> bar <- outer(diag(rmat), diag(rmat), "+")/2 - rmat
> bar <- bar[lower.tri(bar)]
> gout <- glm(foo ~ bar, family = Gamma("identity"), mustart = rep(1,
+ length(foo)))
> summary(gout)
Call:
glm(formula = foo ~ bar, family = Gamma("identity"), mustart = rep(1,
length(foo)))
Deviance Residuals:
Min 1Q Median 3Q Max
-7.7396 -1.6578 -0.6855 0.3028 6.0265
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.46191 0.04759 30.72 <2e-16 ***
bar 0.61748 0.04932 12.52 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Gamma family taken to be 1.939356)
Null deviance: 5033847 on 1998999 degrees of freedom
Residual deviance: 5033618 on 1998998 degrees of freedom
AIC: 6118894
Number of Fisher Scoring iterations: 4
> foof <- round(foo/max(foo) * 499)
> barf <- round(bar/max(bar) * 499)
> quxf <- 1000 * foof + barf
> i <- match(sort(unique(quxf)), quxf)
Figure 1 (page 6) shows the scatter plot of “empirical semivariogram”versus
5
l lll lllllll llll l l lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l ll l l l l l ll llllll l lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll lll l l l l l ll lllll l lll l l ll ll l llll l lllll lll l l lll ll lll l l lllll lllllllllllllllllll lll ll llllllll
ll llllllllllllllll l l l l ll l l lll l l l ll l l lllllllll l l l l ll l llll lllll lll l l llllll l l ll l l ll lll llll lllll lll l l ll lll lll ll l lllllllll llll l ll l l l l ll l lll ll ll lll l ll llllll l l l l llll l l l lllllll lll lllllllllllll lllll lllllllll l
l lllllllll ll ll ll l l l l l lllll l lll l l l lllllll ll lll l ll l l l ll llllll l llll l l llll lll lllllllll ll llllll llllllll llll llllllllll ll l ll lll l ll l l l l lll lll ll l l l l ll llll ll l l ll ll l l ll lll l l lll lll llll l llll l ll l ll llllll lll lllll l l
l ll lll l l lll llll ll ll ll ll ll l l lll l l ll l lllll l l l l l l lll l llll ll ll l l ll ll ll l l l lll l l lll l llll llll llll lll l l llllllll l lllll lll lll llll lll ll l l ll ll lll ll lll ll l l ll l ll ll ll l l ll l ll lll l ll l l ll lll l ll l l ll ll llll lllll ll lll ll lll llllll l ll
l ll l l ll l ll lll lll l ll ll llllll lll lll l ll l l llllll ll ll l l ll lll l l lll l l lllllllll llll ll lll l lll l ll ll l l l l lll lllll ll lll lll l ll lllll ll lll lll l l lll l l lllll l lll ll lll l l l ll l ll lll l l l l lllllllll ll ll ll l ll lll l ll l l ll
l llll l lll l l l llll l l llllllll llll l ll l l l ll l ll l l lll l lll ll l ll llll l l l ll ll l ll ll ll lll l l llll llll l ll ll lll lll l l ll l l llll lllll ll l ll lllllll llll l l ll l l llll ll l lll l ll lll ll ll ll l llll lll llll lllllll lllllll l llll l
ll lll l l l l ll lll l l ll ll ll l ll ll lll l ll l l ll l l llll l l l ll lll l ll lllll ll ll l l l llll ll ll lllll lllll l lll ll l lll ll l l ll ll l l lll l lll lll l ll lllll l ll l l l ll l l ll ll ll l l ll ll l ll l ll l lll l llll ll llllll ll lll l lll lll
ll ll l l l l lll lll ll llll ll lll l l ll ll l l l ll ll l l ll l l llllll l l l l ll l l llll l l l lll l lllll l l l llllllll l ll llll llllllll ll lll l l l l llllll ll l l l l l l lll llll ll ll l ll l l l ll ll l llll l l lllll llll ll l
l l lll ll l llllllll l ll lll l llll ll l l l lll l l l ll l ll lll lll l lll ll l ll l l l llll ll lll ll l l lll llll l l l ll l l l l llll l l l ll lllll lll lll lllll ll l l l l l ll lll llll ll ll l l l ll l l lll lll llll lll l l l l l l l
lll l l l l l lll llll ll lll llllllll lll lll ll lll l l l lll l ll ll ll lll l ll l l ll ll l l l l lll lll llll lllll l l l l lll l ll l l l l ll l l l l lllllll lll lll l l lllll l ll ll l ll ll llll l ll l lll l l ll lll l l l l l llll l lllllll l l l ll
l l ll l l l ll lllllllllll llll lll ll ll lll l ll ll l llll lll lll ll ll l ll l ll l l l l lllll llll l ll l l ll l l l l ll l ll ll l l l l llllllll llll l ll lll l ll l lllll llll ll ll ll l ll l ll l l ll l l l lll ll l l ll
l l l ll l l llll lll l lllll l ll llll ll ll l ll l ll ll l ll l l l ll ll ll lll l l lll l l lll ll l l llll l llll lll l lllll l lll l ll l llll lll ll l ll ll lll ll l lll l l lll l lllll ll l l llll l l ll l ll ll l lll ll ll
ll l llll l l l llll llllll l l ll ll l lll l lll lll ll l l ll l l ll l ll l l l ll l l l l llll l lllllll l l l l ll ll ll l lll lllllll lll lll l lll l l llll ll l lllll ll ll l l lll lll l ll l l llllll ll ll lll l ll ll
l l l l l l lll lll lll l lll ll ll lll l l ll llll l l l l l lll l l lll llll l l l llll lll l lll l ll ll l ll l l ll llll lllll lll l lll lll llll l l l ll lllll l l l l l l llll ll l l l l l llll lll l llll lll ll
l l ll llll llll l ll ll lll l ll lllll ll l ll l l lll ll lll l l l l l lll ll ll l lllll l ll ll llllll l ll lll l lll l l l l ll l lllll l l ll l l lll l lll l l l l l lll lll ll lll ll lll l l
ll l lll llll ll l ll lll l l lll ll llllll l l l l llllllll l ll l l l l l l lllll lll ll l l l ll l lll l l l l ll l lllll l lllll l ll l l l l l l l ll l lll l ll l l l lll lll ll ll lll l l ll lll llll
l ll l llll ll lllll llllll l lll l l l l llllll l lll l l ll lll ll lll l llllll lllll l l l l ll l llllll l llllll l ll l l ll l ll lll l l llll l ll l lll l llll lllll ll ll llllll
ll l l l lll l l l ll l l lll l l lllllllll ll ll ll ll l lll ll lll l ll l ll llllllll l lll llllll l l l l ll l l ll l ll l l ll ll l l l ll l lll ll lllll l l l lllll ll l
l l l l l ll l lllll lll ll lll ll l ll ll l l l llllll lll l ll l lll ll l l lll llll lllll lll l lll l ll llllll l l l ll ll l ll l l l l ll l lll l ll l l lll llll llllll llll
ll ll lllll l l ll lll l ll ll l lll ll l l l lll l ll l l l lll l lll lll ll llll l l ll ll llll l l ll llll l lll l l ll l l ll l l ll l l ll ll l ll l lll ll ll l l lll l
ll l l llll l ll ll llll ll l l l l ll ll l ll l l l l lll lll lll l l ll l lll lll lll lll l l ll l ll l l ll ll llll ll l l l lllll ll llllll l l
ll llll ll ll l l l l llll ll lll l l ll ll ll lll l lll l l l l l lll ll l l l l lll llll l l l l l ll l llll l l l ll llll lllll ll llll ll ll ll ll l l l ll
lll llll l l ll lll lll l ll l lll l l l l ll ll l l ll llll ll l l l llll l l ll l l ll l lll llll llll l l l ll ll ll l l l l l ll ll l l l l ll llll l l l l l
l l l l l l l l l l ll ll l l ll ll llllllllllll l ll lllll lllll lll l l l lll l l ll l l ll l l l l llll lllllllll ll l l l l l l ll l lllll l llll l ll l ll
l lll l l ll ll lllll llll llllll lll l l ll l ll l ll l ll l l l ll ll l l l l ll l l ll l ll lll lllll ll l l ll l l ll ll l
l l lll l ll ll l l ll ll lll l lllllll ll l ll l l l ll lll lll l ll ll ll ll l ll ll llll l l llllllll l ll l l
l l ll l ll ll ll l ll l l l lll l lll lll ll l l lll ll l l l ll l ll l l ll l lll l l l llllll lllll l ll ll l l ll l lll
ll lll l l l lll lll lll llllllll l lll l l l ll l lll l l llll l l lll lll l lll llllllll l ll ll l ll
l ll llll ll l ll ll l l l llllll l l l lllll lll l lllll ll lll ll ll ll l l ll l ll l lllll l l l l
l l ll l l l lll l lll l ll lll ll llll lll l llllll ll l l ll l lll ll l ll l ll l l llll lllll lll
l l ll llll ll l l ll l l ll ll lll lll ll llll ll ll ll l lll lllllll ll lll ll l l l lll ll ll
ll l l ll ll l ll l ll lllll l ll l ll ll l l ll l l l ll l lllllll l l ll l l ll l l l lll
ll l l lll l l ll l l llll lll ll lllll ll l ll l ll l lllll l l lll l llll ll l lll lll l l l l ll
l l l llll lll ll ll ll l l lll ll ll ll ll ll l ll l l lll l llll llll l ll ll l l l lll l ll l l
l l ll l l l lll l ll llll l l lllll ll l l l l ll ll ll l l l ll lll l ll lllllll l l l
l l llllllll ll l lll l l l l lllllllllll lllll ll l l ll l
lllllllll l ll ll l l l lll ll llll ll l ll l l l ll lll l
l l l llllllll l l ll l l ll lll l ll lll l lllll l l ll ll l
lll llllll llll l ll l l ll lll l l l l llll lllll ll lll
l l ll lll lll ll lllll ll l ll ll lll ll l ll lll l l l ll
l l lll l ll l l l ll l ll lllllll ll l l ll llllll lllll
l l llllll ll l ll lll lll ll l l ll lll l
l l l l ll l l lll ll ll ll l lll ll l l ll l lll
l lllll ll ll lll lll l lll ll lll ll ll
ll lllll lllll llll l ll l l l llll l ll ll lll ll l
lllll llll l lll ll ll l lllll l ll ll l
llllll l ll lll lll l llll ll
lll l l lll ll lll l llll llll lll l
l l lll ll ll lll ll l llll
l l lllll ll ll l llllll l
l l l ll ll l lll
l ll lll lllll llllll
l ll ll ll ll ll
l l lll lll ll l ll
lll lllllll l l
ll ll l llll llll ll
l l lll lll ll
l l ll ll ll
l lll llll
l lll
l lll ll ll
ll ll
ll ll
lll l
l ll
ll l
ll
ll
l
l
0.4 0.6 0.8 1.0 1.2
0
10
20
30
40
theoretical genetic semivariogram
e
m
pi
ric
al
 s
em
iva
rio
gr
am
Figure 1: Semivariogram Estimation. Line is the gamma GLM with identity
link regression line.
6
“theoretical additive genetic semivariogram” along with its fitted (gamma
GLM with identity link) regression line.
3 Maximum Likelihood
3.1 Log Likelihood and Score
Now the log likelihood for these data is
l = −12(y − µ)V −1(y − µ)− 12 log detV (2)
where
V = σ2A+ τ2I
and the score is
∂l
∂µ
= u′V −1(y − µ) (3a)
∂l
∂θ
= +12(y − µ)V −1
∂V
∂θ
V −1(y − µ)− 12 tr
(
∂V
∂θ
V −1
)
(3b)
where u is the vector with all components 1, where θ is either σ2 or τ2,
the partial derivatives in (3b) being either A or I, respectively, and where
tr denotes the trace of a matrix. Direct naive use of (2), (3a), and (3b)
to find MLE is highly inefficient. This problem has been much studied
(see Johnson and Thompson, 1995, and references therein) and efficient free
implementations are available (e. g., Shaw and Shaw, 1994). We should
also note that generally REML (Patterson and Thompson, 1971) rather
than plain maximum likelihood is generally used to estimate these variance
components. Nevertheless, we make just such naive use in this example (no
high-quality implementation being available for R).
> logl <- function(theta, y) {
+ if (length(theta) != 3)
+ stop("length(theta) != 3")
+ mu <- theta[1]
+ sigsq <- theta[2]
+ tausq <- theta[3]
+ vmat <- sigsq * rmat + tausq * imat
+ r <- y - mu
+ logl <- -(1/2) * sum(r * solve(vmat, r)) - (1/2) * determinant(vmat,
+ logarithm = TRUE)$modulus
7
+ return(logl)
+ }
> score <- function(theta, y) {
+ if (length(theta) != 3)
+ stop("length(theta) != 3")
+ mu <- theta[1]
+ sigsq <- theta[2]
+ tausq <- theta[3]
+ vmat <- sigsq * rmat + tausq * imat
+ r <- y - mu
+ result <- double(3)
+ foo <- solve(vmat, r)
+ result[1] <- sum(foo)
+ vinv <- solve(vmat)
+ result[2] <- (1/2) * t(foo) %*% rmat %*% foo - (1/2) * sum(vinv *
+ rmat)
+ result[3] <- (1/2) * t(foo) %*% imat %*% foo - (1/2) * sum(vinv *
+ imat)
+ return(result)
+ }
3.2 Maximum Likelihood
A try at maximum likelihood
> theta.start <- as.numeric(c(mean(y), coef(gout)))
> lower <- c(-Inf, 0, 0)
> out <- optim(theta.start, logl, score, lower = lower, method = "L-BFGS-B",
+ control = list(fnscale = -length(y)), y = y)
> print(out)
$par
[1] 5.136033 1.049012 1.012568
$value
[1] -1554.177
$counts
function gradient
12 12
8
$convergence
[1] 0
$message
[1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
> theta.hat <- out$par
> mu.hat <- theta.hat[1]
> sig.hat <- sqrt(theta.hat[2])
> tau.hat <- sqrt(theta.hat[3])
> rm(foo, bar, foof, barf, quxf, i, gout)
3.3 Observed Fisher Information
For further likelihood analysis we also need the Hessian
∂2l
∂µ2
= −u′V −1u (4a)
∂2l
∂θ∂θ′
= −(y − µ)V −1∂V
∂θ
V −1
∂V
∂θ′
V −1(y − µ) (4b)
+ 12 tr
(
∂V
∂θ
V −1
∂V
∂θ′
V −1
)
(4c)
∂2l
∂µ∂θ
= −u′V −1∂V
∂θ
V −1(y − µ) (4d)
> info <- function(theta, y) {
+ if (length(theta) != 3)
+ stop("length(theta) != 3")
+ mu <- theta[1]
+ sigsq <- theta[2]
+ tausq <- theta[3]
+ vmat <- sigsq * rmat + tausq * imat
+ r <- y - mu
+ result <- matrix(NA, 3, 3)
+ foo <- solve(vmat, r)
+ vinv <- solve(vmat)
+ result[1, 1] <- sum(vinv)
+ result[1, 2] <- sum(vinv %*% rmat %*% foo)
+ result[1, 3] <- sum(vinv %*% imat %*% foo)
+ result[2, 1] <- result[1, 2]
+ result[3, 1] <- result[1, 3]
9
+ bar <- rmat %*% vinv %*% rmat
+ result[2, 2] <- t(foo) %*% bar %*% foo - (1/2) * sum(vinv *
+ bar)
+ bar <- imat %*% vinv %*% imat
+ result[3, 3] <- t(foo) %*% bar %*% foo - (1/2) * sum(vinv *
+ bar)
+ bar <- rmat %*% vinv %*% imat
+ result[2, 3] <- t(foo) %*% bar %*% foo - (1/2) * sum(vinv *
+ bar)
+ result[3, 2] <- result[2, 3]
+ return(result)
+ }
3.4 One-Step Newton
> theta.one <- theta.start + solve(info(theta.start, y = y), score(theta.start,
+ y = y))
> rbind(theta.start, theta.one, theta.hat)
[,1] [,2] [,3]
theta.start 5.077667 1.461912 0.6174799
theta.one 5.149669 1.105019 0.8854382
theta.hat 5.136033 1.049012 1.0125677
4 Parametric Bootstrap
4.1 Data
Great! So now we try a simple parametric bootstrap
> nboot <- 100
> yboot <- matrix(NA, nboot, length(y))
> for (iboot in 1:nboot) {
+ zerovec <- rep(0, length(y))
+ gstar <- mvrnorm(mu = zerovec, Sigma = sig.hat^2 * rmat)
+ estar <- mvrnorm(mu = zerovec, Sigma = tau.hat^2 * imat)
+ yboot[iboot, ] <- mu.hat + gstar + estar
+ }
4.2 Semivariogram Estimation
First we bundle up our semivariogram estimation as a procedure
10
> semivariogram <- function(y) {
+ foo <- outer(y, y, "-")^2/2
+ foo <- foo[lower.tri(foo)]
+ bar <- outer(diag(rmat), diag(rmat), "+")/2 - rmat
+ bar <- bar[lower.tri(bar)]
+ gout <- glm(foo ~ bar, family = Gamma("identity"), mustart = rep(1,
+ length(foo)))
+ qux <- coef(gout)
+ if (qux[1] < 0) {
+ cat("genetic variance component estimate negative -- fixing up\n")
+ gout <- glm(foo ~ 1, family = Gamma("identity"), mustart = rep(1,
+ length(foo)))
+ qux <- c(0, coef(gout))
+ }
+ else if (qux[2] < 0) {
+ cat("environmental variance component estimate negative -- fixing up\n")
+ gout <- glm(foo ~ bar + 0, family = Gamma("identity"),
+ mustart = rep(1, length(foo)))
+ qux <- c(coef(gout), 0)
+ }
+ return(as.numeric(c(mean(y), qux)))
+ }
> theta.start.boot <- matrix(NA, nboot, 3)
> for (iboot in 1:nboot) {
+ theta.start.boot[iboot, ] <- semivariogram(yboot[iboot, ])
+ }
environmental variance component estimate negative -- fixing up
environmental variance component estimate negative -- fixing up
environmental variance component estimate negative -- fixing up
environmental variance component estimate negative -- fixing up
4.3 Maximum Likelihood
> theta.star <- matrix(NA, nboot, 3)
> conv.star <- rep(NA, nboot)
> for (iboot in 1:nboot) {
+ out <- optim(theta.start.boot[iboot, ], logl, score, lower = lower,
+ method = "L-BFGS-B", control = list(fnscale = -length(y)),
+ y = yboot[iboot, ])
11
ll
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
0.7 0.8 0.9 1.0 1.1 1.2 1.3
0.
90
0.
95
1.
00
1.
05
1.
10
1.
15
bootstrap genetic variance
bo
ot
st
ra
p 
en
vir
on
m
en
ta
l v
ar
ia
nc
e
Figure 2: Parametric Bootstrap Distribution of MLE for Variance Compo-
nents. Square is the MLE for the original data. Dots are MLE for bootstrap
data.
+ theta.star[iboot, ] <- out$par
+ conv.star[iboot] <- out$convergence
+ }
> all(conv.star == 0)
[1] TRUE
The mean parameter µ is just a nuisance parameter. Let’s look at the
joint distribution of the other two. The distribution (Figure 2) seems fairly
normal. The number of individuals n = 2000 does seem “large” but all of
data yi are highly correlated, there is no independence or stationarity that
we can use to justify an appeal to the central limit theorem.
12
4.4 One-Step Newton
> theta.one.boot <- matrix(NA, nboot, 3)
> for (iboot in 1:nboot) {
+ foo <- theta.start.boot[iboot, ]
+ bar <- yboot[iboot, ]
+ H <- info(foo, y = bar)
+ g <- score(foo, y = bar)
+ q1 <- foo + solve(H, g)
+ if (any(q1[2:3] < 0)) {
+ cat("variance component estimate negative -- fixing up\n")
+ fred <- function(delta) {
+ baz <- delta - foo
+ as.numeric(t(baz) %*% g - (1/2) * t(baz) %*% H %*%
+ baz)
+ }
+ sally <- function(delta) {
+ baz <- delta - foo
+ as.numeric(g - H %*% baz)
+ }
+ lower <- c(-Inf, 0, 0)
+ qout <- optim(foo, fred, sally, lower = lower, method = "L-BFGS-B",
+ control = list(fnscale = -1))
+ q1 <- qout$par
+ }
+ theta.one.boot[iboot, ] <- q1
+ }
Let us just consider how much progress Newton makes in one step.
Figure 3 (page 14) should show the one-step Newton estimator nearly con-
verging in one step. We can see from these plots that Newton does nearly
converge in one step for a majority of the bootstrap random data sets, but
not for all (as would be the case if we were really in “asymptopia”).
4.5 Confidence Interval for Heritability
We make a confidence interval for heritability first by putting it on the
log scale, getting first a confidence interval for φ = log σ2 − log τ2. Us-
ing the delta method or the formula for change-of-parameter with Fisher
13
ll
l
l
l
l
l
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l ll
l
l
l
l
l
ll
l
l
l l
l
ll
l
l
l
l l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
ll
l
l l
l
−0.2 0.0 0.1 0.2
−
0.
05
0.
00
0.
05
fixed effect
start error
o
n
e
 s
te
p 
er
ro
r
l
ll lll
l
l
l
lll
lll
l
l
l
ll ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll ll
l
l
l
ll l
l
l l
−0.5 0.0 0.5 1.0
0.
0
0.
4
0.
8
1.
2
additive genetic variance
start error
o
n
e
 s
te
p 
er
ro
r
l
l
ll l
l
l
l
l
l l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
lll
l
l
l
l
l
ll
−1.0 −0.5 0.0 0.5
−
0.
8
−
0.
4
0.
0
environmental variance
start error
o
n
e
 s
te
p 
er
ro
r
Figure 3: Error of starting point for Newton’s method (semivariogram es-
timator) versus error of one-step estimator, where “error” is difference from
MLE (infinite-step Newton estimator).
14
information, we obtain
I(θ)22
σ4
+
I(θ)33
τ4
− 2I(θ)
23
σ2τ2
for the asymptotic variance of φ, where I(θ)ij is the (i, j) component of
inverse (observed) Fisher information.
> phi.hat <- log(theta.hat[2]) - log(theta.hat[3])
> phi.star <- log(theta.star[, 2]) - log(theta.star[, 3])
> se.phi.star <- double(nboot)
> for (iboot in 1:nboot) {
+ foo <- theta.star[iboot, ]
+ bar <- yboot[iboot, ]
+ qux <- solve(info(foo, y = bar))
+ fred <- qux[2, 2]/foo[2]^2 + qux[3, 3]/foo[3]^2 - 2 * qux[2,
+ 3]/(foo[2] * foo[3])
+ se.phi.star[iboot] <- sqrt(fred)
+ }
> z.phi.star <- (phi.star - phi.hat)/se.phi.star
> mean(phi.star)
[1] 0.01381908
> median(phi.star)
[1] 0.01316959
Figure 4 (page 16) shows the the distribution of the“standardized”bootstrap
estimator of logit heritability. If we were in asymptopia, the points would lie
on the line. That they lie below the line shows we have some bias. Figure 5
(page 17) shows a histogram of the bootstrap distribution of logit heritability
φ∗. The histogram is clearly skewed. It is both mean biased and median
biased
> mean(phi.star < phi.hat)
[1] 0.59
> mean(phi.star - phi.hat)
[1] -0.02154033
15
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
ll
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
l
−2 −1 0 1 2
−
3
−
2
−
1
0
1
2
Normal Q−Q Plot
Theoretical Quantiles
Sa
m
pl
e 
Qu
an
tile
s
Figure 4: Q-Q Plot of Bootstrap Distribution of Pivot for Logit Heritability.
16
Histogram of phi.star
phi.star
Fr
eq
ue
nc
y
−0.4 −0.2 0.0 0.2 0.4
0
5
10
15
20
25
Figure 5: Histogram of Bootstrap Distribution of Logit Heritability. Dashed
line shows original MLE.
17
So, again, we see that we are not in asymptopia. Figure 5 looks fairly normal,
but not quite normal enough. A simple asymptotic 95% confidence interval
for φ is
> qux <- solve(info(theta.hat, y = y))
> fred <- qux[2, 2]/theta.hat[2]^2 + qux[3, 3]/theta.hat[3]^2 -
+ 2 * qux[2, 3]/(theta.hat[2] * theta.hat[3])
> se.phi.hat <- sqrt(fred)
> phi.hat + c(-1, 1) * qnorm(0.975) * se.phi.hat
[1] -0.2565746 0.3272934
whereas a 95% bootstrap-t confidence interval is
> crit <- as.numeric(quantile(z.phi.star, probs = c(0.025, 0.975)))
> phi.hat - rev(crit) * se.phi.hat
[1] -0.2330632 0.3493724
that they are not the same says a (single) parametric bootstrap is not point-
less. It does provide nontrivial correction of the asymptotic interval.
5 Discussion
Admittedly, in this toy example, we have used a simulated pedigree
structure that does have some stationarity (discrete generations and random
mating), but that sort of stationarity assumption would not be appropriate
for real data, even supposing we could base a traditional “n goes to infin-
ity” story on it. Miller (1977) shows how difficult it is to make up such
stories and prove theorems about them when the variance components are
for highly structured designed experiments. It is not clear that there can be
any analogous theory for quantitative genetics on general pedigrees, much
less any theory relevant to actual applications.
The theory of “Le Cam Made Simple” applies without strain. We see
that our problem is on the borderline of asymptopia. Asymptotics sort
of works, but not perfectly. If we were to increase the pedigree size, we
would, of course, eventually arrive in asymptopia. We do not provide such
an example, because we think this “borderline” example, which illustrates
what can go wrong and some (by no means all) ways to detect failure of
asymptotics, is more informative than an example where asymptotics works
perfectly.
18
Since all of the code is in the file fisher.Rnw. Any reader can do their
own example. If you want to see what happens with larger or smaller pedi-
gree sizes, just change popsize or ngen or obsgen in the first code chunk.
References
Fisher, R. A. (1918). The correlation between relatives on the supposition
of Mendelian inheritance. Trans. R. Soc. Edinburgh 52 399–433.
Henderson, C. R. (1976). A simple method for computing the inverse
of a numerator relationship matrix used in prediction of breeding values.
Biometrics 32 69–83.
Johnson, D. L. and Thompson, R. (1995). Restricted maximum like-
lihood estimation of variance components for univariate animal models
using sparse matrix techniques and average information. J. Dairy Sci. 78
449–456.
Miller, J. J. (1977). Asymptotic properties of maximum likelihood es-
timates in the mixed model of the analysis of variance. Ann. Statist. 5
746–762.
Patterson, H. D. and Thompson, R. (1971). Recovery of inter-block
information when block sizes are unequal. Biometrika 58 545–554.
Quaas, R. L. (1976). Computing the diagonal elements and inverse of a
large numerator relationship matrix. Biometrics 32 949–953.
Shaw, R. G. and Shaw, F. H. (1994). Quercus: programs for quantitative
genetic analysis using maximum likelihood. http://biosci.cbs.umn.
edu/eeb/quercus.html.
Thomas, S. C., Pemberton, J. M. and Hill, W. G. (2000). Estimating
variance components in natural populations using inferred relationships.
Heredity 84 427–436.
19