Purpose: This code replicates the analysis presented in Muff, Signer, Fieberg (2019) Section 4.2 “Habitat selection of otters: an SSF analysis”.
library(survival)
library(TwoStepCLogit)
library(INLA)
library(glmmTMB)
options(width=150)
dat <- read.csv("d_otter.csv")
str(dat)
## 'data.frame': 41670 obs. of 8 variables:
## $ NA_ANIMAL : Factor w/ 9 levels "Alena","Baukje",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ NA_ID : int 11624 11624 11624 11624 11624 11624 11624 11624 11624 12131 ...
## $ Loc : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Sohlenbrei: int 5 5 5 5 5 5 5 5 5 7 ...
## $ NAT1 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ REST1 : int 1 1 1 1 1 1 1 1 1 1 ...
## $ STAU1 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Breaks_Dis: num 109.6 138.9 156.8 208.3 30.4 ...
NAT1, REST1 and STAU1 are the three factor levels of the factor variable habitat type, encoded as dummy variables, where
Further, the two continuous variables in the model are:
Finally, Loc
is the binary response variable that indicates if a habitat point was used (1) or available (0).
Add numerical variable for animals:
dat$ANIMAL_ID <- as.numeric(dat$NA_ANIMAL)
Stratum ID is given as “NA_ID” in the data; It is easier to have sequential enumeration, so let’s generate a new stratum-ID variable str_ID:
d.map <- data.frame(NA_ID=unique(dat$NA_ID),str_ID=1:length(unique(dat$NA_ID)))
dat$str_ID <- d.map[match(dat$NA_ID,d.map$NA_ID),"str_ID"]
dat <- dat[order(dat$str_ID),]
Scale and center the two continuous variables river width (Sohlenbrei) and step length (Breaks_Dis)
dat$Sohlenbrei <- scale(dat$Sohlenbrei)
dat$Breaks_Dis <- scale(dat$Breaks_Dis)
r.clogit <- clogit(Loc ~ STAU1 + REST1 + Sohlenbrei + Breaks_Dis
+ strata(str_ID), data=dat)
summary(r.clogit)$coef
## coef exp(coef) se(coef) z Pr(>|z|)
## STAU1 -0.07328577 0.9293352 0.06862720 -1.067882 2.855736e-01
## REST1 -0.38498046 0.6804639 0.09722675 -3.959614 7.507086e-05
## Sohlenbrei 0.15680599 1.1697686 0.03780309 4.147967 3.354405e-05
## Breaks_Dis -0.02051501 0.9796940 0.01681737 -1.219871 2.225139e-01
Set mean and precision for the priors of slope coefficients
mean.beta <- 0
prec.beta <- 1e-4
The model formula for INLA, where we set the stratum-specific intercept variance to \(10^6\) (or rather: the precision to \(10^{-6}\)), is given as follows:
formula.fixed <- Loc ~ STAU1 + REST1 + Sohlenbrei + Breaks_Dis +
f(str_ID,model="iid",hyper=list(theta=list(initial=log(1e-6),fixed=T)))
Then fit the Poisson model
r.inla.fixed <- inla(formula.fixed, family ="Poisson", data=dat,
control.fixed = list(
mean = mean.beta,
prec = list(default = prec.beta)
)
)
The summary for the posterior distribution of the fixed effects:
r.inla.fixed$summary.fixed
## mean sd 0.025quant 0.5quant 0.975quant
## (Intercept) -2.93844954 15.96884734 -34.29067946 -2.93889925 28.38761226
## STAU1 -0.07304312 0.06864839 -0.20811387 -0.07294540 0.06135260
## REST1 -0.38476793 0.09725682 -0.57720188 -0.38426128 -0.19533048
## Sohlenbrei 0.15691572 0.03781281 0.08255463 0.15695646 0.23097784
## Breaks_Dis -0.02062429 0.01682072 -0.05406316 -0.02048214 0.01199488
## mode kld
## (Intercept) -2.93844954 7.386095e-12
## STAU1 -0.07274433 1.012655e-06
## REST1 -0.38324605 6.157114e-07
## Sohlenbrei 0.15704114 9.652643e-07
## Breaks_Dis -0.02019886 2.084364e-07
Note that we have now have to manually fix the variance of the intercept first.
Start by setting up the model, but do not yet fit it:
TMBStruc.fix = glmmTMB(Loc ~ STAU1 + REST1 + Sohlenbrei +
Breaks_Dis + (1|str_ID),
family=poisson, data=dat, doFit=FALSE)
Then fix the standard deviation of the first random term, which is the (1|str_ID)
component in the above model equation:
TMBStruc.fix$parameters$theta[1] = log(1e3)
We need to tell glmmTMB
not to change the variance by setting it to NA
:
TMBStruc.fix$mapArg = list(theta=factor(c(NA)))
Then fit the model and look at the results:
glmm.TMB.fixed = glmmTMB:::fitTMB(TMBStruc.fix)
summary(glmm.TMB.fixed)
## Family: poisson ( log )
## Formula:
## Loc ~ STAU1 + REST1 + Sohlenbrei + Breaks_Dis + (1 | str_ID)
## Data: dat
##
## AIC BIC logLik deviance df.resid
## 85062.9 85106.1 -42526.5 85052.9 41665
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## str_ID (Intercept) 1e+06 1000
## Number of obs: 41670, groups: str_ID, 4167
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.76924 15.49133 -0.179 0.858
## STAU1 -0.07329 0.06863 -1.068 0.286
## REST1 -0.38499 0.09723 -3.960 7.50e-05 ***
## Sohlenbrei 0.15680 0.03780 4.148 3.36e-05 ***
## Breaks_Dis -0.02052 0.01682 -1.220 0.222
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The two-step procedure with independent random effect (D=“UN(1)”):
r.Twostep <- Ts.estim(formula = Loc ~ STAU1 + REST1 + Sohlenbrei + strata(NA_ID) + Breaks_Dis +
cluster(NA_ANIMAL) ,data = dat, random = ~ STAU1 + REST1 + Sohlenbrei ,
all.m.1=F, D="UN(1)")
Slope estiamtes and standard errors
r.Twostep$beta
## STAU1 REST1 Sohlenbrei Breaks_Dis
## 0.04317714 -0.24426655 0.09830315 0.02651010
r.Twostep$se
## STAU1 REST1 Sohlenbrei Breaks_Dis
## 0.17224014 0.24010285 0.11712502 0.01675094
Variance estimates
r.Twostep$D
## STAU1 REST1 Sohlenbrei Breaks_Dis
## STAU1 0.1740639 0.0000000 0.00000000 0
## REST1 0.0000000 0.3478413 0.00000000 0
## Sohlenbrei 0.0000000 0.0000000 0.07741911 0
## Breaks_Dis 0.0000000 0.0000000 0.00000000 0
To fit the model with random slopes in INLA, we need to generate new (but identical) variables of individual ID (ID cannot be used multiple times in the model formula):
dat$ANIMAL_ID1 <- dat$ANIMAL_ID
dat$ANIMAL_ID2 <- dat$ANIMAL_ID
dat$ANIMAL_ID3 <- dat$ANIMAL_ID
Set the model formula as for the fixed-effects model, but now add three random slope terms, namely for river width and for the two levels of the categorical variable (STAU1 and REST1) which are not the reference categories. The priors for precision of the three random slopes are PC(3,0.05), while the intercept variance is again fixed:
formula.random <- Loc ~ -1 + STAU1 + REST1 + Sohlenbrei +
Breaks_Dis +
f(str_ID,model="iid",hyper=list(theta = list(initial=log(1e-6),fixed=T))) +
f(ANIMAL_ID1,Sohlenbrei,values=1:9,model="iid",
hyper=list(theta=list(initial=log(1),fixed=F,prior="pc.prec",param=c(3,0.05)))) +
f(ANIMAL_ID2,STAU1,values=1:9,model="iid",
hyper=list(theta=list(initial=log(1),fixed=F,prior="pc.prec",param=c(3,0.05)))) +
f(ANIMAL_ID3,REST1,values=1:9,model="iid",
hyper=list(theta=list(initial=log(1),fixed=F,prior="pc.prec",param=c(3,0.05))))
Fit the model
r.inla.random <- inla(formula.random, family ="Poisson", data=dat,
control.fixed = list(
mean = mean.beta,
prec = list(default = prec.beta)
)
)
The summary for the posterior distribution of the fixed effects:
r.inla.random$summary.fixed
## mean sd 0.025quant 0.5quant 0.975quant
## STAU1 0.02286614 0.18305962 -0.32311920 0.01404792 0.41644755
## REST1 -0.33412207 0.21707728 -0.76709956 -0.33615906 0.11162251
## Sohlenbrei 0.11148622 0.14265833 -0.18973420 0.11585016 0.38697776
## Breaks_Dis -0.01506916 0.01679532 -0.04845797 -0.01492708 0.01750034
## mode kld
## STAU1 -0.001813017 8.728272e-07
## REST1 -0.339055180 9.145451e-07
## Sohlenbrei 0.122774074 2.038647e-07
## Breaks_Dis -0.014643913 2.073831e-07
Since variances are parameterized and treated as precisions, the summary of the respective posterior distributions is given for the precisions:
r.inla.random$summary.hyperpar
## mean sd 0.025quant 0.5quant
## Precision for ANIMAL_ID1 13.439705 12.053290 2.0926356 10.025368
## Precision for ANIMAL_ID2 9.303474 9.826404 1.2568431 6.389658
## Precision for ANIMAL_ID3 7.119067 7.462072 0.9627068 4.909348
## 0.975quant mode
## Precision for ANIMAL_ID1 45.29953 5.396810
## Precision for ANIMAL_ID2 34.93340 3.170836
## Precision for ANIMAL_ID3 26.62829 2.439278
Source R functions for calculating posterior means and medians of the precisions.
source("inla_emarginal.R")
source("inla_mmarginal.R")
inla_emarginal(r.inla.random)
## Mean of variance for ANIMAL_ID1 Mean of variance for ANIMAL_ID2
## 0.1372604 0.2200796
## Mean of variance for ANIMAL_ID3
## 0.2867388
inla_mmarginal(r.inla.random)
## Mode of variance for ANIMAL_ID1 Mode of variance for ANIMAL_ID2
## 0.05448438 0.07527460
## Mode of variance for ANIMAL_ID3
## 0.09845147
Now the same model using glmmTMB(). Again start to set up the model without fitting it:
TMBStruc = glmmTMB(Loc ~ -1 + STAU1 + REST1 + Sohlenbrei +
Breaks_Dis + (1|str_ID) +
(0 + STAU1 | ANIMAL_ID) +
(0 + REST1 | ANIMAL_ID) +
(0 + Sohlenbrei | ANIMAL_ID),
family=poisson, data=dat, doFit=FALSE)
Set the value of the standard deviation of the first random effect (here (1|str_ID)):
TMBStruc$parameters$theta[1] = log(1e3)
Tell glmmTMB not to change the first standard deviation, all other values are freely estimated (and are different from each other)
TMBStruc$mapArg = list(theta=factor(c(NA,1:3)))
Fit the model and look at the summary:
glmm.TMB.random <- glmmTMB:::fitTMB(TMBStruc)
summary(glmm.TMB.random)
## Family: poisson ( log )
## Formula:
## Loc ~ -1 + STAU1 + REST1 + Sohlenbrei + Breaks_Dis + (1 | str_ID) +
## (0 + STAU1 | ANIMAL_ID) + (0 + REST1 | ANIMAL_ID) + (0 +
## Sohlenbrei | ANIMAL_ID)
## Data: dat
##
## AIC BIC logLik deviance df.resid
## 85026.6 85087.1 -42506.3 85012.6 41663
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## str_ID (Intercept) 1.000e+06 1000.0000
## ANIMAL_ID STAU1 7.375e-02 0.2716
## ANIMAL_ID.1 REST1 1.020e-01 0.3194
## ANIMAL_ID.2 Sohlenbrei 7.062e-02 0.2657
## Number of obs: 41670, groups: str_ID, 4167; ANIMAL_ID, 9
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## STAU1 -0.004082 0.136876 -0.030 0.9762
## REST1 -0.350802 0.161325 -2.175 0.0297 *
## Sohlenbrei 0.122937 0.114338 1.075 0.2823
## Breaks_Dis -0.015105 0.016801 -0.899 0.3686
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
95% CIs for fixed and random effects (standard deviations) are obtained via the confint() function:
confint(glmm.TMB.random)
## 2.5 % 97.5 % Estimate
## cond.STAU1 -0.27235446 0.26419106 -4.081702e-03
## cond.REST1 -0.66699367 -0.03461004 -3.508019e-01
## cond.Sohlenbrei -0.10116157 0.34703532 1.229369e-01
## cond.Breaks_Dis -0.04803470 0.01782379 -1.510546e-02
## str_ID.cond.Std.Dev.(Intercept) 1000.00000000 1000.00000000 1.000000e+03
## ANIMAL_ID.cond.Std.Dev.STAU1 0.09196532 0.80191477 2.715665e-01
## ANIMAL_ID.1.cond.Std.Dev.REST1 0.09636594 1.05838186 3.193618e-01
Note: It is currently a problem of glmmTMB that the confint() function only shows a table of the length equal to the number of parameters estimated. As the variance for str_ID was not estimated but fixed to 10^6, but is still listed, the last variance component (here the one for (0 + Sohlenbrei | ANIMAL_ID)) is not shown. This can be solved by moving the component (1|str_ID) to the last position in the formula, and then use
TMBStruc$parameters$theta[4] = log(1e3)
TMBStruc$mapArg = list(theta=factor(c(1:3, NA)))
in the above code.
knitr::opts_chunk$set(error = TRUE)
Remove all strata of animal 6 where at least one point (used or available) lied in residual water (REST1) habitat; keep only strata that exclusively contain natural (NAT1) and reservoir (STAU1) points
strata to be removed:
strata.remove <- unique(dat[which(dat$ANIMAL_ID=="6" & dat$REST1==1),"NA_ID"])
Keep only strata that are not in this list; store in a reduced dataset:
dat.red <- dat[!(dat$NA_ID%in%strata.remove),]
Number of total data points (used or available) that are removed (from the 41670 in total):
nrow(dat) - nrow(dat.red)
## [1] 120
Try to start the two-step procedure, but it does not run:
# Ts.estim(formula = Loc ~ STAU1 + REST1 + Sohlenbrei + strata(NA_ID) + Breaks_Dis +
# cluster(NA_ANIMAL) ,data = dat.red, random = ~ STAU1 + REST1 + Sohlenbrei ,
# all.m.1=F, D="UN(1)")
On the other hand, the same dataset is no problem for the Poisson model:
TMBStruc = glmmTMB(Loc ~ -1 + STAU1 + REST1 + Sohlenbrei +
Breaks_Dis + (1|str_ID) +
(0 + STAU1 | ANIMAL_ID) +
(0 + REST1 | ANIMAL_ID) +
(0 + Sohlenbrei | ANIMAL_ID),
family=poisson, data=dat.red, doFit=FALSE)
TMBStruc$parameters$theta[1] = log(1e3)
TMBStruc$mapArg = list(theta=factor(c(NA,1:3)))
glmm.TMB.random <- glmmTMB:::fitTMB(TMBStruc)
summary(glmm.TMB.random)
## Family: poisson ( log )
## Formula:
## Loc ~ -1 + STAU1 + REST1 + Sohlenbrei + Breaks_Dis + (1 | str_ID) +
## (0 + STAU1 | ANIMAL_ID) + (0 + REST1 | ANIMAL_ID) + (0 +
## Sohlenbrei | ANIMAL_ID)
## Data: dat.red
##
## AIC BIC logLik deviance df.resid
## 84782.6 84843.1 -42384.3 84768.6 41543
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## str_ID (Intercept) 1.000e+06 1000.0000
## ANIMAL_ID STAU1 6.297e-02 0.2509
## ANIMAL_ID.1 REST1 1.068e-01 0.3268
## ANIMAL_ID.2 Sohlenbrei 7.061e-02 0.2657
## Number of obs: 41550, groups: str_ID, 4155; ANIMAL_ID, 9
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## STAU1 -0.002667 0.133579 -0.020 0.9841
## REST1 -0.364769 0.166769 -2.187 0.0287 *
## Sohlenbrei 0.121956 0.114456 1.065 0.2866
## Breaks_Dis -0.016119 0.016840 -0.957 0.3385
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
devtools::session_info()
## setting value
## version R version 3.4.4 (2018-03-15)
## system x86_64, linux-gnu
## ui X11
## language de_CH:de
## collate de_CH.UTF-8
## tz Europe/Zurich
## date 2019-06-17
##
## package * version date source
## backports 1.1.4 2019-04-10 cran (@1.1.4)
## base * 3.4.4 2018-03-16 local
## codetools 0.2-15 2016-10-05 CRAN (R 3.3.1)
## compiler 3.4.4 2018-03-16 local
## datasets * 3.4.4 2018-03-16 local
## Deriv 3.8.5 2018-06-11 CRAN (R 3.4.4)
## devtools 1.13.4 2017-11-09 CRAN (R 3.4.2)
## digest 0.6.18 2018-10-10 cran (@0.6.18)
## evaluate 0.10.1 2017-06-24 CRAN (R 3.4.1)
## glmmTMB * 0.2.3 2019-01-11 CRAN (R 3.4.4)
## graphics * 3.4.4 2018-03-16 local
## grDevices * 3.4.4 2018-03-16 local
## grid 3.4.4 2018-03-16 local
## htmltools 0.3.6 2017-04-28 CRAN (R 3.4.0)
## INLA * 18.07.12 2019-05-17 local
## knitr 1.17 2017-08-10 CRAN (R 3.4.1)
## lattice 0.20-35 2017-03-25 CRAN (R 3.4.0)
## lme4 1.1-20 2019-02-04 cran (@1.1-20)
## magrittr 1.5 2014-11-22 CRAN (R 3.4.0)
## MASS 7.3-47 2017-04-21 CRAN (R 3.4.2)
## Matrix * 1.2-14 2018-04-09 CRAN (R 3.4.4)
## MatrixModels 0.4-1 2015-08-22 CRAN (R 3.4.0)
## memoise 1.1.0 2017-04-21 CRAN (R 3.4.2)
## methods * 3.4.4 2018-03-16 local
## minqa 1.2.4 2014-10-09 CRAN (R 3.4.0)
## nlme 3.1-137 2018-04-07 CRAN (R 3.4.4)
## nloptr 1.2.1 2018-10-03 cran (@1.2.1)
## Rcpp 1.0.1 2019-03-17 cran (@1.0.1)
## rmarkdown 1.7 2017-11-10 CRAN (R 3.4.2)
## rprojroot 1.3-2 2018-01-03 cran (@1.3-2)
## sp * 1.2-5 2017-06-29 cran (@1.2-5)
## splines 3.4.4 2018-03-16 local
## stats * 3.4.4 2018-03-16 local
## stringi 1.2.4 2018-07-20 cran (@1.2.4)
## stringr 1.3.1 2018-05-10 cran (@1.3.1)
## survival * 2.41-3 2017-04-04 CRAN (R 3.4.0)
## TMB 1.7.14 2018-06-23 CRAN (R 3.4.4)
## tools 3.4.4 2018-03-16 local
## TwoStepCLogit * 1.2.5 2016-03-21 CRAN (R 3.4.0)
## utils * 3.4.4 2018-03-16 local
## withr 2.1.2 2018-03-15 CRAN (R 3.4.4)
## yaml 2.1.14 2016-11-12 CRAN (R 3.4.1)