Purpose: This code replicates the analysis presented in Muff, Signer, Fieberg (2019) Section 4.2 “Habitat selection of otters: an SSF analysis”.
library(survival)
library(TwoStepCLogit)
library(INLA)
library(glmmTMB)
options(width=150)
dat <- read.csv("d_otter.csv")
str(dat)
## 'data.frame': 41670 obs. of 8 variables:
## $ NA_ANIMAL : Factor w/ 9 levels "Alena","Baukje",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ NA_ID : int 11624 11624 11624 11624 11624 11624 11624 11624 11624 12131 ...
## $ Loc : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Sohlenbrei: int 5 5 5 5 5 5 5 5 5 7 ...
## $ NAT1 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ REST1 : int 1 1 1 1 1 1 1 1 1 1 ...
## $ STAU1 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ Breaks_Dis: num 109.6 138.9 156.8 208.3 30.4 ...
NAT1, REST1 and STAU1 are the three factor levels of the factor variable habitat type, encoded as dummy variables, where
Further, the two continuous variables in the model are:
Finally, Loc
is the binary response variable that indicates if a habitat point was used (1) or available (0).
Add numerical variable for animals:
dat$ANIMAL_ID <- as.numeric(as.factor(dat$NA_ANIMAL))
Stratum ID is given as “NA_ID” in the data; It is easier to have sequential enumeration, so let’s generate a new stratum-ID variable str_ID:
d.map <- data.frame(NA_ID=unique(dat$NA_ID),str_ID=1:length(unique(dat$NA_ID)))
dat$str_ID <- d.map[match(dat$NA_ID,d.map$NA_ID),"str_ID"]
dat <- dat[order(dat$str_ID),]
Scale and center the two continuous variables river width (Sohlenbrei) and step length (Breaks_Dis)
dat$Sohlenbrei <- scale(dat$Sohlenbrei)
dat$Breaks_Dis <- scale(dat$Breaks_Dis)
r.clogit <- clogit(Loc ~ STAU1 + REST1 + Sohlenbrei + Breaks_Dis
+ strata(str_ID), data=dat)
summary(r.clogit)$coef
## coef exp(coef) se(coef) z Pr(>|z|)
## STAU1 -0.07328577 0.9293352 0.06862720 -1.067882 2.855736e-01
## REST1 -0.38498046 0.6804639 0.09722675 -3.959614 7.507086e-05
## Sohlenbrei 0.15680599 1.1697686 0.03780309 4.147967 3.354405e-05
## Breaks_Dis -0.02051501 0.9796940 0.01681737 -1.219871 2.225139e-01
Set mean and precision for the priors of slope coefficients
mean.beta <- 0
prec.beta <- 1e-4
The model formula for INLA, where we set the stratum-specific intercept variance to \(10^6\) (or rather: the precision to \(10^{-6}\)), is given as follows:
formula.fixed <- Loc ~ STAU1 + REST1 + Sohlenbrei + Breaks_Dis +
f(str_ID,model="iid",hyper=list(theta=list(initial=log(1e-6),fixed=T)))
Then fit the Poisson model
r.inla.fixed <- inla(formula.fixed, family ="Poisson", data=dat,
control.fixed = list(
mean = mean.beta,
prec = list(default = prec.beta)
)
)
The summary for the posterior distribution of the fixed effects:
r.inla.fixed$summary.fixed
## mean sd 0.025quant 0.5quant 0.975quant mode kld
## (Intercept) -2.93844954 15.96884734 -34.29067946 -2.93889925 28.38761226 -2.93844954 7.386095e-12
## STAU1 -0.07304312 0.06864839 -0.20811387 -0.07294540 0.06135260 -0.07274433 1.012655e-06
## REST1 -0.38476793 0.09725682 -0.57720188 -0.38426128 -0.19533048 -0.38324605 6.157114e-07
## Sohlenbrei 0.15691572 0.03781281 0.08255463 0.15695646 0.23097784 0.15704114 9.652643e-07
## Breaks_Dis -0.02062429 0.01682072 -0.05406316 -0.02048214 0.01199488 -0.02019886 2.084364e-07
Note that we now have to manually fix the variance of the intercept first.
Start by setting up the model, but do not yet fit it:
TMBStruc.fix = glmmTMB(Loc ~ STAU1 + REST1 + Sohlenbrei +
Breaks_Dis + (1|str_ID),
family=poisson, data=dat, doFit=FALSE)
Then fix the standard deviation of the first random term, which is the (1|str_ID)
component in the above model equation:
TMBStruc.fix$parameters$theta[1] = log(1e3)
We need to tell glmmTMB
not to change the variance by setting it to NA
:
TMBStruc.fix$mapArg = list(theta=factor(c(NA)))
Then fit the model and look at the results:
glmm.TMB.fixed = glmmTMB:::fitTMB(TMBStruc.fix)
summary(glmm.TMB.fixed)
## Family: poisson ( log )
## Formula: Loc ~ STAU1 + REST1 + Sohlenbrei + Breaks_Dis + (1 | str_ID)
## Data: dat
##
## AIC BIC logLik deviance df.resid
## 85062.9 85106.1 -42526.5 85052.9 41665
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## str_ID (Intercept) 1e+06 1000
## Number of obs: 41670, groups: str_ID, 4167
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.76924 15.49133 -0.179 0.858
## STAU1 -0.07329 0.06863 -1.068 0.286
## REST1 -0.38499 0.09723 -3.960 7.50e-05 ***
## Sohlenbrei 0.15680 0.03780 4.148 3.36e-05 ***
## Breaks_Dis -0.02052 0.01682 -1.220 0.222
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note also that there is a one-step way to carry out the regression with newer versions of glmmTMB:
glmm.TMB.fixed = glmmTMB(Loc ~ STAU1 + REST1 + Sohlenbrei +
Breaks_Dis + (1|str_ID),
family=poisson, data=dat,
map=list(theta=factor(c(NA))),
start=list(theta=c(log(1e3))))
The two-step procedure with independent random effect (D=“UN(1)”):
r.Twostep <- Ts.estim(formula = Loc ~ STAU1 + REST1 + Sohlenbrei + strata(NA_ID) + Breaks_Dis +
cluster(NA_ANIMAL) ,data = dat, random = ~ STAU1 + REST1 + Sohlenbrei ,
all.m.1=F, D="UN(1)")
Slope estiamtes and standard errors
r.Twostep$beta
## STAU1 REST1 Sohlenbrei Breaks_Dis
## 0.04317714 -0.24426655 0.09830315 0.02651010
r.Twostep$se
## STAU1 REST1 Sohlenbrei Breaks_Dis
## 0.17224014 0.24010285 0.11712502 0.01675094
Variance estimates
r.Twostep$D
## STAU1 REST1 Sohlenbrei Breaks_Dis
## STAU1 0.1740639 0.0000000 0.00000000 0
## REST1 0.0000000 0.3478413 0.00000000 0
## Sohlenbrei 0.0000000 0.0000000 0.07741911 0
## Breaks_Dis 0.0000000 0.0000000 0.00000000 0
To fit the model with random slopes in INLA, we need to generate new (but identical) variables of individual ID (ID cannot be used multiple times in the model formula):
dat$ANIMAL_ID1 <- dat$ANIMAL_ID
dat$ANIMAL_ID2 <- dat$ANIMAL_ID
dat$ANIMAL_ID3 <- dat$ANIMAL_ID
Set the model formula as for the fixed-effects model, but now add three random slope terms, namely for river width and for the two levels of the categorical variable (STAU1 and REST1) which are not the reference categories. The priors for precision of the three random slopes are PC(3,0.05), while the intercept variance is again fixed:
formula.random <- Loc ~ -1 + STAU1 + REST1 + Sohlenbrei +
Breaks_Dis +
f(str_ID,model="iid",hyper=list(theta = list(initial=log(1e-6),fixed=T))) +
f(ANIMAL_ID1,Sohlenbrei,values=1:9,model="iid",
hyper=list(theta=list(initial=log(1),fixed=F,prior="pc.prec",param=c(3,0.05)))) +
f(ANIMAL_ID2,STAU1,values=1:9,model="iid",
hyper=list(theta=list(initial=log(1),fixed=F,prior="pc.prec",param=c(3,0.05)))) +
f(ANIMAL_ID3,REST1,values=1:9,model="iid",
hyper=list(theta=list(initial=log(1),fixed=F,prior="pc.prec",param=c(3,0.05))))
Fit the model
r.inla.random <- inla(formula.random, family ="Poisson", data=dat,
control.fixed = list(
mean = mean.beta,
prec = list(default = prec.beta)
)
)
The summary for the posterior distribution of the fixed effects:
r.inla.random$summary.fixed
## mean sd 0.025quant 0.5quant 0.975quant
## STAU1 0.02286614 0.18305962 -0.32311920 0.01404792 0.41644755
## REST1 -0.33412207 0.21707728 -0.76709956 -0.33615906 0.11162251
## Sohlenbrei 0.11148622 0.14265833 -0.18973420 0.11585016 0.38697776
## Breaks_Dis -0.01506916 0.01679532 -0.04845797 -0.01492708 0.01750034
## mode kld
## STAU1 -0.001813017 8.728272e-07
## REST1 -0.339055180 9.145451e-07
## Sohlenbrei 0.122774074 2.038647e-07
## Breaks_Dis -0.014643913 2.073831e-07
Since variances are parameterized and treated as precisions, the summary of the respective posterior distributions is given for the precisions:
r.inla.random$summary.hyperpar
## mean sd 0.025quant 0.5quant 0.975quant
## Precision for ANIMAL_ID1 13.439705 12.053290 2.0926356 10.025368 45.29953
## Precision for ANIMAL_ID2 9.303474 9.826404 1.2568431 6.389658 34.93340
## Precision for ANIMAL_ID3 7.119067 7.462072 0.9627068 4.909348 26.62829
## mode
## Precision for ANIMAL_ID1 5.396810
## Precision for ANIMAL_ID2 3.170836
## Precision for ANIMAL_ID3 2.439278
Source R functions for calculating posterior means and medians of the precisions.
source("inla_emarginal.R")
source("inla_mmarginal.R")
inla_emarginal(r.inla.random)
## Mean of variance for ANIMAL_ID1 Mean of variance for ANIMAL_ID2
## 0.1372604 0.2200796
## Mean of variance for ANIMAL_ID3
## 0.2867388
inla_mmarginal(r.inla.random)
## Mode of variance for ANIMAL_ID1 Mode of variance for ANIMAL_ID2
## 0.05448438 0.07527460
## Mode of variance for ANIMAL_ID3
## 0.09845147
Now the same model using glmmTMB(). Again start to set up the model without fitting it:
TMBStruc = glmmTMB(Loc ~ -1 + STAU1 + REST1 + Sohlenbrei +
Breaks_Dis + (1|str_ID) +
(0 + STAU1 | ANIMAL_ID) +
(0 + REST1 | ANIMAL_ID) +
(0 + Sohlenbrei | ANIMAL_ID),
family=poisson, data=dat, doFit=FALSE)
Set the value of the standard deviation of the first random effect (here (1|str_ID)):
TMBStruc$parameters$theta[1] = log(1e3)
Tell glmmTMB not to change the first standard deviation, all other values are freely estimated (and are different from each other)
TMBStruc$mapArg = list(theta=factor(c(NA,1:3)))
Fit the model and look at the summary:
glmm.TMB.random <- glmmTMB:::fitTMB(TMBStruc)
summary(glmm.TMB.random)
## Family: poisson ( log )
## Formula:
## Loc ~ -1 + STAU1 + REST1 + Sohlenbrei + Breaks_Dis + (1 | str_ID) +
## (0 + STAU1 | ANIMAL_ID) + (0 + REST1 | ANIMAL_ID) + (0 +
## Sohlenbrei | ANIMAL_ID)
## Data: dat
##
## AIC BIC logLik deviance df.resid
## 85026.6 85087.1 -42506.3 85012.6 41663
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## str_ID (Intercept) 1.000e+06 1000.0000
## ANIMAL_ID STAU1 7.375e-02 0.2716
## ANIMAL_ID.1 REST1 1.020e-01 0.3194
## ANIMAL_ID.2 Sohlenbrei 7.062e-02 0.2657
## Number of obs: 41670, groups: str_ID, 4167; ANIMAL_ID, 9
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## STAU1 -0.004082 0.136876 -0.030 0.9762
## REST1 -0.350802 0.161325 -2.175 0.0297 *
## Sohlenbrei 0.122937 0.114338 1.075 0.2823
## Breaks_Dis -0.015105 0.016801 -0.899 0.3686
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
95% CIs for fixed and random effects (standard deviations) are obtained via the confint() function:
confint(glmm.TMB.random)
## 2.5 % 97.5 % Estimate
## cond.STAU1 -0.27235446 0.26419106 -0.004081702
## cond.REST1 -0.66699367 -0.03461004 -0.350801856
## cond.Sohlenbrei -0.10116157 0.34703532 0.122936875
## cond.Breaks_Dis -0.04803470 0.01782379 -0.015105456
## ANIMAL_ID.cond.Std.Dev.STAU1 0.09196532 0.80191477 0.271566470
## ANIMAL_ID.1.cond.Std.Dev.REST1 0.09636594 1.05838186 0.319361813
## ANIMAL_ID.2.cond.Std.Dev.Sohlenbrei 0.13332008 0.52972521 0.265749893
Note: It it has been a problem in a previous version of glmmTMB that the confint() function only showed a table of the length equal to the number of parameters estimated. As the variance for str_ID was not estimated but fixed to 10^6, but was still listed, the last variance component (here the one for (0 + Sohlenbrei | ANIMAL_ID)) was not shown. If you face this problem, you can solve the issue by moving the component (1|str_ID) to the last position in the formula, and then use
TMBStruc$parameters$theta[4] = log(1e3)
TMBStruc$mapArg = list(theta=factor(c(1:3, NA)))
in the above code.
Note also that there is a one-step way to carry out the regression with newer versions of glmmTMB:
glmm.TMB.random = glmmTMB(Loc ~ -1 + STAU1 + REST1 + Sohlenbrei +
Breaks_Dis + (1|str_ID) +
(0 + STAU1 | ANIMAL_ID) +
(0 + REST1 | ANIMAL_ID) +
(0 + Sohlenbrei | ANIMAL_ID),
family=poisson, data=dat,
map=list(theta=factor(c(NA,1:3))),
start=list(theta=c(log(1e3),0,0,0))
)
knitr::opts_chunk$set(error = TRUE)
Remove all strata of animal 6 where at least one point (used or available) lied in residual water (REST1) habitat; keep only strata that exclusively contain natural (NAT1) and reservoir (STAU1) points
strata to be removed:
strata.remove <- unique(dat[which(dat$ANIMAL_ID=="6" & dat$REST1==1),"NA_ID"])
Keep only strata that are not in this list; store in a reduced dataset:
dat.red <- dat[!(dat$NA_ID%in%strata.remove),]
Number of total data points (used or available) that are removed (from the 41670 in total):
nrow(dat) - nrow(dat.red)
## [1] 120
Try to start the two-step procedure, but it does not run:
# Ts.estim(formula = Loc ~ STAU1 + REST1 + Sohlenbrei + strata(NA_ID) + Breaks_Dis +
# cluster(NA_ANIMAL) ,data = dat.red, random = ~ STAU1 + REST1 + Sohlenbrei ,
# all.m.1=F, D="UN(1)")
On the other hand, the same dataset is no problem for the Poisson model:
TMBStruc = glmmTMB(Loc ~ -1 + STAU1 + REST1 + Sohlenbrei +
Breaks_Dis + (1|str_ID) +
(0 + STAU1 | ANIMAL_ID) +
(0 + REST1 | ANIMAL_ID) +
(0 + Sohlenbrei | ANIMAL_ID),
family=poisson, data=dat.red, doFit=FALSE)
TMBStruc$parameters$theta[1] = log(1e3)
TMBStruc$mapArg = list(theta=factor(c(NA,1:3)))
glmm.TMB.random <- glmmTMB:::fitTMB(TMBStruc)
summary(glmm.TMB.random)
## Family: poisson ( log )
## Formula:
## Loc ~ -1 + STAU1 + REST1 + Sohlenbrei + Breaks_Dis + (1 | str_ID) +
## (0 + STAU1 | ANIMAL_ID) + (0 + REST1 | ANIMAL_ID) + (0 +
## Sohlenbrei | ANIMAL_ID)
## Data: dat.red
##
## AIC BIC logLik deviance df.resid
## 84782.6 84843.1 -42384.3 84768.6 41543
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## str_ID (Intercept) 1.000e+06 1000.0000
## ANIMAL_ID STAU1 6.297e-02 0.2509
## ANIMAL_ID.1 REST1 1.068e-01 0.3268
## ANIMAL_ID.2 Sohlenbrei 7.061e-02 0.2657
## Number of obs: 41550, groups: str_ID, 4155; ANIMAL_ID, 9
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## STAU1 -0.002667 0.133579 -0.020 0.9841
## REST1 -0.364769 0.166769 -2.187 0.0287 *
## Sohlenbrei 0.121956 0.114456 1.065 0.2866
## Breaks_Dis -0.016119 0.016840 -0.957 0.3385
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
devtools::session_info()
## ─ Session info ───────────────────────────────────────────────────────────────
## setting value
## version R version 3.6.3 (2020-02-29)
## os Ubuntu 18.04.6 LTS
## system x86_64, linux-gnu
## ui X11
## language de_CH:de
## collate de_CH.UTF-8
## ctype de_CH.UTF-8
## tz Europe/Zurich
## date 2021-12-10
##
## ─ Packages ───────────────────────────────────────────────────────────────────
## package * version date lib source
## assertthat 0.2.1 2019-03-21 [2] CRAN (R 3.4.4)
## backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.3)
## boot 1.3-24 2019-12-20 [1] CRAN (R 3.6.3)
## callr 3.2.0 2019-03-15 [2] CRAN (R 3.4.4)
## cli 1.1.0 2019-03-19 [2] CRAN (R 3.4.4)
## coda 0.19-2 2018-10-08 [2] CRAN (R 3.4.4)
## codetools 0.2-16 2018-12-24 [4] CRAN (R 3.6.3)
## crayon 1.3.4 2017-09-16 [2] CRAN (R 3.4.2)
## desc 1.2.0 2018-05-01 [2] CRAN (R 3.4.4)
## devtools 2.1.0 2019-07-03 [2] Github (hadley/devtools@4e08802)
## digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.3)
## emmeans 1.3.5.1 2019-06-26 [2] CRAN (R 3.4.4)
## estimability 1.3 2018-02-11 [2] CRAN (R 3.4.4)
## evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.1)
## foreach * 1.4.4 2017-12-12 [2] CRAN (R 3.4.4)
## fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
## glmmTMB * 1.1.2.3 2021-09-20 [1] CRAN (R 3.6.3)
## glue 1.3.2 2020-03-12 [1] CRAN (R 3.6.3)
## highr 0.8 2019-03-20 [1] CRAN (R 3.6.1)
## htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.6.0)
## INLA * 20.03.17 2020-11-20 [1] local
## iterators 1.0.10 2018-07-13 [2] CRAN (R 3.4.4)
## knitr 1.31 2021-01-27 [1] CRAN (R 3.6.3)
## lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.0)
## lme4 1.1-21 2019-03-05 [2] CRAN (R 3.6.0)
## magrittr 1.5 2014-11-22 [2] CRAN (R 3.4.0)
## MASS 7.3-53 2020-09-09 [1] CRAN (R 3.6.3)
## Matrix * 1.2-17 2019-03-22 [2] CRAN (R 3.6.0)
## memoise 1.1.0 2017-04-21 [2] CRAN (R 3.4.2)
## minqa 1.2.4 2014-10-09 [2] CRAN (R 3.6.0)
## multcomp 1.4-8 2017-11-08 [2] CRAN (R 3.4.4)
## mvtnorm 1.0-11 2019-06-19 [1] CRAN (R 3.6.1)
## nlme 3.1-149 2020-08-23 [4] CRAN (R 3.6.3)
## nloptr 1.2.1 2018-10-03 [2] CRAN (R 3.6.0)
## numDeriv 2016.8-1 2016-08-27 [2] CRAN (R 3.4.4)
## pkgbuild 1.0.3 2019-03-20 [2] CRAN (R 3.4.4)
## pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.3)
## processx 3.4.0 2019-07-03 [1] CRAN (R 3.6.0)
## ps 1.3.2 2020-02-13 [1] CRAN (R 3.6.3)
## R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.3)
## Rcpp 1.0.4 2020-03-17 [1] CRAN (R 3.6.3)
## remotes 2.2.0 2020-07-21 [1] CRAN (R 3.6.3)
## rlang 0.4.7 2020-07-09 [1] CRAN (R 3.6.3)
## rmarkdown 2.6 2020-12-14 [1] CRAN (R 3.6.3)
## rprojroot 1.3-2 2018-01-03 [2] CRAN (R 3.4.4)
## sandwich 2.4-0 2017-07-26 [2] CRAN (R 3.4.2)
## sessioninfo 1.1.1 2018-11-05 [2] CRAN (R 3.4.4)
## sp * 1.4-4 2020-10-07 [1] CRAN (R 3.6.3)
## stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.3)
## stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.1)
## survival * 2.44-1.1 2019-04-01 [2] CRAN (R 3.6.0)
## testthat 2.1.1 2019-04-23 [1] CRAN (R 3.6.0)
## TH.data 1.0-9 2018-07-10 [2] CRAN (R 3.4.4)
## TMB 1.7.15 2018-11-09 [2] CRAN (R 3.6.0)
## TwoStepCLogit * 1.2.5 2016-03-21 [2] CRAN (R 3.4.0)
## usethis 1.5.0 2019-04-07 [2] CRAN (R 3.4.4)
## withr 2.1.2 2018-03-15 [2] CRAN (R 3.4.4)
## xfun 0.20 2021-01-06 [1] CRAN (R 3.6.3)
## xtable 1.8-4 2019-04-21 [1] CRAN (R 3.6.3)
## yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.3)
## zoo 1.8-6 2019-05-28 [1] CRAN (R 3.6.1)
##
## [1] /home/steffi/R/x86_64-pc-linux-gnu-library/3.6
## [2] /usr/local/lib/R/site-library
## [3] /usr/lib/R/site-library
## [4] /usr/lib/R/library