Habitat selection of otters (an SSF analysis)

Load libraries and read in data
- Some data manipulation:
Fixed effects models
- clogit
- INLA
- glmmTMB
Random effects models
- 2StepCLogit
- INLA
- glmmTMB
Illustration of convergence problem for 2StepCLogit
Session Info

Purpose: This code replicates the analysis presented in Muff, Signer, Fieberg (2019) Section 4.2 “Habitat selection of otters: an SSF analysis”.

Load libraries and read in data

library(survival)
library(TwoStepCLogit)
library(INLA)
library(glmmTMB)
options(width=150)
dat <-  read.csv("d_otter.csv")
str(dat)

## 'data.frame':    41670 obs. of  8 variables:
##  $ NA_ANIMAL : Factor w/ 9 levels "Alena","Baukje",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ NA_ID     : int  11624 11624 11624 11624 11624 11624 11624 11624 11624 12131 ...
##  $ Loc       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Sohlenbrei: int  5 5 5 5 5 5 5 5 5 7 ...
##  $ NAT1      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ REST1     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ STAU1     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Breaks_Dis: num  109.6 138.9 156.8 208.3 30.4 ...

NAT1, REST1 and STAU1 are the three factor levels of the factor variable habitat type, encoded as dummy variables, where

NAT1: natural habitat (reference category)
REST1: residual water
STAU1: a reservoir

Further, the two continuous variables in the model are:

Sohlbrei: the river width
Breaks_Dis: step length

Finally, Loc is the binary response variable that indicates if a habitat point was used (1) or available (0).

Some data manipulation:

Add numerical variable for animals:

dat$ANIMAL_ID <- as.numeric(dat$NA_ANIMAL)

Stratum ID is given as “NA_ID” in the data; It is easier to have sequential enumeration, so let’s generate a new stratum-ID variable str_ID:

d.map <- data.frame(NA_ID=unique(dat$NA_ID),str_ID=1:length(unique(dat$NA_ID)))
dat$str_ID <- d.map[match(dat$NA_ID,d.map$NA_ID),"str_ID"]
dat <- dat[order(dat$str_ID),]

Scale and center the two continuous variables river width (Sohlenbrei) and step length (Breaks_Dis)

dat$Sohlenbrei <- scale(dat$Sohlenbrei)
dat$Breaks_Dis <- scale(dat$Breaks_Dis)

Fixed effects models

clogit

r.clogit <- clogit(Loc ~ STAU1 + REST1 + Sohlenbrei + Breaks_Dis  
                   +   strata(str_ID), data=dat) 

summary(r.clogit)$coef

##                   coef exp(coef)   se(coef)         z     Pr(>|z|)
## STAU1      -0.07328577 0.9293352 0.06862720 -1.067882 2.855736e-01
## REST1      -0.38498046 0.6804639 0.09722675 -3.959614 7.507086e-05
## Sohlenbrei  0.15680599 1.1697686 0.03780309  4.147967 3.354405e-05
## Breaks_Dis -0.02051501 0.9796940 0.01681737 -1.219871 2.225139e-01

INLA

Set mean and precision for the priors of slope coefficients

mean.beta <- 0
prec.beta <- 1e-4

The model formula for INLA, where we set the stratum-specific intercept variance to \(10^6\) (or rather: the precision to \(10^{-6}\)), is given as follows:

formula.fixed <-  Loc ~  STAU1 + REST1 + Sohlenbrei +  Breaks_Dis +
  f(str_ID,model="iid",hyper=list(theta=list(initial=log(1e-6),fixed=T)))

Then fit the Poisson model

r.inla.fixed <- inla(formula.fixed, family ="Poisson", data=dat,
                       control.fixed = list(
                         mean = mean.beta,
                         prec = list(default = prec.beta)
                       )
                )

The summary for the posterior distribution of the fixed effects:

r.inla.fixed$summary.fixed

##                    mean          sd   0.025quant    0.5quant  0.975quant
## (Intercept) -2.93844954 15.96884734 -34.29067946 -2.93889925 28.38761226
## STAU1       -0.07304312  0.06864839  -0.20811387 -0.07294540  0.06135260
## REST1       -0.38476793  0.09725682  -0.57720188 -0.38426128 -0.19533048
## Sohlenbrei   0.15691572  0.03781281   0.08255463  0.15695646  0.23097784
## Breaks_Dis  -0.02062429  0.01682072  -0.05406316 -0.02048214  0.01199488
##                    mode          kld
## (Intercept) -2.93844954 7.386095e-12
## STAU1       -0.07274433 1.012655e-06
## REST1       -0.38324605 6.157114e-07
## Sohlenbrei   0.15704114 9.652643e-07
## Breaks_Dis  -0.02019886 2.084364e-07

glmmTMB

Note that we have now have to manually fix the variance of the intercept first.

Start by setting up the model, but do not yet fit it:

TMBStruc.fix = glmmTMB(Loc ~ STAU1 + REST1 + Sohlenbrei +
                         Breaks_Dis +  (1|str_ID), 
                       family=poisson, data=dat, doFit=FALSE)

Then fix the standard deviation of the first random term, which is the (1|str_ID) component in the above model equation:

TMBStruc.fix$parameters$theta[1] = log(1e3)

We need to tell glmmTMB not to change the variance by setting it to NA:

TMBStruc.fix$mapArg = list(theta=factor(c(NA)))

Then fit the model and look at the results:

glmm.TMB.fixed = glmmTMB:::fitTMB(TMBStruc.fix) 
summary(glmm.TMB.fixed)

##  Family: poisson  ( log )
## Formula:          
## Loc ~ STAU1 + REST1 + Sohlenbrei + Breaks_Dis + (1 | str_ID)
## Data: dat
## 
##      AIC      BIC   logLik deviance df.resid 
##  85062.9  85106.1 -42526.5  85052.9    41665 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  str_ID (Intercept) 1e+06    1000    
## Number of obs: 41670, groups:  str_ID, 4167
## 
## Conditional model:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.76924   15.49133  -0.179    0.858    
## STAU1       -0.07329    0.06863  -1.068    0.286    
## REST1       -0.38499    0.09723  -3.960 7.50e-05 ***
## Sohlenbrei   0.15680    0.03780   4.148 3.36e-05 ***
## Breaks_Dis  -0.02052    0.01682  -1.220    0.222    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Random effects models

2StepCLogit

The two-step procedure with independent random effect (D=“UN(1)”):

r.Twostep <-  Ts.estim(formula = Loc ~  STAU1 + REST1 + Sohlenbrei   + strata(NA_ID) + Breaks_Dis +
                     cluster(NA_ANIMAL) ,data = dat, random = ~ STAU1 + REST1 + Sohlenbrei ,
                   all.m.1=F, D="UN(1)")

Slope estiamtes and standard errors

r.Twostep$beta

##       STAU1       REST1  Sohlenbrei  Breaks_Dis 
##  0.04317714 -0.24426655  0.09830315  0.02651010

r.Twostep$se

##      STAU1      REST1 Sohlenbrei Breaks_Dis 
## 0.17224014 0.24010285 0.11712502 0.01675094

Variance estimates

r.Twostep$D

##                STAU1     REST1 Sohlenbrei Breaks_Dis
## STAU1      0.1740639 0.0000000 0.00000000          0
## REST1      0.0000000 0.3478413 0.00000000          0
## Sohlenbrei 0.0000000 0.0000000 0.07741911          0
## Breaks_Dis 0.0000000 0.0000000 0.00000000          0

INLA

To fit the model with random slopes in INLA, we need to generate new (but identical) variables of individual ID (ID cannot be used multiple times in the model formula):

dat$ANIMAL_ID1 <- dat$ANIMAL_ID
dat$ANIMAL_ID2 <- dat$ANIMAL_ID
dat$ANIMAL_ID3 <- dat$ANIMAL_ID

Set the model formula as for the fixed-effects model, but now add three random slope terms, namely for river width and for the two levels of the categorical variable (STAU1 and REST1) which are not the reference categories. The priors for precision of the three random slopes are PC(3,0.05), while the intercept variance is again fixed:

formula.random <- Loc ~  -1 + STAU1 + REST1 + Sohlenbrei +  
  Breaks_Dis +
  f(str_ID,model="iid",hyper=list(theta = list(initial=log(1e-6),fixed=T))) +
  f(ANIMAL_ID1,Sohlenbrei,values=1:9,model="iid",
    hyper=list(theta=list(initial=log(1),fixed=F,prior="pc.prec",param=c(3,0.05)))) + 
  f(ANIMAL_ID2,STAU1,values=1:9,model="iid",
    hyper=list(theta=list(initial=log(1),fixed=F,prior="pc.prec",param=c(3,0.05)))) + 
  f(ANIMAL_ID3,REST1,values=1:9,model="iid",
    hyper=list(theta=list(initial=log(1),fixed=F,prior="pc.prec",param=c(3,0.05))))

Fit the model

r.inla.random <- inla(formula.random, family ="Poisson", data=dat, 
                      control.fixed = list(
                        mean = mean.beta,
                        prec = list(default = prec.beta)
                      )
)

The summary for the posterior distribution of the fixed effects:

r.inla.random$summary.fixed

##                   mean         sd  0.025quant    0.5quant 0.975quant
## STAU1       0.02286614 0.18305962 -0.32311920  0.01404792 0.41644755
## REST1      -0.33412207 0.21707728 -0.76709956 -0.33615906 0.11162251
## Sohlenbrei  0.11148622 0.14265833 -0.18973420  0.11585016 0.38697776
## Breaks_Dis -0.01506916 0.01679532 -0.04845797 -0.01492708 0.01750034
##                    mode          kld
## STAU1      -0.001813017 8.728272e-07
## REST1      -0.339055180 9.145451e-07
## Sohlenbrei  0.122774074 2.038647e-07
## Breaks_Dis -0.014643913 2.073831e-07

Since variances are parameterized and treated as precisions, the summary of the respective posterior distributions is given for the precisions:

r.inla.random$summary.hyperpar

##                               mean        sd 0.025quant  0.5quant
## Precision for ANIMAL_ID1 13.439705 12.053290  2.0926356 10.025368
## Precision for ANIMAL_ID2  9.303474  9.826404  1.2568431  6.389658
## Precision for ANIMAL_ID3  7.119067  7.462072  0.9627068  4.909348
##                          0.975quant     mode
## Precision for ANIMAL_ID1   45.29953 5.396810
## Precision for ANIMAL_ID2   34.93340 3.170836
## Precision for ANIMAL_ID3   26.62829 2.439278

Source R functions for calculating posterior means and medians of the precisions.

source("inla_emarginal.R")
source("inla_mmarginal.R")
inla_emarginal(r.inla.random)

## Mean of variance for ANIMAL_ID1 Mean of variance for ANIMAL_ID2 
##                       0.1372604                       0.2200796 
## Mean of variance for ANIMAL_ID3 
##                       0.2867388

inla_mmarginal(r.inla.random)

## Mode of variance for ANIMAL_ID1 Mode of variance for ANIMAL_ID2 
##                      0.05448438                      0.07527460 
## Mode of variance for ANIMAL_ID3 
##                      0.09845147

glmmTMB

Now the same model using glmmTMB(). Again start to set up the model without fitting it:

TMBStruc = glmmTMB(Loc ~ -1 + STAU1 + REST1 + Sohlenbrei +  
                     Breaks_Dis +  (1|str_ID) + 
                     (0 + STAU1 | ANIMAL_ID) + 
                     (0 + REST1 | ANIMAL_ID)   + 
                     (0 + Sohlenbrei | ANIMAL_ID), 
                   family=poisson, data=dat, doFit=FALSE)

Set the value of the standard deviation of the first random effect (here (1|str_ID)):

TMBStruc$parameters$theta[1] = log(1e3)

Tell glmmTMB not to change the first standard deviation, all other values are freely estimated (and are different from each other)

TMBStruc$mapArg = list(theta=factor(c(NA,1:3)))

Fit the model and look at the summary:

glmm.TMB.random <- glmmTMB:::fitTMB(TMBStruc)
summary(glmm.TMB.random)

##  Family: poisson  ( log )
## Formula:          
## Loc ~ -1 + STAU1 + REST1 + Sohlenbrei + Breaks_Dis + (1 | str_ID) +  
##     (0 + STAU1 | ANIMAL_ID) + (0 + REST1 | ANIMAL_ID) + (0 +  
##     Sohlenbrei | ANIMAL_ID)
## Data: dat
## 
##      AIC      BIC   logLik deviance df.resid 
##  85026.6  85087.1 -42506.3  85012.6    41663 
## 
## Random effects:
## 
## Conditional model:
##  Groups      Name        Variance  Std.Dev. 
##  str_ID      (Intercept) 1.000e+06 1000.0000
##  ANIMAL_ID   STAU1       7.375e-02    0.2716
##  ANIMAL_ID.1 REST1       1.020e-01    0.3194
##  ANIMAL_ID.2 Sohlenbrei  7.062e-02    0.2657
## Number of obs: 41670, groups:  str_ID, 4167; ANIMAL_ID, 9
## 
## Conditional model:
##             Estimate Std. Error z value Pr(>|z|)  
## STAU1      -0.004082   0.136876  -0.030   0.9762  
## REST1      -0.350802   0.161325  -2.175   0.0297 *
## Sohlenbrei  0.122937   0.114338   1.075   0.2823  
## Breaks_Dis -0.015105   0.016801  -0.899   0.3686  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

95% CIs for fixed and random effects (standard deviations) are obtained via the confint() function:

confint(glmm.TMB.random)

##                                         2.5 %        97.5 %      Estimate
## cond.STAU1                        -0.27235446    0.26419106 -4.081702e-03
## cond.REST1                        -0.66699367   -0.03461004 -3.508019e-01
## cond.Sohlenbrei                   -0.10116157    0.34703532  1.229369e-01
## cond.Breaks_Dis                   -0.04803470    0.01782379 -1.510546e-02
## str_ID.cond.Std.Dev.(Intercept) 1000.00000000 1000.00000000  1.000000e+03
## ANIMAL_ID.cond.Std.Dev.STAU1       0.09196532    0.80191477  2.715665e-01
## ANIMAL_ID.1.cond.Std.Dev.REST1     0.09636594    1.05838186  3.193618e-01

Note: It is currently a problem of glmmTMB that the confint() function only shows a table of the length equal to the number of parameters estimated. As the variance for str_ID was not estimated but fixed to 10^6, but is still listed, the last variance component (here the one for (0 + Sohlenbrei | ANIMAL_ID)) is not shown. This can be solved by moving the component (1|str_ID) to the last position in the formula, and then use

TMBStruc$parameters$theta[4] = log(1e3) 
TMBStruc$mapArg = list(theta=factor(c(1:3, NA)))

in the above code.

Illustration of convergence problem for 2StepCLogit

knitr::opts_chunk$set(error = TRUE)

Remove all strata of animal 6 where at least one point (used or available) lied in residual water (REST1) habitat; keep only strata that exclusively contain natural (NAT1) and reservoir (STAU1) points

strata to be removed:

strata.remove <- unique(dat[which(dat$ANIMAL_ID=="6" & dat$REST1==1),"NA_ID"])

Keep only strata that are not in this list; store in a reduced dataset:

dat.red <- dat[!(dat$NA_ID%in%strata.remove),]

Number of total data points (used or available) that are removed (from the 41670 in total):

nrow(dat) - nrow(dat.red)

## [1] 120

Try to start the two-step procedure, but it does not run:

# Ts.estim(formula = Loc ~  STAU1 + REST1 + Sohlenbrei   + strata(NA_ID) + Breaks_Dis +
#   cluster(NA_ANIMAL) ,data = dat.red, random = ~ STAU1 + REST1 + Sohlenbrei ,
# all.m.1=F, D="UN(1)")

On the other hand, the same dataset is no problem for the Poisson model:

TMBStruc = glmmTMB(Loc ~ -1 + STAU1 + REST1 + Sohlenbrei +  
                     Breaks_Dis +  (1|str_ID) + 
                     (0 + STAU1 | ANIMAL_ID) + 
                     (0 + REST1 | ANIMAL_ID)   + 
                     (0 + Sohlenbrei | ANIMAL_ID), 
                   family=poisson, data=dat.red, doFit=FALSE) 

TMBStruc$parameters$theta[1] = log(1e3) 
TMBStruc$mapArg = list(theta=factor(c(NA,1:3)))
glmm.TMB.random <- glmmTMB:::fitTMB(TMBStruc)

summary(glmm.TMB.random)

##  Family: poisson  ( log )
## Formula:          
## Loc ~ -1 + STAU1 + REST1 + Sohlenbrei + Breaks_Dis + (1 | str_ID) +  
##     (0 + STAU1 | ANIMAL_ID) + (0 + REST1 | ANIMAL_ID) + (0 +  
##     Sohlenbrei | ANIMAL_ID)
## Data: dat.red
## 
##      AIC      BIC   logLik deviance df.resid 
##  84782.6  84843.1 -42384.3  84768.6    41543 
## 
## Random effects:
## 
## Conditional model:
##  Groups      Name        Variance  Std.Dev. 
##  str_ID      (Intercept) 1.000e+06 1000.0000
##  ANIMAL_ID   STAU1       6.297e-02    0.2509
##  ANIMAL_ID.1 REST1       1.068e-01    0.3268
##  ANIMAL_ID.2 Sohlenbrei  7.061e-02    0.2657
## Number of obs: 41550, groups:  str_ID, 4155; ANIMAL_ID, 9
## 
## Conditional model:
##             Estimate Std. Error z value Pr(>|z|)  
## STAU1      -0.002667   0.133579  -0.020   0.9841  
## REST1      -0.364769   0.166769  -2.187   0.0287 *
## Sohlenbrei  0.121956   0.114456   1.065   0.2866  
## Breaks_Dis -0.016119   0.016840  -0.957   0.3385  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Session Info

devtools::session_info()

##  setting  value                       
##  version  R version 3.4.4 (2018-03-15)
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language de_CH:de                    
##  collate  de_CH.UTF-8                 
##  tz       Europe/Zurich               
##  date     2019-06-17                  
## 
##  package       * version  date       source        
##  backports       1.1.4    2019-04-10 cran (@1.1.4) 
##  base          * 3.4.4    2018-03-16 local         
##  codetools       0.2-15   2016-10-05 CRAN (R 3.3.1)
##  compiler        3.4.4    2018-03-16 local         
##  datasets      * 3.4.4    2018-03-16 local         
##  Deriv           3.8.5    2018-06-11 CRAN (R 3.4.4)
##  devtools        1.13.4   2017-11-09 CRAN (R 3.4.2)
##  digest          0.6.18   2018-10-10 cran (@0.6.18)
##  evaluate        0.10.1   2017-06-24 CRAN (R 3.4.1)
##  glmmTMB       * 0.2.3    2019-01-11 CRAN (R 3.4.4)
##  graphics      * 3.4.4    2018-03-16 local         
##  grDevices     * 3.4.4    2018-03-16 local         
##  grid            3.4.4    2018-03-16 local         
##  htmltools       0.3.6    2017-04-28 CRAN (R 3.4.0)
##  INLA          * 18.07.12 2019-05-17 local         
##  knitr           1.17     2017-08-10 CRAN (R 3.4.1)
##  lattice         0.20-35  2017-03-25 CRAN (R 3.4.0)
##  lme4            1.1-20   2019-02-04 cran (@1.1-20)
##  magrittr        1.5      2014-11-22 CRAN (R 3.4.0)
##  MASS            7.3-47   2017-04-21 CRAN (R 3.4.2)
##  Matrix        * 1.2-14   2018-04-09 CRAN (R 3.4.4)
##  MatrixModels    0.4-1    2015-08-22 CRAN (R 3.4.0)
##  memoise         1.1.0    2017-04-21 CRAN (R 3.4.2)
##  methods       * 3.4.4    2018-03-16 local         
##  minqa           1.2.4    2014-10-09 CRAN (R 3.4.0)
##  nlme            3.1-137  2018-04-07 CRAN (R 3.4.4)
##  nloptr          1.2.1    2018-10-03 cran (@1.2.1) 
##  Rcpp            1.0.1    2019-03-17 cran (@1.0.1) 
##  rmarkdown       1.7      2017-11-10 CRAN (R 3.4.2)
##  rprojroot       1.3-2    2018-01-03 cran (@1.3-2) 
##  sp            * 1.2-5    2017-06-29 cran (@1.2-5) 
##  splines         3.4.4    2018-03-16 local         
##  stats         * 3.4.4    2018-03-16 local         
##  stringi         1.2.4    2018-07-20 cran (@1.2.4) 
##  stringr         1.3.1    2018-05-10 cran (@1.3.1) 
##  survival      * 2.41-3   2017-04-04 CRAN (R 3.4.0)
##  TMB             1.7.14   2018-06-23 CRAN (R 3.4.4)
##  tools           3.4.4    2018-03-16 local         
##  TwoStepCLogit * 1.2.5    2016-03-21 CRAN (R 3.4.0)
##  utils         * 3.4.4    2018-03-16 local         
##  withr           2.1.2    2018-03-15 CRAN (R 3.4.4)
##  yaml            2.1.14   2016-11-12 CRAN (R 3.4.1)

Habitat selection of otters (an SSF analysis)

S. Muff, J. Signer, J. Fieberg

17 Juni, 2019

Load libraries and read in data

Some data manipulation:

Fixed effects models

clogit

INLA

glmmTMB

Random effects models

2StepCLogit

INLA

glmmTMB

Illustration of convergence problem for 2StepCLogit

Session Info