The purpose of this document is to illustrate how a simple RSF and SSF with random effects can be fitted to tracking data. We use a data set containing fisher locations from:
library(glmmTMB)
library(INLA)
library(tidyverse)
library(raster)
library(survival)
library(TwoStepCLogit)
library(amt)
First load the fisher data (this is the data as it was downloaded from movebank). We simplified the fisher data slightly, by running the following code prior to upload the data (to save space). This code is not needed, unless you download the original data from: https://www.datarepository.movebank.org/handle/10255/move.330.
dat <- read_csv("Martes pennanti LaPoint New York.csv") %>%
filter(!is.na(`location-lat`)) %>%
select(x = `location-long`, y = `location-lat`,
t = `timestamp`, id = `tag-local-identifier`) %>%
filter(id %in% c(1465, 1466, 1072, 1078, 1016, 1469))
write_csv(dat, "Fisher_analysis/fisher_data.csv")
Now lets start by reading in the simplified fisher data
dat <- read_csv("fisher_data.csv")
Include sex of each animal and create tracks with an appropriate coordinate reference system using the amt package
dat_all <- dat %>% nest(-id)
dat_all$sex <- c("f", "f", "f", "m", "m", "m")
dat_all <- dat_all %>%
mutate(trk = map(data, function(d) {
make_track(d, x, y, t, crs = sp::CRS("+init=epsg:4326")) %>%
transform_coords(sp::CRS("+init=epsg:5070"))
}))
Summarize sampling rates,
dat_all %>% mutate(sr = lapply(trk, summarize_sampling_rate)) %>%
select(id, sr) %>% unnest
## # A tibble: 6 x 10
## id min q1 median mean q3 max sd n unit
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <chr>
## 1 1072 3.92 9.8 10.0 20.7 10.3 1650. 95.2 1348 min
## 2 1465 0.4 1.97 2.03 8.93 3.98 1080. 48.1 3003 min
## 3 1466 0.417 1.97 2.07 15.7 4.08 2167. 104. 1500 min
## 4 1078 1.35 9.78 10.0 21.7 10.3 2111. 87.2 1637 min
## 5 1469 0.417 1.97 2.17 13.3 5.42 2889. 90.2 2435 min
## 6 1016 0.1 1.93 2.03 8.04 2.57 1209. 44.0 8957 min
10 minutes seems to appropriate for all animals. Resample the track to 10 minutes with a tolerance of 2 minutes.
dat1 <- dat_all %>% mutate(dat_clean = map(trk, ~ {
.x %>% track_resample(rate = minutes(10), tolerance = seconds(120))
}))
Read in the landuse raster and reclassify to two categories (wet forests and other).
landuse <- raster("landuse_study_area.tif")
wet_forests <- landuse %in% c(90, 95)
names(wet_forests) <- "forest"
Now start with an RSF by creating random points per animal and extract the covariates for the observed and random points.
dat_rsf <- dat1 %>% mutate(rp = map(dat_clean, ~ .x %>% random_points() %>%
extract_covariates(wet_forests))) %>%
select(id, rp) %>% unnest()
Change id column, to 1:6
dat_rsf$id <- as.numeric(factor(dat_rsf$id))
Make response numeric (required for INLA)
dat_rsf$y <- as.numeric(dat_rsf$case_)
We use a weighted likelihood for to fit the RSF. To this end, we need to create a variable for the weights, where used points (case_ = TRUE
) keep weight 1, and available points (case_ = FALSE
) obtain a large weight \(W\) (here \(W=1000\)):
dat_rsf$weight <- 1000^(1 - dat_rsf$case_)
As explained in the manuscript (Section 3.4), we recommend to manually fix the variance of the random intercept at a large value. This can be done in glmmTMB() by first setting up the model, but do not yet fit it:
fisher.tmp <- glmmTMB(case_ ~ forest + (1|id) + (0 + forest |id) , family=binomial(), data = dat_rsf,
doFit=FALSE, weights = weight)
Then fix the standard deviation of the first random term, which is the (1|id)
component in the above model equation. We use \(\sigma=10^3\), which corresponds to a variance of \(10^6\):
fisher.tmp$parameters$theta[1] <- log(1e3)
We need to tell glmmTMB
not to change the first entry of the vector of variances, and give all other variances another indicator to make sure they can be freely estimated:
fisher.tmp$mapArg <- list(theta=factor(c(NA, 1)))
Then fit the model and look at the results:
fisher.rsf <- glmmTMB:::fitTMB(fisher.tmp)
summary(fisher.rsf)
## Family: binomial ( logit )
## Formula: case_ ~ forest + (1 | id) + (0 + forest | id)
## Data: dat_rsf
## Weights: weight
##
## AIC BIC logLik deviance df.resid
## 167133.4 167161.6 -83563.7 167127.4 90978
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## id (Intercept) 1.000e+06 1000.0000
## id.1 forest 2.569e-01 0.5068
## Number of obs: 90981, groups: id, 6
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.000185 408.280851 0.000 1
## forest 0.902882 0.209202 4.316 1.59e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Let us now carry the analysis with random intercept \(\mathsf{N}(0,\sigma_{id}^2)\) and fixed variance \(\sigma_{id}^2=10^6\) using INLA. A peculiarity of INLA is that the same variable cannot be used more than once. So for ID we need to generate a new (but identical) variable
dat_rsf$id1 <-dat_rsf$id
For the fixed effects we use the INLA (default) priors \(\beta \sim \mathsf{N}(0,\sigma_\beta^2)\) with \(\sigma_\beta^2=10^4\). The precisions of the priors are thus set to:
prec.beta.forest <- 1e-4
We now store the INLA formula with the fixed effects forest
, plus two random effects, namely one for the individual-specific intercept and one for the individual-specific slope for forest
. Note that the precision (thus \(1/\sigma^2\)) for id
is fixed (fixed=TRUE
) at the value of \(10^{-6}\) (thus the variance is fixed at \(10^6\)). The other precision is given a PC(1,0.05) prior:
formula.inla <- y ~ forest +
f(id, model="iid", hyper=list(theta = list(initial=log(1e-6),fixed=TRUE))) +
f(id1,forest,values=1:6,model="iid",
hyper=list(theta=list(initial=log(1),fixed=F,prior="pc.prec",param=c(1,0.05))))
The actual INLA call is then given as follows:
inla.setOption(enable.inla.argument.weights=TRUE)
fisher.inla <- inla(formula.inla, family ="binomial", data=dat_rsf, weights=dat_rsf$weight,
control.fixed = list(
mean = 0,
prec = list(forest = prec.beta.forest)
)
)
The summary for the posterior distribution of the fixed effects is given as follows:
fisher.inla$summary.fixed
## mean sd 0.025quant 0.5quant 0.975quant
## (Intercept) -9.4834515 418.0462496 -830.2491981 -9.4952241 810.597320
## forest 0.9023449 0.2522929 0.3916488 0.9030925 1.408816
## mode kld
## (Intercept) -9.4834515 2.009582e-14
## forest 0.9046724 5.460981e-04
Since variances are parameterized and treated as precisions, the summary of the respective posterior distributions is given for the precisions:
fisher.inla$summary.hyperpar
## mean sd 0.025quant 0.5quant 0.975quant mode
## Precision for id1 3.57532 2.043031 0.8818724 3.172212 8.681721 3.177542
Source R functions for calculating posterior means and medians of the precisions.
source("inla_emarginal.R")
source("inla_mmarginal.R")
inla_emarginal(fisher.inla)
## Mean of variance for id1
## 0.3826514
inla_mmarginal(fisher.inla)
## Mode of variance for id1
## 0.3139998
First we need to prepare the data. We need to pair each observed point with 10 random points and extract the covariate value at the end point of each step.
dat_ssf <- dat1 %>%
mutate(stps = map(dat_clean, ~ .x %>% steps_by_burst() %>%
random_steps() %>% extract_covariates(wet_forests))) %>%
select(id, stps) %>% unnest() %>%
mutate(
y = as.numeric(case_),
id = as.numeric(factor(id)),
step_id = paste0(id, step_id_, sep = "-"))
dat_ssf
## # A tibble: 58,608 x 16
## id burst_ step_id_ case_ x1_ y1_ x2_ y2_
## <dbl> <dbl> <int> <lgl> <dbl> <dbl> <dbl> <dbl>
## 1 2 4 1 TRUE 1.78e6 2.41e6 1.78e6 2.41e6
## 2 2 4 1 FALSE 1.78e6 2.41e6 1.78e6 2.41e6
## 3 2 4 1 FALSE 1.78e6 2.41e6 1.78e6 2.41e6
## 4 2 4 1 FALSE 1.78e6 2.41e6 1.78e6 2.41e6
## 5 2 4 1 FALSE 1.78e6 2.41e6 1.78e6 2.41e6
## 6 2 4 1 FALSE 1.78e6 2.41e6 1.78e6 2.41e6
## 7 2 4 1 FALSE 1.78e6 2.41e6 1.78e6 2.41e6
## 8 2 4 1 FALSE 1.78e6 2.41e6 1.78e6 2.41e6
## 9 2 4 1 FALSE 1.78e6 2.41e6 1.78e6 2.41e6
## 10 2 4 1 FALSE 1.78e6 2.41e6 1.78e6 2.41e6
## # … with 58,598 more rows, and 8 more variables: t1_ <dttm>, t2_ <dttm>,
## # dt_ <drtn>, sl_ <dbl>, ta_ <dbl>, forest <dbl>, y <dbl>, step_id <chr>
The two-step procedure with independent random effect (D=“UN(1)”):
r.Twostep <- Ts.estim(formula = y ~ forest + strata(step_id) +
cluster(id), data = dat_ssf, random = ~ forest,
all.m.1=F, D="UN(1)")
Slope estimates and standard errors
r.Twostep$beta
## forest
## 0.4943116
r.Twostep$se
## forest
## 0.1427203
Variance estimates
r.Twostep$D
## forest
## forest 0.1070722
Now the same model using glmmTMB()
. Note that we do not need an overall intercept in this model, because the stratum-specific intercepts are (almost) freely estimated due to the large, fixed variance. Again start to set up the model without fitting the model:
TMBStruc <- glmmTMB(y ~ -1 + forest + (1|step_id) +
(0 + forest | id),
family=poisson, data = dat_ssf, doFit=FALSE)
Set the value of the standard deviation of the first random effect (here (1|step_id)):
TMBStruc$parameters$theta[1] <- log(1e3)
Tell glmmTMB not to change the first standard deviation, all other values are freely estimated (and are different from each other)
TMBStruc$mapArg <- list(theta=factor(c(NA,1)))
Fit the model and look at the summary:
glmm.TMB.random <- glmmTMB:::fitTMB(TMBStruc)
summary(glmm.TMB.random)
## Family: poisson ( log )
## Formula: y ~ -1 + forest + (1 | step_id) + (0 + forest | id)
## Data: dat_ssf
##
## AIC BIC logLik deviance df.resid
## 109683.5 109701.5 -54839.8 109679.5 58606
##
## Random effects:
##
## Conditional model:
## Groups Name Variance Std.Dev.
## step_id (Intercept) 1.000e+06 1000.0000
## id forest 8.737e-02 0.2956
## Number of obs: 58608, groups: step_id, 5328; id, 6
##
## Conditional model:
## Estimate Std. Error z value Pr(>|z|)
## forest 0.4937 0.1306 3.78 0.000157 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
95% CIs for fixed and random effects (standard deviations) are obtained via the confint() function:
confint(glmm.TMB.random)
## 2.5 % 97.5 % Estimate
## cond.forest 0.2377399 0.7497029 0.4937214
## step_id.cond.Std.Dev.(Intercept) 1000.0000000 1000.0000000 1000.0000000
Note: It is currently a problem of glmmTMB that the confint() function only shows a table of the length equal to the number of parameters estimated. As the variance for str_ID was not estimated but fixed to 10^6, but is still listed, the last variance component (here the one for (0 + forest | id)) is not shown. This can be solved by moving the component (1|step_id) to the last position in the formula, and then replace the respective lines by
TMBStruc$parameters$theta[1] = log(1e3)
TMBStruc$mapArg = list(theta=factor(c(1, NA)))
in the above code.
Start by setting the precision for the priors of slope coefficients (here for forest
).
prec.beta.forest <- 1e-4
In the model formula for INLA, we set the stratum-specific intercept variance to \(10^6\) (or rather: the precision to \(10^{−6}\)) by fixing it (fixed=T
) to an intitial value. The other precision is given a PC(1,0.05) prior:
formula.random <- y ~ -1 + forest +
f(step_id,model="iid",hyper=list(theta = list(initial=log(1e-6),fixed=T))) +
f(id, forest, values=1:6,model="iid",
hyper=list(theta=list(initial=log(1),fixed=F,prior="pc.prec",param=c(1,0.05))))
Fit the model
fisher.inla.ssf <- inla(formula.random, family ="Poisson", data=dat_ssf,
control.fixed = list(
mean = 0,
prec = list(default = prec.beta.forest)
)
)
The summary for the posterior distribution of the fixed effects:
fisher.inla.ssf$summary.fixed
## mean sd 0.025quant 0.5quant 0.975quant mode
## forest 0.4942391 0.1688319 0.1547562 0.4936275 0.8361327 0.4923019
## kld
## forest 0.0003256006
Since variances are parameterized and treated as precisions, the summary of the respective posterior distributions is given for the precisions:
fisher.inla.ssf$summary.hyperpar
## mean sd 0.025quant 0.5quant 0.975quant mode
## Precision for id 10.11578 7.516201 2.024617 8.25162 31.6272 5.936236
Posterior mean and mode are obtained as
inla_emarginal(fisher.inla.ssf)
## Mean of variance for id
## 0.1572805
inla_mmarginal(fisher.inla.ssf)
## Mode of variance for id
## 0.08441722
devtools::session_info()
## setting value
## version R version 3.4.4 (2018-03-15)
## system x86_64, linux-gnu
## ui X11
## language de_CH:de
## collate de_CH.UTF-8
## tz Europe/Zurich
## date 2019-06-17
##
## package * version date source
## amt * 0.0.4.0 2018-04-21 CRAN (R 3.4.4)
## assertthat 0.2.1 2019-03-21 cran (@0.2.1)
## backports 1.1.4 2019-04-10 cran (@1.1.4)
## base * 3.4.4 2018-03-16 local
## boot 1.3-20 2017-07-30 CRAN (R 3.4.1)
## broom 0.4.2 2017-02-13 CRAN (R 3.4.0)
## cellranger 1.1.0 2016-07-27 CRAN (R 3.4.0)
## circular 0.4-93 2017-06-29 cran (@0.4-93)
## cli 1.1.0 2019-03-19 cran (@1.1.0)
## codetools 0.2-15 2016-10-05 CRAN (R 3.3.1)
## colorspace 1.3-2 2016-12-14 CRAN (R 3.4.0)
## compiler 3.4.4 2018-03-16 local
## crayon 1.3.4 2017-09-16 CRAN (R 3.4.2)
## datasets * 3.4.4 2018-03-16 local
## Deriv 3.8.5 2018-06-11 CRAN (R 3.4.4)
## devtools 1.13.4 2017-11-09 CRAN (R 3.4.2)
## digest 0.6.18 2018-10-10 cran (@0.6.18)
## dplyr * 0.8.0.1 2019-02-15 CRAN (R 3.4.4)
## evaluate 0.10.1 2017-06-24 CRAN (R 3.4.1)
## fansi 0.4.0 2018-10-05 cran (@0.4.0)
## fitdistrplus 1.0-9 2017-03-24 cran (@1.0-9)
## forcats * 0.2.0 2017-01-23 CRAN (R 3.4.0)
## foreign 0.8-70 2018-04-23 CRAN (R 3.4.4)
## ggplot2 * 3.1.1 2019-04-07 cran (@3.1.1)
## glmmTMB * 0.2.3 2019-01-11 CRAN (R 3.4.4)
## glue 1.3.1 2019-03-12 cran (@1.3.1)
## graphics * 3.4.4 2018-03-16 local
## grDevices * 3.4.4 2018-03-16 local
## grid 3.4.4 2018-03-16 local
## gtable 0.3.0 2019-03-25 cran (@0.3.0)
## haven 1.1.0 2017-07-09 CRAN (R 3.4.2)
## hms 0.3 2016-11-22 CRAN (R 3.4.0)
## htmltools 0.3.6 2017-04-28 CRAN (R 3.4.0)
## httr 1.3.1 2017-08-20 CRAN (R 3.4.2)
## INLA * 18.07.12 2019-05-17 local
## jsonlite 1.5 2017-06-01 cran (@1.5)
## knitr 1.17 2017-08-10 CRAN (R 3.4.1)
## lattice 0.20-35 2017-03-25 CRAN (R 3.4.0)
## lazyeval 0.2.2 2019-03-15 cran (@0.2.2)
## lme4 1.1-20 2019-02-04 cran (@1.1-20)
## lubridate 1.7.1 2017-11-03 CRAN (R 3.4.2)
## magrittr 1.5 2014-11-22 CRAN (R 3.4.0)
## MASS 7.3-47 2017-04-21 CRAN (R 3.4.2)
## Matrix * 1.2-14 2018-04-09 CRAN (R 3.4.4)
## MatrixModels 0.4-1 2015-08-22 CRAN (R 3.4.0)
## memoise 1.1.0 2017-04-21 CRAN (R 3.4.2)
## methods * 3.4.4 2018-03-16 local
## minqa 1.2.4 2014-10-09 CRAN (R 3.4.0)
## mnormt 1.5-5 2016-10-15 CRAN (R 3.4.0)
## modelr 0.1.1 2017-07-24 CRAN (R 3.4.2)
## munsell 0.5.0 2018-06-12 cran (@0.5.0)
## mvtnorm 1.0-6 2017-03-02 CRAN (R 3.4.0)
## nlme 3.1-137 2018-04-07 CRAN (R 3.4.4)
## nloptr 1.2.1 2018-10-03 cran (@1.2.1)
## parallel 3.4.4 2018-03-16 local
## pillar 1.4.0 2019-05-11 cran (@1.4.0)
## pkgconfig 2.0.2 2018-08-16 cran (@2.0.2)
## plyr 1.8.4 2016-06-08 CRAN (R 3.4.2)
## psych 1.8.4 2018-05-06 CRAN (R 3.4.4)
## purrr * 0.2.5 2018-05-29 cran (@0.2.5)
## R6 2.4.0 2019-02-14 cran (@2.4.0)
## raster * 2.6-7 2017-11-13 CRAN (R 3.4.2)
## Rcpp 1.0.1 2019-03-17 cran (@1.0.1)
## readr * 1.1.1 2017-05-16 CRAN (R 3.4.2)
## readxl 1.0.0 2017-04-18 CRAN (R 3.4.0)
## reshape2 1.4.3 2017-12-11 cran (@1.4.3)
## rgdal 1.2-15 2017-10-30 CRAN (R 3.4.2)
## rgeos 0.3-26 2017-10-31 CRAN (R 3.4.2)
## rlang 0.3.4 2019-04-07 cran (@0.3.4)
## rmarkdown 1.7 2017-11-10 CRAN (R 3.4.2)
## rprojroot 1.3-2 2018-01-03 cran (@1.3-2)
## rstudioapi 0.7 2017-09-07 CRAN (R 3.4.2)
## rvest 0.3.2 2016-06-17 CRAN (R 3.4.0)
## scales 1.0.0 2018-08-09 cran (@1.0.0)
## sp * 1.2-5 2017-06-29 cran (@1.2-5)
## splines 3.4.4 2018-03-16 local
## stats * 3.4.4 2018-03-16 local
## stringi 1.2.4 2018-07-20 cran (@1.2.4)
## stringr * 1.3.1 2018-05-10 cran (@1.3.1)
## survival * 2.41-3 2017-04-04 CRAN (R 3.4.0)
## tibble * 2.1.1 2019-03-16 cran (@2.1.1)
## tidyr * 0.7.2 2017-10-16 cran (@0.7.2)
## tidyselect 0.2.5 2018-10-11 CRAN (R 3.4.4)
## tidyverse * 1.2.1 2017-11-14 CRAN (R 3.4.3)
## TMB 1.7.14 2018-06-23 CRAN (R 3.4.4)
## tools 3.4.4 2018-03-16 local
## TwoStepCLogit * 1.2.5 2016-03-21 CRAN (R 3.4.0)
## utf8 1.1.4 2018-05-24 cran (@1.1.4)
## utils * 3.4.4 2018-03-16 local
## vctrs 0.1.0 2018-11-29 cran (@0.1.0)
## withr 2.1.2 2018-03-15 CRAN (R 3.4.4)
## xml2 1.2.0 2018-01-24 cran (@1.2.0)
## yaml 2.1.14 2016-11-12 CRAN (R 3.4.1)
## zeallot 0.1.0 2018-01-28 cran (@0.1.0)