```
require(foieGras)
#> Loading required package: foieGras
```

this vignette is an extended set of examples to highlight `foieGras`

’s functionality. Please, do NOT interpret these examples as instructions for conducting analysis of animal movement data. Numerous essential steps in a proper analysis have been left out of this document. It is your job to understand your data, ensure you are asking the right questions of your data, and that the analyses you undertake appropriately reflect those questions. We can not do this for you!

this vignette provides a (very) brief overview of how to use `foieGras`

to filter animal track locations obtained via the Argos satellite system or via processed light-level geolocation (GLS). `foieGras`

provides two state-space models (SSM’s) for filtering (ie. estimating “true” locations and associated movement model parameters, while accounting for error-prone observations):

- a simple Random Walk model,
`rw`

- a Correlated Random Walk model,
`crw`

both models are continuous-time models, that is, they account for the time intervals between successive observations, thereby naturally accounting for the commonly irregularly-timed nature of animal tracking data. We won’t dwell on the details of the models here (see Jonsen et al. 2020 for details on the `crw`

model), except to say there may be advantages to choosing one over the other in certain circumstances. The Random Walk model tends not to deal well with small to moderate gaps (relative to a specified time step) in observed locations and can over-fit to particularly noisy data. The Correlated Random Walk model can often deal better with these small to moderate data gaps and appropriately smooth through noisy data but tends to estimate nonsensical movement through larger data gaps.

additionally, `foieGras`

provides fast models (`mpm`

, `jmpm`

) for estimating a behavioural index along animals’ tracks (see Jonsen et al. 2019 for details). The `mpm`

is fit to individual tracks, whereas the `jmpm`

is fit to multiple tracks simultaneously with a variance parameter that is estimated jointly across the tracks. This latter model can often better resolve subtle changes in movement behaviour along tracks that lack much contrast in movements. Now, both models can be fit to time-regularized locations (discrete-time models) or to time-irregular locations (continuous-time models). See Auger-Méthé et al. 2017 for an example of the latter.

`foieGras`

expects data to be provided in one of several possible formats.

- a
`data.frame`

or`tibble`

that looks like this

```
#> id date lc lon lat smaj smin eor
#> 1 54591 2012-03-05 05:09:33 1 110.5707 -66.42752 2442 416 42
#> 2 54591 2012-03-06 04:55:14 0 110.3402 -66.39579 49660 391 90
#> 3 54591 2012-03-07 04:23:10 A 110.4778 -66.45266 5032 472 93
#> 4 54591 2012-03-07 21:23:06 A 110.3749 -66.39622 4007 286 116
#> 5 54591 2012-03-09 04:27:49 B 110.4732 -66.48743 13063 956 82
#> 6 54591 2012-03-10 00:10:41 A 110.5014 -66.43516 5099 478 79
```

where the Argos data are provided via CLS Argos’ Kalman filter model (KF) and include error ellipse information for each observed location.

- a
`data.frame`

or`tibble`

that looks like this

```
#> id date lc lon lat
#> 1 ct109-085-14 2015-02-03 00:11:02 B 70.45 -49.93
#> 2 ct109-085-14 2015-02-03 13:26:37 B 71.00 -50.21
#> 3 ct109-085-14 2015-02-03 21:53:15 B 71.31 -50.37
#> 4 ct109-085-14 2015-02-04 04:05:35 A 71.64 -50.43
#> 5 ct109-085-14 2015-02-04 17:12:42 B 72.04 -50.46
#> 6 ct109-085-14 2015-02-05 02:05:44 B 72.44 -50.47
```

where the Argos data are provided via CLS Argos’ Least-Squares model (LS) and do not include error ellipse information.

- a
`data.frame`

or`tibble`

that includes observations with missing KF error ellipse information

```
#> id date lc lon lat smaj smin eor
#> 1 54591 2012-03-05 05:09:33 1 110.5707 -66.42752 2442 416 42
#> 2 54591 2012-03-06 04:55:14 0 110.3402 -66.39579 49660 391 90
#> 3 54591 2012-03-07 04:23:10 A 110.4778 -66.45266 NA NA NA
#> 4 54591 2012-03-07 21:23:06 A 110.3749 -66.39622 NA NA NA
#> 5 54591 2012-03-09 04:27:49 B 110.4732 -66.48743 NA NA NA
#> 6 54591 2012-03-10 00:10:41 A 110.5014 -66.43516 5099 478 79
```

in this situation, `foieGras`

treats observations with missing error ellipse information as though they are LS-based observations.

- an
`sf-tibble`

where observations have any of the previous 3 structures and also include`CRS`

information

```
#> id date lc smaj smin eor geometry
#> 1 54591 2012-03-05 05:09:33 1 2442 416 42 POINT (2430.943 -912.3109)
#> 2 54591 2012-03-06 04:55:14 0 49660 391 90 POINT (2437.96 -903.7763)
#> 3 54591 2012-03-07 04:23:10 A 5032 472 93 POINT (2429.753 -907.3739)
#> 4 54591 2012-03-07 21:23:06 A 4007 286 116 POINT (2437.367 -905.2335)
#> 5 54591 2012-03-09 04:27:49 B 13063 956 82 POINT (2426.138 -905.8045)
#> 6 54591 2012-03-10 00:10:41 A 5099 478 79 POINT (2431.234 -909.0692)
```

- a
`data.frame`

,`tibble`

or`sf-tibble`

where processed GLS data are provided and include longitude and latitude error SD’s (in degrees). In this case, the`lc`

class is set to`GL`

for all GLS locations.

```
#> id date lc lon lat lonerr laterr
#> 1 54632 2021-04-26 14:51:25 GL 100.0 -55 0.80145678 0.24680960
#> 2 54632 2021-04-27 02:51:25 GL 100.5 -54 1.03366929 1.34625683
#> 3 54632 2021-04-27 14:51:25 GL 101.0 -53 0.03611925 2.84626597
#> 4 54632 2021-04-28 02:51:25 GL 101.5 -52 1.15502022 1.67499383
#> 5 54632 2021-04-28 14:51:25 GL 102.0 -51 0.98906118 0.05397196
```

- a
`data.frame`

,`tibble`

or`sf-tibble`

where GPS data are provided. In this case, the`lc`

class is set to`G`

for all GPS locations.

```
#> id date lc lon lat
#> 1 F02-B-17 2021-04-26 14:51:25 G 70.1 -49.2
#> 2 F02-B-17 2021-04-26 15:51:25 G 70.6 -48.2
#> 3 F02-B-17 2021-04-26 16:51:25 G 71.1 -47.2
#> 4 F02-B-17 2021-04-26 17:51:25 G 71.6 -46.2
#> 5 F02-B-17 2021-04-26 18:51:25 G 72.1 -45.2
```

- a
`data.frame`

,`tibble`

or`sf-tibble`

where any combination of Argos, GLS or GPS locations can be intermixed - though, most typically this would be a combination of Argos and GPS locations.

```
#> id date lc lon lat smaj smin eor
#> 1 F02-B-17 2017-09-17 05:20:00 G 70.1 -49.2 NA NA NA
#> 2 F02-B-17 2017-10-04 14:35:01 2 70.2 -49.1 1890 45 77
#> 3 F02-B-17 2017-10-05 04:03:25 G 70.1 -49.3 NA NA NA
#> 4 F02-B-17 2017-10-05 06:28:20 A 71.1 -48.7 28532 1723 101
#> 5 F02-B-17 2017-10-05 10:21:18 B 70.8 -48.5 45546 3303 97
```

model fitting for quality control of locations is comprised of 2 steps: a prefilter step where a number of checks are made on the input data (see `?foieGras::fit_ssm`

for details), including applying the `trip::sda`

filter to identify extreme outlier observations. Additionally, if the input data are not supplied as an `sf`

object,the prefilter guesses at an appropriate projection (typically world mercator, EPSG 3395) to apply to the data. The SSM is then fit to this projected version of the data. Users invoke this process via the `fit_ssm`

function:

```
## prefilter and fit Random Walk SSM using a 24 h time step
<-
fit fit_ssm(
sese1,model = "rw",
time.step = 24,
control = ssm_control(verbose = 0)
)
```

these are the minimum arguments required: the input data, the model (`rw`

or `crw`

) and the time.step (in h) to which locations are predicted (the argument `control = ssm_control(verbose = 0)`

is included for vignette tidyness). Additional control can be exerted over the prefiltering step, via the `vmax`

, `ang`

, `distlim`

, `spdf`

and `min.dt`

arguments. see `?fit_ssm`

for details, the defaults for these arguments are quite conservative (for non-flying species), usually leading to relative few observations being flagged to be ignored by the SSM. Additional control over the SSM fitting step can also be exerted via the `control = ssm_control()`

argument, see `?ssm_control`

for details.

`fit_ssm`

can be applied to single tracks or to multiple tracks, as shown above. The SSM is fit to each individual separately and the resulting output is a compound `tibble`

with rows corresponding to each individual `fG_ssm`

fit object. The `converged`

column indicates whether each model fit converged successfully.

```
## list fit outcomes for both seals
fit#> # A tibble: 1 x 5
#> id ssm converged pdHess pmodel
#> <chr> <named list> <lgl> <lgl> <chr>
#> 1 ct36-E-09 <ssm [15]> TRUE TRUE rw
```

individual `id`

is displayed in the 1st column, all fit output (`ssm`

) in the 2nd column, `convergence`

status (whether the optimizer found a global minimum) of each model fit is displayed in the 3rd column, whether the Hessian matrix was positive-definite and could be solved to obtain standard errors (`pdHess`

) is displayed in the 4th column, and the specified process model (`rw`

or `crw`

) in the 5th column. In some cases, the optimizer will converge but the Hessian matrix is not positive-definite, which typically indicates the optimizer converged on a local minimum. In this case, some standard errors can often be calculated but not all. One possible solution is to try specifying a longer `time.step`

or set `time.step = NA`

to turn off predictions and return only fitted values (location estimates at the pre-filtered observation times). If `pdHess = FALSE`

persists then careful inspection of the supplied data is warranted to determine if suspect observations not identified by `prefilter`

are present. The excellent glmmTMB troubleshooting vignette may also provide hints at solutions. Convergence failures should be examined for potential data issues, however, in some cases changes to the optimization parameters via `ssm_control()`

(see `?fit_ssm`

and `?ssm_control`

on usage) may overcome mild issues (see `?nlminb`

or `?optim`

for details on optimization control parameters).

simple summary information about the fit can be obtained by calling the individual fit objects:

```
$ssm[[1]]
fit#> Process model: rw
#> Time interval: 24 hours
#> number of original observations: 678
#> number of observations: 497
#> number of fitted states: 497
#> number of predicted states: 58
#>
#> parameter estimates
#> -------------------
#> Estimate Std. Error
#> rho_p -0.57210 0.05063
#> sigma_x 9.22751 0.44432
#> sigma_y 7.38602 0.32627
#> rho_o 0.46945 0.08541
#> tau_x 0.34763 0.02734
#> tau_y 0.32041 0.02210
#> -------------------
#> negative log-likelihood: 4409.909
#> convergence: yes
```

here, the `[[1]]`

denotes the first individual and so on. The summary table lists all estimated parameters, the specific ones listed depend on the process model selected and the data type. Here, `sigma_x`

and `sigma_y`

are the process error standard deviations in the x and y directions, `rho_p`

is the correlation parameter in the covariance term, `tau_x`

and `tau_y`

are the observation error standard deviations, and `rho_o`

is the correlation parameter. The `Std. Error`

column lists the standard errors, calculated via the Delta method (see TMB documentation for details), for each estimated parameter.

a summary `plot`

method allows a quick visual of the SSM fit to the data:

```
# plot time-series of the predicted values
plot(fit, what = "predicted", type = 1, pages = 1)
```

`plot(fit, what = "fitted", type = 2, pages = 1)`

the predicted values (red) are the state estimates predicted at regular time intervals, specified by `time.step`

(here every 24 h). These estimates are plotted on top of the observations that passed the `prefilter`

stage (blue points and blue rug at bottom). Fitted values are the state estimates corresponding to the time of each observation; their time series are plotted by default - `plot(fit)`

. A 2-D time series plot of the track is invoked by the argument `type = 2`

.

Assessing goodness-of-fit is an important component of any model fitting exercise. Residual plots are important for validating models, but classical Pearson residuals, for example, are not appropriate for state-space models. Instead, a prediction residual, or one-step-ahead residual, provides a useful alternative - albeit more computationally demanding to calculate. In `foieGras`

, prediction residuals from state-space model fits are calculated using the `osar`

function and can be visualised as time-series plots, Q-Q plots, or autocorrelation functions:

```
require(patchwork)
# calculate & plot residuals
<- osar(fit)
res
plot(res, type = "ts") | plot(res, type = "qq")) /
(plot(res, type = "acf") | plot_spacer()) (
```

As calculation of these residuals is computationally demanding, especially for multiple individual tracks, the `osar`

function is automatically implemented in parallel when calculating residuals for more than 2 tracks.

A behavioural index, movement persistence, can be estimated along SSM-fitted tracks by using the `fit_mpm`

function. This index ranges continuously from 0, associated with frequent changes in direction and/or speed, to 1, associated with infrequent changes (persistence) in direction and/or speed. Here we use the `fG_ssm`

fit object from earlier as data to estimate and visualise the time-varying movement persistence as a 1-D time-series and along an animal’s ssm-filtered track:

```
<- fit_mpm(fit, what = "predicted", model = "mpm", control = mpm_control(verbose = 0))
fmp
plot(fmp, pages = 1, pal = "Zissou1", rev = TRUE)
```

```
plot(fmp, fit, pages = 1, pal = "Cividis")
```