Building a worklfow for Warblers acorss Pennsylvania state

This vignette illustrates using the intSDM R package for three types of warbler distributed across Pennsylvania on the Eastern side of the United States of America. This case study has been used in numerous other integrated species distribution model analyses and includes three datasets: eBird, North American Breeding Bird Survey (BBS) and Pennsylvania Breeding Bird Atlas (BBA). Details on the data and the selection of observation models for each are provided in Mostert and O’Hara (2023), Isaac et al. (2020) and Miller et al. (2019).


We will assume that the BBA and BBS data are not provided on GBIF, and thus will load them directly from the PointedSDMs package.

BBA <- PointedSDMs::SetophagaData$BBA
BBA$Species_name <- paste0('Setophaga_', BBA$Species_name)
BBS <- PointedSDMs::SetophagaData$BBS
BBS$Species_name <- paste0('Setophaga_', BBS$Species_name)

We will then initialize the workflow using the startWorkflow function.

workflow <- startWorkflow(
  Projection = "WGS84",
  Species = c("Setophaga_caerulescens", "Setophaga_fusca", "Setophaga_magnolia"),
  saveOptions = list(projectName =  'Setophaga'), Save = FALSE

The .$addArea() function only gives us access to country borders. However we can easily add other polygon objects to the workflow using the Object argument from the function.

workflow$addArea(Object = USAboundaries::us_states(states = "Pennsylvania"))

Next we add data to the analysis. The eBird dataset is available to download directly from GBIF, and thus may be downloaded into out workflow using the .$addGBIF function and specifying the relevant datasetKey. The other two datasets are not directly available on GBIF, but may still be added using the .$addStructured function. This requires us to specify the response name of each dataset (using the responseName argument), and the species name variable (using the speciesName argument).

workflow$addGBIF(datasetName = 'eBird', datasetType = 'PO', limit = 5000,
                 datasetKey = '4fa7b334-ce0d-4e88-aaae-2e0c138d049e')

workflow$addStructured(dataStructured = BBA, datasetType = 'PA',
                       responseName = 'NPres', 
                       speciesName = 'Species_name')

workflow$addStructured(dataStructured = BBS, datasetType = 'Counts',
                       responseName = 'Counts', 
                       speciesName = 'Species_name')

workflow$plot(Species = TRUE)

We can then add the elevation and canopy covariates from the PointedSDMs package. If we were planning on using worldClim covariates, we could have used the worldClim argument by specifying which variable we wanted to download.

covariates <- scale(terra::rast(system.file('extdata/SetophagaCovariates.tif', 
                                      package = "PointedSDMs")))
names(covariates) <- c('elevation', 'canopy')

workflow$addCovariates(Object = covariates)
workflow$plot(Covariates = TRUE)

We specify an additional field for the eBird dataset using the .$biasFields argument, and an inla.mesh object using .$addMesh.

workflow$biasFields(datasetName  = 'eBird')

workflow$addMesh(cutoff = 0.2,
                 max.edge = c(0.1, 0.24),
                 offset = c(0.1, 0.4))

For this case study, we specify the model outcome as Model. This will give us the R-INLA model outcome for which we could analyse further.


workflow$modelOptions(INLA = list(control.inla=list(int.strategy = 'eb',
                                                    cmin = 0),
                                  safe = TRUE,
                                  inla.mode = 'experimental'))

Models <- sdmWorkflow(Workflow = workflow)

lapply(unlist(Models, recursive = FALSE), summary)
Isaac, Nick JB, Marta A Jarzyna, Petr Keil, Lea I Dambly, Philipp H Boersch-Supan, Ella Browning, Stephen N Freeman, et al. 2020. “Data Integration for Large-Scale Models of Species Distributions.” Trends in Ecology and Evolution 35 (1): 56–67.
Miller, David AW, Krishna Pacifici, Jamie S Sanderlin, and Brian J Reich. 2019. “The Recent Past and Promising Future for Data Integration Methods to Estimate Species’ Distributions.” Methods in Ecology and Evolution 10 (1): 22–37.
Mostert, Philip S, and Robert B O’Hara. 2023. PointedSDMs: An R Package to Help Facilitate the Construction of Integrated Species Distribution Models.” Methods in Ecology and Evolution 14 (5): 1200–1207.