tRophicPosition: calculating trophic position for multiple species

Claudio Quezada-Romegialli, Andrew L Jackson & Chris Harrod

October 05 2017

tRophicPosition

tRophicPosition, is an R package incorporating a Bayesian model for the calculation of trophic position using stable isotopes with one or two baselines. As of 2017-10-05, the current version of the package is 0.7.2. tRophicPosition uses the powerful approach of Markov Chain Monte Carlo simulations provided by JAGS and the statistical language R. Vignettes can be browsed with browseVignettes("tRophicPosition").

Introduction

In this vignette, we will introduce the new functions we have developed for tRophicPosition in the version 0.6.8-9. These functions are extractIsotopeData(), jagsBayesianModel() and multiSpeciesTP(). Aditionally, we provide an example to run these calculations in parallel (maximizing efficient use of computing power). Each of these functions accomplish different tasks within the Bayesian estimation of trophic position with stable isotopes, extracting stable isotope data for multiple species and/or communities, and facilitating the calculation of trophic position for multiple consumers within one or more ecosystems. We have also included a parallel calculation of trophic position, to make full use of the multiple processor cores typical in modern computers.

Installing and loading the package

Stable version - CRAN

First of all, you need to install JAGS for your platform, and then install the stable version of tRophicPosition from CRAN:

install.packages("tRophicPosition")

After that, you have to load the package with:

library(tRophicPosition)

Development version - GitHub

If you want to install the development version of tRophicPosition, you must install it from GitHub. For this, we use the function install_github() from the package devtools (installation instructions here), which needs to be installed previously (either from CRAN or GitHub):

install.packages("devtools")
library(devtools)

If you are working in Windows, devtools also requires Rtools, or if you are working on a Mac, Xcode (from Apple Store). In Linux you will need to install a compiler and various development libraries.

Besides installing devtools, you must also install JAGS (Just Another Gibbs Sampler), which is at the core of the Bayesian model analysis supporting tRophicPosition.

install_github("clquezada/tRophicPosition", build_vignettes = TRUE)

After installing tRophicPosition, it should be loaded into memory, which automatically reports the version of the software (we need at least 0.6.8-8 to use the routines described in this vignette).

library(tRophicPosition)

Future releases and how to get support

You are encouraged to use tRophicPosition with your own data, test the package and see if there are any issues or problems. You can send your questions or commentaries to the google group tRophicPosition-support or directly to the email trophicposition-support@googlegroups.com. You can send your questions to http://stackexchange.com/ http://stackoverflow.com/ or even Facebook (stable isotope ecology group).

We are constantly working on future releases of tRophicPosition, so feedback is very much appreciated.

Multiple species calculation of trophic position (TP)

Before we start

If you have been using R prior to starting your analysis of trophic Position, it is possible that the environment contains unwanted data. Try to be as organized as possible with the data generated within R/RStudio, as this will make things a lot clearer.

When working on your analysis, a good first thing to do is to set the working directory to a convenient location. This is accomplished using setwd(). In RStudio you can use the menu “Session/Set Working Directory” to make things easier.

The next thing is to clear the environment. With RStudio you can click on the little broom icon in the environment tab, or use the command rm(list = ls()). Also, it is a good idea to clear the plots using the same broom icon in the plots tab, or the command graphics.off(). This is only needed if you have been working with some data before.

And now, an initial analysis - Bilagay TP from multiple coastal kelp forests in N Chile

We will start with some data from a fish common to the coastal kelp forests of N Chile, the bilagay Cheilodactylus variegatus (http://www.fishbase.se/summary/Cheilodactylus-variegatus.html). We have data from a number of populations (and associated filter feeding and grazing molluscs representing pelagic and benthic baselines respectively) from a series of different locations along the N Chilean coast. We know from the literature that bilagay diet is extremely variable and reflects what is locally available. We also know that there is a marked \(\delta^{15}\)N-enrichment at the base of the food chain reflecting increased upwelling intensity as you move northwards up the coast. As such, we need to account for variation in baseline \(\delta^{15}\)N variation before calculating TP at each site.

So, our question is, given the potential influences of local differences in diet stomach contents and upwelling intensity, are there measurable differences in bilagay TP between the different locations along the S-N latitudinal gradient?

We will use a data set included within tRophicPosition. The structure of the file has 7 columns, where species (Spp) are grouped into functional groups (FG) to describe its role in each of the ecosystems: “Bilagay” (our fish of interest), “Benthic_BL” (benthic baseline) and “Pelagic_BL” (pelagic baseline). We have samples from locations along the N Chilean coast (NS stands for the N-S ordering) and from multiple studies (Study) and locations (Location). We also have stable isotope values in two columns: “d13C” and “d15N”.

Structure of the file Bilagay_for_tRophicPosition.csv
Study Location Spp FG d13C d15N NS
1 MEL CHI Echinolittorina peruviana Benthic_BL -10.1 16.3 4
2 MEL CHI Echinolittorina peruviana Benthic_BL -11.7 16.3 4
3 MEL CHI Echinolittorina peruviana Benthic_BL -9.8 16.4 4
130 MEL COL Bilagay Bilagay -15.2 20.9 3
131 MEL COL Bilagay Bilagay -15.4 21.3 3
132 MEL COL Bilagay Bilagay -15.2 21.6 3
133 MEL COL Bilagay Bilagay -15.0 22.0 3
660 MEL MEJ Perumytilus purpuratus Pelagic_BL -15.9 18.1 9
661 MEL MEJ Perumytilus purpuratus Pelagic_BL -15.6 17.4 9

If you want to edit the raw data or modify it, open the following file with your favourite application, save it to your working directory and then load it with read.csv() or read.table().

system.file("extdata", "Bilagay_for_tRophicPosition.csv", package = "tRophicPosition")
## [1] "C:/Users/claud/AppData/Local/Temp/RtmpGiwC8z/Rinst34a83d955eca/tRophicPosition/extdata/Bilagay_for_tRophicPosition.csv"

To load the Bilagay data set into R, use data():

data("Bilagay")

As shown above, in this data set we have a combination of Study and Location variables, which need to be concatenated. Here we introduce mutate and arrange functions from the package dplyr and the pipe operator %>% (from magrittr but loaded through dplyr).

# install.packages(dplyr) if you haven't installed it before
library(dplyr)
Bilagay <- Bilagay %>% mutate(Community = paste(Study,"-", Location, sep = ""))

Also, this spreadsheet has information from the latitudinal arrangement of sampling sites, which is the variable NS (North to South order), ranging from 1 (most southern site) to 10 (most northern site). Below we order the whole data frame considering those values.

Bilagay <- Bilagay %>% arrange(NS)

Now, we are ready to proceed with the calculation of trophic position. This follows the same general process you have already encountered in the previous vignette (A guide to the use of tRophicPosition. First we must consecutively subset the data frame considering each of the communities we have samples of. We could use the subset command, repeating the same command each time for each community.

If we only have 2 sampling sites, this can be done by hand. However, here we have different 10 sampling sites, so this task can be prone to errors and time-consuming, as we have to repeat the same code 10 times, but with changes in the Location, selecting the baseline, etc.

To deal with this issue, we have developed a convenient function extractIsotopeData() where you can indicate the text for each of the baselines and the respective column where they are coded, the species column, and the community column. This function iterates through the data frame selecting unique values from the community column, extracting all the values that belong to each community and then iterate through them. Then it will extract all stable isotope values from the columns “d13C” and “d15N” (if their values haven’t changed) for each baseline. Finally, it will select each unique value from the species column and it generates an isotopeData class object for each consumer combining their baselines isotope values. See help(extractIsotopeData) for other options.

BilagayList <- extractIsotopeData(Bilagay, b1 = "Pelagic_BL", b2 = "Benthic_BL",
                                  baselineColumn = "FG", speciesColumn = "Spp",
                                  communityColumn = "Community",
                                  d13C = "d13C", d15N = "d15N")

If no \(\Delta\)N and/or \(\Delta\)C values are included as arguments, by default this function will use assumptions from Post (2002), i.e. 3.4 \(\pm\) 0.98 sd for nitrogen and 0.39 \(\pm\) 1.3 sd for carbon). You can change either TDF values using the function TDF() or a list of values, as explained in the short guide to the use of tRophicPosition vignette.

The next step is use the str function to check that all went well during the use of the extractIsotopeData() function:

str(BilagayList)
## List of 1
##  $ MEC-Coquimbo-Bilagay:List of 8
##   ..$ dNb1  : num [1:15] 13.8 14.3 14.2 13.4 13.8 13.3 14.6 14 14.2 13.9 ...
##   ..$ dCb1  : num [1:15] -17.3 -17.6 -17.4 -16.6 -17 -16.9 -18.5 -18.1 -17.5 -18.3 ...
##   ..$ dNb2  : num [1:28] 14.9 15.2 15 15.2 16.3 15.7 15.4 15.5 16.4 15.4 ...
##   ..$ dCb2  : num [1:28] -11.1 -11.9 -11.7 -11.5 -14.6 -13.8 -14.5 -14.3 -14.6 -13.9 ...
##   ..$ deltaN: num [1:56] 2.48 3.21 3.81 2.28 3.74 ...
##   ..$ deltaC: num [1:107] -1.158 -0.129 0.719 -1.45 0.622 ...
##   ..$ dNc   : num [1:10] 19.4 19 19.1 18.3 19.8 20.1 19.6 20.3 19.5 19.9
##   ..$ dCc   : num [1:10] -16.2 -16.7 -17.8 -16.5 -16.8 -17.8 -16.3 -15.5 -16.3 -16
##   ..- attr(*, "class")= chr "isotopeData"
##   ..- attr(*, "consumer")= chr "Bilagay"
##   ..- attr(*, "baseline1")= chr "Pelagic_BL"
##   ..- attr(*, "baseline2")= chr "Benthic_BL"
##   ..- attr(*, "community")= chr "MEC-Coquimbo"

If you execute the code above you will see a lot of information, but for this vignette, we have shortened the output to make it readable. In your console, you will note that there is a list of 10 elements, followed by the sign $ and the concatenated name of each of our communities with the name of our species (Bilagay). Each of these objects (MEC-Coquimbo-Bilagay, MEC-Taltal-Bilagay, etc.) is a list of 8 elements that relate to a particular sampling location, in a structure that is ready to use with tRophicPosition: they are objects of the class isotopeData.

The class isotopeData is a special object recognized within the package. It also has other information embedded within it: consumer, baseline 1, baseline 2, and community labels, besides a list of isotope values with the convention introduced before (dNb1 is d15N of baseline 1, dCb1 is d13C of baseline 1, dNb2 and dCb2 respectively for the baseline 2, dNc and dCc respectively for the consumer, and deltaN and deltaC are trophic discrimination factors).

Now, as we have extracted each community from the original data frame, we can iterate through them, prepare a summary and plot them. Here we introduce the for command, which is one of the basic control-flow elements of programming. Again, if you execute the following code you will have a summary and a plot for each community, but in this vignette, you will see only one example.

for (community in BilagayList) {
  print(summary(community))
  plot(community)
  }
## 
## Summary for stable isotope data object 
## 
## Community  MEC-Coquimbo 
## Consumer  Bilagay 
## Baseline 1  Pelagic_BL 
## Baseline 2  Benthic_BL 
## 
##        n   mean  sd  se  min   max   median
## dNb1   15  13.7  0.7 0.2 12.1  14.6  13.8  
## dCb1   15  -17.5 0.6 0.1 -18.5 -16.6 -17.5 
## dNb2   28  15.4  0.5 0.1 14.6  16.4  15.3  
## dCb2   28  -13.3 1.2 0.2 -15.1 -11.1 -13.8 
## deltaN 56  3.4   1   0.1 1.1   5.4   3.3   
## deltaC 107 0.4   1.3 0.1 -3.2  3     0.6   
## dNc    10  19.5  0.6 0.2 18.3  20.3  19.6  
## dCc    10  -16.6 0.7 0.2 -17.8 -15.5 -16.4

Now we have to model each community within the Bayesian framework of tRophicPosition (as introduced previously in other vignettes, e.g. a guide to the use of tRophicPosition.

For this, we have developed a function that automatically i) defines a Bayesian model, ii) initializes the model and iii) samples the posterior distribution of trophic position. This is accomplished using the function multiSpeciesTP(), similar to the function multiModelsTP() explained in the vignette multiple models calculation with tRophicPosition.

Bilagay_models <- multiSpeciesTP(BilagayList, model = "twoBaselinesFull",
                                             n.adapt = 10000, n.iter = 10000,
                                             burnin = 10000, n.chains = 2, print = FALSE)
## Compiling model graph
##    Resolving undeclared variables
##    Allocating nodes
## Graph information:
##    Observed stochastic nodes: 269
##    Unobserved stochastic nodes: 16
##    Total graph size: 354
## 
## Initializing model

By default, the function multiSpeciesTP() uses lambda = 2 (trophic level of baselines), n.chains = 2 (number of parallel MCMC simulations), and set n.adapt, n.iter and burnin as 10000 iterations. The model calculated by default is oneBaseline, so in this case we have changed it to model = "twoBaselinesFull" as we have two separate potential sources of N and C. In this particularly case, there was no need to explicitly state the number of adaptive iterations, iterations and burnin, but we decided to make them clear. Also, here we calculate 5 chains instead of the default (i.e. n.chains = 2).

The function multiSpeciesTP() returns 4 objects: 1. a list named multiSpeciesTP, which includes the raw data returned by posteriorTP() for each species per community/sampling location; 2. a data frame named df with the mode, median and credibility confidence interval for trophic position and alpha (if a two baselines model was chosen), grouped by model, species and community/sampling location; 3. a list named TPs with the posterior samples of trophic position for each community/location and species; and 4. a list named Alphas which includes the posterior samples of alpha (relative contribution of baseline 1) for each species by community/location.

More advanced users will likely directly access this object for subsequent analysis. For now, we will concentrate on plotting the mode or median and 95% credibility interval for Bilagay from each location, using the joint values of trophic position and alpha by community/location, allowing an easy visual comparison between them:

# By default the mode is used in both trophic position and alpha plots
credibilityIntervals(Bilagay_models$df, x = "community", xlab ="Community")

# If you want to use the median instead of the mode,
# just add y1 and y2 as arguments
credibilityIntervals(Bilagay_models$df, x = "community", xlab ="Community", 
                     y1 = "median", y2 = "alpha.median")

To get a numerical summary of posterior estimates (median and 95% credibility interval) of trophic position, we can write the following:

# To get a numerical summary
sapply(Bilagay_models$"TPs", quantile, probs = c(0.025, 0.5, 0.975)) %>% round(3)

# To get the mode
getPosteriorMode(Bilagay_models$"TPs")
##       MEC.Coquimbo.Bilagay.2bf MEC.Taltal.Bilagay.2bf MEL.COL.Bilagay.2bf
## 2.5%                     3.449                  2.908               3.343
## 50%                      3.670                  3.199               3.477
## 97.5%                    3.885                  3.476               3.646
##       MEL.CHI.Bilagay.2bf MEL.RIN.Bilagay.2bf Fondecyt.ISM.Bilagay.2bf
## 2.5%                3.549               3.410                    3.267
## 50%                 3.704               3.542                    3.480
## 97.5%               3.844               3.691                    3.671
##       MEL.LAG.Bilagay.2bf MEL.LOB.Bilagay.2bf MEL.MEJ.Bilagay.2bf
## 2.5%                3.072               3.265               3.223
## 50%                 3.195               3.384               3.322
## 97.5%               3.316               3.535               3.427
##       MEC.Tocopilla.Bilagay.2bf
## 2.5%                      3.540
## 50%                       3.754
## 97.5%                     4.033
##  MEC.Coquimbo.Bilagay.2bf    MEC.Taltal.Bilagay.2bf 
##                     3.668                     3.186 
##       MEL.COL.Bilagay.2bf       MEL.CHI.Bilagay.2bf 
##                     3.478                     3.704 
##       MEL.RIN.Bilagay.2bf  Fondecyt.ISM.Bilagay.2bf 
##                     3.535                     3.479 
##       MEL.LAG.Bilagay.2bf       MEL.LOB.Bilagay.2bf 
##                     3.182                     3.371 
##       MEL.MEJ.Bilagay.2bf MEC.Tocopilla.Bilagay.2bf 
##                     3.320                     3.753

Finally, we can statistically compare whether either the posterior samples of trophic position and/or alpha alpha differ between communities/locations. In this example we compare both posterior estimates accross communities/locations with the function pairwiseComparisons(). Remember that this output must to be read as: what is the probability that consumer in row Y has a posterior trophic position/alpha less than or equal to consumer in column X. To make readable the output, we will compare only the first 8 communities/locations. If you want to compare all the communities/locations, just remove [1:8] below:

# First, we compare bilagay posterior trophic position estimates
pairwiseTP <- pairwiseComparisons(Bilagay_models$TPs[1:8], print = TRUE)

# And then, we compare their posterior alpha estimates
pairwiseAlpha <- pairwiseComparisons(Bilagay_models$Alphas[1:8], print = TRUE)
##                                [1]   [2]   [3]   [4]   [5]   [6]   [7]
## [1] MEC.Coquimbo.Bilagay.2bf 0.000 0.005 0.089 0.604 0.183 0.104 0.000
## [2] MEC.Taltal.Bilagay.2bf   0.995 0.000 0.950 1.000 0.985 0.936 0.495
## [3] MEL.COL.Bilagay.2bf      0.911 0.050 0.000 0.965 0.718 0.470 0.010
## [4] MEL.CHI.Bilagay.2bf      0.396 0.000 0.035 0.000 0.084 0.054 0.000
## [5] MEL.RIN.Bilagay.2bf      0.817 0.015 0.282 0.916 0.000 0.292 0.010
## [6] Fondecyt.ISM.Bilagay.2bf 0.896 0.064 0.530 0.946 0.708 0.000 0.020
## [7] MEL.LAG.Bilagay.2bf      1.000 0.505 0.990 1.000 0.990 0.980 0.000
## [8] MEL.LOB.Bilagay.2bf      0.995 0.129 0.832 1.000 0.931 0.762 0.025
##                                [8]
## [1] MEC.Coquimbo.Bilagay.2bf 0.005
## [2] MEC.Taltal.Bilagay.2bf   0.871
## [3] MEL.COL.Bilagay.2bf      0.168
## [4] MEL.CHI.Bilagay.2bf      0.000
## [5] MEL.RIN.Bilagay.2bf      0.069
## [6] Fondecyt.ISM.Bilagay.2bf 0.238
## [7] MEL.LAG.Bilagay.2bf      0.975
## [8] MEL.LOB.Bilagay.2bf      0.000
##                                [1]   [2]   [3]   [4]   [5]   [6]   [7]
## [1] MEC.Coquimbo.Bilagay.2bf 0.000 0.495 0.797 0.579 0.837 0.015 0.896
## [2] MEC.Taltal.Bilagay.2bf   0.505 0.000 0.827 0.629 0.837 0.015 0.931
## [3] MEL.COL.Bilagay.2bf      0.203 0.173 0.000 0.262 0.545 0.005 0.663
## [4] MEL.CHI.Bilagay.2bf      0.421 0.371 0.738 0.000 0.757 0.015 0.832
## [5] MEL.RIN.Bilagay.2bf      0.163 0.163 0.455 0.243 0.000 0.005 0.653
## [6] Fondecyt.ISM.Bilagay.2bf 0.985 0.985 0.995 0.985 0.995 0.000 0.990
## [7] MEL.LAG.Bilagay.2bf      0.104 0.069 0.337 0.168 0.347 0.010 0.000
## [8] MEL.LOB.Bilagay.2bf      0.129 0.139 0.371 0.218 0.406 0.005 0.584
##                                [8]
## [1] MEC.Coquimbo.Bilagay.2bf 0.871
## [2] MEC.Taltal.Bilagay.2bf   0.861
## [3] MEL.COL.Bilagay.2bf      0.629
## [4] MEL.CHI.Bilagay.2bf      0.782
## [5] MEL.RIN.Bilagay.2bf      0.594
## [6] Fondecyt.ISM.Bilagay.2bf 0.995
## [7] MEL.LAG.Bilagay.2bf      0.416
## [8] MEL.LOB.Bilagay.2bf      0.000

Parallel trophic position

As an appendix, we will estimate trophic position using parallel calculations. We will use the library parallel that comes with base R. First we have to create a parallel socket cluster, i.e. to create a set of copies of R running in parallel. This is done with the function makePSOCKcluster() and detecting the number of cores with detectCores().

cl <- parallel::makePSOCKcluster(parallel::detectCores())

Then we will use the function parLapply(). This function is a parallel version of lapply() a function that applies a function over a list. As we coded tRophicPosition specifically to use lists, we can calculate the trophic position for a number of species in parallel rather in series. As we are just comparing the performance of the serial vs parallel version of multiModelTP, we will use just 500 adaptive iterations and 500 actual iterations after discarding 500 iterations as burnin:

# First we calculate the time elapsed with system.time() with parallel
time_parallel <- system.time(a <- parallel::parLapply(cl, BilagayList, multiModelTP, 
                                               adapt = 500, n.iter = 500,
                                               burnin = 500))

# Then we calculate the elapsed time with the normal version
time_serial <- system.time(b <- lapply(BilagayList, multiModelTP, quiet = TRUE, 
                                               adapt = 500, n.iter = 500,
                                               burnin = 500))

# We have to stop the cluster after each set of calculations
parallel::stopCluster(cl)

# And print the elapsed time
print(rbind(time_parallel, time_serial))
##               user.self sys.self elapsed user.child sys.child
## time_parallel     0.122    0.097  44.455      0.008     0.075
## time_serial      85.587    1.432  92.284      0.000     0.000

Above you will see that in a modest MacBook Air (Intel Core i5, 1.6 GHz, 4 cores) the elapsed time with the serial version of multiModelTP() is roughly 92 seconds, while using the parallel version, the analysis time is reduced by almost 50 % (44 seconds), a not trivial difference. However, on desktop pc running Windows 10 (i7 3930-K, 3.2 GHz, 12 cores), the serial version took 73 seconds, while the parallel version took only 14 seconds.