island: the Theory of Island Biogeography made easy

Vicente J. Ontiveros & David Alonso

2023-11-29

Introduction

The Theory of Island Biogeography

The Theory of Island Biogeography by MacArthur and Wilson (1967) is one of the most influential and successful theories in ecology, a rich source of inspiration for new research and ideas since it was first suggested. MacArthur and Wilson originally proposed that the number of species in an island is determined by the size of the island and its distance from the mainland. These factors result in a dynamic equilibrium between colonization from the mainland and extinction of species once they arrive on the island. The theory initially presented by MacArthur and Wilson was called ETIB (Simberloff 1974). Wilson and Simberloff’s (1969) study of the experimental defaunation of mangrove islands in the Florida Keys was the first validation of the theory, which explicitly emphasizes its dynamic aspects, and it showed how different mangrove islands reached an equilibrium with a species richness equivalent to the number of species present before the experimental defaunation.

The classical representation of the Theory of Island Biogeography. Islands further from the mainland receive less species than closer ones, and there are more extinctions on small islands than on the large ones. The number of species at equilibrium corresponds to the point where colonization equals extinction.

The classical representation of the Theory of Island Biogeography. Islands further from the mainland receive less species than closer ones, and there are more extinctions on small islands than on the large ones. The number of species at equilibrium corresponds to the point where colonization equals extinction.

The package island follows the stochastic implementation of Simberloff’s model (1969) developed by Alonso et al. (2015), that uses a likelihood approach to estimate colonization and extinction rates for communities that have been repeatedly sampled through time.

Aims of this package

This package allows:
1. Estimating colonization and extinction rates with regular and irregular sampling schemes, for whole communities or groups within those communities, taking into account or not imperfect detectability.
2. Converting those rates into transition probabilities.
3. Performing Akaike Information Criterion (AIC) -based model selection.
4. Estimating the influence of environmental variables on colonization and extinction dynamics.
5. Simulating the dynamics of the Equilibrium Theory of Island Biogeography, as well as three other population models.
6. Evaluating model error and R².

Data entry

In order to estimate colonization and extinction rates with package island, we need a dataframe of repeatedly sampled communities, organised in time and with at least one species (or the relevant taxonomic unit for our analysis). In this package we adopt the convention of indicating sampling times in columns and species in rows of the dataframe. As an example, take a look at the dataframe data(alonso15), that uses a regular sampling scheme for 4 years (2000 to 2003).

Species Trophic.Level Kd.2000 Kd.2001 Kd.2002 Kd.2003 Kd.2010 Kd.2011 Guild
Acanthurus auranticavus NA 1 1 1 1 1 1 Algal Feeder
Acanthurus leucosternon 2 1 1 1 1 1 1 Algal Feeder
Acanthurus lineatus 2 1 1 1 1 1 1 Algal Feeder
Acanthurus nigrofuscus 2 1 1 1 1 1 1 Algal Feeder
Acanthurus triostegus 2,78 1 1 1 1 1 1 Algal Feeder
Acanthurus xanthopterus 2,87 0 0 0 1 1 1 Algal Feeder

In the case of irregular sampling schemes, we adopt a convention of recording subsequent time intervals in the column header. For example, if our sampling started on day 17, and the next sample was taken 20 days after, that is, on day 37, the column heads for those samples should read 17 and 37. These conventions have been followed in data(simberloff), extracted here:

Taxa PRE 31 51 69 258 276 295 371 Tax. Unit 1 Tax. Unit 2 Genera Island
Gen. Sp. 1 0 0 0 0 0 0 0 Embioptera Teratembiidae Unknown E2
Aglaopteryx sp. 0 0 0 0 1 0 0 0 Orthoptera Blattidae Aglaopteryx E2
Latiblattella n. sp. 1 0 0 0 1 1 0 0 Orthoptera Blattidae Latiblattella E2
Latiblattella rehni 1 0 0 0 1 1 1 1 Orthoptera Blattidae Latiblattella E2
Cycloptilum sp. 1 0 0 0 0 0 0 0 Orthoptera Gryllidae Cycloptilum E2
Cyrtoxipha sp. 0 0 0 0 1 1 1 1 Orthoptera Gryllidae Cyrtoxipha E2

In both examples, we can see that we have columns for the taxa (or species) studied, some columns with additional information (such as Guild or Taxonomic Unit considered), and consecutive columns with data on presence (1) or absence (0).
Studying ecological communities can be messy. Temporal studies that track changes over many sites may often result in sampling schemes that are irregular both in time and space. In cases where we have different sampling schemes in our data, we require the different sampling schemes to be treated as different dataframes inside a list. An example would be the data set simberloff.

Likelihood and estimating colonisation and extinction rates

The Equilibrium Theory of Island Biogeography (ETIB)

The seminal work of Wilson & Simberloff (1969) on the experimental defaunation of mangrove islands in the Florida Keys was the first experimental validation of the ETIB, that predicted that colonization and extinction on an island would balance out and reach a dynamic equilibrium. The basic model underlying the theory can be expressed as follows: \[ \frac{dS_s}{dt} = c(S_P - S_s) - eS_s \] where \(S_s\) is the number of species present at a site, \(S_P\) is the number of species in the regional pool, and \(c\) and \(e\) are colonization and extinction rates respectively. This model states that the temporal variation in the number of species in a site over time is equal to the total rate of arrival of species from the pool that are not already at the site minus the total rate of species loss from the site. The equation can be easily solved for a single species and we can compute the probability of a species being present or absent at any time (see Alonso et al. 2015). With these probabilities we can construct a matrix of transition probabilities for the associated Markov model: \[\begin{equation} \label{T_00} \left\{ \begin{array}{ccc} T_{00} &=& 1 - \frac{c}{e+c} ( 1 - \exp(-(e+c)\Delta t))\\ T_{10} &=& \frac{c}{e+c} ( 1 - \exp(-(e+c)\Delta t) )\\ T_{01} &=& \frac{e}{e+c} ( 1 - \exp(-(e+c)\Delta t) ) \\ T_{11} &=& 1 - \frac{e}{e+c} ( 1 - \exp(-(e+c)\Delta t)) \end{array} \right. \end{equation}\] where \(T_{01}\) is the probability of extinction, \(T_{10}\) the probability of colonization, \(T_{11}\) of repeated presence and \(T_{00}\) of repeated absence. Two key assumptions, the assumption of species equivalence (the same parameters apply for a group of species), and the assumption of species independence (the presence of other species does not affect the dynamics of a focal species), allow to generalize this Markov process to groups of species or even communities. We can therefore calculate the likelihood of a given data set under the colonization-extinction model with the following expression for a regular sampling scheme given specific colonization and extinction rates, c and e: \[\begin{equation} P( M | c, e) = (1-T_{10})^{N_{00}} T_{10}^{N_{10}} T_{01}^{N_{01}} (1-T_{01})^{N_{11}} \end{equation}\] where \(N_{00}\) is the number of events of repeated absence, \(N_{10}\) events of colonization, \(N_{01}\) events of extinction and \(N_{11}\) events of repeated presence. In the case of an irregular sampling scheme we need to calculate the transition probabilities and the matching likelihood for each transition.
As an example, imagine that we have sampled arthropods on an island for three seasons and we have the following data:

Sp. 1 2 3
A 0 0 1
B 0 1 1
C 1 1 1
D 0 0 0
E 0 1 0
F 0 0 0
G 1 0 0
H 0 0 0
I 1 1 0
J 0 0 0

Here we can see that between samples 1 and 2 we have \(N_{01} = 1\) (G), \(N_{11} = 2\) (C, I), \(N_{00} = 5\) (A, D, F, H, J), and \(N_{10} = 2\) (B, E); we can similarly calculate transitions between samples 2 and 3. Assuming we already know the true value of transition probabilities (\(T_{01} = 0.4\), \(T_{11} = 0.6\), \(T_{00} = 0.7\) and \(T_{10} = 0.3\)), we can easily calculate the likelihood of the observed dataframe above as:

### LIKELIHOOD
0.4^3 * 0.6^4 * 0.7^10 * 0.3^3
#> [1] 6.325999e-06
# This is a pretty small likelihood. We can easily transform this likelihood into log-likelihood.

### LOG-LIKELIHOOD
3 * log(0.4) + 4 * log(0.6) + 10 * log(0.7) + 3 * log(0.3)
#> [1] -11.97084
# This is the log-likelihood associated with the given set of transition probabilities. Notice that we obtain the same value with both approaches.
log(0.4^3 * 0.6^4 * 0.7^10 * 0.3^3)
#> [1] -11.97084

The method above allows us to calculate the likelihood of our data set from known transition probabilities. However, say that we do not know the transition probabilities and want to estimate them. We can do this easily with the following code:

c_and_e <- regular_sampling_scheme(x = df, vector = 2:4)
c_and_e
#>           c c_up c_low        e e_up e_low  N      NLL
#> 1 0.3769053   NA    NA 0.699967   NA    NA 10 11.80301

This gives us the rates \(c\) and \(e\), but where are the transition probabilities? We obtain them with the function cetotrans:

cetotrans(c = c_and_e[1], e = c_and_e[4])
#>      T01       T10      
#> [1,] 0.4285714 0.2307692

The equations \(T_{11}= 1 - T_{01}\) and \(T_{00} = 1 - T_{10}\) will give us a complete set of transition probabilities estimated from the data.

Rates Vs Transition Probabilities

There are two main differences between transition probabilities and rates. First, transition probabilities are dimensionless numbers, while rates have dimensions of time\(^{-1}\). Second, with rates we can directly estimate the transition probabilities for any time interval, while transition probabilities are associated with a specific time interval, so if we have \(T_{10}=0.6\) for a specific time interval and we need the transition probability for twice that interval \(T_{10}' \ne 2 \times 0.6\) but if a colonization rate per week is \(c = 0.6 \ week^{-1}\) the colonization rate after 2 weeks would be \(c' = 2 \ weeks \times 0.6 \ week^{-1}= 1.2\). This property allows us to work with irregular sampling schemes —much more common in ecology than regular ones— naturally, simply using a pair of colonization and extinction rates.

Regular sampling schemes

The ideal scenario for ecological studies is one in which we sample our subject of study (say the community of fishes on an coral reef) regularly and frequently enough to observe changes in the community. Even though this is rare in ecology, this facilitates the estimation of rates and is less computationally intense. The estimation of rates for these regular sampling schemes has an analytically exact expression, while for irregular sampling schemes we need to rely on heuristic methods or numerical solvers (see section Irregular Sampling Schemes).
Throughout this section, we will use data(alonso15), a data set of three lists that have information on presence-absence of the community of coral reef fish communities in atolls in the Lakshadweep Archipelago (India). We will use data only from one of the atolls, Kadmat. The table below is an extract of the data:

data("alonso15")
df <- alonso15[[2]]
knitr::kable(head(df))
Species Trophic.Level Kd.2000 Kd.2001 Kd.2002 Kd.2003 Kd.2010 Kd.2011 Guild
Acanthurus auranticavus NA 1 1 1 1 1 1 Algal Feeder
Acanthurus leucosternon 2 1 1 1 1 1 1 Algal Feeder
Acanthurus lineatus 2 1 1 1 1 1 1 Algal Feeder
Acanthurus nigrofuscus 2 1 1 1 1 1 1 Algal Feeder
Acanthurus triostegus 2,78 1 1 1 1 1 1 Algal Feeder
Acanthurus xanthopterus 2,87 0 0 0 1 1 1 Algal Feeder

We have several columns of data. The first column of the data.frame lists the species studied, while the second, its associated trophic level. The third to the eighth columns contain presence-absence data of temporal samples collected in consecutive years between 2000 to 2003 and once again in 2010-2011. Finally, the last column has data on the guild of the species studied. For example, the third entry in the data.frame has data for Acanthurus lineatus, an algal feeder present in all the years studied.

Estimating colonization and extinction rates

From here, we can start estimating colonization and extinction rates for the entire community using the function regular_sampling_scheme. First, we have to specify a single data.frame in the function using argument x = and the name of our data.frame. Next, we need to tell the function in which columns the presence-absence data located with the argument vector =, in this case, columns 3 to 6. We will not use the data from 2010 and 2011. So let’s estimate the colonization and extinction rates:

rates <- regular_sampling_scheme(x = df, vector = 3:6)
rates
#>           c c_up c_low         e e_up e_low   N     NLL
#> 1 0.6034534   NA    NA 0.3505777   NA    NA 156 276.576

As we can see the colonization rate is c = 0.603 and the extinction rate e = 0.351. In addition, the output of the function tells us that we have examined 156 species and that the Negative Log-Likelihood of this model is NLL = 276.576.
What if we want to know the rates for each guild? We just have to add to the call the argument level, as in level = "Guild", indicating the name of the column that divides the data into groups, and n, as in n = 5 indicating the minimum number of species that a group needs in order to estimate its rates.

rates_g <- regular_sampling_scheme(x = df, vector = 3:6, level = "Guild", n = 5)
rates_g
#>               Group         c c_up c_low          e e_up e_low  N      NLL
#> 1      Algal Feeder 0.7685355   NA    NA 0.16302269   NA    NA 30 38.95667
#> 2       Corallivore 0.4427195   NA    NA 0.39352848   NA    NA 20 35.72338
#> 3 Macroinvertivore  1.0151676   NA    NA 0.56465265   NA    NA 41 78.09438
#> 4  Microinvertivore 0.4960388   NA    NA 0.51204007   NA    NA 21 39.36755
#> 5         Omnivore  0.5090504   NA    NA 0.40724033   NA    NA  9 16.33690
#> 6        Piscivore  0.5263375   NA    NA 0.60002472   NA    NA 21 40.03434
#> 7   Zooplanktivore  0.5124767   NA    NA 0.09189237   NA    NA 14 15.93931

This calculates the colonization and extinction rates for each guild separately, e.g. corallivores or piscivores, as well as the number of species in each guild and the negative log-likelihood associated with these rates and data.

Estimating Confidence Intervals

The estimates of colonization and extinction rates can not be regarded as fixed values, but have an associated uncertainty due, at least in part, to the amount of data we use to calculate the estimates. In interpreting these results we recommend to calculate confidence intervals for the rates we have just estimated. Confidence intervals can be calculated using function regular_sampling_scheme with the argument CI = T. This option provides confidence intervals whose limits have a difference in log-likelihood of 1.96 units around the estimator.
We employ two separate methods to estimate confidence intervals: a sequential method, which is computationally expensive, and a binary search seeded with the Hessian of the likelihood function. The rationale of the sequential method is the following: we start from one of the parameters, say c, and we sequentially add a small amount step to the parameter, obtaining the log-likelihood of the new value of the parameter, c’, until the difference in log-likelihood between c and c’ climbs above 1.96. This gives us the upper boundary of c. The lower boundary is calculated similarly, but with sequential subtraction. This is a straightforward method, but it has to balance computational time and accuracy with the size of the interval step.
The alternative method uses the Hessian of the likelihood function as a seed for a binary search to determine the intervals. After some algebra, the Hessian provides an asymptotic estimator of the confidence interval, thus being a good starting point to look for the actual interval. The method works by calculating the likelihood of this asymptotic estimator and expanding the search (if the difference with our estimate is less than 1.96) or narrowing it (if it is greater than 1.96). This method typically finds the confidence interval in about 10 steps, and is significantly less computationally intense than the sequential method. However, in some cases, the value of the Hessian reduces almost to zero or produces a non-invertible matrix. In these cases, we recommend using argument step.
Coming back to our example, we can calculate confidence intervals either for the entire community or for each studied guilds:

rates2a <- regular_sampling_scheme(x = df, vector = 3:6, CI = T)
rates2a
#>           c      c_up     c_low         e      e_up     e_low   N     NLL
#> 1 0.6034534 0.7527623 0.4763309 0.3505777 0.4483115 0.2686129 156 276.576

# Sequential method
rates2b <- regular_sampling_scheme(x = df, vector = 3:6, level = "Guild", n = 1, step = 0.005, CI = T)
knitr::kable(rates2b)
Group c c_up c_low e e_up e_low N NLL
Algal Feeder 0.7685355 1.3135355 0.4035355 0.1630227 0.3230227 0.0680227 30 38.95667
Corallivore 0.4427195 0.8127195 0.2077195 0.3935285 0.7485285 0.1735285 20 35.72338
Macroinvertivore 1.0151676 1.5001676 0.6601676 0.5646527 0.8546527 0.3496527 41 78.09438
Microinvertivore 0.4960388 0.8910388 0.2410388 0.5120401 0.9170401 0.2520401 21 39.36755
Omnivore 0.5090504 1.2240504 0.1540504 0.4072403 0.9772403 0.1222403 9 16.33690
Piscivore 0.5263375 0.9063375 0.2763375 0.6000247 1.1000247 0.2850247 21 40.03434
Zooplanktivore 0.5124767 1.1274767 0.1774767 0.0918924 0.2918924 0.0118924 14 15.93931

# Hessian-based method
rates2c <- regular_sampling_scheme(x = df, vector = 3:6, level = "Guild", n = 1, CI = T)
knitr::kable(rates2c)
Group c c_up c_low e e_up e_low N NLL
Algal Feeder 0.7685355 1.3122641 0.4078379 0.1630227 0.3195759 0.0690967 30 38.95667
Corallivore 0.4427195 0.8123286 0.2101798 0.3935285 0.7439781 0.1778156 20 35.72338
Macroinvertivore 1.0151676 1.4990009 0.6626292 0.5646527 0.8532916 0.3539581 41 78.09438
Microinvertivore 0.4960388 0.8864932 0.2458565 0.5120401 0.9149993 0.2538674 21 39.36755
Omnivore 0.5090504 1.2212077 0.1557325 0.4072403 0.9772242 0.1240222 9 16.33690
Piscivore 0.5263375 0.9051379 0.2774242 0.6000247 1.0951876 0.2879977 21 40.03434
Zooplanktivore 0.5124767 1.1233754 0.1808364 0.0918924 0.2884887 0.0148875 14 15.93931

As we can see, when the number of species in a group is low, uncertainty increases and expands the confidence intervals. In addition, the differences between the two methods are negligible in their output, except for the computational time needed.

Likelihood profiles

A likelihood profile is a simple way to bound the confidence interval of a parameter. Our sequential procedure is clearly based on likelihood profiles, and we will illustrate it using the function NLL_rss, that calculates the NLL of a model given its parameters. There are equivalent functions, NLL_isd, NLL_imd and NLL_env that produce the equivalent output with different sampling schemes or environmental influences on the parameters.
Next, for our case study, we will calculate the likelihood profile for colonization in Kadmat atoll. Function NLL_rss needs a dataframe x, a vector with the number of the columns to analyze, and given c and e rates. The procedure iteratively calculates the likelihood of the parameter given the data around the estimated value. The shape of the likelihood profile gives us an indication on how peaked around the maximum our likelihood functions are.

rates
#>           c c_up c_low         e e_up e_low   N     NLL
#> 1 0.6034534   NA    NA 0.3505777   NA    NA 156 276.576

seqs <- seq.int(5, 200, by = 5) / 100
NLLs <- NLL_rss(x = df, vector = 3:6, c = seqs * rates$c, e = rates$e)
plot(seqs * rates$c, NLLs, type = "l", xlab = "Colonization rate (year⁻¹)", ylab = "Negative Log-Likelihood", main = "Likelihood profile")
points(rates$c, rates$NLL, pch = 4, col = "magenta")
Likelihood profile for the colonization rate in Kadmat atoll. The cross in magenta indicates the m.l.e. for the rate.

Likelihood profile for the colonization rate in Kadmat atoll. The cross in magenta indicates the m.l.e. for the rate.

A few words on model selection

Model selection identifies the best model that fits a given data set. In this package we propose the use of the Akaike Information Criterion (AIC) and the associated measure of weight of evidence to choose the model that best fits the data and compares it against several alternative models. The functions in package island provide Negative Log-Likelihoods (NLLs) that can easily be transformed to AIC with functions akaikeic or akaikeicc. We can then calculate the weight_of_evidence for each competing model that potentially explains our observations. As a reminder, AICs are only comparable when we try to explain the same data with different models, because it makes no sense comparing the AIC of two different data sets with the same model, since the data is different in both cases.
Let us compare two competing models that try to explain the same data. We will use the data for Kadmat once again, to check if the model that treats each guild’s dynamics separately better explains the data than the model that works with a single colonization and extinction pair of rates for the entire assemblage. For this, we will extract the NLL of both models, calculate their AIC using function akaikeic, and compare the results with function weight_of_evidence.

guild_NLL <- sum(rates2c$NLL)
guild_NLL
#> [1] 264.4525
comm_NLL <- rates$NLL
comm_NLL
#> [1] 276.576

aic_g <- akaikeic(NLL = guild_NLL, k = 14)
aic_c <- akaikeic(NLL = comm_NLL, k = 2)
aic_g
#> [1] 556.905
aic_c
#> [1] 557.1519

ms_kadmat <- data.frame(Model = c("Guilds", "Community"), AIC = c(aic_g, aic_c))
weight_of_evidence(data = ms_kadmat)
#>       Model   IncAIC         w
#> 1    Guilds 0.000000 0.5308212
#> 2 Community 0.246883 0.4691788

The output indicates that the log-likelihood of the model with different dynamics for each guild is over 12 points better than the model with a single dynamic group, which means that considering the guilds separately better explains the data than lumping together all guilds in a single group. However, the “guilds” model has 14 parameters while the “community” model has only 2. It is unsurprising that the more parameters the model has, the better it explains the data. For this reason, we calculate the AIC of each model using function akaikeic, that, to preserve parsimony, accounts for the number of parameters with the argument k. AIC values suggest that the “guilds” model is only marginally better than the “community” model, with a difference of 0.25. This is a tiny difference, and we cannot conclusively say that one of the models has more support than the other. This analysis is backed by the function weight_of_evidence, that provides a measure of the support that each of the competing models has. We found that the “guilds” model has a weight of evidence of 53% while the “community” model is 47% and, as already noted, we cannot conclude that any model is significantly better. We will explore this dataset in some more detail in the section A short example.

Simulating Colonization and Extinction Dynamics

The package island allows to simulate ecological data driven by colonization and extinction dynamics. Functions data_generation and PA_simulation allow us to generate species richness or presence-absence data from an initial vector of presence-absence for specific transition probabilities that can vary between different simulated transitions. As an example, we will simulate the temporal evolution in richness for Kadmat atoll.

### First, take a look at the rates for Kadmat
tp <- cetotrans(c = rates$c, e = rates$e)

set.seed(10110111)
sim_1 <- PA_simulation(x = df, column = 3, transitions = tp, times = 11)
colSums(sim_1)
#>  [1]  88  88 100 106 100 103  98  97 102  99 101

sims <- data_generation(x = df, column = 3, transitions = tp, iter = 999, times = 11)
sims[, 1]
#>  [1]  85  98  91  97 102  97  92  93  96 100 102

ic <- apply(X = sims, MARGIN = 1, FUN = quantile, c(0.025, 0.975))
ic
#>       [,1]   [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
#> 2.5%    80  83.95   85   86   87   86   87   87   87    87    86
#> 97.5%  102 107.00  109  109  109  110  109  109  110   109   110
rates 
#>           c c_up c_low         e e_up e_low   N     NLL
#> 1 0.6034534   NA    NA 0.3505777   NA    NA 156 276.576
Temporal evolution in Kadmat atoll. The points connected with lines indicate the observed species richness, while the red dashed line indicates the 95% distribution of simulations. The dotted black line corresponds to one stochastic simulation of the colonization and extinction dynamics on the atoll.

Temporal evolution in Kadmat atoll. The points connected with lines indicate the observed species richness, while the red dashed line indicates the 95% distribution of simulations. The dotted black line corresponds to one stochastic simulation of the colonization and extinction dynamics on the atoll.

In the simulation above, we use colonization and extinction rates estimated with the first 4 samples to simulate community dynamics from 2000 to 2011. We can use this to evaluate how reliably our parameter estimates of colonization-extinction dynamics fit observed data. The plot above shows the actual observations, the result of a simulation (as a black dotted line) and the 95% distribution of the simulations of the dynamics (the area enclosed within the two red dashed lines). The actual data falls well within the boundaries delimited by the simulation except for the last sample.
Finally, we will estimate goodness-of-fit calculating model R^2. Any estimation of R^2 requires a null model to compare with. R^2 measures how well our model performs in relation to a null model of choice. As a null model, here we assume that at any given time we can have a number of species present from 0 to \(S_P\), the number of species in the pool, which is drawn from a uniform probability distribution. In order to estimate goodness-of-fit, we use the function r_squared, which requires us to specify the observed species richness, simulated species richness and the number of species in the pool sp.

simulated <- apply(sims, 1, quantile, 0.5)
simulated 
#>  [1] 91 96 97 98 99 98 99 98 98 99 99

simulated <- simulated[c(1:3, 10, 11)]

observed <- colSums(df[, 3:8])
observed
#> Kd.2000 Kd.2001 Kd.2002 Kd.2003 Kd.2010 Kd.2011 
#>      79      91     100      95     103     120

# We skip the first observation since we have to use it as the starting point of the simulations.
observed <- observed[-1] 

r_squared(observed = observed, simulated = simulated, sp = 156)
#> R-Squared 
#> 0.9651524

The output indicates that we have a good fit compared to our null model. However, changes in the null model would lead to different estimates of R². For this reason, we also include function simulated_model that produces the quadratic error of a model given some data. We can estimate a new R² using a different null model and estimating its error with the previous function using the equation below: \[ R^2 = 1 - \frac{\epsilon^2}{\epsilon^2 _0 }\]

Irregular Sampling Schemes

Given the complexities and challenges inherent to collecting real-world ecological data, temporal monitoring is often patchy, making irregular sampling schemes common in ecology. Data typically includes one or even several data sets with different sampling schemes. To accommodate the messiness of real-world data, ‘island’ has two functions irregular_single_dataset and irregular_multiple_datasets. Irregular sampling schemes force to alter the way in which we calculate colonization and extinction rates, precluding the use of the algebraic (exact) method for estimating rates. We can still calculate the likelihood of the dynamic model using two methods with different approaches: a heuristic method and a semianalytical method.

Heuristic Vs Semianalytical Methods

The heuristic or semianalytical approaches of calculating rates are implemented in functions irregular_single_dataset and irregular_multiple_dataset, and we can switch between approaches using the argument jacobian.
The heuristic method uses R’s built-in optimization routine, optim, to obtain estimates for the colonization and extinction rates. The function optim uses an implementation of the Nelder-Mead algorithm to find the optimum of the likelihood function. However, heuristic methods do not guarantee finding the global optima of the objective function, and they do not have a mechanism to evaluate how good the optima are for the selected point.
In contrast, the semianalytical method guarantees that the optima calculated is a root of the determinant of the Jacobian matrix of the likelihood function when the algorithm converges. In this package, we call routine multiroot from package rootSolve in order to find the semianalytical optimum value for our likelihood function.
Even though we encourage the use of the Jacobian-based method, it may not always converge. The function will notify possible problems, and we recommend using the heuristic method if difficulties are encountered. One way to find good initial values for the Jacobian-based method is to use the estimates of the heuristic method as a starting point.

Single dataset

The function that deals with an irregular sampling scheme in a single data set is called irregular_single_dataset. An example, partially shown below (some samples have been skipped) and similar to the one reproduced earlier, is data of island ST2 in data(simberloff):

Taxa PRE 21 40 59 231 250 322 Tax. Unit 1 Tax. Unit 2 Genera Island
Diradius caribbeana 1 0 0 0 0 0 0 Embioptera Teratembiidae Diradius ST2
Latiblattella n. sp. 1 0 1 1 1 1 1 Orthoptera Blattidae Latiblattella ST2
Cycloptilum sp. 1 0 0 0 0 0 0 Orthoptera Gryllidae Cycloptilum ST2
Cyrtoxipha sp. 0 0 0 0 0 0 0 Orthoptera Gryllidae Cyrtoxipha ST2
Tafalisca lurida 1 0 0 0 1 1 1 Orthoptera Gryllidae Tafalisca ST2
Kalotermes joutelli 0 0 0 1 0 0 0 Isoptera Kalotermitidae Kalotermes ST2

Functions irregular_single_dataset as well as irregular_multiple_datasets reproduce all the functionalities of regular_sampling_schemes. Please note that irregular_single_dataset requires numbers for the names of the columns with presence-absence data. The only other difference with the function for the regular sampling schemes is that we need to enter priors for c and e, our colonization and extinction rates, using arguments c = and e =.
Below we illustrate the use of this function with the data for island ST2 of data set simberloff. This data set was published by Simberloff and Wilson (1969). We start by estimating colonization and extinction rates for the whole island.

st2 <- simberloff[[6]]

# Before we begin, it is useful to look at the data for obvious errors.
head(st2) 
#>                   Taxa PRE 21 40 59 77 94 113 134 149 171 191 211 231 250 322
#> 1  Diradius caribbeana   1  0  0  0  0  0   0   0   0   0   1   0   0   0   0
#> 2 Latiblattella n. sp.   1  0  1  1  1  1   1   1   1   1   1   1   1   1   1
#> 3      Cycloptilum sp.   1  0  0  0  0  0   1   1   1   1   0   0   0   0   0
#> 4       Cyrtoxipha sp.   0  0  0  0  0  0   1   1   1   0   0   0   0   0   0
#> 5     Tafalisca lurida   1  0  0  0  0  1   1   1   1   1   1   1   1   1   1
#> 6  Kalotermes joutelli   0  0  0  1  0  0   0   0   0   0   0   0   0   0   0
#>   Tax. Unit 1    Tax. Unit 2        Genera Island
#> 1  Embioptera  Teratembiidae      Diradius    ST2
#> 2  Orthoptera      Blattidae Latiblattella    ST2
#> 3  Orthoptera      Gryllidae   Cycloptilum    ST2
#> 4  Orthoptera      Gryllidae    Cyrtoxipha    ST2
#> 5  Orthoptera      Gryllidae     Tafalisca    ST2
#> 6    Isoptera Kalotermitidae    Kalotermes    ST2

# Presence-absence data is in columns 3 to 16.
rates.st2 <- irregular_single_dataset(dataframe = st2, vector = 3:16, c = 0.001, e = 0.001, assembly = T, jacobian = T)
rates.st2
#>            c c_up c_low         e e_up e_low  N      NLL
#> 1 0.00670301   NA    NA 0.0127161   NA    NA 80 457.6403

# We will now estimate rates for different groups in "Tax. Unit 1" with more than 5 species.
rates.groups <- irregular_single_dataset(dataframe = st2, vector = 3:16, c = 0.001, e = 0.001, column = "Tax. Unit 1", n = 5, assembly = T, jacobian = T)
rates.groups
#>         Group           c c_up c_low           e e_up e_low  N       NLL
#> 1     Acarina 0.004820490   NA    NA 0.008007749   NA    NA  8  35.63007
#> 2     Araneae 0.008426681   NA    NA 0.017747947   NA    NA 11  70.27852
#> 3  Coleoptera 0.006089566   NA    NA 0.014810771   NA    NA  8  43.01984
#> 4 Hymenoptera 0.007776273   NA    NA 0.010184661   NA    NA 20 122.89718
#> 5 Lepidoptera 0.007806117   NA    NA 0.012146876   NA    NA  8  49.23273
#> 6  Psocoptera 0.008107215   NA    NA 0.017269065   NA    NA  5  30.16782

# Plotting the results.
plot(rates.groups[, 2], rates.groups[, 5], xlab = "Colonization rate (day⁻¹)", ylab = "Extinction rate (day⁻¹)", main = "ST2")
points(rates.st2[, 1], rates.st2[, 4], pch = 4)                        
loc <- list(x = c(0.004902829, 0.008421634, 0.006021524, 0.007679227, 0.007892796, 0.008025005, 0.006692741), y = c(0.008578302, 0.017145964, 0.014431104, 0.009668973, 0.011560750, 0.016878929, 0.011553715))

text(loc, c(substr(rates.groups[, 1], 1, 3), "Whole \ncommunity"))
Colonization and extinction rates for the whole invertebrate community or selected taxonomic groups in island ST2.

Colonization and extinction rates for the whole invertebrate community or selected taxonomic groups in island ST2.

We can also produce transition probabilities for different time intervals and simulate these time intervals, allowing a more efficient use of processor time and memory. Functions cetotrans and data_generation are optimized to use with irregular time intervals. In the case of function cetotrans, arguments c, e and dt must be vectors of the same length, while for function data_generation arguments transitions and times should have the same number of rows and the same value, respectively. We simulate colonization and extinction dynamics for island ST2 300 times, and we observe that species richness reaches a steady-state after a while.

dts <- as.numeric(colnames(st2)[4:16]) - as.numeric(colnames(st2)[3:15])
dts <- c(21, dts)

tps <- cetotrans(c = rep(rates.st2[[1]], length(dts)), e = rep(rates.st2[[4]], length(dts)), dt = dts)
tps
#>             T01        T10
#>  [1,] 0.2192933 0.11559563
#>  [2,] 0.2020453 0.10650373
#>  [3,] 0.2020453 0.10650373
#>  [4,] 0.1931669 0.10182363
#>  [5,] 0.1841143 0.09705176
#>  [6,] 0.2020453 0.10650373
#>  [7,] 0.2192933 0.11559563
#>  [8,] 0.1654731 0.08722548
#>  [9,] 0.2276694 0.12001087
#> [10,] 0.2107531 0.11109382
#> [11,] 0.2107531 0.11109382
#> [12,] 0.2107531 0.11109382
#> [13,] 0.2020453 0.10650373
#> [14,] 0.4930516 0.25990123

sims <- data_generation(x = matrix(0, 80, 1), column = 1, transitions = tps, iter = 300, times = length(dts))
ic <- apply(sims, 1, quantile, probs = c(0.025, 0.975))
ic
#>       [,1]   [,2] [,3]   [,4] [,5]   [,6] [,7] [,8] [,9]  [,10] [,11] [,12]
#> 2.5%     4  8.000   11 14.000   15 18.000   17   18   18 19.000    19    20
#> 97.5%   15 21.525   25 29.525   31 33.525   35   35   36 35.525    35    36
#>       [,13] [,14]
#> 2.5%     19    20
#> 97.5%    36    37

med.st2 <- apply(sims, 1, median)

days <- c(0, colnames(st2)[3:16]) 

plot(days, c(0, colSums(st2[, 3:16])), ylim = c(0, 40), xlab = "Days since defaunation", ylab = "Species richness", main = "ST2", type = "b")
lines(days, c(0, med.st2), col = "green", lty = 2)
lines(days, c(0, ic[1, ]), col = "magenta", lty = 3)
lines(days, c(0, ic[2, ]), col = "magenta", lty = 3)
Temporal evolution on island ST2. The points connected with lines indicate the observed species richness, while the red dashed line indicates the 95% of the distribution of simulations, and the green dashed line the median richness for 300 simulations.

Temporal evolution on island ST2. The points connected with lines indicate the observed species richness, while the red dashed line indicates the 95% of the distribution of simulations, and the green dashed line the median richness for 300 simulations.

Multiple datasets

Sampling schemes for similar communities can be very different due to multiple reasons usually associated to the challenges of ecological research. The function irregular_multiple_datasets allows us to use data from different sampling schemes and estimate joint parameters for these data sets. In order to use these data sets, they need to comply with the general structure for irregular sampling schemes and they should be combined in a list. The argument list will collate the data sets we want to study jointly and the argument vectorlist must contain the vectors (ordered in time) that indicate where the presence-absence data is located. The remaining arguments work as discussed in earlier sections. Next we estimate colonization and extinction rates for each island and each taxonomic group in the whole dataset simberloff.

rates.simberloff <- irregular_multiple_datasets(list = simberloff, vectorlist = list(3:17, 3:18, 3:17, 3:19, 3:17, 3:16), c = 0.001, e = 0.001, jacobian = T)
rates.islands <- irregular_multiple_datasets(list = simberloff, vectorlist = list(3:17, 3:18, 3:17, 3:19, 3:17, 3:16), c = 0.001, e = 0.001, column = "Island", n = 5, jacobian = T)
rates.taxonomy <- irregular_multiple_datasets(list = simberloff, vectorlist = list(3:17, 3:18, 3:17, 3:19, 3:17, 3:16), c = 0.0001, e = 0.0001, column = "Tax. Unit 1", n = 20, jacobian = T)
rates.simberloff
#>             c c_up c_low          e e_up e_low   N      NLL
#> 1 0.005102348   NA    NA 0.01175197   NA    NA 549 2938.645
rates.islands
#>   Group           c c_up c_low           e e_up e_low   N      NLL
#> 1    E1 0.003849172   NA    NA 0.008577194   NA    NA  54 226.9079
#> 2    E2 0.005353969   NA    NA 0.013019739   NA    NA 119 654.2403
#> 3    E3 0.005637932   NA    NA 0.011374938   NA    NA  92 486.9761
#> 4    E7 0.004133691   NA    NA 0.011507033   NA    NA  99 600.8253
#> 5    E9 0.005261473   NA    NA 0.011595805   NA    NA 105 526.3934
#> 6   ST2 0.006901374   NA    NA 0.012774564   NA    NA  80 433.5353
rates.taxonomy
#>         Group           c c_up c_low           e e_up e_low   N      NLL
#> 1     Acarina 0.005007757   NA    NA 0.010790814   NA    NA  48 240.7149
#> 2     Araneae 0.005658833   NA    NA 0.016369729   NA    NA  85 488.8871
#> 3  Coleoptera 0.004985066   NA    NA 0.011884045   NA    NA  65 349.9689
#> 4   Hemiptera 0.006196764   NA    NA 0.019892102   NA    NA  34 185.8024
#> 5 Hymenoptera 0.004933051   NA    NA 0.011527283   NA    NA 131 689.1720
#> 6 Lepidoptera 0.005971425   NA    NA 0.007636927   NA    NA  44 241.2960
#> 7  Orthoptera 0.005429715   NA    NA 0.005220060   NA    NA  24 122.8997
#> 8  Psocoptera 0.005809177   NA    NA 0.016437311   NA    NA  46 267.1539

plot(rates.taxonomy[, 2], rates.taxonomy[, 5], xlim = c(.0034, .0074), ylim = c(.0025, .0225), pch = 20, main = "Simberloff dataset", xlab = "Colonization rate (day⁻¹)", ylab = "Extinction rate (day⁻¹)")
points(rates.islands[, 2], rates.islands[, 5], pch = 4)
points(rates.simberloff[, 1], rates.simberloff[, 4], pch = 3)

locs <- list(x = c(0.005136858, 0.003858476, 0.005435459, 0.005724728, 0.004026439, 0.005379471, 0.006919129, 0.005015552, 0.005594090, 0.004912908, 0.006303266, 0.004722939, 0.006069984, 0.005435459, 0.005892691), y = c(0.01074532, 0.007403081, 0.014011605, 0.010289563, 0.012644324, 0.010593403, 0.011353003, 0.009985722, 0.017429807, 0.012872204, 0.018645168, 0.011201083, 0.006415600, 0.004212758, 0.015454846))
text(locs, c("TIB", as.character(rates.islands$Group), substr(rates.taxonomy[, 1], 1, 3)), 
     col = c("black", rep("magenta", 6), rep("green", 8)))
Colonization and extinction rates for the data from Simberloff (1969) on experimental island zoogeography. In green, some of the groups studied, in magenta the different islands.

Colonization and extinction rates for the data from Simberloff (1969) on experimental island zoogeography. In green, some of the groups studied, in magenta the different islands.

The rates obtained indicate differences among islands, specially in colonization, while the different taxonomic groups show different rates.

A short example

In this section, we will reproduce, step-by-step, some of the results presented in the article by Alonso et al. (2015), on the recolonization process of a coral reef fish community in the Lakshadweep Archipelago after a coral mass mortality event. This article shows that higher trophic guilds suffer increased rates of extinction even in the absence of targeted fishing.
The article estimated colonization and extinction rates for 4 competing models to explain the dynamics of the coral fish community, performed model selection among those models and simulated the dynamics of different trophic guilds in each island. The four models were: A, the different guilds in the three sampled islands behaved as a single community; B, each island had a different dynamics; C, each guild had a different dynamics, but did not vary between islands; and D, each guild had a different dynamics in each island. Below, we reproduce part of Table 1 of the article, showing the model selection procedure, Figure 3, showing the transition probabilities found by the best model, and a plot for the temporal evolution of corallivorous fishes in Agatti atoll, included in the original article in Figure 2.

data(alonso15)
head(alonso15[[2]]) # Examine the data
#>                   Species Trophic.Level Kd.2000 Kd.2001 Kd.2002 Kd.2003 Kd.2010
#> 1 Acanthurus auranticavus          <NA>       1       1       1       1       1
#> 2 Acanthurus leucosternon             2       1       1       1       1       1
#> 3     Acanthurus lineatus             2       1       1       1       1       1
#> 4  Acanthurus nigrofuscus             2       1       1       1       1       1
#> 5   Acanthurus triostegus          2,78       1       1       1       1       1
#> 6 Acanthurus xanthopterus          2,87       0       0       0       1       1
#>   Kd.2011        Guild
#> 1       1 Algal Feeder
#> 2       1 Algal Feeder
#> 3       1 Algal Feeder
#> 4       1 Algal Feeder
#> 5       1 Algal Feeder
#> 6       1 Algal Feeder

# To begin, we calculate rates for each island (model B), and for each group in each island (model D).
rates.aga <- regular_sampling_scheme(x = alonso15[[1]], vector = 3:6)
rates.kad <- regular_sampling_scheme(x = alonso15[[2]], vector = 3:6)
rates.kav <- regular_sampling_scheme(x = alonso15[[3]], vector = 3:5)

ModelB <- rates.aga$NLL + rates.kad$NLL + rates.kav$NLL 


rates.aga.guild <- regular_sampling_scheme(x = alonso15[[1]], vector = 3:6, level = "Guild", n = 1)
rates.kad.guild <- regular_sampling_scheme(x = alonso15[[2]], vector = 3:6, level = "Guild", n = 1)
rates.kav.guild <- regular_sampling_scheme(x = alonso15[[3]], vector = 3:5, level = "Guild", n = 1)

ModelD <- sum(rates.aga.guild$NLL) + sum(rates.kad.guild$NLL) + sum(rates.kav.guild$NLL)

# Next, we calculate rates using the function irregular_multiple_datasets. We need to first change the colnames of the columns with data.

colnames(alonso15[[1]])[3:6] <- 2000:2003
colnames(alonso15[[2]])[3:6] <- 2000:2003
colnames(alonso15[[3]])[3:5] <- 2001:2003

rates.lak <- irregular_multiple_datasets(list = alonso15, vectorlist = list(3:6, 3:6, 3:5), c = 0.1, e = 0.1, jacobian = T, CI = T)

rates.guild <- irregular_multiple_datasets(list = alonso15, vectorlist = list(3:6, 3:6, 3:5), c = 0.1, e = 0.1, jacobian = T, column = "Guild", n = 1, CI = T)

# We can now follow the model selection procedure used in the article.
ModelA <- rates.lak$NLL
ModelC <- sum(rates.guild$NLL)

NLL <- c(ModelA, ModelB, ModelC, ModelD)
NLL
#> [1] 742.3037 741.1601 725.4704 710.7865

Parameters <- c(2, 6, 14, 42)
N <- rep(1248, 4)

AIC <- akaikeic(NLL = NLL, k = Parameters)
AICc <- akaikeicc(NLL = NLL, k = Parameters, n = 1248)

Table1 <- weight_of_evidence(data = data.frame(Model = c("A", "B", "C", "D"), AICc = AICc))
Table1 <- cbind(Table1, AIC, AICc, NLL, Parameters, N)
knitr::kable(Table1)
Model IncAIC w AIC AICc NLL Parameters N
A 9.335557 0.0093009 1488.607 1488.617 742.3037 2 1248
B 15.106378 0.0005193 1494.320 1494.388 741.1601 6 1248
C 0.000000 0.9901794 1478.941 1479.282 725.4704 14 1248
D 29.289027 0.0000004 1505.573 1508.571 710.7865 42 1248
# The following steps are used to reproduce Figure 3 in Alonso *et al* (2015).
rates.guild
#>               Group         c      c_up     c_low         e      e_up     e_low
#> 1      Algal Feeder 0.6314581 0.8750604 0.4396526 0.1913754 0.2873993 0.1198905
#> 2       Corallivore 0.4505997 0.6573846 0.2938969 0.2743163 0.4345424 0.1598409
#> 3 Macroinvertivore  0.5286944 0.6873453 0.3977289 0.4329315 0.5721428 0.3191255
#> 4  Microinvertivore 0.4823432 0.6950086 0.3201004 0.5565499 0.8014445 0.3698255
#> 5         Omnivore  0.5107695 0.8562513 0.2773004 0.2907457 0.5674376 0.1238472
#> 6        Piscivore  0.6428574 0.8878366 0.4518301 0.6607665 0.9631957 0.4321006
#> 7   Zooplanktivore  0.7924913 1.1952252 0.4994883 0.2661351 0.4598341 0.1371437
#>     N       NLL
#> 1  90 116.33150
#> 2  60  89.39827
#> 3 123 201.79017
#> 4  63 105.29037
#> 5  27  41.50041
#> 6  63 110.06134
#> 7  42  61.09837

# transform rates into transition probabilities.
tps <- cetotrans(c = rates.guild$c, e = rates.guild$e)

plot(tps[, 2], tps[, 1], xlim = c(0, 0.7), ylim = c(0, 0.5), type = "n", xlab = "Colonization probability", ylab = "Extinction probability", main = "Lakshadweep guilds")
text(tps[, 2], tps[, 1], c("A", "C", "M", "m", "O", "P", "Z"))
text(0.07, 0.25, "Trophic level", srt = 90, col = "magenta")
arrows(0.1, 0.1, 0.1, 0.4, code = 2, col = "green")
Colonization and extinction probabilities by guild within the coral reef fish community in the Lakshadweep Archipelago, India. Algal Feeders (A), Omnivores (O), Corallivores (C), Zooplanktivores (Z), Micro-invertivores (m), Macro-invertivores (M), and Piscivores (P).
Colonization and extinction probabilities by guild within the coral reef fish community in the Lakshadweep Archipelago, India. Algal Feeders (A), Omnivores (O), Corallivores (C), Zooplanktivores (Z), Micro-invertivores (m), Macro-invertivores (M), and Piscivores (P).
rates.aga.guild
#>               Group         c c_up c_low         e e_up e_low  N      NLL
#> 1      Algal Feeder 0.5058664   NA    NA 0.2360710   NA    NA 30 49.00071
#> 2       Corallivore 0.5902927   NA    NA 0.2459553   NA    NA 20 33.70719
#> 3 Macroinvertivore  0.3438327   NA    NA 0.3625524   NA    NA 41 69.42637
#> 4  Microinvertivore 0.5865863   NA    NA 0.7502420   NA    NA 21 41.07112
#> 5         Omnivore  0.9037842   NA    NA 0.2397795   NA    NA  9 14.71404
#> 6        Piscivore  0.6161308   NA    NA 0.7701635   NA    NA 21 41.12469
#> 7   Zooplanktivore  0.8724753   NA    NA 0.4800701   NA    NA 14 26.87099

tps <- cetotrans(c = rates.aga.guild[2, 2], e = rates.aga.guild[2, 5])

cor.agatti <- alonso15[[1]][alonso15[[1]]$Guild == "Corallivore", ]
richness <- colSums(cor.agatti[, 3:8])
richness
#>   2000   2001   2002   2003 A.2010 A.2011 
#>      7     11     12     14     14     18

sims <- data_generation(x = cor.agatti, column = 3, transitions = tps, iter = 300, times = 11)
sims.ic <- apply(X = sims, MARGIN = 1, FUN = quantile, probs = c(0.025, 0.975))

plot(c(2000:2003, 2010, 2011), richness, type = "b", ylim = c(0, 21), col = "red", xlab = "Time (Year)", ylab = "Species", main = "Corallivore/AGATTI")
lines(c(2000:2011), c(7, sims.ic[1, ]), lty = 2)
lines(c(2000:2011), c(7, sims.ic[2, ]), lty = 2)
Temporal evolution of species richness of corallivores in Agatti atoll, according to model D. In red, the observed species richness, while the black dashed lines represent 95% of the distribution of 300 simulations with the estimated dynamics.
Temporal evolution of species richness of corallivores in Agatti atoll, according to model D. In red, the observed species richness, while the black dashed lines represent 95% of the distribution of 300 simulations with the estimated dynamics.

Environmental fit

Apart from several biotic factors (e.g.. the presence of top predators or the abundance of primary producers), the number of species in a community can also be influenced by several abiotic factors (e.g.. temperature or rainfall) - referred to as environmental covariates. The package island includes functions all_environmental_fit, greedy_environmental_fit, custom_environmental_fit and rates_calculator to analyze the influence of environmental covariates on the colonization and extinction dynamics of ecological communities. The functions assume a linear functional response for environmental covariates, \[ c_t = \alpha_0 + \sum_{i=1}^F \alpha_{i} Y_{it}, \quad e_t = \beta_0 + \sum_{i=1}^F \beta_{i} Y_{it},\] where \(Y_{it}\) is the value of the \(i\)-th environmental covariate (\(i=1,\dots,F\)) at time \(t\), since it is the simplest way to build this dependency into colonization and extinction rates.
We employ R expressions in these functions. An expression is an R object, frequently a call, symbol or constant, that has to be evaluated prior to its use. We have minimized the use of expressions to make certain function calls easier to understand. However we make use of them internally and provide function custom_environmental_fit that needs two expressions in order to hone the results from all_environmental_fit and greedy_environmental_fit as well as developing custom dependencies with environmental covariates that can be specified by advanced users.
All three functions for environmental fit need a single argument dataset with the same structure as the one for irregular schemes and an argument vector indicating the columns containing presence absence data. In addition, all three functions need another data.frame with related environmental covariates in columns. The names of these columns have to be specified to functions all_environmental_fit and greedy_environmental_fit with the argument env. These two functions also need arguments c, e and aic, this is, tentative values for colonization and extinction rates and a tentative AIC value for the model. In practice, this AIC value should be chosen very large (of the order of 10⁸ but this depends on the size of the data set). The difference between all_environmental_fit and greedy_environmental_fit is that the latter employs a greedy algorithm that sequentially adds environmental covariates, one at a time, to the previously best set (using AIC), i. e., the algorithm finds first the best environmental covariate and then the best combination with two covariates including the previously selected, and so forth until AIC does not justify the addition of new covariates. In contrast all_environmental_fit tries all combinations of environmental variables, starting with the minimum number of environmental covariates to the maximum. Since the number of combinations rapidly becomes unmanageable, this method is not recommended except when we have few environmental variables.
We illustrate the use of these functions using data(idaho). This is a long-term time series of a mapped plant community in Dubois, Idaho (USA). The plant community of a sagebrush steppe was mapped and identified from 1923 to 1973, using permanent 1-m² quadrats, and precipitation, mean temperature and snowfall were measured monthly (we added the annual mean for each one). The data included in this package is a sample of the full data set corresponding to several quadrats surveyed from 1932 to 1955 with some gaps. The following code explores the relation of this plant community with total annual precipitation and mean monthly temperature.

data(idaho)
df <- idaho[[1]]
env <- idaho[[2]]

# Examine the colnames.
colnames(df)
#>  [1] "quad"    "species" "32"      "33"      "34"      "35"      "36"     
#>  [8] "37"      "38"      "39"      "40"      "41"      "42"      "45"     
#> [15] "46"      "47"      "49"      "50"      "51"      "52"      "54"     
#> [22] "55"      "56"
colnames(env)
#>  [1] "YEAR"        "JAN.ppt"     "FEB.ppt"     "MAR.ppt"     "APR.ppt"    
#>  [6] "MAY.ppt"     "JUN.ppt"     "JUL.ppt"     "AUG.ppt"     "SEP.ppt"    
#> [11] "OCT.ppt"     "NOV.ppt"     "DEC.ppt"     "TOTAL.ppt"   "JAN.temp"   
#> [16] "FEB.temp"    "MAR.temp"    "APR.temp"    "MAY.temp"    "JUN.temp"   
#> [21] "JUL.temp"    "AUG.temp"    "SEP.temp"    "OCT.temp"    "NOV.temp"   
#> [26] "DEC.temp"    "ANNUAL.temp" "JAN.snow"    "FEB.snow"    "MAR.snow"   
#> [31] "APR.snow"    "MAY.snow"    "JUN.snow"    "JUL.snow"    "AUG.snow"   
#> [36] "SEP.snow"    "OCT.snow"    "NOV.snow"    "DEC.snow"    "TOTAL.snow"

# Estimate the influence of temperature and precipitation on colonization and extinction dynamics. We first include the call to all_environmental_fit but we do not run it because of computation limits. We then use the greedy algorithm to find a good dependency.

envfit <- greedy_environmental_fit(dataset = df, vector = 3:23, env = c("env$TOTAL.ppt", "env$ANNUAL.temp"), c = 0.13, e = 0.19, aic = 100000)
envfit
#> $Colonization
#> expression(params[1] * env$TOTAL.ppt[i] + params[3])
#> 
#> $Extinction
#> expression(params[2] * env$ANNUAL.temp[i] + params[4])
#> 
#> $Results
#> $Results$par
#> [1] -0.00497925 -0.01729602  0.19006501  0.93486956
#> 
#> $Results$value
#> [1] -1874.859
#> 
#> $Results$counts
#> function gradient 
#>      215       NA 
#> 
#> $Results$convergence
#> [1] 0
#> 
#> $Results$message
#> NULL

all_environmental_fit(dataset = df, vector = 3:23, env = c("env$TOTAL.ppt", "env$ANNUAL.temp"), c = 0.13, e = 0.19, aic = 100000) 
#> $Colonization
#> expression(params[1] * env$TOTAL.ppt[i] + params[3])
#> 
#> $Extinction
#> expression(params[2] * env$ANNUAL.temp[i] + params[4])
#> 
#> $Results
#> $Results$par
#> [1] -0.00497925 -0.01729602  0.19006501  0.93486956
#> 
#> $Results$value
#> [1] -1874.859
#> 
#> $Results$counts
#> function gradient 
#>      215       NA 
#> 
#> $Results$convergence
#> [1] 0
#> 
#> $Results$message
#> NULL

The greedy algorithm identified a model that included precipitation in the expression for colonization and temperature in the expression for extinction. The output of the function is a list that indicates that its first component is the expression for the colonization, the second one is the expression for the extinction, and the third one is the result of the optimizer for these expressions; in this case, the output of the function indicates that the colonization rate depends on precipitation (with a coefficient of -0.00497925 and an intercept of 0.19006501), the extinction rate depends on temperature (with a coefficient of -0.01729602 and an intercept of 0.93486956), and it has a log-likelihood of 1874.859 that was found after 215 calls to the corresponding likelihood function. This result was backed by function all_environmental_fit, that found the same dependency with the environmental covariates.
We will now use the function custom_environmental_fit to demonstrate its use and improve the result we found earlier. Besides arguments dataset and vector, this function requires an argument params that corresponds to the parameters estimated for expressions c_expression and e_expression, for colonization and extinction. Next, we use rates_calculator, that uses arguments params, c_expression and e_expression as in the previous function plus argument t that indicates the number of colonization and extinction rates needed. This function calculates the actual rates at each time in order to be able to simulate the dynamics of the colonization and extinction process resulting from these parameters. Please note that these rates have dimensions of time⁻¹, so when converting rates to transition probabilities we have to indicate the interval of time between samples for each rate.

custom_environmental_fit(dataset = df, vector = 3:23, params = envfit$Results$par, c_expression = envfit$Colonization, e_expression = envfit$Extinction)
#> $par
#> [1] -0.004973345 -0.017281488  0.190021313  0.934197043
#> 
#> $value
#> [1] -1874.859
#> 
#> $counts
#> function gradient 
#>       97       NA 
#> 
#> $convergence
#> [1] 0
#> 
#> $message
#> NULL

rates.env <- rates_calculator(params = envfit$Results$par, c_expression =  envfit$Colonization, e_expression = envfit$Extinction, t = 21)
head(rates.env)
#>              c          e
#> [1,] 0.1398742 0.23293924
#> [2,] 0.1448036 0.19041984
#> [3,] 0.1491356 0.08866157
#> [4,] 0.1445547 0.18681650
#> [5,] 0.1462476 0.17917743
#> [6,] 0.1460484 0.21189574

dts <- as.numeric(colnames(df)[4:23]) - as.numeric(colnames(df)[3:22])
dts
#>  [1] 1 1 1 1 1 1 1 1 1 1 3 1 1 2 1 1 1 2 1 1

tps <- cetotrans(c = rates.env[-21, 1], e = rates.env[-21, 2], dt = dts)
sims <- data_generation(x = df, column = 3, transitions = tps, iter = 100, times = 20)
sims.ic <- apply(sims, 1, quantile, probs = c(0.025, 0.975))
med.ida <- apply(sims, 1, quantile, probs = .5)
sims.ic
#>         [,1]   [,2]   [,3]    [,4]   [,5]    [,6]    [,7] [,8]    [,9]  [,10]
#> 2.5%  52.950 60.950 73.475  75.475  79.95  80.000  74.475   79  80.000  76.95
#> 97.5% 73.525 84.525 98.000 101.000 106.05 104.525 101.050  105 102.525 106.00
#>         [,11]  [,12]  [,13]   [,14]  [,15] [,16] [,17]  [,18]   [,19]  [,20]
#> 2.5%   74.475 70.000 71.475  75.425  75.95 75.00    72  71.00  74.475  73.00
#> 97.5% 100.575 94.525 97.000 101.525 104.05 99.05    98 100.05 106.525 100.05

days <- 1900 + as.numeric(colnames(df)[3:23])


plot(days, colSums(df[, 3:23]), type = "b", ylim = c(50, 115), pch = 4, ylab = "Accumulated species richness", xlab = "Year", main = "Sagebrush steppe Dubois, ID, USA")
lines(days, c(57, sims.ic[1, ]), col = "magenta", lty = 3)
lines(days, c(57, sims.ic[2, ]), col = "magenta", lty = 3)
lines(days, c(57, med.ida), col = "green", lty = 2)
Temporal evolution of species richness in a plant community. The points connected with lines indicate the observed species richness, while the red dashed line indicates the 95% of the distribution of simulations, and the green dashed line the median richness for 300 simulations.

Temporal evolution of species richness in a plant community. The points connected with lines indicate the observed species richness, while the red dashed line indicates the 95% of the distribution of simulations, and the green dashed line the median richness for 300 simulations.

Conclusion

This vignette comprehensively illustrates most of the functionality of the package island. This package allows the estimation of colonization and extinction rates for whole communities or groups of species under two basic assumptions: species independence and species equivalence. It allows the user to employ typically messy ecological monitoring data with a variety of sampling schemes. It also enables explorations of the influence of environmental variables on colonization and extinction rates, and helps identify the best model explaining data using a model selection procedure. Finally, it simulates the dynamics of estimated parameters for simple stochastic models of island biogeography. For more details, take a look at the other vignettes of the package.