Harrison River Female Chinook 2011 Data

Introduction

The experiment consisted of daily sampling, but condensed to weeks. Returning adult salmon were tagged and released at the upstream trap. They were recaptured during carass surveys

A stratified-Petersen estimator (Darrock, 1961; Plante et al. 1996) is found for the unpooled data and some pooling of the rows/columns.

library(SPAS)

This will load the SPAS fitting functions and any associated packages needed by SPAS.

Import the data.

The data should be stored as an $$s+1$$ by $$t+1$$ (before pooling) matrix where the upper $$s \times t$$ matrix consists of the The $$s \times t$$ upper left matrix is the number of animals released in row stratum $$i$$ and recovered in column stratum $$j$$. Row $$s+1$$ contains the total number of UNMARKED animals recovered in column stratum $$j$$. Column $$t+1$$ contains the number of animals marked in each row stratum but not recovered in any column stratum. The $$rawdata[s+1, t+1]$$ is not used and can be set to 0 or NA. The sum of the entries in each of the first $$s$$ rows is then the number of animals marked in each row stratum. The sum of the entries in each of the first $$t$$ columns is then the number of animals captured (marked and unmarked) in each column stratum. The row/column names of the matrix may be set to identify the entries in the output.

Here is the raw data for the Conne River. Notice that very small number of releases and recoveries in week 6.

harrison.2011.chinook.F.csv <- textConnection("
4   ,      2   ,      1   ,     1   ,     0   ,     0   ,   130
12   ,      7   ,     14   ,     1   ,     3   ,     0   ,   330
7   ,     11   ,     41   ,     9   ,     1   ,     1   ,   790
1   ,     13   ,     40   ,    12   ,     9   ,     1   ,   667
0   ,      1   ,      8   ,     8   ,     3   ,     0   ,   309
0   ,      0   ,      0   ,     0   ,     0   ,     1   ,    65
744   ,   1187   ,   2136   ,   951   ,   608   ,   127   ,     0")

har.data
#>       V1   V2   V3  V4  V5  V6  V7
#> [1,]   4    2    1   1   0   0 130
#> [2,]  12    7   14   1   3   0 330
#> [3,]   7   11   41   9   1   1 790
#> [4,]   1   13   40  12   9   1 667
#> [5,]   0    1    8   8   3   0 309
#> [6,]   0    0    0   0   0   1  65
#> [7,] 744 1187 2136 951 608 127   0

A total of 138 fish were tagged and released in week 1. Of these fish, 4 were recovered downnstream in column stratum 1; 2 were recovered in column stratum 2; and 130 were never seen again. A total of 744 UNMARKED fish were recovered in column stratum 1.

You can add row and column names to the matrix which will used when the data are printed.

Fitting the Stratified-Petersen model

The Stratified-Petersen model is fit to the above data object. The very sparse last row will make fitting the model to the original data difficult with a very flat likelihood surface and so we relax the convergence criteria:

mod1 <- SPAS.fit.model(har.data,
model.id="No restrictions",
row.pool.in=1:6, col.pool.in=1:6,
optMethod.control=list(ftol=.00001))
#> Warning in sqrt(diag(hess$vcv)): NaNs produced #> Warning in sqrt(diag(RESULT$est.star$vcv)): NaNs produced #> Warning in sqrt(diag(RESULT$real$vcv)): NaNs produced #> Warning in sqrt(RESULT$real$est.indiv$N.stratum.vcv): NaNs produced
#> Warning in sqrt(diag(RESULT$real$est.indiv$exp.factor.vcv)): NaNs produced The row.pool.in and col.pool.in inform the function on which rows or columns to pool before the analysis proceeds. Both parameters use a vector of codes (length $$s$$ for row pooing and length $$t$$ for column pooling) where rows/columns are pooled that share the same value. For example. row.pool.in=c(1,2,3,4,5,6) would imply that no rows are pooled, while row.pool.in=c('a','a','a','b','b','b') would imply that the first three rows and last three rows are pooled. The entries in the vector can be numeric or character; however, using character entries implies that the final pooled matrix is displayed in the order of the character entries. I find that using entries such as '123' to represent pooling rows 1, 2, and 3 to be easiest to use. The SPAS system only fits models were the number of rows after pooling is less than or equal to the number of columns after pooling. Results from model 1 (no pooling). The likelihood is rather flat, so don’t be too concerned about messages from the optimization on failing to converge. As long as the likelihood values seem to be have stabilized and estimates are sensible (i.e. capture rates not 0 or 1; estimates of movement sensible results), the estimates should be fine. The results of the model fit is a LARGE list. But the function SPAS.print.model produces a nice report SPAS.print.model(mod1) #> Model Name: No restrictions #> Date of Fit: 2019-02-06 22:25 #> Version of OPEN SPAS used : SPAS-R 2019-01-01 #> #> Raw data #> V1 V2 V3 V4 V5 V6 V7 #> [1,] 4 2 1 1 0 0 130 #> [2,] 12 7 14 1 3 0 330 #> [3,] 7 11 41 9 1 1 790 #> [4,] 1 13 40 12 9 1 667 #> [5,] 0 1 8 8 3 0 309 #> [6,] 0 0 0 0 0 1 65 #> [7,] 744 1187 2136 951 608 127 0 #> #> Row pooling setup : 1 2 3 4 5 6 #> Col pooling setup : 1 2 3 4 5 6 #> Theta pooling : FALSE #> CJS pooling : FALSE #> #> #> Raw data AFTER POOLING #> pool1 pool2 pool3 pool4 pool5 pool6 V7 #> pool1 4 2 1 1 0 0 130 #> pool2 12 7 14 1 3 0 330 #> pool3 7 11 41 9 1 1 790 #> pool4 1 13 40 12 9 1 667 #> pool5 0 1 8 8 3 0 309 #> pool6 0 0 0 0 0 1 65 #> 744 1187 2136 951 608 127 0 #> #> Conditional Log-Likelihood: 47240.58 ; np: 48 ; AICc: 94577.73 #> #> Code/Message from optimization is: 0 Successful convergence #> #> Estimates #> pool1 pool2 pool3 pool4 pool5 pool6 psi cap.prob exp factor #> pool1 4.0 2.4 0.9 1.1 0.1 0.1 130 0.028 34.9 #> pool2 12.0 9.2 11.4 1.1 3.0 0.1 330 0.021 46.2 #> pool3 7.0 11.1 40.7 9.0 1.0 1.0 790 0.430 1.3 #> pool4 1.0 16.1 33.7 13.2 9.1 2.8 667 0.026 37.8 #> pool5 0.1 1.3 6.7 8.9 3.0 0.1 309 0.025 39.0 #> pool6 0.1 0.1 0.1 0.1 0.1 1.1 65 0.199 4.0 #> est unmarked 744.0 1181.1 2147.0 948.8 607.8 124.9 0 NA NA #> Pop Est #> pool1 4950 #> pool2 17308 #> pool3 2002 #> pool4 28810 #> pool5 13168 #> pool6 332 #> est unmarked 66587 #> #> SE of above estimates #> pool1 pool2 pool3 pool4 pool5 pool6 psi cap.prob exp factor #> pool1 2.0 1.6 0.8 1.1 0.2 0.4 11.4 NaN NaN #> pool2 3.6 2.9 2.7 1.1 1.5 0.7 18.2 NaN NaN #> pool3 2.6 3.3 6.4 3.0 0.5 1.0 28.1 0.609 3.3 #> pool4 1.0 3.7 5.8 4.1 2.2 1.1 25.8 0.009 13.4 #> pool5 0.2 1.3 2.4 2.8 1.6 0.6 17.6 0.021 33.6 #> pool6 0.2 0.2 0.2 0.2 0.2 1.0 8.1 NaN NaN #> est unmarked NA NA NA NA NA NA 0.0 NA NA #> Pop Est #> pool1 NaN #> pool2 NaN #> pool3 2839 #> pool4 9960 #> pool5 11065 #> pool6 NaN #> est unmarked 6943 The original data, the data after pooling, estimates and their standard errors are shown. Here the stratified-Petersen estimate of the total number of smolts passing the at the first sampling station is 66,587 with a standard error of 6,943. Each entry in the output is available in fitted model object. You will need to explore the structure names(mod1) #> [1] "version" "date" "input" "fit.setup" #> [5] "conditional" "model.info" "est.red.star" "est.star" #> [9] "real" names(mod1$real)
#> [1] "est.all"   "vcv.all"   "se.all"    "est.indiv"
names(mod1$real$est.indiv)
#>  [1] "theta"          "psi"            "cap"            "N"
#>  [5] "N.stratum"      "theta.vcv"      "theta.se"       "cap.vcv"
#>  [9] "cap.se"         "psi.vcv"        "psi.se"         "N.vcv"
#> [13] "N.se"           "N.stratum.vcv"  "N.stratum.se"   "exp.factor"
#> [17] "exp.factor.vcv" "exp.factor.se"

The N entries refer to the population size; the N.stratum entries refer to the individual stratum population sizes; the cap entries refer to the estimated probability of capture in each row stratum; the exp.factor entries refer to (1-cap)/cap, or the expansion factor for each row; the psi entries refer to the number of animals tagged but never seen again (the right most column in the input data); and the theta entires refer to the expected number of animals that were tagged in row stratum $$i$$ and recovered in column stratum $$j$$ (after pooling).

Pooling some rows and columns

As noted by Darroch (1961), the stratified-Petersen will fail if the matrix of movements is close to singular. This often happens if two rows are proportional to each other. In this case, there is no unique MLE for the probability of capture in the two rows, and they should be pooled. A detailed discussion of pooling is found in Schwarz and Taylor (1998).

Let us now poolw the first two rows and the last two rows, but leave rows 3 and 4 alone. Similar, let us pool the last two columns.

The code is

mod2 <- SPAS.fit.model(har.data, model.id="Pooling some rows",
row.pool.in=c("12","12","3","4","56","56"),
col.pool.in=c(1,2,3,4,56,56))

Notice how we specify the pooling for rows and columns and the choice of entries for the two corresponding vectors. I used character entries for the row pooling to ensure that the pooled matrix is sorted properly (see below); the numeric entries for the columns is ok as is.

The results of the model fit is

SPAS.print.model(mod2)
#> Model Name: Pooling some rows
#>    Date of Fit: 2019-02-06 22:25
#>    Version of OPEN SPAS used : SPAS-R 2019-01-01
#>
#> Raw data
#>       V1   V2   V3  V4  V5  V6  V7
#> [1,]   4    2    1   1   0   0 130
#> [2,]  12    7   14   1   3   0 330
#> [3,]   7   11   41   9   1   1 790
#> [4,]   1   13   40  12   9   1 667
#> [5,]   0    1    8   8   3   0 309
#> [6,]   0    0    0   0   0   1  65
#> [7,] 744 1187 2136 951 608 127   0
#>
#> Row pooling setup : 12 12 3 4 56 56
#> Col pooling setup : 1 2 3 4 56 56
#> Theta pooling     : FALSE
#> CJS pooling       : FALSE
#>
#>
#> Raw data AFTER POOLING
#>        pool1 pool2 pool3 pool4 pool56  V7
#> pool12    16     9    15     2      3 460
#> pool3      7    11    41     9      2 790
#> pool4      1    13    40    12     10 667
#> pool56     0     1     8     8      4 374
#>          744  1187  2136   951    735   0
#>
#>   Conditional   Log-Likelihood: 48053.07    ;  np: 28 ;  AICc: 96162.34
#>
#>   Code/Message from optimization is:  0 Successful convergence
#>
#> Estimates
#>              pool1  pool2  pool3 pool4 pool56 psi cap.prob exp factor
#> pool12        14.0   12.6   13.1   2.1    3.3 460    0.019       51.3
#> pool3          7.0   11.0   41.0   9.0    2.0 790    0.962        0.0
#> pool4          0.9   15.5   36.9  12.2   10.5 667    0.034       28.7
#> pool56         0.0    1.5    6.8   8.3    4.4 374    0.016       59.9
#> est unmarked 746.1 1180.4 2142.2 950.5  733.8   0       NA         NA
#>              Pop Est
#> pool12         26400
#> pool3            894
#> pool4          22101
#> pool56         24059
#> est unmarked   73453
#>
#> SE of above estimates
#>              pool1 pool2 pool3 pool4 pool56  psi cap.prob exp factor
#> pool12         4.0   3.2   3.3   1.4    1.7 21.4    0.006       15.7
#> pool3          2.6   3.3   6.4   3.0    1.4 28.1    0.698        0.8
#> pool4          0.9   4.3   6.8   3.5    3.0 25.8    0.019       17.0
#> pool56         0.0   1.6   2.3   2.7    1.7 19.3    0.011       41.4
#> est unmarked    NA    NA    NA    NA     NA  0.0       NA         NA
#>              Pop Est
#> pool12          7939
#> pool3            649
#> pool4          12627
#> pool56         16355
#> est unmarked   10412

Here the stratified-Petersen estimate of the total number of smolts passing the at the first sampling station is 73,453 with a standard error of 10,412. which is a slight reduction from the unpooled estimates.

Pooling to a single row and complete pooling

You can pool to a single row (and multiple columns) or a single row and single column (which is equivalent to the pooled Petersen estimator). The code and output follow:

mod3 <- SPAS.fit.model(har.data, model.id="A single row",
row.pool.in=rep(1, nrow(har.data)-1),
col.pool.in=c(1,2,3,4,56,56))
SPAS.print.model(mod3)
#> Model Name: A single row
#>    Date of Fit: 2019-02-06 22:25
#>    Version of OPEN SPAS used : SPAS-R 2019-01-01
#>
#> Raw data
#>       V1   V2   V3  V4  V5  V6  V7
#> [1,]   4    2    1   1   0   0 130
#> [2,]  12    7   14   1   3   0 330
#> [3,]   7   11   41   9   1   1 790
#> [4,]   1   13   40  12   9   1 667
#> [5,]   0    1    8   8   3   0 309
#> [6,]   0    0    0   0   0   1  65
#> [7,] 744 1187 2136 951 608 127   0
#>
#> Row pooling setup : 1 1 1 1 1 1
#> Col pooling setup : 1 2 3 4 56 56
#> Theta pooling     : FALSE
#> CJS pooling       : FALSE
#>
#>
#> Raw data AFTER POOLING
#>   pool1 pool2 pool3 pool4 pool56   V7
#> 1    24    34   104    31     19 2291
#>     744  1187  2136   951    735    0
#>
#>   Conditional   Log-Likelihood: 51374.83    ;  np: 7 ;  AICc: 102763.7
#>
#>   Code/Message from optimization is:  0 Successful convergence
#>
#> Estimates
#>              pool1  pool2  pool3 pool4 pool56  psi cap.prob exp factor
#> 1             27.3   43.4   79.6  34.9   26.8 2291    0.036       27.1
#> est unmarked 740.7 1177.6 2160.4 947.1  727.2    0       NA         NA
#>              Pop Est
#> 1              70427
#> est unmarked   70427
#>
#> SE of above estimates
#>              pool1 pool2 pool3 pool4 pool56  psi cap.prob exp factor
#> 1              2.1   3.2   5.6   2.6    2.1 47.9    0.002        1.9
#> est unmarked    NA    NA    NA    NA     NA  0.0       NA         NA
#>              Pop Est
#> 1               4545
#> est unmarked    4545
mod4 <- SPAS.fit.model(har.data, model.id="Pooled Peteren",
row.pool.in=rep(1, nrow(har.data)-1),
col.pool.in=rep(1, ncol(har.data)-1))
SPAS.print.model(mod4)
#> Model Name: Pooled Peteren
#>    Date of Fit: 2019-02-06 22:25
#>    Version of OPEN SPAS used : SPAS-R 2019-01-01
#>
#> Raw data
#>       V1   V2   V3  V4  V5  V6  V7
#> [1,]   4    2    1   1   0   0 130
#> [2,]  12    7   14   1   3   0 330
#> [3,]   7   11   41   9   1   1 790
#> [4,]   1   13   40  12   9   1 667
#> [5,]   0    1    8   8   3   0 309
#> [6,]   0    0    0   0   0   1  65
#> [7,] 744 1187 2136 951 608 127   0
#>
#> Row pooling setup : 1 1 1 1 1 1
#> Col pooling setup : 1 1 1 1 1 1
#> Theta pooling     : FALSE
#> CJS pooling       : FALSE
#>
#>
#> Raw data AFTER POOLING
#>      1   V7
#> 1  212 2291
#>   5753    0
#>
#>   Conditional   Log-Likelihood: 60410.94    ;  np: 3 ;  AICc: 120827.9
#>
#>   Code/Message from optimization is:  0 Successful convergence
#>
#> Estimates
#>                 1  psi cap.prob exp factor Pop Est
#> 1             212 2291    0.036       27.1   70426
#> est unmarked 5753    0       NA         NA   70426
#>
#> SE of above estimates
#>                 1  psi cap.prob exp factor Pop Est
#> 1            14.6 47.9    0.002        1.9    4545
#> est unmarked   NA  0.0       NA         NA    4545

Both models have an estimated abundance of 70,427 with a standard error of 4,545. which is a slight reduction from the unpooled estimates.

Comparing different pooling

Unfortunately, because pooling actually changes the data (internally), it is not possible to do likelihood ratio testing or to use AICc to compare different poolings. Methods to compare different poolings is under active investigation.

References

Darroch, J. N. (1961). The two-sample capture-recapture census when tagging and sampling are stratified. Biometrika, 48, 241–260. https://www.jstor.org/stable/2332748

Plante, N., L.-P Rivest, and G. Tremblay. (1988). Stratified Capture-Recapture Estimation of the Size of a Closed Population. Biometrics 54, 47-60. https://www.jstor.org/stable/2533994

Schwarz, C. J., & Taylor, C. G. (1998). The use of the stratified-Petersen estimator in fisheries management with an illustration of estimating the number of pink salmon (Oncorhynchus gorbuscha) that return to spawn in the Fraser River. Canadian Journal of Fisheries and Aquatic Sciences, 55, 281–296. https://doi.org/10.1139/f97-238