`library(anovir)`

This vignette explains how to use the default form of the negative log-likelihood (*nll*) functions in this package.

In their default form, the various parameters underlying each *nll_function* are estimated as constants. However, this behaviour can be extended to make these parameters into functions themselves. In such cases, it is the parameters of these parameter functions that are estimated by constants.

How to modify *nll_functions* is described in a separate vignette: nll_functions_modifying

The *nll_functions* in this package calcuate the negative log-likelihood (*nll*) for observed patterns of mortality in survival data based on the approach analysing relative survival. The different functions vary in how they assume the data are structured. However the way to use each function is the same and involves a two-step process;

- Step #1 collects the information needed to form a likelihood expression,
- Step #2 calls a maximum likelihood estimation function to mimimise the
*nll*of the expression by varying the expressions' variables

The resulting estimates of these variables can be used to describe the patterns of background mortality and mortality due to infection in the observed data, including the pathogen's virulence.

The following sections provide further details of the two-step process.

The first step is to write a preparatory-, or prep-, function which collects the information needed to define and parameterise the likelihood expression to be minimised.

The formals, or arguments, of the 'prep-function' lists the names of the variables to be estimated

The body of the 'prep-function' contains the name of the

*nll_function*to be used.- The formals, or arguments, of the
*nll_function*repeats the list of variables to be estimated, plus details identifying where the data are, how they are labelled, and the choice of probability distribution(s) to describe them.

- The formals, or arguments, of the

The example below shows Step #1 preparing the function *nll_basic* for analysis, given the data frame `data01`

.

`data01`

is a subset of data from the study by Blanford et al [1]. The data are from the uninfected and infected treatments `cont`

and `Bb06`

, respectively, of the 3rd experimental block of the experiment.

```
head(data01, 6)
#> block treatment replicate_cage day censor d inf t fq
#> 450 3 cont 1 1 0 1 0 1 0
#> 451 3 cont 1 2 0 1 0 2 0
#> 452 3 cont 1 3 0 1 0 3 0
#> 453 3 cont 1 4 0 1 0 4 1
#> 454 3 cont 1 5 0 1 0 5 0
#> 455 3 cont 1 6 0 1 0 6 1
```

```
m01_prep_function <- function(a1, b1, a2, b2){
nll_basic(
a1, b1, a2, b2,
data = data01,
time = day,
censor = censor,
infected_treatment = inf,
d1 = 'Weibull', d2 = 'Weibull')
}
```

Here,

`m01_prep_function`

- name given to the 'prep' function being prepared

`function(a1, b1, a2, b2)`

- the formals, or arguments of the function containing the names of the variables to be estimated
- here they correspond with the location and scale parameters for background mortality and mortality due to infection,
*a1, b1, a2, b2*, respectively

`data = data01`

- data = the name of the data frame containing the data to be analysed;

`time = day`

- time = the name of the column in the data frame identifying the timing of events;
- the column
*t*could also have been used in this example - values in this column must be > 0

`censor = censor`

- censor = the name of the column in the data frame whether data were censored or not;
- values in this column must be,
- '0' data not censored
- '1' data right-censored

`infected_treatment = inf`

- infected_treatment = the name of the column in the data frame identifying whether data are from an infected or uninfected treatment;
- values in the this column must be,
- '0' uninfected or control treatment
- '1' infected treatment

- NB the function
*nll_basic*assumes all individuals in an infected treatment are infected; this is not necessarily the case for other functions; see nll_functions

`d1 = 'Weibull', d2 = 'Weibull'`

- d1 = the name of the probability distribution chosen to describe background mortality
- d2 = the name of the probability distribution chosen to describe mortality due to infection
- choice of distributions are, Gumbel, Weibull, Fréchet
- NB the exponential distribution is a special case of the Weibull distribution. It can be specified by setting a scale parameter to equal one, e.g.,
*b1 = 1*see the exponential distribution

- NB the exponential distribution is a special case of the Weibull distribution. It can be specified by setting a scale parameter to equal one, e.g.,

**NB***nll*functions automatically detect the column`fq`

and take into account the frequency of individuals involved- if there is no column
`fq`

it is assumed each line of data corresponds with an individual host.

- if there is no column

The information above is used to define a log-likelihood function of the form;

\[\begin{equation} \log L = \sum_{i=1}^{n} \left\{ d \log \left[ h_1(t_i) + g \cdot h_2(t_i) \right] + \log \left[ S_1(t_i) \right] + g \cdot \log \left[ S_2(t_i) \right] \right\} \end{equation}\]where;

*d*is a death indicator taking a value of '1' when data are for death and '0' for censored data;- it is the complement of the data defined as 'censor'

*g*is an indicator of infection treatment taking a value of '1' for an infected treatment and '0' for an uninfected treatment*h*_{1}(*t*) is the hazard function for background mortality for individual*i*at time*t*

*h*_{2}(*t*) is the hazard function for mortality due to infection for individual*i*at time*t*

*S*_{1}(*t*) is the cumulative survival function for background mortality for individual*i*at time*t*

*S*_{2}(*t*) is the cumulative survival function for mortality due to infection for individual*i*at time*t*

The Weibull distribution was chosen to describe the background rate of mortality and mortality due to infection.

The survival function describing background mortality was,

\[\begin{equation} S_1(t) = \exp \left[ - \exp \left( z_1 \right) \right] \end{equation}\]with the hazard function being,

\[\begin{equation} h_1(t) = \frac{1}{b_1 t} \exp \left( z_1 \right) \end{equation}\]where, \(z_1 = \left( \log t - a_1\right) / b_1 \)

Equivalent expressions described mortality due to infection, where the index '1' is replaced by '2'.

*a*_{1}, *b*_{1}, *a*_{2}, *b*_{2} are the variables to be estimated, as listed in the formal arguments of `m01_prep_function`

.

Step #2 involves calling the *mle2* function of the package *bbmle* [2] which will estimate the values of variables maximising the likelihood function.

In the example below, *mle2* is given the name of the prep-function, `m01_prep_function`

along with a list giving starting values for the variables to be estimated.

```
m01 <- mle2(m01_prep_function,
start = list(a1 = 2, b1 = 0.5, a2 = 2, b2 = 0.5)
)
```

Yielding the results,

```
#> Maximum likelihood estimation
#>
#> Call:
#> mle2(minuslogl = m01_prep_function, start = list(a1 = 2, b1 = 0.5,
#> a2 = 2, b2 = 0.5))
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(z)
#> a1 2.845099 0.069055 41.2005 < 2.2e-16 ***
#> b1 0.482788 0.043041 11.2168 < 2.2e-16 ***
#> a2 2.580764 0.028630 90.1411 < 2.2e-16 ***
#> b2 0.183133 0.031634 5.7891 7.078e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> -2 log L: 1293.144
```

The location and scale parameters describing mortality due to infection, *a2, b2*, lead to a numerical expression for pathogen's virulence at time *t* as,

where the rate of mortality due to infection accelerates over time.

The default of *nll_basic* returns estimates for, *a1, b1, a2, b2*. The function *conf_ints_virulence* can be used to generate a matrix with the estimate of virulence (± 95% c.i.) based on the variance and covariance of estimates for *a2* and *b2*.

```
coef(m01)
#> a1 b1 a2 b2
#> 2.8450992 0.4827880 2.5807642 0.1831328
vcov(m01)
#> a1 b1 a2 b2
#> a1 0.004768599 0.0015413625 -0.0009298910 0.0007351280
#> b1 0.001541362 0.0018525636 -0.0001777557 -0.0001259104
#> a2 -0.000929891 -0.0001777557 0.0008196927 -0.0003119921
#> b2 0.000735128 -0.0001259104 -0.0003119921 0.0010007282
a2 <- coef(m01)[[3]]
b2 <- coef(m01)[[4]]
var_a2 <- vcov(m01)[3,3]
var_b2 <- vcov(m01)[4,4]
cov_a2b2 <- vcov(m01)[3,4]
ci_matrix01 <- conf_ints_virulence(
a2 = a2, b2 = b2,
var_a2 = var_a2, var_b2 = var_b2, cov_a2b2 = cov_a2b2,
d2 = "Weibull", tmax = 14)
tail(ci_matrix01)
#> t h2 dh2_da2 dh2_db2 sd_h2 lower_ci upper_ci
#> [9,] 9 0.07472008 -0.4080105 0.446496391 0.02120460 0.04284230 0.1303172
#> [10,] 10 0.11954723 -0.6527900 0.338799748 0.02453919 0.07994882 0.1787586
#> [11,] 11 0.18288260 -0.9986341 -0.001438483 0.02857553 0.13463847 0.2484138
#> [12,] 12 0.26960567 -1.4721871 -0.701596941 0.04030686 0.20112658 0.3614003
#> [13,] 13 0.38528853 -2.1038756 -1.922190342 0.06929851 0.27082267 0.5481345
#> [14,] 14 0.53622430 -2.9280634 -3.860097085 0.12200915 0.34329369 0.8375817
plot(ci_matrix01[, 't'], ci_matrix01[, 'h2'],
type = 'l', col = 'red', xlab = 'time', ylab = 'virulence (± 95% ci)'
)
lines(ci_matrix01[, 'lower_ci'], col = 'grey')
lines(ci_matrix01[, 'upper_ci'], col = 'grey')
```

1. Blanford S, Jenkins NE, Read AF, Thomas MB. 2012 Evaluating the lethal and pre-lethal effects of a range of fungi against adult anopheles stephensi mosquitoes. *Malaria Journal* **11**, 10. (doi:10.1186/1475-2875-11-365)

2. Bolker, B. and R Development Core Team. 2017 *Bbmle: Tools for general maximum likelihood estimation*. See https://CRAN.R-project.org/package=bbmle.