`bhrcr`

This vignette contains a brief description of malaria parasite clearance rate regression and provides a quick tutorial for the ** bhrcr** package. For more details, please see our paper Sharifi-Malvajerdi

Malaria is a mosquito-borne disease caused by parasites that was estimated to cause 429,000 deaths in 2015 (World Malaria Report, 2016). Resistance to anti-malarial drugs such as Artemisinin is a major concern in the public health fight against malaria (Ashley *et al.* 2014). Artemisinin resistance is manifested by delayed parasite clearance after treatment; slower parasite clearance can therefore indicate emergence of parasite resistance, although it can also be associated with host factors such as decreased immunity, inadequate dosing or poor drug absorption. Understanding how covariates relate to parasite clearance rate is important for understanding host and parasite factors’ association with delayed parasite clearance, characterizing resistance and defining spatio-temporal trends in resistance. The parasite clearance rate is defined as the negative of the slope of the log-parasitemia over the time in which the antimalarial is having its primary effect, and we call this time period the “decay” phase. There are some difficulties that arise in calculating the parasite clearance rates. First, some patients’ profiles may contain a “lag” phase, before the decay phase, in which the parasite density remains constant, or even increases, in a period right after artemisinin administration (Doolan (2002), Koning (2007)). Second, there might be also a “tail” phase, after the decay phase, where the true parasite count remains close to the detection limit, with no decline over a few measurements, and once the detection limit is reached, observations are left censored. Lastly, there may exist measurement errors in the measured values of parasite densities (see Dowling and Shute (1966) and O’Meara *et al.* (2005) for more details). The Parasite Clearance Estimator (PCE) was developed by the WorldWide Antimalarial Resistance Network (WWARN) in response to the need from field researchers for a method to quickly and reliably estimate parasite clearance rates, while accounting for existence of lag phases, tail phases, and censored observations (Flegg *et al.* (2011)).

Although the WWARN PCE serves as a powerful tool for estimating the clearance rates in many studies, estimating the impact of individual level covariates on these clearance rates might be the primary end point in some other studies. For instance, as in Amaratunga *et al.* (2012), understanding how parasite factors and host factors influence clearance rates can provide insights into the mechanism of artemisinin resistance. One common approach in estimating the effect of individual level covariates on clearance rates is using a *two-stage procedure*, where WWARN PCE is followed by a simple linear regression. Even though using the two-stage approach is straightforward, it has some drawbacks which motivated Fogarty *et al.* (2015) to develop the Bayesian Clearance Estimator. This procedure uses a Bayesian hierarchical model to estimate both clearance rates and the impact of patient level covariates on them, while accounting for lag phases, tail phases, and censored observations. Given the advantages of the Bayesian approach over the two-stage analysis, we built the ** bhrcr** package to provide researchers in the related fields with software that performs the Bayesian hierarchical regression on clearance rates. The

`bhrcr`

`calculatePCE`

function) to calculate the WWARN PCE estimates of the parasite clearance rates as well.The ** bhrcr** package takes serial measurements of a response on an individual (e.g., parasite counts after artemisinin administration) that is decaying over time, and performs Bayesian hierarchical regression on the clearance rates. While this tutorial illustrates the method in the context of malaria, the package can be utilized to analyze any clearance data fitting the framework presented in the next section. The

`bhrcr`

`clearanceEstimatorBayes`

, which will be clarified thoroughly later on. While the `clearanceEstimatorBayes`

function returns the WWARN PCE estimates as well, we have incorporated the `calculatePCE`

function in the package, which only provides the WWARN PCE estimates of the clearance rates. The generic `summary`

, `print`

, and `plot`

functions , as well as the `diagnostics`

function, will be illustrated by examples in following sections.For a quick demonstration of the package, please run the following functions:

```
library(bhrcr)
# If you don't bother to see the step-by-step interactive
# process of PCE estimation and generating plots
# please set "ask = F".
demo(fastExample, ask = F)
# or we can run the slowExample.
# to save your time, we have already run the MCMC in the slow example for you.
# the demo will show you the saved results.
demo(slowExample, ask = F)
```

We now briefly present the Bayesian Clearance Estimator developed in Fogarty *et al.* (2015). Let \(y_{ij}\) represent the \(j\)th measurement of patient \(i\)’s parasite count at time \(t_{ij}\), where \(1 \le i \le N\) and \(1 \le j \le n_i\)^{2}. Suppose \(\delta_{i}^\ell\) is patient \(i\)’s time of changepoint between the lag and decay phases, and let \(\delta_{i}^\tau\) be patient \(i\)’s time of changepoint between decay and tail phases. The observed data are assummed to follow a continuous piecewise linear model^{3}: \[
\log (y_{ij}) = \alpha_i - \beta_i \left(\delta_i ^\ell \mathbb{1}_{ t_{ij} < \delta_i ^\ell} + t_{ij} \mathbb{1}_{ \delta_i ^\ell \le t_{ij} \le \delta_i ^\tau} + \delta_i ^\tau \mathbb{1}_{ t_{ij} > \delta_i ^\tau } \right) + \epsilon_{ij} \tag{$\dagger$}
\] where \(\beta_i\) is the clearance rate of the \(i\)th individual, and \(\epsilon_{ij} \overset{iid} \sim \mathcal{N}(0 , \sigma_\epsilon^2).\)^{4}

Within a Bayesian hierarchical structure, the patients, and correspondingly the patients’ parameters such as \(\{ \beta_i \}_{i=1}^N\) and \(\{ \alpha_i \}_{i=1}^N\), are assumed to be draws from a common distribution. This hierarchical structure allows us to borrow strength across patients, in the sense that information about all patients informs the regularization of patient-specific parameters. For details on the prior distributions used in this Bayesian framework, see Fogarty *et al.* (2015) and our paper which will appear in the Malaria Journal.

`Pursat`

DataThe data sets contained in the ** bhrcr** package consist of

Sex: A factor variable with two levels

`F`

and`M`

Age Group: 21+ (21 years of age or older), or 21- (younger than 21 years)

Veal Veng or Kranvanh: whether or not an individual was from these two districts

Hemoglobin E: the number of alleles of Hemoglobin E variant

\(\alpha\)-thalassaemia: the number of alleles of \(\alpha\)-thalassaemia variant

G6PD deficient: the number of alleles of G6PD deficient variant

Log initial parasite density

Year:

`TRUE`

if 2010,`FALSE`

if 2009Parasite group: 1 if

*group 1*, 0 if*group 2*

For more details on the data, see (Amaratunga *et al.* 2012) and (Fogarty *et al.* 2015). One can use `data("pursat")`

and `data("pursat_covariates")`

to access the data sets.

`clearanceEstimatorBayes`

FunctionThe `clearanceEstimatorBayes`

function is the principal function in the ** bhrcr** package that analyzes the input data set in the Bayesian framework presented before, and provides the posterior distributions of the parameters, along with point estimates and credible intervals.

Usage:

```
out <- clearanceEstimatorBayes(data = data, covariates=covariates,
seed=1234, detect.limit=40, outlier.detect = TRUE, conf.level=.95,
niteration = 100000, burnin = 500, thin = 50,
filename = "output.csv")
```

See the manual page of this function for more information on the arguments and outputs.

`summary`

and `print`

FunctionsThe `summary`

function produces comprehensive and compressed output information based on the results from the main function, `clearanceEstimatorBayes`

. To further illustrate this point, we use the built-in data sets of ** bhrcr** package,

`pursat`

and `pursat_covariates`

, to provide an example.```
library(bhrcr)
data(pursat)
data(pursat_covariates)
results <- clearanceEstimatorBayes(data = pursat,
covariates = pursat_covariates, seed = 1234,
detect.limit = 15, burnin=50, niteration=100, thin=10)
summary(results)
```

For reproducibility of our results, we may set the `seed`

argument to be `1234`

. The output given by `summary`

includes a table containing posterior mean and median of the regression coefficients which represent the impact of covariates on log parasite clearance rates and also on the corresponding log half-life values. The half-life value is calculated as \(\log(2) ~/ \text{ (Clearance Rate) }\). Thus, even though our method originally regressed log clearance rates rather than log half-lives on the covariates, we can attain the slopes for a regression of the log half-lives by using \(\log(\text{Half-Life}) = \log\log(2) - \log(\text{Clearance Rate})\).

If the input data set does not contain WWARN PCE estimates, the `clearanceEstimatorBayes`

function will automatically generate a folder called `PceEstimates`

under your current working directory to store calculated WWARN PCE estimates for each individual.

In what follows, we display the results in terms of log half-lives which may be more intuitive to the malaria research community. The half-life is the time it takes for the parasite density to reduce by 50%; the longer the half-life, the slower the parasite clearance.

```
Summary:
clearanceEstimatorBayes(data = pursat, covariates = pursat_covariates,
seed = 1234, detect.limit = 15, niteration = 100, burnin = 50, thin = 10)
Posterior Estimates and Intervals for the Effect of Covariates on log half-lives
Mean Median CI 2.5% CI 97.5%
(Intercept) 1.1371 1.2486 0.3096 1.7616
SexM 0.1648 0.1508 0.0755 0.3060
agegroup21+ -0.0002 0.0163 -0.0674 0.0866
vvkvTRUE -0.0227 -0.0295 -0.0985 0.0567
HbE 0.0898 0.0961 -0.0201 0.2017
athal -0.0348 -0.0608 -0.1263 0.1307
g6pd -0.0168 -0.0222 -0.0814 0.0579
lnPf0 0.0356 0.0175 -0.0140 0.1162
year2010TRUE 0.0465 0.0488 -0.0306 0.1213
group 0.1532 0.1522 0.0734 0.2418
---
Detect Limit: 15 , Log Base: 2.718
```

Based on the output of the `summary`

function, we can perform an analysis of the covariates of interest. For details, please see (Fogarty *et al.* 2015) and our paper.

`diagnostics`

FunctionThe `diagnostics`

function provides diagnostic analysis such as trace plots, ACF and PACF plots for some important parameters in the MCMC process of Gibbs sampling.^{6} These diagnostic plots help to assess whether it is plausible that the MCMC process has reached stationarity and that we have thinned sufficiently (see (Cowles and Carlin 1996); (Gelman and Shirley 2011)).

```
# We use the results given by our previous example
# All diagnostic plots are saved under "./mcmcDiagnostics"
diagnostics(results)
```

In our fast example, the burn-in period and the total length of simulation (also referred to as the length of Markov chain) are short, which may not provide enough time for convergence. For serious malaria research, our recommendation is:

detect outliers by using the methodology suggested in (Flegg

*et al.*2011). Flegg’s outlier detection method is recommended. However, users can choose to toggle it off by setting`outlier.detect = FALSE`

when they are running the main function`clearanceEstimatorBayes`

. If the outliers are determined to be likely due to transcription errors, then the outlying data points should be deleted;run the MCMC algorithm (already embedded in

`clearanceEstimatorBayes`

) with various lengths and observe the trace plots, ACF plots (explained later), which helps determine the suitable burn-in period. Make sure the final sample is collected after the Markov chain reaches stationarity, i.e. the distribution of the values after the burn-in ends should be similar to the values at the middle and end of the chain. For the current version of`bhrcr`

, parallelization is not supported so that users have to run one chain at a time;run the formal MCMC with a long run instead of just several short runs. Only a long run can give the Markov chain enough time to mix well and thus to get its equilibrium since one is not able to foresee how slow the mixing rate might be for real problems especially for those in high-dimensional space;

Optional: set a suitable step size in “thinning” to make sure the final sample is close to independent if independence or low correlation is highly desired (the ACF plot can be used to detect autocorrelation). But “thinning” will inevitably sacrifice some estimation efficiency.

The posterior results produced by our fast example may not be very reliable; we have used the fast sample just for tutorial purposes. For the results of the Bayesian clearance estimator to truly reflect the posterior uncertainty in our estimators, we need to ensure that stationarity has been achieved. Results that satisfy the requisite diagnostics are found in a longer sample (`slowExample`

), which we have saved into a dataset called and incorporated into the ** bhrcr** package. To see the results, just run the slow example in the demo:

For detailed analysis of the diagnostic results, please see our paper.

`plot`

FunctionThe `plot`

function visualizes the results returned by the `clearanceEstimatorBayes`

function. We continue our previous example as follows:

The output provides a group of figures showing all patients’ posterior log-parasitemia profiles fitted by the Bayesian method. The following figure shows the profile of patient 1. It seems to exhibit only a decay phase.

Amaratunga, C., Sreng, S., Suon, S., Phelps, E.S., Stepniewska, K., Lim, P., and al., 2012. Artemisinin-resistant *plasmodium falciparum* in pursat province, western cambodia: A parasite clearance rate study.

Ashley, E.A., Dhorda, M., Fairhurst, R.M., Amaratunga, C., Lim, P., Suon, S., and al., 2014. Spread of artemisinin resistance in plasmodium falciparum malaria.

Cowles, M.K. and Carlin, B.P., 1996. Markov chain monte carlo convergence diagnostics: A comparative review. *Journal of the American Statistical Association*, 91 (434), 883–904.

Doolan, D.L., 2002. Malaria methods and protocols.

Dowling, M. and Shute, G., 1966. A comparative study of thick and thin blood films in the diagnosis of scanty malaria parasitaemia.

Flegg, J., Guerin, P., White, N., and Stepniewska, K., 2011. Standardizing the measurement of parasite clearance in *falciparum* malaria: The parasite clearance estimator.

Fogarty, C.B., Fay, M.P., Flegg, J.A., Stepniewska, K., Fairhurst, R.M., and Small, D.S., 2015. Bayesian heirarchical regression on clearance rates in the presence of "lag" and "tail" phases with an application to malaria parasites.

Gelman, A. and Shirley, K., 2011. Inference from simulations and monitoring convergence. *Handbook of markov chain monte carlo*, 163–174.

Koning, L.O., 2007. Progress in malaria research.

O’Meara, W.P., McKenzie, F.E., Magill, A.J., Forney, J.R., Permpanich, B., Lucas, C., Gasser Jr, R.A., and Wongsrichanalai, C., 2005. Sources of variability in determining malaria parasite density by microscopy.

Sharifi-Malvajerdi, S., Zhu, F., Fogarty, C.B., Fay, M.P., Fairhurst, R.M., Flegg, J.A., Stepniewska, K., and Small, D.S., 2019. Malaria parasite clearance rate regression: An r software package for a bayesian hierarchical regression model. *Malaria Journal*, 18 (1), 4.

contributed equally to this work↩

Note that this method allows uneven measurement times.↩

\(\mathbb{1}_A\) is the indicator function of \(A\) which takes the value one if \(A\) occurs, and zero otherwise.↩

\(\mathcal{N}(\mu, \sigma^2)\) represents the normal distribution with mean \(\mu\) and variance \(\sigma^2\).↩

It may take a while to run the code, depending on your computer’s hardware. Here we only use a small number of iterations for tutorial purpose.↩

For those who may not be very familiar with Gibbs sampling: In statistics, Gibbs sampling or a Gibbs sampler is a conditional sampling technique for obtaining a sequence of observations which are approximately from a specified multivariate probability distribution, when direct sampling is difficult. This method is frequently used in Bayesian statistics. Usually we need to set a “burn-in” period for our MCMC algorithm and to discard the first \(m\) samples. The idea is that a “bad” starting point may over-sample regions that have very low probability under the equilibrium distribution before it converges to the equilibrium distribution. So we need to give the Markov chain time to reach its equilibrium. Furthermore, Gibbs sampling generates a Markov chain of samples, each of which is correlated with nearby samples. Thus if uncorrelated samples are required for the model, we may thin the resulting chain (after the burn-in period) by only taking every \(n\)-th value, which is called “thinning”.↩

We still use the fast example with

`seed = 1234`

for reproducibility.↩