Example: Binary data imputation

In this example, we use the binary data from Schizophrenia data from NIMH study, i.e. schizob, which is included in the package. As described in Wang and Liu (2022), the original response variable was in numerical scale. The binary response was created using a cut-point of 3.5. The data is in a wide format. Please note that all binary variables which is set to be imputed should be converted into factor variables.

data(schizob)
head(schizob) %>% kbl(align = "c") %>% 
  kable_classic_2(full_width = F, html_font = "Cambria") %>%
  column_spec(1, width = "2cm") %>%
  add_header_above(c(" " = 1, "Responses at the baseline, week 1, week 3, and week 6" = 4))
Responses at the baseline, week 1, week 3, and week 6
tx y0 y1 y3 y6
1 1 0 0 1
1 1 0 0 0
1 1 0 0 NA
1 0 0 0 0
0 1 1 1 1
1 1 1 1 1

Missing pattern is displayed in the following plot:

Missing pattern of Schizophrenia data

Figure 1. Missing pattern of Schizophrenia data

To impute missing values with logit model, we can set up family argument, say, family = binomial(link = "logit"):

test = remiod(formula = y6 ~ tx + y0 + y1 + y3, data=schizob, family = binomial(link = "logit"),
              trtvar = 'tx', algorithm = "jags", method = "MAR", 
              n.iter = 0, warn = FALSE, mess = FALSE) 
print(test$mc.mar$models)

>                y6                   y3                   y1                   y0 
"glm_binomial_logit" "glm_binomial_logit" "glm_binomial_logit" "glm_binomial_logit"

However, if probit models are the choice, argument models must be set to accompany with family argument, like the following:

test.probit = remiod(formula = y6 ~ tx + y0 + y1 + y3, data=schizob, family = binomial(link = "probit"),
                     models = c(y0="glm_binomial_probit",y1="glm_binomial_probit",y3="glm_binomial_probit"),
                     trtvar = 'tx', algorithm = "jags", method = "MAR", 
                     n.iter = 0, warn = FALSE, mess = FALSE) 
print(test.probit$mc.mar$models)

>                    y6                    y3                    y1                     y0 
  "glm_binomial_probit" "glm_binomial_probit"  "glm_binomial_probit"  "glm_binomial_probit"

Let’s run the Probit model with an adaptation of 10000 and 2000 iterations for 4 chains. Chains run in parallel, which is set through doFuture package:

registerDoFuture()
plan(multisession(workers = 4))

bp.test = remiod(formula=y6 ~ tx + y0 + y1 + y3, data=schizob, family = binomial(link="probit"),
                 models = c(y0="glm_binomial_probit",y1="glm_binomial_probit",y3="glm_binomial_probit"),
                 n.iter = 2000,  n.chains = 4, n.adapt = 10000, thin=1, mess=TRUE, warn=FALSE,
                 algorithm = "jags", trtvar = 'tx', method="MAR")

plan(sequential)

The following plot show the estimated intervals as shaded areas under the posterior density curves for the parameters of treatment variable tx in imputation models:

The specified set of parameters can be submitted through argument subset with keyword selected_parms (alternatively, keyword selected_vars, which will be available in the new release, can also be used):

pms = c("beta[2]","alpha[2]","alpha[6]","alpha[9]")
mcsub = remiod:::get_subset(object = bp.test$mc.mar, subset=c(selected_parms = list(pms)))

color_scheme_set("purple")
mcmc_areas(
  mcsub, 
  pars = pms,
  prob = 0.95, # 95% intervals
  prob_outer = 0.99, # 99%
  point_est = "mean"
)

Intervals under the estimated posterior density curves for coefficients of `tx` in imputation models

Figure 2. Intervals under the estimated posterior density curves for coefficients of tx in imputation models

Reference

Wang and Liu. 2022. “Remiod: Reference-Based Controlled Multiple Imputation of Longitudinal Binary and Ordinal Outcomes with Non-Ignorable Missingness.” arXiv 2203.02771.