The goal of mipred is to calibrate a prediction rule using generalized linear models or Cox regression modeling, using multiple imputation to account for missing values in the predictors as described by Mertens, Banzato and de Wreede (2018) (https://arxiv.org/abs/1810.05099). Imputations are generated using the R package ‘mice’ without using outcomes on observations for which the prediction is generated. Two options are provided to generate predictions. The first is prediction-averaging of predictions calibrated from single models fitted on single imputed datasets within a set of multiple imputations. The second is application of the Rubin’s rules pooled model. For both implementations, unobserved values in the predictor data of new observations for which the predictions are derived are automatically imputed. The package contains two basic workhorse functions, the first of which is mipred() which generates predictions of outcome on new observations (when outcomes will by definition usually not be available at the time of calibration of the prediction rule). The second is the function mipred.cv() which generates cross-validated predictions with the methodology on existing data for which outcomes have already been observed. This allows users to assess predictive potential of the prediction models which are calibrated. The present version of the package is preliminary (development) and has only been thoroughly checked for application on binary-outcome logistic regression for now. The vignette which is included documents application of the functions for binary outcome data. Although we did not check extensively, the package should also work for continuous and counting outcomes. We are working to expand the functionality to censored survival outcomes.

You can install the released version of mipred from CRAN.

Alternatively, you can install the current version into R from GitHub using devtools:

For installation from Github, you may need to install and load the devtools package first before using the above command. See the book “R packages” (online version) by Hadley Wickham, chapter “Git and Github”.

There are currently two key functions

mipred() # prediction calibration with multiple imputation for missing predictors

mipred.cv() # cross-validation for prediction calibration with multiple imputation for missing predictors

The first function calibrates predictions for new observations and accounts for missing values in the predictor data (of either the calibration or new validation sample) through multiple imputation. The second function implements cross-validation of the same approach.

Let `dataset`

be a data.frame consisting of a vector of binary outcomes `outcome`

and two predictors `x1`

and `x2`

. The outcome must be fully observed. Likewise, let `newdataset`

be a data.frame with new observations for which the same predictors `x1`

and `x2`

are observed and for which we want to predict outcome, using a model fitted to the old data in `dataset`

. Either or both of these predictors may contain missing values in the calibration data, but this is also allowed in the new data for which we want to generate predictions.

We can generate predictions using the command

`preds <- mipred(outcome ~ x1 + x2, family=binomial, data=dataset, newdata=newdataset, nimp=100)`

This will use the logistic regression model and 100 imputations.

If we wanted to generate cross-validated predictions within the set `dataset`

, then we can generate these with the same model using

`preds.cv <- mipred.cv(outcome ~ x1 + x2, family=binomial, data=dataset, nimp=100, folds=10)`

This will generate cross-validated predictions from the same model and 100 imputations for each predicted observation, using 10-folds.

Please refer to the example included with the package. The package also includes a vignette which documents use for binary outcome data.