The LMTP Package

Nick Williams & Ivan Diaz


Most modern causal inference methods consider the effects of a treatment on a population mean outcome under interventions that set the treatment value deterministically. For example, the average treatment effect (ATE) considers the hypothetical difference in a population mean outcome if a dichotomous exposure was applied to all observations versus if it was applied to none. In the case of a continuous exposure, interventions that set the exposure to a static value deterministically are of little practical relevance. Furthermore, the estimation of causal effects requires the so called positivity assumption which states that all observations have a greater than zero chance of experiencing the exposure value under consideration. This assumption is often violated when evaluating the effects of deterministic interventions, and is usually exacerbated with longitudinal data as the number of time points grows.

Modified treatment policies (MTPs) are a class of stochastic treatment regimes that can be formulated to avoid the above problems (Díaz and Laan 2012; Haneuse and Rotnitzky 2013). In a recent article (Díaz et al. 2020), we generalized the theoretical framework for estimation of the effect of MTPs to the longitudinal setting, accounting for time-varying treatment, covariates, and right-censoring of the outcome. Briefly, MTPs are hypothetical interventions where the post-intervention value of treatment can depend on the actual observed treatment level and the unit’s history. As such, MTPs are useful to assess the effect of continuous treatments. For example, (Haneuse and Rotnitzky 2013) assess the effect of reducing surgery time by a predetermined amount (e.g., 5 minutes) for lung cancer patients, where the reduction is carried out only for those patients for whom the intervention is feasible. Furthermore, MTPs generalize many important effect estimands, such as the effect of a dynamic treatment rule in which the treatment level is assigned as a function of a unit’s history. For example, dynamic treatment rules, a particular case of MTPs, may be used to estimate the effect of policies such as switching HIV treatment once the CD4 T-cell count goes below a predetermined threshold (Petersen et al. 2014). MTPs also generalize many interesting causal effects such as the average treatment effects, the causal risk ratio, and causal odds ratio. In this article we describe how lmtp can be used for estimating the causal effects of MTPs, and present examples on the use of the software for several of the above cases.

The package lmtp implements four methods for estimating the effects of MTPs. Two of these estimators, a targeted minimum-loss based estimator (M. J. van der Laan and Rose 2011; M. J. van der Laan and Rubin 2006) and a sequentially doubly-robust (Buckley and James 1979; Fan and Gijbels 1994; M. van der Laan and Dudoit 2003; Rotnitzky, Faraggi, and Schisterman 2006; Rubin and Laan 2006; Kennedy et al. 2017), are multiply-robust. TMLE and SDR are implemented using cross-fitting to allow for the use of flexible machine learning regression methodology (Díaz et al. 2020).


In this article, we will use the notation of (Díaz et al. 2020) with only slight modifications. Let \(i\) be the index of an observation from a data set with \(n\) total units and \(t\) be the index of time for a total number of time points \(\tau\). The observed data for observation \(i\) may be denoted as

\[ Z_i = (W, L_1, A_1, L_2, A_2, …, L_{\tau}, A_{\tau}, Y_{\tau + 1}) \]

where \(W\) denotes baseline covariates, \(L_t\) denotes time-varying covariates, \(A_t\) denotes a vector of exposure and/or censoring variables and \(Y\) denotes an outcome measured at the end of study follow-up. We observe \(n\) i.i.d. copies of \(Z\) with distribution \(P\) . We use \(A_t = a_t\) to denote a realization of a random variable. If right-censoring exists, \(A_t\) can be adapted so that \(A_t = (A_{1, t}, A_{2, t})\) where \(A_{1, t}\) equals one if an observation is still in the study at time \(t\) and zero otherwise, and \(A_{2, t}\) denotes the exposure at time \(t\). We use an overbar to indicate the history of a variable up until time \(t\). We then use \(H_t = (\bar{L}_t, \bar{A}_{t-1})\) to denote the history of all variables up until just before \(A_t\).

Modified Treatment Policies

We use the potential outcomes framework to define the causal effect of interest using our established data structure. We consider a hypothetical policy where \(\bar{A}\) is set to a regime \(d\) defined as \(A^{d}_t = d_t(A_t, H^{d}_t)\), where \(H^{d}_t = (\bar{L}_t, \bar{A}^{d}_{t-1})\), for a set of user-given regimes \(d_t:t \in \{1, ..., \tau\}\). The defining characteristic that makes regime \(d_t\) a modified treatment policy is that it depends on the natural value of treatment \(\bar{A}_t\), that is, the value that the treatment would have taken under no intervention. However, when the function \(d_t\) only depends on \(H_t\), the LMTP reduces to the dynamic treatment regimes studied in the literature. Furthermore, when \(d_t\) is a constant that and does not depend on either \(A_t\) or \(H_t\), then LMTPs reduce to the conventional static rules studied in the causal inference literature (Bang and Robins 2005; Mark J. van der Laan and Gruber 2011). Below we present examples of all these interventions.

First, consider a study of the effect of physical activity on mortality in the elderly. Assume that each patient is monitored at several time points, and that a measure of physical activity such as the metabolic equivalent of task (MET) (Mendes et al. 2018) is measured together with a number of lifestyle, health status, and demographic variables. In this setup, a natural question to ask would be "what is the effect on mortality of an intervention that increases physical activity by \(\delta\) units for patients whose socioeconomic and health status allows it?’’ Formally, consider a longitudinal study with loss-to-follow-up. Let \(A_t = (A_{1, t}, A_{2, t})\) where\(A_{1, t}\) equals one if an observation is still in the study at time \(t\) and zero otherwise, and \(A_{2, t}\) denote a continuous exposure at time \(t\) that can be changed through some intervention. A modified treatment policy that increases \(A_{2,t}\), whenever it is feasible to do so, can be defined as

\[ \begin{cases} (1, a_{2,t} + \delta_t) & \text{if } a_{2,t} + \delta_t \leq u_t(h_t) \\ (1, a_{2,t}) & \text{if } a_{2,t} + \delta_t > u_t(h_t)\end{cases} \]

where \(u_t(h_t)\) defines the maximum level of physical activity allowed for a patient with characteristics \(h_t\). Note that we also consider an intervention on \(A_{1,t}\) because we are interested in a hypothetical world where there is no loss-to-follow-up. In this case the hypothetical exposure after intervention, \(A^{d}_t\) depends on the actually observed exposure, \(A_t\). This is in contrast to a deterministic intervention where \(A^{d}_t\) would be set to some pre-specified value with probability one.

For dynamic treatment rules, consider a hypothetical longitudinal study where two different antiviral treatments are administered to HIV positive patients. Sometimes an antiviral drug works at first, until the virus develops resistance, at which point it is necessary to change the treatment regime. Assume we are interested in assessing a policy with two treatments encoded as \(A_t\in \{0,1\}\), and we want to assess the effect of a regime that would switch the antiviral treatment as soon as the CD4 T cell count drops bellow 300. Let \(A_t = (A_{1, t}, A_{2, t})\) where \(A_{1, t}\) equals one if an observation is still in the study at time \(t\) and zero otherwise, and \(A_{2, t}\) denotes the treatment arm at time \(t\). Let \(L_t\) denote the CD4 T cell count at time \(t\). In this case, one may decide to assess the effect of the rule

\[ d_t(h_t)= \begin{cases} (1, 1 - a_{2,t-1}) & \text{if } l_t < 300 \\ (1, a_{2,t-1}) & \text{if } l_t \geq 300. \end{cases} \]

In contrast to the previous rule, the dynamic treatment rule does not depend on the natural value of treatment at time \(t\), it only depends on the history. This induces certain technicalities in the estimation procedure for true MTPs that depend on the natural value of treatment (Díaz et al. 2020). However, the software and methods presented here handle both cases seamlessly.

In the case of a single time point setting where the data structure is \(Z=(W,A,Y)\), it follows trivially from the above definitions that the average treatment effect from a cross-sectional study, defined as \(\text{E}[Y(1) - Y(0)]\), can be estimated using MTPs by simply letting \(\tau = 1\) and contrasting two MTPs \(d(A)=1\) and \(d(A)=0\). The lmtp package presented in this article allows the contrast of different MTPs using differences, ratios, and odds ratios.

In what follows we focus on estimating the the causal effect of MTP \(d\) on outcome \(Y\), using lmtp, through the causal parameter

\[ \theta = \text{E}\{Y(\bar A^d)\}\text{,} \]

where \(Y(\bar A^d)\) is the counterfactual outcome in a world, where possibly contrary to fact, each entry of \(\bar{A}\) was modified according to the MTP \(d\). When \(Y\) is continuous, \(\theta\) is the mean population value of \(Y\) under MTP \(d\); when \(Y\) is dichotomous, \(\theta\) is the population proportion of event \(Y\)under MTP \(d\). Similarly, when \(Y\) is the indicator of an event by end of the study, \(\theta\) is defined as the cumulative incidence of \(Y\) under MTP \(d\).

Estimation methods

The lmtp package implements four estimation methods: a targeted minimum-loss based estimator (TMLE), a sequential doubly-robust estimator (SDR), an estimator based on the parametric G-formula, and an inverse probability weighted (IPW) estimator. We will only describe the use of TMLE, lmtp_tmle, and SDR, lmtp_sdr, as their use is strongly suggested over the others based on their advantageous theoretical properties which allow for machine learning regression while maintaining the ability to compute valid confidence intervals and p-values.

Targeted minimum-loss based estimation is a general framework for constructing asymptotically linear estimators leveraging machine learning, with an optimal bias-variance trade-off for the target causal parameter (van der Laan and Rose 2011, 2018). In general, TMLE is constructed from a factorization of observed data likelihood into an outcome regression and an intervention mechanism. Using the outcome regression, an initial estimate of the target parameter is constructed and then de-biased by a fluctuation that depends on a function of the intervention mechanism. The sequential doubly-robust estimator is based on a unbiased transformation of the efficient influence function of the target estimand.

TMLE and SDR require estimation of two nuisance parameters at each time point: an outcome mechanism and an intervention mechanism. Both TMLE and SDR are multiply-robust in that they allow certain configurations of nuisance parameters to be inconsistently estimated. Specifically, TMLE is considered \(\tau + 1\)-multiply robust in that it allows for inconsistent estimation of all the intervention mechanisms prior to any time point \(t\), as long as all outcome mechanisms after time \(t\) are consistently estimated. SDR is \(2^{\tau}\)-robust in that at each time point, estimation of at most either the intervention mechanism or outcome mechanism is allowed to be inconsistent. Both TMLE and SDR are efficient when all the treatment mechanism and outcome regression are consistently estimated at a given consistency rate, but the SDR has better protection against model misspecification (see Luedtke et al. 2017; Rotnitzky, Robins, and Babino 2017; Díaz et al. 2020 for more details).

It is important to note that the SDR estimator can produce an estimate \(\hat{\theta}\) outside of the bounds of the parameter space (e.g., probability estimates outside \([0,1]\)), while the TMLE guarantees that the estimate is within bounds of the parameter space. With this in mind and because for a single time-point TMLE and SDR are equally robust, we recommend use of TMLE for the case of a single time-point, while we recommend use of SDR for the longitudinal setting.

Using LMTP

#> Major changes in lmtp 0.9.0. Consult for more information.

Required data structure

Data is passed to lmtp estimators through the data argument. Data should be in wide format with one column per variable per time point under study (i.e., there should be one column for every variable in \(Z\)). These columns do not have to be in any specific order and the data set may contain variables that are not used in estimation. The names of treatment variables, censoring variables, baseline covariates, and time-varying covariates are specified using the trt, cens, baseline, and time_vary arguments respectively. The trt, cens, and baseline arguments accept character vectors and thetrt and censarguments should be ordered according to the time-ordering of the data generating mechanism. The time_varyargument accepts an unnamed list ordered according to the time-ordering of the model with each index containing the name of the time-varying covariates for the given time. The outcome variable is specified through the outcome argument.

Estimators are compatible with continuous, dichotomous and survival outcomes. In the case of a dichotomous or continuous outcome, only a single variable name should be passed to the outcome argument. For survival outcomes, a vector containing the names of the intermediate outcome and final outcome variables, ordered according to time, should be specified with the outcome argument. Dichotomous and survival outcomes should be coded using zero’s and one’s where one indicates the occurrence of an event and zero otherwise. If working with a survival outcome, once an observation experiences an outcome, all future outcome variables should also be coded with a one. The outcome_type argument should be set to "continuous" for continuous outcomes, "binomial" for dichotomous, and "survival" for survival outcomes.

If the study is subject to loss-to-follow-up, the cens argument must be provided. Censoring indicators should be coded using zero’s and one’s where one indicates an observation is observed at the next time and zero indicates loss-to-follow-up. Once an observation’s censoring status is switched to zero it cannot change back to one. Missing data before an observation is lost-to-follow-up is not allowed; a pre-processing step using multiple imputation is recommended for such variables.

The k argument controls a Markov assumption on the data generating mechanism. When k = Inf, the history \(H_t\) will be constructed using all previous time-point variables while setting k to any other value will restrict \(H_t\) to time-varying covariates from time \(t - k - 1\) until \(t-1\). Baseline confounders are always included in \(H_t\). The create_node_list() function may be used to inspect how variables will be used for estimation. It is specified with the same trt, baseline, time_vary, and k arguments as lmtp estimators and is used internally to create a "node list’’ that encodes which variables should be used at each time point of estimation. For example, consider a study with the observed data structure

\[ Z = (W_1, W_2, L_{1, 1}, L_{1, 2}, A_1, L_{2, 1}, L_{2, 2}, A_2, Y_3) \]We can translate this data structure to R with

baseline <- c("W_1", "W_2")
trt <- c("A_1", "A_2")
time_vary <- list(c("L_11", "L_12"), 
                  c("L_21", "L_22"))
create_node_list(trt = trt, baseline = baseline, time_vary = time_vary, tau = 2)
#> $trt
#> $trt[[1]]
#> [1] "W_1"  "W_2"  "L_11" "L_12" "A_1" 
#> $trt[[2]]
#> [1] "W_1"  "W_2"  "L_11" "L_12" "L_21" "L_22" "A_1"  "A_2" 
#> $outcome
#> $outcome[[1]]
#> [1] "W_1"  "W_2"  "L_11" "L_12" "A_1" 
#> $outcome[[2]]
#> [1] "W_1"  "W_2"  "L_11" "L_12" "A_1"  "L_21" "L_22" "A_2"

A list of lists is returned with the names of the variables in \(H_t\) to be used for estimation of the outcome regression and the treatment mechanism at every time \(t\).  Notice that variables \(A_1\) and \(A_2\) are included in the list of variables used for estimation of the treatment mechanism. This is due to the fact that the nuisance parameter for the treatment mechanism is the density ratio \(r_t\), which is a function of \(A_1\) and \(A_2\).

The density ratio is estimated based on a classification trick using an auxiliary variable \(\Lambda\) as a pseudo outcome and the treatment as a predictor. Specifically, the TMLE and SDR estimation methods require estimation of the ratio of the densities of \(A_t^d\) and \(A_t\), conditional on the history \(H_t\), defined as \(r_t\) above. This is achieved through computing the odds in a classification problem in an augmented dataset with \(2n\) observations where the outcome is the auxiliary variable \(\Lambda\) (defined below) and the predictors are the variables \(A_t\) and \(H_t\). In the \(2n\) augmented data set, the data structure at time \(t\) is redefined as

\[ (H_{\lambda, i, t}, A_{\lambda, i, t}, \Lambda_{\lambda, i} : \lambda = 0, 1; i = 1, ..., n) \]

where \(\Lambda_{\lambda, i} = \lambda_i\)indexes duplicate values. For all duplicated observations \(\lambda\in\{0,1\}\) with the same \(i\), \(H_{\lambda, i, t}\) is the same. For \(\lambda = 0\), \(A_{\lambda, i, t}\) equals the observed exposure values \(A_{i, t}\), whereas for \(\lambda=1\), \(A_{\lambda, i, t}\) equals the exposure values under the MTP \(d\), namely \(A^{d}_t\). The classification approach to density ratio estimation proceeds by estimating the conditional probability that \(\Delta=1\) in this dataset, and dividing it by the corresponding estimate of the conditional probability that \(\Delta=0\). Specifically, denoting \(P^\lambda\) the distribution of the data in the augmented dataset, we have:

\[ r_t(a_t, h_t) = \frac{p^\lambda(a_t, h_t \mid \Lambda =    1)}{p^\lambda(a_t, h_t \mid \Lambda =    0)}=\frac{P^\lambda(\Lambda = 1\mid A_t=a_t,    H_t=h_t)}{P^\lambda(\Lambda = 0\mid A_t=a_t, H_t=h_t)}. \]

Further details on this algorithm may be found in our technical paper (Díaz et al. 2020).

Creating treatment policies

Modified treatment policies and deterministic static/dynamic treatment rules are specified using the shift argument, which accepts a user-defined function that returns a vector of exposure values modified according to the policy of interest. Shift functions should take two arguments, the first for specifying a data set and the second for specifying the current exposure variable. For example, a possible MTP may decrease exposure by 1 unit if the natural exposure value was greater than two and do nothing otherwise. A shift function for this MTP would look like

mtp <- function(data, trt) {
  (data[[trt]] - 1) * (data[[trt]] - 1 >= 1) + data[[trt]] * (data[[trt]] - 1 < 1)

It is then passed to estimators using the shift argument

A <- c("A_1", "A_2", "A_3", "A_4")
L <- list(c("L_1"), c("L_2"), c("L_3"), c("L_4"))
lmtp_sdr(sim_t4, A, "Y", time_vary = L, k = 0, shift = mtp, folds = 5)
#> LMTP Estimator: SDR
#> Trt. Policy: (mtp)
#> Population intervention effect
#> Estimate: 0.2645
#> Std. error: 0.0179
#> 95% CI: (0.2294, 0.2996)

This framework is flexible and allows for specifying complex treatment regimes that can depend on time and covariates. In the case of a binary exposure, two shift functions are installed with the package: static_binary_on() which sets \(A_{i, t} = 1\), and static_binary_off() which sets \(A_{i, t} = 0\).

if (require("twang", quietly = TRUE, attach.required = FALSE)) {
  data("iptwExWide", package = "twang")
  A <- paste0("tx", 1:3)
  W <- c("gender", "age")
  L <- list(c("use0"), c("use1"), c("use2"))
  lmtp_tmle(iptwExWide, A, "outcome", W, L, 
            shift = static_binary_on, outcome_type = "continuous",
            folds = 2, .SL_folds = 2)
#> To reproduce results from prior versions of the twang package, please see the version="legacy" option described in the documentation.
#> LMTP Estimator: TMLE
#> Trt. Policy: (static_binary_on)
#> Population intervention effect
#> Estimate: -0.2844
#> Std. error: 0.0445
#> 95% CI: (-0.3716, -0.1972)

Dynamic treatment regimes are treatment rules where treatment is applied based on a fixed rule that depends on covariate history. lmtp is capable of estimating the effects of deterministic dynamic treatment rules as well as modified treatment policies that depend on covariate history.

A <- c("A_1", "A_2", "A_3", "A_4")
L <- list(c("L_1"), c("L_2"), c("L_3"), c("L_4"))

# creating a dynamic mtp that applies the shift function 
# but also depends on history and the current time
dynamic_mtp <- function(data, trt) {
  mtp <- function(data, trt) {
    (data[[trt]] - 1) * (data[[trt]] - 1 >= 1) + data[[trt]] * (data[[trt]] - 1 < 1)
  # if its the first time point, follow the same mtp as before
  if (trt == "A_1") return(mtp(data, trt))
  # otherwise check if the time varying covariate equals 1
    data[[sub("A", "L", trt)]] == 1, 
    mtp(data, trt), # if yes continue with the policy
    data[[trt]]     # otherwise do nothing

lmtp_tmle(sim_t4, A, "Y", time_vary = L, k = 0, 
          shift = dynamic_mtp, folds = 2, .SL_folds = 2)
#> LMTP Estimator: TMLE
#> Trt. Policy: (dynamic_mtp)
#> Population intervention effect
#> Estimate: 0.2359
#> Std. error: 0.011
#> 95% CI: (0.2143, 0.2575)


An attractive property of multiply-robust estimators is that they can incorporate flexible machine-learning algorithms for the estimation of nuisance parameters \(Q_t\) and \(r_t\) while remaining \(\sqrt{n}\)-consistent. The super learner algorithm is an ensemble learner than incorporates a set of candidate models through a weighted convex-combination based on cross-validation (Mark J. van der Laan, Polley, and Hubbard 2007). Asymptotically, this weighted combination of models will outperform any single one of its components.

lmtp uses the implementation of the super learner provided by the SuperLearner package (Polley et al. 2019). Analysts must specify a vector of SuperLearner prediction algorithms which are then included in lmtp_tmle() and lmtp_sdr() calls with the learners_trt and learners_outcome arguments. The outcome variable type should guide users on selecting the appropriate candidate learners for use with the learners_outcome argument. Regardless of whether an exposure is continuous, dichotomous, or categorical, the exposure mechanism is estimated using classification as discussed above, users should thus only include candidate learners capable of binary classification with the learners_trt argument. If learners_outcome and learners_trt aren’t specified, estimation will be conducted using a main-effects generalized linear model.

Candidate learners that rely on cross-validation for the tuning of hyper-parameters should support grouped data if used with learners_trt. Because estimation of the treatment mechanism relies on the augmented \(2n\) duplicated data set, duplicated observations must be put into the same fold during sample-splitting. This is done automatically by the package.

if (require("ranger", quietly = TRUE, attach.required = FALSE) && 
    require("twang", quietly = TRUE, attach.required = FALSE)) {
  data("iptwExWide", package = "twang")
  A <- paste0("tx", 1:3)
  W <- c("gender", "age")
  L <- list(c("use0"), c("use1"), c("use2"))
  lrnrs <- c("SL.glm", "SL.ranger", "SL.glm.interaction")
  lmtp_tmle(iptwExWide, A, "outcome", W, L, shift = static_binary_on, 
            outcome_type = "continuous", learners_trt = lrnrs, 
            learners_outcome = lrnrs, folds = 2, .SL_folds = 2)
#> LMTP Estimator: TMLE
#> Trt. Policy: (static_binary_on)
#> Population intervention effect
#> Estimate: -0.2889
#> Std. error: 0.0423
#> 95% CI: (-0.3718, -0.206)

Censored outcomes

In the case of missing outcomes, lmtp can estimate the effect of a hypothetical treatment regime where all observations remained uncensored at end of follow-up. To do this, the user must supply a vector containing the names of censoring indicators for each treatment time point to lmtp estimators through the cens argument. Censoring nodes should be defined such that at any time \(t\), if an observation is observed at time \(t + 1\) they receive a 1 and a 0 otherwise.

head(sim_cens[sim_cens$C1 == 0, ])
#>     L1       A1 C1 L2 A2 C2  Y
#> 48   1 3.260499  0 NA NA  0 NA
#> 58   1 1.109737  0 NA NA  0 NA
#> 67   1 1.376540  0 NA NA  0 NA
#> 112  1 3.832451  0 NA NA  0 NA
#> 114  1 1.241776  0 NA NA  0 NA
#> 115  1 2.390811  0 NA NA  0 NA
A <- c("A1", "A2")
L <- list(c("L1"), c("L2"))
C <- c("C1", "C2")

lmtp_tmle(sim_cens, A, "Y", time_vary = L, cens = C,
          shift = function(data, trt) data[[trt]] + 0.5, 
          folds = 2, .SL_folds = 2)
#> LMTP Estimator: TMLE
#> Trt. Policy: (function(data, trt) data[[trt]] + 0.5)
#> Population intervention effect
#> Estimate: 0.8979
#> Std. error: 0.0095
#> 95% CI: (0.8794, 0.9165)

Population mean outcome

In certain situations, the user may be interested in the population mean outcome under no intervention. In the presence of censoring, this can be estimated by setting shift = NULL and providing censoring indicators.

A <- c("A1", "A2")
L <- list(c("L1"), c("L2"))
C <- c("C1", "C2")

lmtp_sdr(sim_cens, A, "Y", time_vary = L, cens = C, 
         shift = NULL, folds = 2, .SL_folds = 2)
#> LMTP Estimator: SDR
#> Trt. Policy: (NULL)
#> Population intervention effect
#> Estimate: 0.7989
#> Std. error: 0.0129
#> 95% CI: (0.7735, 0.8242)

Survival analysis

For a time-to-event analysis, the outcome argument should be provided a vector containing the names of intermediate outcome variables as well as the final outcome variable; the outcome_type argument should be set to "survival". The intermediate outcome variables serve as indicators for when an observation experiences the event before the end of follow-up. If an observation does experience the event before the final outcome time, all future outcome variables (including the final outcome) variable should be set to 1. The function event_locf() (last observation carried forward, only for events) is provided to help with this imputation. Survival probability, NOT cumulative incidence, is estimated.

Time-to-event analyses are supported for both time-invariant…

A <- "trt"
Y <- paste0("Y.", 1:6)
C <- paste0("C.", 0:5)
W <- c("W1", "W2")

lmtp_tmle(sim_point_surv, A, Y, W, cens = C, shift = static_binary_on, 
          outcome_type = "survival", folds = 2, .SL_folds = 2)
#> LMTP Estimator: TMLE
#> Trt. Policy: (static_binary_on)
#> Population intervention effect
#> Estimate: 0.1858
#> Std. error: 0.0184
#> 95% CI: (0.1498, 0.2218)

…and time-varying exposures.

W <- "L0.c"
L <- list(c("L0.a", "L0.b"), c("L1.a", "L1.b"))
A <- c("A0", "A1")
C <- c("C0", "C1")
Y <- c("Y1", "Y2")

lmtp_sdr(sim_timevary_surv, A, Y, W, L, C, outcome_type = "survival", 
         shift = static_binary_on, folds = 2, .SL_folds = 2)
#> LMTP Estimator: SDR
#> Trt. Policy: (static_binary_on)
#> Population intervention effect
#> Estimate: 0.7968
#> Std. error: 0.0573
#> 95% CI: (0.6844, 0.9091)


The effects returned by lmtp estimators are population intervention effects, that is the expected mean outcome in the population under the hypothetical intervention. Often, however, we are also interested in the comparison of different interventions to each other or to no intervention at all. This is the role of lmtp_contrast().

A <- c("A1", "A2")
L <- list(c("L1"), c("L2"))
C <- c("C1", "C2")

fit_shift <- 
  lmtp_sdr(sim_cens, A, "Y", time_vary = L, cens = C, 
           shift = function(data, trt) data[[trt]] + 0.5, 
           folds = 2, .SL_folds = 2)

fit_noshift <- 
  lmtp_sdr(sim_cens, A, "Y", time_vary = L, cens = C,
           shift = NULL, folds = 2, .SL_folds = 2)
lmtp_contrast(fit_shift, ref = fit_noshift, type = "additive")
#> LMTP Contrast: additive
#> Null hypothesis: theta == 0
#>   theta shift   ref std.error conf.low conf.high p.value
#> 1   0.1 0.897 0.797   0.00588   0.0887     0.112  <0.001
lmtp_contrast(fit_shift, ref = fit_noshift, type = "rr")
#> LMTP Contrast: relative risk
#> Null hypothesis: theta == 1
#>   theta shift   ref std.error conf.low conf.high p.value
#> 1  1.13 0.897 0.797    0.0079     1.11      1.14  <0.001

Extra features

Tidy results

lmtp provides a tidy() method as described in broom (Robinson and Hayes 2020):

#> # A tibble: 1 × 5
#>   estimator estimate std.error conf.low conf.high
#>   <chr>        <dbl>     <dbl>    <dbl>     <dbl>
#> 1 SDR          0.897   0.00999    0.878     0.917
Parallel processing

Computation time can quickly increase in with many time-points, a large Super Learner library, and large datasets. To help, lmtp provides support for parallel processing using future (Bengtsson 2020a). The simplest way to use estimators in parallel is to run plan(multiprocess). We recommend consulting the future documentation for more information.

Progress bars

In the presence of long computation time, a lack of user feedback can become very frustrating. To address this, lmtp supports the use of progress bars during computation through progressr (Bengtsson 2020b).


Bang, Heejung, and James M Robins. 2005. “Doubly Robust Estimation in Missing Data and Causal Inference Models.” Biometrics 61 (4): 962–73.
Bengtsson, Henrik. 2020a. future: Unified Parallel and Distributed Processing in R for Everyone.
———. 2020b. progressr: A Inclusive, Unifying API for Progress Updates.
Buckley, Jonathan, and Ian James. 1979. “Linear Regression with Censored Data.” Biometrika 66 (3): 429–36.
Díaz, Iván, and Mark van der Laan. 2012. “Population Intervention Causal Effects Based on Stochastic Interventions.” Biometrics 68 (2): 541–49.
Díaz, Iván, Nicholas Williams, Katherine L. Hoffman, and Edward J. Schenck. 2020. “Non-Parametric Causal Effects Based on Longitudinal Modified Treatment Policies.” arXiv:2006.01366, June.
Fan, Jianqing, and Irene Gijbels. 1994. “Censored Regression: Local Linear Approximations and Their Applications.” Journal of the American Statistical Association 89 (426): 560–70.
Haneuse, S., and A. Rotnitzky. 2013. “Estimation of the Effect of Interventions That Modify the Received Treatment.” Statistics in Medicine 32 (30): 5260–77.
Kennedy, Edward H., Zongming Ma, Matthew D. McHugh, and Dylan S. Small. 2017. “Nonparametric Methods for Doubly Robust Estimation of Continuous Treatment Effects.” Journal of the Royal Statistical Society. Series B, Statistical Methodology 79 (4): 1229–45.
Laan, Mark J van der, and Susan Gruber. 2011. “Targeted Minimum Loss Based Estimation of an Intervention Specific Mean Outcome.”
Laan, Mark J. van der, Eric C. Polley, and Alan E. Hubbard. 2007. “Super Learner.” Statistical Applications in Genetics and Molecular Biology 6 (1).
Laan, Mark J. van der, and Sherri Rose. 2011. Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. New York: Springer-Verlag.
Laan, Mark J. van der, and Daniel Rubin. 2006. “Targeted Maximum Likelihood Learning.” The International Journal of Biostatistics 2 (1).
Laan, Mark van der, and Sandrine Dudoit. 2003. “Unified Cross-Validation Methodology For Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and Examples.” U.C. Berkeley Division of Biostatistics Working Paper Series, November.
Luedtke, Alexander R, Oleg Sofrygin, Mark J van der Laan, and Marco Carone. 2017. “Sequential Double Robustness in Right-Censored Longitudinal Models.” arXiv Preprint arXiv:1705.02459.
Mendes, Márcio de Almeida, Inácio da Silva, Virgilio Ramires, Felipe Reichert, Rafaela Martins, Rodrigo Ferreira, and Elaine Tomasi. 2018. “Metabolic Equivalent of Task (METs) Thresholds as an Indicator of Physical Activity Intensity.” PloS One 13 (7): e0200701.
Petersen, Maya L, Linh Tran, Elvin H Geng, Steven J Reynolds, Andrew Kambugu, Robin Wood, David R Bangsberg, Constantin T Yiannoutsos, Steven G Deeks, and Jeffrey N Martin. 2014. “Delayed Switch of Antiretroviral Therapy After Virologic Failure Associated with Elevated Mortality Among HIV-Infected Adults in Africa.” AIDS (London, England) 28 (14): 2097.
Polley, Eric, Erin LeDell, Chris Kennedy, and Mark van der Laan. 2019. SuperLearner: Super Learner Prediction.
Robinson, David, and Alex Hayes. 2020. broom: Convert Statistical Analysis Objects into Tidy Tibbles.
Rotnitzky, Andrea, David Faraggi, and Enrique Schisterman. 2006. “Doubly Robust Estimation of the Area Under the Receiver-Operating Characteristic Curve in the Presence of Verification Bias.” Journal of the American Statistical Association 101 (475): 1276–88.
Rotnitzky, Andrea, James Robins, and Lucia Babino. 2017. “On the Multiply Robust Estimation of the Mean of the g-Functional.” arXiv Preprint arXiv:1705.08582.
Rubin, Daniel, and Mark van der Laan. 2006. “Doubly Robust Censoring Unbiased Transformations.” U.C. Berkeley Division of Biostatistics Working Paper Series, June.
van der Laan, Mark J, and Sherri Rose. 2011. Targeted Learning: Causal Inference for Observational and Experimental Data. New York: Springer.
———. 2018. Targeted Learning in Data Science: Causal Inference for Complex Longitudinal Studies. New York: Springer.