`ddml`

is an implementation of double/debiased machine learning estimators as proposed by Chernozhukov et al. (2018). The key feature of `ddml`

is the straightforward estimation of nuisance parameters using (short-)stacking (Wolpert, 1992), which allows for multiple machine learners to increase robustness to the underlying data generating process. See also Ahrens et al. (2024) for a detailed illustration of the practical benefits of combining DDML with (short-)stacking.

`ddml`

is the sister R package to our Stata package, mirroring its key features while also leveraging R to simplify estimation with user-provided machine learners and/or sparse matrices. See also Ahrens et al. (2023) with additional discussion of the supported causal models and benefits of (short)-stacking.

Install the latest development version from GitHub (requires devtools package):

```
if (!require("devtools")) {
install.packages("devtools")
}
devtools::install_github("thomaswiemann/ddml", dependencies = TRUE)
```

Install the latest public release from CRAN:

To illustrate `ddml`

on a simple example, consider the included random subsample of 5,000 observations from the data of Angrist & Evans (1998). The data contains information on the labor supply of mothers, their children, as well as demographic data. See `?AE98`

for details.

```
# Load ddml and set seed
library(ddml)
set.seed(75523)
# Construct variables from the included Angrist & Evans (1998) data
y = AE98[, "worked"]
D = AE98[, "morekids"]
Z = AE98[, "samesex"]
X = AE98[, c("age","agefst","black","hisp","othrace","educ")]
```

`ddml_late`

estimates the local average treatment effect (LATE) using double/debiased machine learning (see `?ddml_late`

). Since the statistical properties of machine learners depend heavily on the underlying (unknown!) structure of the data, adaptive combination of multiple machine learners can increase robustness. In the below snippet, `ddml_late`

estimates the LATE with short-stacking based on three base learners:

- linear regression (see
`?ols`

) - lasso (see
`?mdl_glmnet`

) - gradient boosting (see
`?mdl_xgboost`

)

```
# Estimate the local average treatment effect using short-stacking with base
# learners ols, rlasso, and xgboost.
late_fit_short <- ddml_late(y, D, Z, X,
learners = list(list(fun = ols),
list(fun = mdl_glmnet),
list(fun = mdl_xgboost,
args = list(nrounds = 100,
max_depth = 1))),
ensemble_type = 'nnls1',
shortstack = TRUE,
sample_folds = 10,
silent = TRUE)
summary(late_fit_short)
#> LATE estimation results:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> nnls1 -0.2105019 0.195529 -1.076576 0.2816698
```

`ddml`

Check out our articles to learn more:

`vignette("ddml")`

is a more detailed introduction to`ddml`

`vignette("stacking")`

discusses computational benefits of short-stacking`vignette("new_ml_wrapper")`

shows how to write user-provided base learners`vignette("sparse")`

illustrates support of sparse matrices (see`?Matrix`

)`vignette("did")`

discusses integration with the diff-in-diff package`did`

For additional applied examples, see our case studies:

`vignette("example_401k")`

revisits the effect of 401k participation on retirement savings`vignette("example_BLP95")`

considers flexible demand estimation with endogenous prices

`ddml`

is built to easily (and quickly) estimate common causal parameters with multiple machine learners. With its support for short-stacking, sparse matrices, and easy-to-learn syntax, we hope `ddml`

is a useful complement to `DoubleML`

, the expansive R and Python package. `DoubleML`

supports many advanced features such as multiway clustering and stacking.

Ahrens A, Hansen C B, Schaffer M E, Wiemann T (2023). “ddml: Double/debiased machine learning in Stata.” https://arxiv.org/abs/2301.09397

Ahrens A, Hansen C B, Schaffer M E, Wiemann T (2024). “Model averaging and double machine learning.” https://arxiv.org/abs/2401.01645

Angrist J, Evans W, (1998). “Children and Their Parents’ Labor Supply: Evidence from Exogenous Variation in Family Size.” American Economic Review, 88(3), 450-477.

Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C B, Newey W, Robins J (2018). “Double/debiased machine learning for treatment and structural parameters.” The Econometrics Journal, 21(1), C1-C68.

Wolpert D H (1992). “Stacked generalization.” Neural Networks, 5(2), 241-259.