# Statistical Inference and Power Analysis for Direct and Spillover Effects

This vignette addresses the usage of the functions involved in statistical inference and power analysis for the direct and spillover effects in two-stage randomized experiments motivated by the JD data set.

## Study Design

In 2007, the ministry in charge of employment in France launched a public employment integration service contract for young graduates seeking employment. A randomized experiment of this job placement assistance program was conducted and the methods in this package can be used to analyze the data. The following examples focus on two specific outcomes: fixed-term contract of six months or more (LTFC) and permanent contract (PC).

## Data

The data set is a subset of the original JD data set and includes the following variables:

anonale: local employment agency

tempsc_av: full-time work (at time of assignment)

assigned: 1 if the individual is assigned to treatment, 0 otherwise

pct0: share of the local population treated

cdi: binary variable for whether the individual works on a permanent contract, 8 months after the assignment

cdd6m: binary variable for whether the individual works in CDD (LTFC-time contract) for more than 6 months, 8 months after the assignment

emploidur: binary variable for whether the individual works on a permanent or LTFC-term contract for more than 6 months, 8 months after the assignment

tempsc: binary variable for whether the individual works full time, 8 months after the assignment

salaire: individual’s salary in Euros.

## Overview

The relevant functions for this analysis are the following:

1. ZSRE: returns a list of Z the vector of the desired binary treatment assignment variable

2. YSRE: returns a list of Y the vector of the outcomes for a desired variable of interest.

3. CalAPO: returns a list of point estimates and variances for the average potential outcomes, unit level direct effect, marginal direct effect, and unit level spillover effect.

4. Test2SRE: returns the rejection region for the desired test. This function takes in the data, the effect type (i.e. direct effect, marginal direct effect, or spillover effect) and outputs the rejection region at the desired significance level.

5. calpara: returns a list of the estimated within-cluster variance, between cluster variance, intra-class correlation coefficient, and average of the assignment vector which are necessary for the Calsamplesize

6. Calsamplesize: returns a list of the necessary total number of clusters in order to achieve a given power level at a given significance level for the three types of effects.

## Functions

First, import the RCT2 library and load the relevant data set.

library(RCT2)
data(jd)

### CalAPO

In order to calculate a list of point estimates and variances for an effect of interest, run the CalAPO command. It is necessary first to create the vector of treatment assignments, A, which will depend on the study design. In this experiment, there are three treatment assignment mechanisms with treated probabilities 25%, 50%, and 75% respectively.

Then, run the CalAPO command, which takes in the vector of treatment assignments, the assignment mechanism vector, and the vector of outcomes for the variable of interest which is Y.LTFC in this case. We see that the estimated average potential outcome for long-term fixed contracts is given by Y.hat. As stated in the paper, we also have the results for the estimated direct effects under the three treatment mechanisms (ADE.est), the estimated marginal direct effect (MDE.est), and the estimated spillover effects (ASE.est). We also have the estimated covariance matrices for the average potential outcomes, the estimated direct effect, estimated marginal effect, and estimated spillover effects.

data_LTFC <- data.frame(jd$assigned, jd$pct0, jd$cdd6m, jd$anonale)
colnames(data_LTFC) <- c("Z", "A", "Y", "id")
test <- CalAPO(data_LTFC)
print(CalAPO(data_LTFC))
## []
##                          Potential Outcome Estimates
## treated group 1 estimate                   0.2109006
## control group 1 estimate                   0.1953872
## treated group 2 estimate                   0.2071030
## control group 2 estimate                   0.2027447
## treated group 3 estimate                   0.2018187
## control group 3 estimate                   0.2243082
##
## $Y.covariance ## [,1] [,2] [,3] [,4] [,5] ## [1,] 9.352489e-05 -1.196691e-05 0.000000e+00 0.000000e+00 0.000000e+00 ## [2,] -1.196691e-05 1.034387e-04 0.000000e+00 0.000000e+00 0.000000e+00 ## [3,] 0.000000e+00 0.000000e+00 1.147296e-04 2.025355e-05 0.000000e+00 ## [4,] 0.000000e+00 0.000000e+00 2.025355e-05 7.940618e-05 0.000000e+00 ## [5,] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 9.680927e-05 ## [6,] 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 -3.197198e-05 ## [,6] ## [1,] 0.000000e+00 ## [2,] 0.000000e+00 ## [3,] 0.000000e+00 ## [4,] 0.000000e+00 ## [5,] -3.197198e-05 ## [6,] 2.276049e-04 ## ## [] ## Average Direct Effect ## assignment group 1 0.015513434 ## assignment group 2 0.004358247 ## assignment group 3 -0.022489545 ## ##$ADE.covariance
##              [,1]         [,2]         [,3]
## [1,] 0.0002208974 0.0000000000 0.0000000000
## [2,] 0.0000000000 0.0001536287 0.0000000000
## [3,] 0.0000000000 0.0000000000 0.0003883582
##
## []
##                                       Average Spillover Effect
## treatment group under assignments 1 2              0.003797605
## treatment group under assignments 2 3              0.005284307
## control group under assignments 1 2               -0.007357582
## control group under assignments 2 3               -0.021563484
##
## $ASE.covariance ## [,1] [,2] [,3] [,4] ## [1,] 2.082545e-04 -1.147296e-04 8.286640e-06 -2.025355e-05 ## [2,] -1.147296e-04 2.115389e-04 -2.025355e-05 -1.171843e-05 ## [3,] 8.286640e-06 -2.025355e-05 1.828448e-04 -7.940618e-05 ## [4,] -2.025355e-05 -1.171843e-05 -7.940618e-05 3.070111e-04 ## ## [] ## Marginal Direct Effect ## 1 -0.0008726215 ## ##$MDE.covariance
##              [,1]
## [1,] 8.476492e-05

Similarly, we can run this on the permanent contracts.

data_perm <- data.frame(jd$assigned, jd$pct0, jd$cdi, jd$anonale)
colnames(data_perm) <- c("Z", "A", "Y", "id")
CalAPO(data_perm)

### Test2SRE

We can also perform hypothesis tests on this data by using the Test2SRE function. THE Test2SRE function takes in Z, A, Y, as before, and also takes in an extra argument effect, where the desired effect should be specified (either ADE for direct effect, MDE for marginal direct effect, or ASE for spillover effect). The function returns TRUE if the hypothesis should be rejected, and FALSE otherwise. The default significance level is set to 0.05, but may be changed by altering the alpha argument.

Test2SRE(data_LTFC, effect="MDE", alpha=0.05)
##  FALSE

### Calpara and Calsamplesize

Lastly, we can perform sample size calculations for the sample size needed for a given power at a given significance level. First, we call the calpara function to calculate the necessary parameters for the sample size calculation, including the within-class and between class variances and the intra-class correlation coefficient. The effect size and the assignment mechanism also need to be specified based on the study design. In this case, mu is the effect size and qa is the vector of probabilities of being assigned to one of the three assignment mechanisms.

Then, call the calpara command to calculate the within-class and between class variances, and the intra-class correlation coefficient.

# calculate variances for permanent contract
var.perm <- calpara(data_perm)

# calculate variances for long term fixed contract
var.LTFC <- calpara(data_LTFC)

The elements of the output of calpara can be accessed as below. For example, to retrieve the total variance of the potential outcomes for the permanent contracts and long-term fixed contracts, the following code can be run:

sigma.perm <- var.perm$sigma.tot sigma.LTFC <- var.LTFC$sigma.tot
print(sigma.perm)
##  0.1951648

Then, we specify the effect size and use the Calsamplesize function to calculate the appropriate sample sizes for the permanent contract and the LTFC. The default alpha(significance level) and beta (power) are set at 0.05 and 0.2 respectively.

### effect size and assignment mechanism
mu <- 0.03
qa <- rep(1/3,3)

# calculate sample size for the permanent contract
print("Permanent Contract:")
##  "Permanent Contract:"
print(Calsamplesize(data_LTFC, 0.03, qa, 0.05, 0.2))
##                          [,1]     [,2]     [,3]
## Assignment Mechanism   1.0000  2.00000   3.0000
## Number of Clusters   428.4264 96.59406 511.5405
# calculate sample size for the long term fixed contract
print("Long Term Fixed Contract:")
##  "Long Term Fixed Contract:"
print(Calsamplesize(data_perm, 0.03, qa, alpha=0.05, beta=0.2))
##                          [,1]     [,2]     [,3]
## Assignment Mechanism   1.0000   2.0000   3.0000
## Number of Clusters   515.6595 116.4777 614.2199

From the results, we can see the necessary total number of clusters for each assignment mechanism with size n.avg needed to detect a specific alternative at a certain power and significance level.