Occasionally, I get questions about interpretation of the the `intEff()`

function and more generally about interactions in binary dependent variable models (e.g., logits and probits). I thought it might be useful to make a short vignette to discuss the various issues as there has been some recent research discussing providing useful guidance.

Let’s consider three different scenarios.

- Both variables in the interaction are dummy variables.
- One variable in the interaction is continuous and one is binary.
- Both variables in the interaction are continuous.

Let’s make some data first and then estimate the model.

```
set.seed(1234)
tibble(
df1 <-x1 = as.factor(rbinom(1000, 1, .5)),
x2 = as.factor(rbinom(1000, 1, .4)),
z = rnorm(1000),
ystar = 0 + as.numeric(x1 == "1") - as.numeric(x2 == "1") +
2*as.numeric(x1=="1")*as.numeric(x2=="1") + z,
p = plogis(ystar),
y = rbinom(1000, 1, p)
)
glm(y ~ x1*x2 + z, data=df1, family=binomial) mod1 <-
```

The Norton, Wang and Ai discussion suggests taking the discrete double-difference of the model above with respect to `x1`

and `x2`

. This is just the probability where `x1`

and `x2`

are equal to 1, minus the probability where `x1`

is 1 and `x2`

is 0 minus the probability where `x2`

is 1 and `x1`

is 0 plus the probability where `x1`

and `x2`

are both 0. We could calculate this by hand

```
## make the model matrix for all conditions
X10 <- X01 <- X00 <- model.matrix(mod1)
X11 <-## set the conditions for each of the four different
## scenarios above
## x1 = 1, x2=1
"x11"] <- X11[,"x21"] <- X11[,"x11:x21"] <- 1
X11[,
## x1=1, x2=0
"x11"] <- 1
X10[,"x21"] <- X10[,"x11:x21"] <- 0
X10[,
## x1=0, x2=1
"x21"] <- 1
X01[,"x11"] <- X01[,"x11:x21"] <- 0
X01[,
## x1=0, x2=0
"x11"] <- X00[,"x21"] <- X00[,"x11:x21"] <- 0
X00[,
## calculate the probabilities
plogis(X11 %*% coef(mod1))
p11 <- plogis(X10 %*% coef(mod1))
p10 <- plogis(X01 %*% coef(mod1))
p01 <- plogis(X00 %*% coef(mod1))
p00 <-
p11 - p10 - p01 + p00 eff1 <-
```

This is just what the `intEff()`

function does.

` intEff(mod1, c("x1", "x2"), df1) i1 <-`

The `byob$int`

element of the `i1`

object above gives the interaction effect, particularly the first column. We can just plot that relative to the effect calculated above to see that they’re the same.

```
tibble(e1 = eff1, i1 = i1$byobs$int$int_eff) %>%
ggplot(mapping= aes(x=e1, y=i1)) +
geom_point(pch=1) +
theme_bw() +
labs(x="Calculated by Hand", y="intEff Function Output")
```

So, the `byobs`

list has two elements - the `int`

element holds the interaction effects for each individual observation. The `X`

element holds the original data. These data were used to calculate the interaction effect, except that the variables involved in the interaction effect were changed as we did above. Here, you could plot a histogram of the effects:

```
$byobs$int %>%
i1 ggplot(mapping=aes(x=int_eff)) +
geom_histogram() +
theme_bw() +
labs(x="Interaction Effect")
```

In this case, all of the effects are significant, but you could also break these out by significant and not significant effects:

```
$byobs$int %>%
i1 mutate(sig = ifelse(abs(i1$byobs$int$zstat) > 1.96, 1, 0),
sig = factor(sig, levels=c(0,1), labels=c("No", "Yes"))) %>%
ggplot(mapping=aes(x=int_eff)) +
geom_histogram() +
theme_bw() +
facet_wrap(~sig) +
labs(x="Interaction Effect")
```

I also wrote another function that does this more generally called `secondDiff()`

. This function calculates second differences at user-defined values. The summary function summarises all of the individual second differences like those created above.

```
secondDiff(mod1, c("x1", "x2"), df1)
sd1 <-summary(sd1)
```

```
## Second Difference Using the Average Marginal Effect Approach
##
## Overall:
## Average Second Difference: -0.318, 95% CI: (-0.424,-0.214)
##
## Individual:
## Significant Negative Individual Second Differences: 1000
## Significant Positive Individual Second Differences: 0
## Inignificant Individual Second Differences: 0
```

These results all tell us about the change in probability when `x2`

changes from 0 to 1 under two conditions one when `x1`

is 1 and one when it is 0. For example, the first row of the `byobs$int`

element of the output from `intEff()`

:

`$byobs$int[1,] i1`

```
## int_eff linear phat se_int_eff zstat
## 1 0.3584801 0.2141966 0.1384038 0.06166805 5.813061
```

suggests that for the first observation, as `x2`

changes from 0 to 1, the first difference is .35 higher when `x1`

is 1 than when `x1`

is 0. The `atmean`

element of `i1`

shows what this difference is at the average of all of the covariates:

`$atmean i1`

```
## [[1]]
## int_eff linear phat se_int_eff zstat
## 1 0.3446747 0.4126915 0.6422852 0.05928781 5.813584
##
## $X
## (Intercept) x11 x21 z x11:x21
## [1,] 1 0.518 0.381 0.01450824 0.21
```

With one binary and one continuous variable, the Norton, Wang and Ai model would have us calculate the slope of the line tangent to the logit curve for the continuous variable at both of the values of the categorical variable.

First, let’s make the data and run the model:

```
set.seed(1234)
tibble(
df2 <-x2 = as.factor(rbinom(1000, 1, .5)),
x1 = runif(1000, -2,2),
z = rnorm(1000),
ystar = 0 + as.numeric(x2 == "1") - x1 +
.75*as.numeric(x2=="1")*x1 + z,
p = plogis(ystar),
y = rbinom(1000, 1, p)
)
glm(y ~ x1*x2 + z, data=df2, family=binomial) mod2 <-
```

Norton, Wang and Ai show that the interaction effect is the difference in the first derivatives of the probability with respect to `x1`

when `x2`

changes from 0 to 1. In the following model:

\[\begin{aligned} log\left(\frac{p_i}{1-p_{i}}\right) &= u_i\\ log\left(\frac{p_i}{1-p_{i}}\right) &= b_0 + b_1x_{1i} + b_2x_{2i} + b_3x_{1i}x_{2i} + \mathbf{Z\theta}, \end{aligned}\]

The first derivative of the probability with respect to `x1`

when `x2 is equal to 1 is:

\[\frac{\partial F(u_i)}{\partial x_{1i}} = (b_1 + b_3)f(u_i)\]

where \(F(u_i)\) is the predicted probability for observation \(i\) (i.e., the CDF of the logistic distribution evaluated at \(u_i\)) and \(f(u_{i})\) is the PDF of the logistic distribution distribution evaluated at \(u_i\). We could also calculate this for the condition when `x2`

= 0:

\[\frac{\partial F(u_i)}{\partial x_{1i}} = b_1f(u_i)\]

In both cases, this assumes that the values of \(\mathbf{x}_{i}\) are consistent with the condition. For example in the first partial derivative above, \(x_{2i}\) would have to equal 1 and in the second partial derivative, \(x_{2i}\) would have to equal zero. We could do this by hand just to see how it works:

```
X1 <- model.matrix(mod2)
X0 <-## set the conditions for each of the four different
## scenarios above
## x1 = 1, x2=1
"x21"] <- 1
X1[,"x1:x21"] <- X1[,"x1"]
X1[,
## x1=1, x2=0
"x21"] <- 0
X0[,"x1:x21"] <- 0
X0[,
coef(mod2)
b <-
## print the coefficients to show that the two coefficients
## we want are the second and fifth ones.
b
```

```
## (Intercept) x1 x21 z x1:x21
## 0.3414514 -0.8653259 0.6186529 0.8542471 0.6109292
```

```
## calculate the first effect
(b[2] + b[5])*dlogis(X1 %*% b)
e1 <-
## calculate the second effect
(b[2] )*dlogis(X0 %*% b)
e2 <-
## calculate the probabilities
e1 - e2 eff2 <-
```

Just like before, we can also estimate the same effect with `intEff()`

and show that the two are the same.

```
intEff(mod2, c("x1", "x2"), df2)
i2 <-ggplot(mapping=aes(y = i2$byobs$int[,1], x=eff2)) +
geom_point() +
theme_bw() +
labs(x="Calculated by hand", y= "intEff Output")
```

Looking at the first line of the output of `i2$byobs$int`

,

`$byobs$int[1,] i2`

```
## int_eff linear phat se_int_eff zstat
## 1 0.04013175 0.07137251 0.1350701 0.0233773 1.716697
```

We see that the slope of the line tangent to the logit curve when `x1`

takes on the value of the first observation (1.350536) is 0.04 higher when `x2`

= 1 than when `x2`

= 0. We could visualize this as in the figure below. The solid lines are the logit curves and the dotted lines are the lines tangent to the curves at \(x_1 = 1.350536\). The slope of the blue dotted line is -0.060961 and the slope of the orange dotted line is -0.1010927. You can see that the difference 0.0401317 is the first entry in the `int_eff`

column displayed above.

```
X1[rep(1, 51), ]
tmpX1 <- X0[rep(1, 51), ]
tmpX0 <-"x1"] <- tmpX1[,"x1:x21"] <- c(seq(-2, 2, length=50), 1.350536)
tmpX1[,"x1"] <- c(seq(-2, 2, length=50), 1.350536)
tmpX0[,"x1:x21"] <- 0
tmpX0[,
plogis(tmpX1 %*% b)
phat1 <- plogis(tmpX0 %*% b)
phat0 <- tibble(phat = c(phat0[1:50], phat1[1:50]),
plot.df <-x = rep(seq(-2,2,length=50), 2),
x2 = factor(rep(c(0,1), each=50), levels=c(0,1), labels=c("x2 = 0", "x2 = 1")))
phat1[51] - e1[1]*tmpX1[51, "x1"]
yint1 <- phat0[51] - e2[1]*tmpX0[51, "x1"]
yint0 <-
%>%
plot.df ggplot(aes(x=x, y=phat, colour = x2)) +
geom_line() +
scale_colour_manual(values=c("#0072B2", "#D55E00")) +
geom_abline(slope=e1[1], intercept=yint1, colour="#D55E00", lty=3) +
geom_abline(slope=e2[1], intercept=yint0, colour="#0072B2", lty=3) +
theme_bw() +
labs(x="x1", y="Predicted Probability of y=1", colour="x2")
```

From the `zstat`

entry, we see that the effect is not significant. We can plot the effects by significance.

```
$byobs$int %>%
i2 mutate(sig = ifelse(abs(zstat) > 1.96, 1, 0),
sig = factor(sig, levels=c(0,1), labels=c("No", "Yes"))) %>%
ggplot(mapping=aes(x=int_eff)) +
geom_histogram() +
theme_bw() +
facet_wrap(~sig) +
labs(x="Interaction Effect") +
theme(aspect.ratio=1)
```

Another option here is to do a second difference. Instead of looking at the difference in the slope of the line tangent to the curve for `x1`

, it looks at how the effect of a discrete change in `x1`

differs across two values of `x2`

. Using the minimum and maximum as the two values to make the change for `x1`

is the default. But let’s say that we wanted to see how changing `x1`

from -2 to -1 change the predicted probability of success for `x2=0`

and `x2=1`

. The result here is the first

\[\begin{aligned} & \left\{Pr(y=1|x_1=\text{high}, x_2=\text{high}) - Pr(y=1|x_1=\text{low}, x_2=\text{high})\right\} - \\ & \left\{Pr(y=1|x_1=\text{high}, x_2=\text{low}) - Pr(y=1|x_1=\text{low}, x_2=\text{low})\right\} \end{aligned}\]

```
secondDiff(mod2, c("x1", "x2"), df2, vals=list(x1=c(-2,-1), x2=c("0", "1")))
s2 <-summary(s2)
```

```
## Second Difference Using the Average Marginal Effect Approach
##
## Overall:
## Average Second Difference: 0.079, 95% CI: (0.053,0.110)
##
## Individual:
## Significant Negative Individual Second Differences: 0
## Significant Positive Individual Second Differences: 1000
## Inignificant Individual Second Differences: 0
```

The summary shows that the change in probabilities is bigger when x2 is low than when x2 is high. This corroborates what we saw in the plot above.

With two continuous variables using this model:

\[\begin{aligned} log\left(\frac{p_i}{1-p_{i}}\right) &= u_i\\ log\left(\frac{p_i}{1-p_{i}}\right) &= b_0 + b_1x_{1i} + b_2x_{2i} + b_3x_{1i}x_{2i} + \mathbf{Z\theta}, \end{aligned}\]

Norton, Wang and Ai show that the cross-derivative, rate of change in the the probabilities as a function of `x1`

changes as the rate of change in `x2`

changes.

\[\frac{\partial^2 F(u_{i})}{\partial x_{1i} \partial x_{2i}} = b_3f(u_i) + (b1 + b_3x_{2i})(b_2+b_3x_{1i})f'(u_i)\] where \(f'(u_i) = f(u_i)\times (1-2F(u_{i}))\). We could calculate this “by hand” as:

```
set.seed(1234)
tibble(
df3 <-x2 = runif(1000, -2,2),
x1 = runif(1000, -2,2),
z = rnorm(1000),
ystar = 0 + as.numeric(x2 == "1") - x1 +
.75*as.numeric(x2=="1")*x1 + z,
p = plogis(ystar),
y = rbinom(1000, 1, p)
)
glm(y ~ x1*x2 + z, data=df3, family=binomial) mod3 <-
```

```
model.matrix(mod3)
X <- coef(mod3)
b <- b[5]*dlogis(X%*% b) + (b[2] + b[5]*X[,"x2"])*(b[3] + b[5]*X[,"x1"])*dlogis(X%*%b)*(1-(2*plogis(X %*% b))) e3 <-
```

` intEff(mod3, c("x1", "x2"), data=df3) i3 <-`

We can content ourselves that these are the same:

`1] e3[`

`## [1] 0.002296147`

`$byobs$int[1,] i3`

```
## int_eff linear phat se_int_eff zstat
## 1 0.002296147 -0.01145364 0.1248453 0.005366089 0.4278994
```

So, the effects are the cross-derivative with respect to both `x1`

and `x2`

. I don’t find this to be a particularly intuitive metric, though we can consider the significance of these values and look at their distribution. I find the second difference metric to be a bit more intuitive because it still gives the difference in change in predicted probabilities for a change in another variable. For example, we could do:

```
secondDiff(mod3, c("x1", "x2"), data=df3, vals =list(x1=c(-1,0), x2=c(-2,2)))
s3 <-summary(s3)
```

```
## Second Difference Using the Average Marginal Effect Approach
##
## Overall:
## Average Second Difference: -0.083, 95% CI: (-0.170,0.003)
##
## Individual:
## Significant Negative Individual Second Differences: 148
## Significant Positive Individual Second Differences: 0
## Inignificant Individual Second Differences: 852
```

This shows us what happens when we change `x1`

from -1 to 0 for the two conditions - when `x2`

is at its minmum versus its maximum. We see that on average the second difference is around -.08 and about 20% of the differences are statistically significant. The average second difference is not, itself, significant. The -0.084 number means the same thing here as it did above. It’s the difference in the first difference of `x1`

when `x2`

is high and when `x2`

is low.