Estimating Models with Interactions

Interactions with continuous variables

To add interactions between covariates in your model, you can add additional arguments in the pars vector in the logitr() function separated by the * symbol. For example, let’s say we want to interact price with feat in the following model:

library("logitr")

model <- logitr(
  data    = yogurt,
  outcome = 'choice',
  obsID   = 'obsID',
  pars    = c('price', 'feat', 'brand')
)

To do so, I could add "price*feat" to the pars vector:

model_price_feat <- logitr(
  data    = yogurt,
  outcome = 'choice',
  obsID   = 'obsID',
  pars    = c('price', 'feat', 'brand', 'price*feat')
)

The model now has an estimated coefficient for the price*feat effect:

summary(model_price_feat)
#> =================================================
#> Call:
#> logitr(data = yogurt, outcome = "choice", obsID = "obsID", pars = c("price", 
#>     "feat", "brand", "price*feat"))
#> 
#> Frequencies of alternatives:
#>        1        2        3        4 
#> 0.402156 0.029436 0.229270 0.339138 
#> 
#> Exit Status: 3, Optimization stopped because ftol_rel or ftol_abs was reached.
#>                                 
#> Model Type:    Multinomial Logit
#> Model Space:          Preference
#> Model Run:                1 of 1
#> Iterations:                   19
#> Elapsed Time:        0h:0m:0.03s
#> Algorithm:        NLOPT_LD_LBFGS
#> Weights Used?:             FALSE
#> Robust?                    FALSE
#> 
#> Model Coefficients: 
#>               Estimate Std. Error  z-value  Pr(>|z|)    
#> price        -0.356909   0.024696 -14.4522 < 2.2e-16 ***
#> feat          1.155206   0.378237   3.0542  0.002257 ** 
#> brandhiland  -3.724702   0.146520 -25.4212 < 2.2e-16 ***
#> brandweight  -0.640221   0.054543 -11.7380 < 2.2e-16 ***
#> brandyoplait  0.724315   0.080317   9.0182 < 2.2e-16 ***
#> price:feat   -0.086381   0.047275  -1.8272  0.067672 .  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>                                      
#> Log-Likelihood:         -2655.5403770
#> Null Log-Likelihood:    -3343.7419990
#> AIC:                     5323.0807540
#> BIC:                     5357.8100000
#> McFadden R2:                0.2058178
#> Adj McFadden R2:            0.2040234
#> Number of Observations:  2412.0000000

Interactions with discrete variables

In the above example, both price and feat were continuous variables, so only a single interaction coefficient was needed.

In the case of interacting discrete variables, multiple interactions coefficients will be estimated according to the number of levels in the discrete attribute. For example, the interaction of price with brand will require three new interactions - one for each level of the brand variable except the first reference level:

model_price_brand <- logitr(
  data    = yogurt,
  outcome = 'choice',
  obsID   = 'obsID',
  pars    = c('price', 'feat', 'brand', 'price*brand')
)

The model now has three estimated coefficients for the price*brand effect:

summary(model_price_brand)
#> =================================================
#> Call:
#> logitr(data = yogurt, outcome = "choice", obsID = "obsID", pars = c("price", 
#>     "feat", "brand", "price*brand"))
#> 
#> Frequencies of alternatives:
#>        1        2        3        4 
#> 0.402156 0.029436 0.229270 0.339138 
#> 
#> Exit Status: 3, Optimization stopped because ftol_rel or ftol_abs was reached.
#>                                 
#> Model Type:    Multinomial Logit
#> Model Space:          Preference
#> Model Run:                1 of 1
#> Iterations:                   37
#> Elapsed Time:        0h:0m:0.06s
#> Algorithm:        NLOPT_LD_LBFGS
#> Weights Used?:             FALSE
#> Robust?                    FALSE
#> 
#> Model Coefficients: 
#>                     Estimate Std. Error z-value  Pr(>|z|)    
#> price              -0.389503   0.045247 -8.6085 < 2.2e-16 ***
#> feat                0.421899   0.122578  3.4419 0.0005777 ***
#> brandhiland        -1.691789   0.623149 -2.7149 0.0066295 ** 
#> brandweight        -2.228475   0.561641 -3.9678 7.254e-05 ***
#> brandyoplait        0.438702   0.450424  0.9740 0.3300688    
#> price:brandhiland  -0.434271   0.115531 -3.7589 0.0001707 ***
#> price:brandweight   0.199919   0.069830  2.8629 0.0041973 ** 
#> price:brandyoplait  0.033233   0.050372  0.6598 0.5094089    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>                                      
#> Log-Likelihood:         -2643.2046775
#> Null Log-Likelihood:    -3343.7419990
#> AIC:                     5302.4093549
#> BIC:                     5348.7150000
#> McFadden R2:                0.2095070
#> Adj McFadden R2:            0.2071145
#> Number of Observations:  2412.0000000

Interactions with individual-specific variables

If you want to include interactions with individual-specific variables (for example, to assess the difference in an effect between groups of respondents), you should not include the individual-specific variable interactions using * in pars. This is because interactions inside pars automatically generate the interaction coefficient as well as coefficients for each covariate.

For example, if you had a group variable that determined whether individuals belongs to group A or group B, including price*group in pars would create coefficients for price, groupA, and price:groupA, but the groupA coefficient would be unidentified. In this case, you should only include price and price:groupA in the model. For now, the only way to handle this situation is to manually create dummy-coded interaction variables to include in the model.

To illustrate how one might do this, consider if the yogurt data frame had two groups of individuals: A and B. For simple illustration, I’ll define these groups arbitrarily based on whether or not the obsID is even or odd:

# Create group A dummies
yogurt$groupA <- ifelse(yogurt$obsID %% 2 == 0, 1, 0)

An interaction between the group variable and price can be included in the model by first manually creating a price_groupA interaction variable and then including it in pars:

# Create dummy coefficients for group interaction with price
yogurt$price_groupA <- yogurt$price*yogurt$groupA

model_price_group <- logitr(
  data    = yogurt,
  outcome = 'choice',
  obsID   = 'obsID',
  pars    = c('price', 'feat', 'brand', 'price_groupA')
)

The model now has attribute coefficients for price, feat, and brand as well as an interaction between the group and price:

summary(model_price_group)
#> =================================================
#> Call:
#> logitr(data = yogurt, outcome = "choice", obsID = "obsID", pars = c("price", 
#>     "feat", "brand", "price_groupA"))
#> 
#> Frequencies of alternatives:
#>        1        2        3        4 
#> 0.402156 0.029436 0.229270 0.339138 
#> 
#> Exit Status: 3, Optimization stopped because ftol_rel or ftol_abs was reached.
#>                                 
#> Model Type:    Multinomial Logit
#> Model Space:          Preference
#> Model Run:                1 of 1
#> Iterations:                   26
#> Elapsed Time:        0h:0m:0.03s
#> Algorithm:        NLOPT_LD_LBFGS
#> Weights Used?:             FALSE
#> Robust?                    FALSE
#> 
#> Model Coefficients: 
#>                Estimate Std. Error  z-value  Pr(>|z|)    
#> price        -0.3680634  0.0273911 -13.4373 < 2.2e-16 ***
#> feat          0.4915271  0.1200725   4.0936 4.248e-05 ***
#> brandhiland  -3.7155231  0.1454216 -25.5500 < 2.2e-16 ***
#> brandweight  -0.6411384  0.0544999 -11.7640 < 2.2e-16 ***
#> brandyoplait  0.7345568  0.0806444   9.1086 < 2.2e-16 ***
#> price_groupA  0.0030007  0.0254484   0.1179    0.9061    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>                                      
#> Log-Likelihood:         -2656.8808982
#> Null Log-Likelihood:    -3343.7419990
#> AIC:                     5325.7617965
#> BIC:                     5360.4911000
#> McFadden R2:                0.2054169
#> Adj McFadden R2:            0.2036225
#> Number of Observations:  2412.0000000

Interactions in mixed logit models

Suppose I want to include an interaction between two variables and I also want one of those variables to be modeled as normally distributed across the population. The example below illustrates this cases, where a price*feat interaction is specified and the feat parameter is modeled as normally distributed by setting randPars = c(feat = "n"):

model_price_feat_mxl <- logitr(
  data    = yogurt,
  outcome = 'choice',
  obsID   = 'obsID',
  pars    = c('price', 'feat', 'brand', 'price*feat'),
  randPars = c(feat = "n")
)

In this case, the price*feat interaction parameter is interpreted as a difference in the feat_mu parameter and price; that is, it an interaction in the mean feat parameter and price:

summary(model_price_feat_mxl)
#> =================================================
#> Call:
#> logitr(data = yogurt, outcome = "choice", obsID = "obsID", pars = c("price", 
#>     "feat", "brand", "price*feat"), randPars = c(feat = "n"))
#> 
#> Frequencies of alternatives:
#>        1        2        3        4 
#> 0.402156 0.029436 0.229270 0.339138 
#> 
#> Exit Status: 3, Optimization stopped because ftol_rel or ftol_abs was reached.
#>                              
#> Model Type:       Mixed Logit
#> Model Space:       Preference
#> Model Run:             1 of 1
#> Iterations:                32
#> Elapsed Time:        0h:0m:1s
#> Algorithm:     NLOPT_LD_LBFGS
#> Weights Used?:          FALSE
#> Robust?                 FALSE
#> 
#> Model Coefficients: 
#>               Estimate Std. Error  z-value  Pr(>|z|)    
#> price        -0.388138   0.027026 -14.3615 < 2.2e-16 ***
#> feat_mu       0.829059   0.552235   1.5013    0.1333    
#> brandhiland  -3.991584   0.165893 -24.0612 < 2.2e-16 ***
#> brandweight  -0.662161   0.055780 -11.8709 < 2.2e-16 ***
#> brandyoplait  0.787758   0.086233   9.1352 < 2.2e-16 ***
#> price:feat   -0.076752   0.071063  -1.0800    0.2801    
#> feat_sigma    2.341209   0.493326   4.7458 2.077e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>                                      
#> Log-Likelihood:         -2645.2196926
#> Null Log-Likelihood:    -3343.7419990
#> AIC:                     5304.4393851
#> BIC:                     5344.9569000
#> McFadden R2:                0.2089044
#> Adj McFadden R2:            0.2068109
#> Number of Observations:  2412.0000000
#> 
#> Summary of 10k Draws for Random Coefficients: 
#>        Min.    1st Qu.    Median      Mean  3rd Qu.     Max.
#> 1 -8.270551 -0.7516536 0.8272703 0.8257815 2.405655 9.283538