Logit models relaxing the iid hypothesis

In the previous section, we assumed that the error terms were iid (identically and independently distributed), i.e., uncorrelated and homoscedastic. Extensions of the basic multinomial logit model have been proposed by relaxing one of these two hypothesis while maintaining the hypothesis of a Gumbel distribution.

The heteroskedastic logit model

The heteroskedastic logit model was proposed by Bhat (1995). The probability that \(U_l>U_j\) is:

\[ P(\epsilon_j<V_l-V_j+\epsilon_l)=e^{-e^{-\frac{(V_l-V_j+\epsilon_l)}{\theta_j}}}, \]

which implies the following conditional and unconditional probabilities

\[\begin{equation*} (P_l \mid \epsilon_l) =\prod_{j\neq l}e^{-e^{-\frac{(V_l-V_j+\epsilon_l)}{\theta_j}}}, \end{equation*}\]

\[\begin{equation*} \begin{array}{rcl} P_l&=&\displaystyle\int_{-\infty}^{+\infty} \prod_{j\neq l} \left(e^{-e^{-\frac{(V_l-V_j+t)}{\theta_j}}}\right)\frac{1}{\theta_l}e^{-\frac{t}{\theta_l}}e^{-e^{-\frac{t}{\theta_l}}} dt\\ &=& \displaystyle \int_{0}^{+\infty}\left(e^{-\sum_{j \neq l}e^{-\frac{V_l-V_j-\theta_l \ln t}{\theta_j}}}\right)e^{-t}dt. \end{array} \end{equation*}\]

There is no closed form for this integral, but it can be efficiently computed using a Gauss quadrature method, and more precisely the Gauss-Laguerre quadrature method.

The nested logit model

The nested logit model was first proposed by (McFadden 1978). It is a generalization of the multinomial logit model that is based on the idea that some alternatives may be joined in several groups (called nests). The error terms may then present some correlation in the same nest, whereas error terms of different nests are still uncorrelated.

Denoting \(m=1... M\) the nests and \(B_m\) the set of alternatives belonging to nest \(m\), the cumulative distribution of the errors is:

\[ \mbox{exp}\left(-\sum_{m=1}^M \left( \sum_{j \in B_m} e^{-\epsilon_j/\lambda_m}\right)^{\lambda_m}\right). \]

The marginal distributions of the \(\epsilon\)s are still univariate extreme value, but there is now some correlation within nests. \(1-\lambda_m\) is a measure of the correlation, i.e., \(\lambda_m = 1\) implies no correlation. In the special case where \(\lambda_m=1\; \forall m\), the errors are iid Gumbel errors and the nested logit model reduce to the multinomial logit model. It can then be shown that the probability of choosing alternative \(j\) that belongs to nest \(l\) is:

\[ P_j = \frac{e^{V_j/\lambda_l}\left(\sum_{k \in B_l} e^{V_k/\lambda_l}\right)^{\lambda_l-1}} {\sum_{m=1}^M\left(\sum_{k \in B_m} e^{V_k/\lambda_m}\right)^{\lambda_m}}, \]

and that this model is a random utility model if all the \(\lambda\) parameters are in the \(0-1\) interval.1

Let us now write the deterministic part of the utility of alternative \(j\) as the sum of two terms: the first one (\(Z_j\)) being specific to the alternative and the second one (\(W_l\)) to the nest it belongs to:


We can then rewrite the probabilities as follow:

\[ \begin{array}{rcl} P_j&=&\frac{e^{(Z_j+W_l)/\lambda_l}}{\sum_{k \in B_l} e^{(Z_k+W_l)/\lambda_l}}\times \frac{\left(\sum_{k \in B_l} e^{(Z_k+W_l)/\lambda_l}\right)^{\lambda_l}} {\sum_{m=1}^M\left(\sum_{k \in B_m} e^{(Z_k+W_m)/\lambda_m}\right)^{\lambda_m}}\\ &=&\frac{e^{Z_j/\lambda_l}}{\sum_{k \in B_l} e^{Z_k/\lambda_l}}\times \frac{\left(e^{W_l/\lambda_l}\sum_{k \in B_l} e^{Z_k/\lambda_l}\right)^{\lambda_l}} {\sum_{m=1}^M\left(e^{W_m/\lambda_m}\sum_{k \in B_m} e^{Z_k/\lambda_m}\right)^{\lambda_m}}. \end{array} \]

Then denote \(I_l=\ln \sum_{k \in B_l} e^{Z_k/\lambda_l}\) which is often called the log-sum, the inclusive value or the inclusive utility.2 We then can write the probability of choosing alternative \(j\) as:

\[ P_j=\frac{e^{Z_j/\lambda_l}}{\sum_{k \in B_l} e^{Z_k/\lambda_l}}\times \frac{e^{W_l+\lambda_l I_l}}{\sum_{m=1}^Me^{W_m+\lambda_m I_m}}. \]

The first term \(\mbox{P}_{j\mid l}\) is the conditional probability of choosing alternative \(j\) if nest \(l\) is chosen. It is often referred as the lower model. The second term \(\mbox{P}_l\) is the marginal probability of choosing nest \(l\) and is referred as the upper model. \(W_l+\lambda_l I_l\) can be interpreted as the expected utility of choosing the best alternative in \(l\), \(W_l\) being the expected utility of choosing an alternative in this nest (whatever this alternative is) and \(\lambda_l I_l\) being the expected extra utility gained by being able to choose the best alternative in the nest. The inclusive values link the two models. It is then straightforward to show that IIA applies within nests, but not for two alternatives in different nests.

A consistent but inefficient way of estimating the nested logit model is to estimate separately its two components. The coefficients of the lower model are first estimated, which enables the computation of the inclusive values \(I_l\). The coefficients of the upper model are then estimated, using \(I_l\) as covariates. Maximizing directly the likelihood function of the nested model leads to a more efficient estimator.



Bhat (1995) estimated the heteroscedastic logit model on the ModeCanada data set. Using mlogit, the heteroscedastic logit model is obtained by setting the heterosc argument to TRUE:

##           Estimate Std. Error   z-value     Pr(>|z|)
## sp.train 1.2371829  0.1104610 11.200182 0.000000e+00
## sp.air   0.5403239  0.1118353  4.831425 1.355592e-06

The variance of the error terms of train and air are respectively higher and lower than the variance of the error term of car (set to 1). Note that the z-values and p-values of the output are not particularly meaningful, as the hypothesis that the coefficient is zero (and not one) is tested. The homoscedasticity hypothesis can be tested using any of the three tests. A particular convenient syntax is provided in this case. For the likelihood ratio and the wald test, one can use only the fitted heteroscedastic model as argument. In this case, it is guessed that the hypothesis that the user wants to test is the homoscedasticity hypothesis.

or, more simply:

The Wald test can also be computed using the linearHypothesis function from the car package:

For the score test, we provide the constrained model as argument, which is the standard multinomial logit model and the supplementary argument which defines the unconstrained model, which is in this case heterosc = TRUE.

##           wald     lh score    lr
## stat    25.196 25.196 9.488 6.888
## p-value  0.000  0.000 0.009 0.032

The homoscedasticity hypothesis is strongly rejected using the Wald test, but only at the 1 and 5% level for, respectively, the score and the likelihood ratio tests.


Head and Mayer (2004) analyzed the choice of one of the 57 European regions belonging to 9 countries by Japenese firms to implement a new production unit.

Note that we’ve used an extra argument to mlogit.data called group.var which indicates the grouping variable, which will be used later to define easily the nests. There are two sets of covariates: the wage rate wage, the unemployment rate unemp, a dummy indicating that the region is eligible to European funds elig and the area area are observed at the regional level and are therefore relevant for the estimation of the lower model, whereas the social cotisation rate scrate and the corporate tax rate ctaxrate are observed at the country level and are therefore suitable for the upper model.

We first estimate a multinomial logit model:

Note that, as the covariates are only alternative specific, the intercepts are not identified and therefore have been removed. We next estimate the lower model, which analyses the choice of a region within a given country. Therefore, for each choice situation, we estimate the choice of a region on the subset of regions of the country which has been chosen. Moreover, observations concerning Portugal and Ireland are removed as these two countries are mono-region.

We next use the fitted lower model in order to compute the inclusive value, at the country level:

\[ \mbox{iv}_{ig} = \ln \sum_{j \in B_g} e^{\beta^\top x_{ij}}, \]

where \(B_g\) is the set of regions for country \(g\). When a grouping variable is provided in the mlogit.data function, inclusive values are by default computed for every group \(g\) (global inclusive values are obtained by setting the type argument to "global"). By default, output is set to "chid" and the results is a vector (if type = "global") or a matrix (if type = "region") with row number equal to the number of choice situations. If output is set to "obs", a vector of length equal to the number of lines of the data in long format is returned. The following code indicates different ways to use the logsum function:

##         BE       DE       ES       FR       IE       IT       NL
## 3 3.595818 5.415838 3.593702 5.153709 1.933707 5.051387 4.077845
## 4 4.113243 5.765190 4.445012 5.383095 1.960462 5.687569 4.490379
##         PT       UK
## 3 2.702028 4.900622
## 4 3.200124 5.378561
##        3        4        5        7        8        9 
## 6.736116 7.182139 7.121855 7.084245 7.133368 7.133368
## [1] 3.595818 3.595818 3.595818 3.595818 5.415838 5.415838
## [1] 6.736116 6.736116 6.736116 6.736116 6.736116 6.736116

To add the inclusive values in the original data.frame, we use output = "obs" and the type argument can be omitted as its default value is "group":

We next select the relevant variables for the estimation of the upper model, select unique lines in order to keep only one observation for every choice situation / country combination and finally we coerce the response (choice.c) to a logical for the chosen country.

Finally, we estimate the upper model, using the previously computed inclusive value as a covariate.

If one wants to obtain different iv coefficients for different countries, the iv covariate should be introduced in the 3th part of the formula and the coefficients for the two mono-region countries (Ireland and Portugal) should be set to 1, using the constPar argument.

We next estimate the full-information maximum likelihood nested model. It is obtained by adding a nests argument to the mlogit function. This should be a named list of alternatives (here regions), the names being the nests (here the countries). More simply, if a group variable has been indicated while using mlogit.data, nests can be a boolean.

Two flavors of nested models can be estimated, using the un.nest.el argument which is a boolean. If TRUE, one imposes that the coefficient associated with the inclusive utility is the same for every nest, which means that the degree of correlation inside each nest is the same. If FALSE, a different coefficient is estimated for every nest.

The results of the fitted models are presented in the following table:

<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN” “http://www.w3.org/TR/html4/loose.dtd”>
Choice by Japanese firms of a european region.
Mult. logit Lower model Upper model Upper model Nested logit Nested logit
log(wage) 0.47 1.21* 0.46 0.77**
(0.25) (0.48) (0.25) (0.25)
unemp -8.90*** -9.77*** -7.62*** -6.95**
(1.69) (2.39) (1.60) (2.28)
elig -0.25 -0.89** -0.34 -0.18
(0.21) (0.33) (0.20) (0.18)
log(area) 0.31*** 0.29*** 0.29*** 0.15**
(0.05) (0.06) (0.05) (0.05)
scrate -2.26*** -2.49*** 0.15 -2.44*** -0.88
(0.38) (0.38) (1.72) (0.38) (0.86)
ctaxrate -4.82*** -3.69*** -1.78 -4.13*** -2.35
(0.59) (0.60) (1.32) (0.66) (1.25)
iv 0.66*** 0.85***
(0.06) (0.08)
iv:BE 0.72*** 0.48*
(0.08) (0.19)
iv:DE 0.72*** 0.52**
(0.08) (0.17)
iv:ES 0.86*** 0.77***
(0.05) (0.20)
iv:FR 0.75*** 0.67***
(0.04) (0.09)
iv:IT 0.62*** 0.25**
(0.05) (0.08)
iv:NL 0.69*** 0.18
(0.06) (0.10)
iv:UK 0.87*** 0.86***
(0.06) (0.10)
AIC 3469.13 1738.04 1746.58 1713.86 3467.96 3437.01
Log Likelihood -1728.57 -865.02 -870.29 -845.93 -1726.98 -1703.50
Num. obs. 452 421 452 452 452 452
p < 0.001, p < 0.01, p < 0.05

For the nested logit models, two tests are of particular interest:

For the test of no nests, the nested model is provided as the unique argument for the lrtest and the waldtest function. For the scoretest, the constrained model (i.e., the multinomial logit model) is provided as the first argument and the second argument is nests, which describes the nesting structure that one wants to test.

The Wald test can also be performed using the linearHypothesis function:

##            wald      lh score     lr
## stat    208.407 208.407 60.28 50.122
## p-value   0.000   0.000  0.00  0.000

The three tests reject the null hypothesis of no correlation. We next test the hypothesis that all the nest elasticities are equal.

##           wald     lh score     lr
## stat    73.535 73.535 60.28 46.954
## p-value  0.000  0.000  0.00  0.000

Once again, the three tests strongly reject the hypothesis.


Bhat, Chandra R. 1995. “A Heteroscedastic Extreme Value Model of Intercity Travel Mode Choice.” Transportation Research Part B: Methodological 29 (6): 471–83. http://www.sciencedirect.com/science/article/pii/0191261595000156.

Daly, A. 1987. “Estimating ‘Tree’ Logit Models.” Transportation Research B, 251–67.

Head, Keith, and Thierry Mayer. 2004. “Market Potential and the Location of Japanese Investment in the European Union.” The Review of Economics and Statistics 86 (4): 959–72. https://doi.org/10.1162/0034653043125257.

Heiss, Florian. 2002. “Structural Choice Analysis with Nested Logit Models.” The Stata Journal 2 (3): 227–52.

Hensher, David A., and William H. Greene. 2002. “Specification and Estimation of the Nested Logit Model: Alternative Normalisations.” Transportation Research Part B 36: 1–17.

Koppelman, Franck S., and Chieh-Hua Wen. 1998. “Alternative Nested Logit Models: Structure, Properties and Estimation.” Transportation Research B 32 (5): 289–98.

McFadden, D. 1978. “Spatial Interaction Theory and Planning Models.” In Modeling the Choice of Residential Location, edited by A. Karlqvist, 75–96. North-Holland, Amsterdam.

  1. A slightly different version of the nested logit model (Daly 1987) is often used, but is not compatible with the random utility maximization hypothesis. Its difference with the previous expression is that the deterministic parts of the utility for each alternative is not divided by the nest elasticity. The differences between the two versions have been discussed in Koppelman and Wen (1998), Heiss (2002) and Hensher and Greene (2002).

  2. We’ve already encountered this expression in Section~@ref(sec:conssurplus).