Week 6: Discrete Choice Modelling

Multinomial & Conditional Logit

Jesper Lindmarker

How difficult was last week?

Week 6: Discrete Choice Modelling - Multinomial & Conditional Logit

Week 5 recap: LPM and Logistic regression πŸ”™

  • Binary outcomes shift us from modelling levels β†’ modelling probabilities
  • Probabilities of a binary outcome follow an S-curve
  • LPM: simple, intuitive, but can give impossible predictions and wrong functional form
  • Logistic regression: transforms probabilities β†’ odds β†’ log-odds so we can fit a straight line
  • Log-odds coefficients: additive effects; odds ratios: multiplicative effects
  • Marginal effects vary across X because the S-curve is nonlinear
  • AMEs summarize these varying effects on the probability scale
  • Model goodness of fit uses likelihood, pseudo-RΒ², calibration, and classification (not residuals)

✨ Big idea: binary Y requires probability models, and every modelling choice brings trade-offs.

Today’s topic

MY RESEARCH ON PARTNER CHOICE! πŸ§‘β€β€οΈβ€πŸ’‹β€πŸ§‘

  • Maybe later. If we have time at the end.
  • But first, a conceptual question:
    • Is partner-choice a choice?

πŸš€ Our Journey So Far

\(y = f(x)\)

  • OLS: continuous \(y\) (e.g., income, GPA)
  • Logistic regression: binary \(y\) (yes/no, 1/0, pass/fail)
  • Each step: adapt regression to the form of the \(y\)

πŸ‘‰ Today: keep expanding the y side of the equation. What do we do if there are more than two choices?

Statistical modelling of choices πŸ‘‰πŸ‘œ

What do we mean by choices? πŸ€”

  1. A choice = a situation where someone selects one or several option from a choice set.

Choices Are Everywhere (when you start to look for them) 🌍

  • In small, daily actions:
- Coffee / Tea / Energy drink β˜•πŸ«–βš‘  
- Bike / Bus / Walk / Car  
- Netflix / HBO / Disney+ 🎬  
- Cook / Takeout / Restaurant 🍝  
  • And in big life decisions:
- Which person to partner with ❀️  
- Which career path to take πŸ’Ό  
- Which political party to vote for πŸ—³οΈ  
- Which neighbourhood to live in πŸ™οΈ  
- Which transport modes to use πŸš²πŸšŒπŸš—  
- Field of study πŸŽ“  
- Whether to cohabit or marry  
- Where to migrate ✈️
  • Do you agree that all of these are choices?
  • What do they depend on?

What do we mean by choices? πŸ€”

  1. A choice = a situation where someone selects one or several option from a choice set.

  2. A situation where the process can reasonably be thought of as a choice. i.e. choice is a good approximation of the behaviour.

What Drives Choices? 🎯

Every choice has influences:

  • Individual traits
  • Context and opportunities
  • Features of the alternatives
  • Social norms and boundaries
  • Trade-offs (time, cost, distance, similarity)

Example: Political Science πŸ‘©β€βš–οΈ

  • Outcome: Which party do you vote for?
  • Predictors:
    • Age, gender, education (socio-demographics)
    • Political interest, ideology (rational/attitudinal)
    • Region, social circle norms (social influence)

More Examples

  • Transport mode: walk, bike, bus, car, train 🚎
    • Distance (rational cost/time)
    • Health attitudes (behavioural)
    • Peer norms (social influence)
  • Job choice: public, private, nonprofit πŸ’°
    • Salary (rational)
    • Parents’ occupation (social reproduction)
    • Altruism (values)
  • Eurovision jury votes 🎢
    • Alternatives: songs
    • Song quality, genre, performance running order (features)
    • Neighbour/diaspora ties (social mechanisms)

The Core Intuition ✨

A choice situation reveals:

  • something about the chooser,
  • something about the options,
  • something about the match between them.
  • And perhaps, something about the underlying process

Why Choices Matter in Social Science πŸ“Š

Choices reflect:

  • Preferences (of the individual)
  • Constraints (due to social structure)
  • Social boundaries (that exist due to predjudice)
  • Opportunities (that are unequally distributed)
  • Identity (socially constructed)

All relevant measures and construct in social science.

New models ⚑️

  • We will extend the toolkit with two models:

    • Multinomial logit, Conditional logit
  • General advice:

    • The key to statistical modelling is not memorizing models.
    • It is thinking carefully about the outcome you are studying and what data you have
    • πŸ‘‰ Once you understand that, you can find models to suit your needs.

Terminology Alert

β€œMultinomial logit” can sometimes (in economics) refer to β€œconditional logit”

Multinomial Regression

  • Classic in social science applications
  • Very much an extension of logistic regression, but with more than 2 outcomes
  • Choice situation:
    • Observations = choosers
    • Predictors = characteristics of the chooser
    • Outcome = one category among several (K > 2)

Example:

- Outcome: Which party do you vote for?  
- Predictors:
  - Age, gender, education (socio-demographics)  
  - Political interest, ideology (rational/attitudinal)  
  - Region, social circle norms (social influence)  

Data setup - Multinomial regression

head(as.data.frame(sim_df), n = 15)
   ego_id age unemployed mode
1       1  33          0 walk
2       2  59          0  car
3       3  39          0 walk
4       4  64          0 walk
5       5  67          0 walk
6       6  20          0 walk
7       7  45          0  car
8       8  64          0  car
9       9  47          1 walk
10     10  42          0 walk
11     11  68          0  car
12     12  42          0 walk
13     13  53          1 walk
14     14  48          0 walk
15     15  23          1 walk
  • 15 observations, distributed over 15 rows in the data
  • Each row = chosen alternative for one chooser
  • ego_id = chooser ID
  • mode = alternative (walk, bus, car)
  • Predictors vary for each ego: age, unemployed

All choices must sum to 1 πŸ“Š

  • In every choice situation, the set of alternatives must be exhaustive β€” every possible option must be represented.
  • An individual must choose at least one option
    • (or you include β€œnon-choice” as an option)
  • In other words: The set of probabilities across all options must sum to 1

If we model daily transport choices using only car, bike, and train, we should include a stay-home option if that was an available choice.

From two to more outcomes

  • With two outcomes, logistic regression ensures: \[P(Y=1) + P(Y=0) = 1\]
  • With more than two outcomes, we need a generalization where: \[P(Y=1) + P(Y=2) + \dots + P(Y=K) = 1\]

Probabilities sum to 1

Multinomial Regression

  • For each category \(k \neq r\):

\[ \log\!\left(\frac{P(y_i = k)}{P(y_i = r)}\right) = \beta_{0k} + \beta_{1k} X_{i1} + \beta_{2k} X_{i2} + \dots + \beta_{pk} X_{ip} \]

  • Once again we have this linear function! How neat!

  • How does this compare to logistic regression?

  • Log odds of two choice alternatives (Repeated once for each category vs the reference)

  • Then they are estimated together

  • \(k \in \{1, \dots, K\}\) indexes outcome categories

  • One reference category \(r\) is chosen, and we set \(\beta_r = \mathbf{0}\) for identification

Example: Choosing mode of transportation

  • Outcomes: Walk \((r)\), Bus \((b)\), Car \((c)\)
  • Reference: Walk, so \(\beta_r = 0\)
  • Predictor: Age

Log-odds equations:

\[ \log\!\left(\frac{P(\text{Bus})}{P(\text{Walk})}\right) = \text{Age}_i \beta_b \]

\[ \log\!\left(\frac{P(\text{Car})}{P(\text{Walk})}\right) = \text{Age}_i \beta_c \]

  • Let’s interpret \(\beta_b\)
  • \(\beta_{b}\) = change in log-odds of taking the bus vs walking for a 1-unit increase in \(\text{Age}\)

Interpretation

library(nnet) # One possible package for multinomial regression
choice_mode_wide$actual_mode <- relevel(choice_mode_wide$actual_mode, "walk") # Set walk as reference category
m1 <- multinom(actual_mode ~ age + unemployed, data = choice_mode_wide)

Multinomial regression predicting mode of transport

Est. S.E.
response Model 1 Model 1
(Intercept) bus -7.153*** 1.056
car -5.938*** 1.036
age bus 0.295*** 0.042
car 0.287*** 0.041
unemployed bus -2.286*** 0.519
car -3.734*** 0.523
Num.Obs. 500
AIC 716.4
BIC 741.7

Intercepts

Multinomial regression predicting mode of transport

Est. S.E.
response Model 1 Model 1
(Intercept) bus -7.153*** 1.056
car -5.938*** 1.036
age bus 0.295*** 0.042
car 0.287*** 0.041
unemployed bus -2.286*** 0.519
car -3.734*** 0.523
Num.Obs. 500
AIC 716.4
BIC 741.7
  • Intercept:
    • Value of linear predictor when all other terms are = \(0\)
  • Car Intercept = -5.94
    • at age = 0 and employed, baseline log-odds are negative, indicating that taking the car is less likely than walking.
  • Bus Intercept = -7.15
    • at age = 0 and employed, log-odds are negative, indicating that taking the bus is less likely than walking.
  • Age = 0 is outside realistic data. Intercepts mainly anchor the equation.

Age effects

Multinomial regression predicting mode of transport

Est. S.E.
response Model 1 Model 1
(Intercept) bus -7.153*** 1.056
car -5.938*** 1.036
age bus 0.295*** 0.042
car 0.287*** 0.041
unemployed bus -2.286*** 0.519
car -3.734*** 0.523
Num.Obs. 500
AIC 716.4
BIC 741.7
  • Car estimate: 0.287
    • Each additional year of age increases the log-odds of Car vs Walking
    • Odds ratio = \(\exp(\beta)\) = 1.33
      β†’ +33% odds per year of age.
  • Bus estimate: 0.295
    • Each additional year increases the log-odds of Bus vs Walking
    • Odds ratio = \(\exp(\beta)\) = 1.34
      β†’ + 34% odds per year of age.

Unemployment effects

Multinomial regression predicting mode of transport

Est. S.E.
response Model 1 Model 1
(Intercept) bus -7.153*** 1.056
car -5.938*** 1.036
age bus 0.295*** 0.042
car 0.287*** 0.041
unemployed bus -2.286*** 0.519
car -3.734*** 0.523
Num.Obs. 500
AIC 716.4
BIC 741.7
  • Car vs Walk:
    • Being unemployed decrease the log-odds by -3.73.
    • Odds ratio = \(\exp(\beta)\) = 0.02
      β†’ -98% change in odds.
  • Bus vs Walk:
    • Being unemployed decreases log-odds by -2.29.
    • Odds ratio = \(\exp(\beta)\) = 0.1
      β†’ -90% change in odds.

Predicted probabilities

  • You could calculate marginal effects in probability space for one additional year of age.
  • But, same problem as always: How do we represent our results as truthfully as possible. Additive log-odds, multiplicative odds or non-linear probabilites?

Conditional logit 🧨

Transition – From Multinomial to Conditional Logit

  • Multinomial regression
    • Predictors = characteristics of the chooser (ego)
    • Outcome = category chosen
  • But what if predictors vary by alternative?
    • Bus time β‰  Car time β‰  Walk time
    • Ticket cost β‰  Gas cost β‰  Free walking
  • πŸ‘‰ This is where Conditional Logit comes in.

Data Structure: Long format

   ego_id mode chosen time  cost age unemployed actual_mode
1       1 walk      0  140  0.00  33          0         bus
2       1  bus      1   26  2.33  33          0         bus
3       1  car      0   27  3.65  33          0         bus
4       2 walk      0  358  0.00  59          0         bus
5       2  bus      1   63  3.79  59          0         bus
6       2  car      0   74 14.31  59          0         bus
7       3 walk      0  193  0.00  39          0         car
8       3  bus      0   35  2.69  39          0         car
9       3  car      1   29  5.72  39          0         car
10      4 walk      0  399  0.00  64          0         car
11      4  bus      0   71  4.06  64          0         car
12      4  car      1   53 14.71  64          0         car
  • 4 observations, distributed over 12 rows in the data
  • Each row = one alternative for one chooser
  • ego_id = chooser ID
  • mode = alternative (walk, bus, car)
  • chosen = 1 if chosen, 0 otherwise
  • Predictors vary by alternative: time, cost
  • Ego-specific predictors are constant within choice sets: age, unemployed

Conditional Logit Probability/Log-odds/Odds equations

Probability of individual \(i\) choosing alternative \(j\):

\[P_{ij} = \frac{\exp(X_{ij}\beta)}{\sum_{k\in C_i} \exp(X_{ik}\beta)}\]

Log odds:

\[\log\left(\frac{P_{ij}}{P_{ik}}\right) = (X_{ij} - X_{ik})\beta\]

Ratio of choice probabilities between two otherwise identical alternatives in the choice set:

\[\exp(\beta) = \frac{P_{ij}}{P_{ik}} \text{ when } X_{ij} - X_{ik} = 1\]

Fit the Conditional Logit

library(survival) #Package for clogit
m2 <- clogit(chosen ~ mode + time + cost + strata(ego_id), data = sim_df2) # `strata()` uses an indicator for the choiceset
summary(m2)
Call:
coxph(formula = Surv(rep(1, 1500L), chosen) ~ mode + time + cost + 
    strata(ego_id), data = sim_df2, method = "exact")

  n= 1500, number of events= 500 

           coef exp(coef) se(coef)     z Pr(>|z|)    
modebus -1.5811    0.2057   0.2833 -5.58  2.4e-08 ***
modecar -0.1499    0.8608   0.3377 -0.44     0.66    
time    -0.0289    0.9716   0.0032 -9.02  < 2e-16 ***
cost    -0.2130    0.8082   0.0244 -8.73  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

        exp(coef) exp(-coef) lower .95 upper .95
modebus     0.206       4.86     0.118     0.358
modecar     0.861       1.16     0.444     1.669
time        0.972       1.03     0.965     0.978
cost        0.808       1.24     0.770     0.848

Concordance= 0.783  (se = 0.018 )
Likelihood ratio test= 376  on 4 df,   p=<2e-16
Wald test            = 152  on 4 df,   p=<2e-16
Score (logrank) test = 275  on 4 df,   p=<2e-16
  • Coefficients: effect of time, cost and mode on choice probabilities between alternatives
  • No intercept, all coefficients are comparisons within choiceset.

🎯 Interpretation of Coefficients

Log-odds interpretation:

A one-unit increase in attribute \(X\) increases the log-odds of choosing an alternative over an otherwise identical alternative by \(\beta\).

Probability ratio interpretation:

A one-unit increase in attribute \(X\) multiplies the relative probability of choosing an alternative over an otherwise identical alternative by \(\exp(\beta)\).

Interpretation

modelsummary(m2)
(1)
modebus -1.581
(0.283)
modecar -0.150
(0.338)
time -0.029
(0.003)
cost -0.213
(0.024)
Num.Obs. 1500
AIC 730.8
BIC 752.0
RMSE 0.39
modelsummary(m2, exponentiate = TRUE)
(1)
modebus 0.206
(0.058)
modecar 0.861
(0.291)
time 0.972
(0.003)
cost 0.808
(0.020)
Num.Obs. 1500
AIC 730.8
BIC 752.0
RMSE 0.39

Predicted probabilities

   ego_id mode chosen time  cost age unemployed actual_mode  prob
1       1 walk      0  140  0.00  33          0         bus 0.068
2       1  bus      1   26  2.33  33          0         bus 0.229
3       1  car      0   27  3.65  33          0         bus 0.703
4       2 walk      0  358  0.00  59          0         bus 0.002
5       2  bus      1   63  3.79  59          0         bus 0.754
6       2  car      0   74 14.31  59          0         bus 0.244
7       3 walk      0  193  0.00  39          0         car 0.024
8       3  bus      0   35  2.69  39          0         car 0.270
9       3  car      1   29  5.72  39          0         car 0.705
10      4 walk      0  399  0.00  64          0         car 0.001
11      4  bus      0   71  4.06  64          0         car 0.578
12      4  car      1   53 14.71  64          0         car 0.421
  • As with previous models, we can use predict() to calculate predicted probabilities
    • Within the choice set
  • And from this we could calculate error matrices etc.

But, what about characteristics of the decision-maker?

  • What about the effect of age or unemployment?
  • In conditional logit, comparisons are within a choice set (strata = ego)
  • Identification uses within-set variation only
  • Ego covariates (e.g., age, unemployed) are constant across alternatives β†’ no within-set variation
  • Decision maker characteristics can enter through interactions.
  • Idea: β€œAge effect on Bus vs Walk,” β€œAge effect on sensitivity to cost” etc.
  ego_id mode chosen time  cost age unemployed actual_mode  prob
1      1 walk      0  140  0.00  33          0         bus 0.068
2      1  bus      1   26  2.33  33          0         bus 0.229
3      1  car      0   27  3.65  33          0         bus 0.703
4      2 walk      0  358  0.00  59          0         bus 0.002
5      2  bus      1   63  3.79  59          0         bus 0.754
6      2  car      0   74 14.31  59          0         bus 0.244

Main assumption: Independence of Irrelevant Alternatives (IIA) ☒️

  • IIA says the relative odds of choosing A over B does not depend on any other option C.

  • Any other choice option is irrelevant for A and B.

  • Adding or removing other options does not change \(\frac{P(A)}{P(B)}\)

  • Assumes options have no unobserved similarity

When IIA Makes Sense πŸ‘

IIA is reasonable when alternatives are clearly distinct.

  • Bus vs Car
  • Add Bike β†’ a very different option
  • The Bus/Car comparison stays the same

When IIA Breaks Down 🚍🚍

Red Bus / Blue Bus paradox:

  • Choice: Car (50%) vs Blue Bus (50%)
  • Add β€œRed Bus,” identical to Blue Bus
  • Model predicts Car 33%, Blue Bus 33%, Red Bus 33%
  • since \(50/50 = 33/33\) in line with IIA

  • But intuition says Car should stay 50%, and Bus should split 25% / 25%.

Does Controlling for Similarity Fix IIA? 🧩

Yes, if the similarity is observable and included in the model.

  • Add a variable like is_bus
  • Or a bus-specific constant
  • Or features capturing shared attributes
  • Then the model β€œknows” the buses belong to the same group, and the IIA problem disappears.
  • IIA breaks only when the similarity is unobserved.

Summary: What We Learned Today 🧠

  • Choices are common in social science: any situation that can be framed as choosing from a choice set even if it’s not an explicit choice.

  • Multinomial logit models choices using predictors that vary across individuals

    • Models how predictors shift the likelihood of ending up in one category rather than another.
  • Conditional logit models choices where attributes vary across alternatives

    • It compares alternatives within each person’s choice set.
  • Data structure:

    • Multinomial β†’ one row per chooser
    • Conditional β†’ one row per chooser–alternative
  • Individual traits cannot be included directly in conditional logit without creating interactions, because they do not vary within a choice set.

  • IIA: the comparison between two options should not depend on other options in the set.

    • This is reasonable when options are distinct, but fails when options share unobserved similarities.

See you at the lab πŸ”¬

Using choice models to study romantic partnerships ❀️

❀️ Why study romantic partnerships?

Because partner choice reveals something deeper: How social boundaries work.

  • For example:
    • In school friendship networks, we already see strong divides.
      • Gender clusters
      • Ethnicity clusters

People seem gravitate toward those who feel like β€œus” and away from those who feel like β€œthem.”

  • Similarity in social ties is typically called homophily (similar β†’ like).
  • When the similarity concerns partner choice, we use homogamy (similar β†’ marry).

Social distance & boundaries ⟷

We see the same divides later in life:

  • Who people feel comfortable around
  • Who they befriend
  • Who they object to marrying into their family
  • Who they see as β€œgood matches” vs β€œnot for us”

This is the logic of ingroup vs. outgroup (Allport 1954).

We can call this social boundaries or social distance.

Enter partnering patterns πŸ‘©β€β€οΈβ€πŸ’‹β€πŸ‘¨

Romantic partnerships give us a direct window into social distance.

  • If two groups
    • rarely partner,
    • partner only under certain conditions,
    • or partner strongly within-group,
  • that tells us something fundamental about how close or far their social boundaries are.
  • Assortative mating.

What predicts your partner?

- Age πŸŽ‚  
- Attractiveness 😍  
- Ethnicity 🌍  
- Language πŸ—£οΈ
- Religion πŸ™
- Geographic Distance πŸ“  
- Education πŸŽ“  
- Income πŸ’°  
- Social circles πŸ§‘β€πŸ€β€πŸ§‘  
- Personality 🧠  
- Values πŸ’­  
- Chemistry ✨  
- Random chance 🎲  
  • My main research questions:
    • How do ethnic boundaries shape partner choice in Sweden over time?
    • How does residential segregation mediate patterns of ethnic endogamy?
    • Does ethnic assortative mating reflect changing preferences versus structural shifts in the partner market?

Swedish Register Data

  • All new marriages and cohabitations with common children, 1991–2022
  • Infer the single population 1991–2022
  • Independent variables:
    • Ancestry & generational status from parental country of birth
    • Education and age
    • Residential location at 100Γ—100 m grids (to calculate distances)
  • Both who partners with whom and structure of the partner market

Conditional logit with sampled alternatives

  • For each ego: data on actual partner and 100 sampled alternatives

  • Sample if from random opposite-sex singles in \(t βˆ’ 1\) (year before union start)

  • Predictors are dyadic, i.e. differences in age, education, ancestry, residential location

  • I model the probability of choosing the actual partner over the other possible partners in the choice set

  ego alter actual x1 x2   x3
1   1     A      1  1  3  low
2   1     B      0  0  5 high
3   1     C      0  0  8 diff
4   2     D      1  0  4 diff
5   2     E      0  1  4  low
6   2     F      0  1  2  low

Example table

Example Figure