Week 6: Discrete Choice Modelling

Multinomial & Conditional Logit

Jesper Lindmarker

How difficult was last week?

Week 6: Discrete Choice Modelling - Multinomial & Conditional Logit

Week 5 recap: LPM and Logistic regression 🔙

Binary outcomes shift us from modelling levels → modelling probabilities
Probabilities of a binary outcome follow an S-curve
LPM: simple, intuitive, but can give impossible predictions and wrong functional form
Logistic regression: transforms probabilities → odds → log-odds so we can fit a straight line
Log-odds coefficients: additive effects; odds ratios: multiplicative effects
Marginal effects vary across X because the S-curve is nonlinear
AMEs summarize these varying effects on the probability scale
Model goodness of fit uses likelihood, pseudo-R², calibration, and classification (not residuals)

✨ Big idea: binary Y requires probability models, and every modelling choice brings trade-offs.

Today’s topic

MY RESEARCH ON PARTNER CHOICE! 🧑‍❤️‍💋‍🧑

Maybe later. If we have time at the end.
But first, a conceptual question:
- Is partner-choice a choice?

🚀 Our Journey So Far

\(y = f(x)\)

OLS: continuous \(y\) (e.g., income, GPA)
Logistic regression: binary \(y\) (yes/no, 1/0, pass/fail)
Each step: adapt regression to the form of the \(y\)

👉 Today: keep expanding the y side of the equation. What do we do if there are more than two choices?

Statistical modelling of choices 👉👜

What do we mean by choices? 🤔

A choice = a situation where someone selects one or several option from a choice set.

Choices Are Everywhere (when you start to look for them) 🌍

In small, daily actions:

- Coffee / Tea / Energy drink ☕🫖⚡  
- Bike / Bus / Walk / Car  
- Netflix / HBO / Disney+ 🎬  
- Cook / Takeout / Restaurant 🍝

And in big life decisions:

- Which person to partner with ❤️  
- Which career path to take 💼  
- Which political party to vote for 🗳️  
- Which neighbourhood to live in 🏙️  
- Which transport modes to use 🚲🚌🚗  
- Field of study 🎓  
- Whether to cohabit or marry  
- Where to migrate ✈️

Do you agree that all of these are choices?
What do they depend on?

What do we mean by choices? 🤔

A choice = a situation where someone selects one or several option from a choice set.
A situation where the process can reasonably be thought of as a choice. i.e. choice is a good approximation of the behaviour.

What Drives Choices? 🎯

Every choice has influences:

Individual traits
Context and opportunities
Features of the alternatives
Social norms and boundaries
Trade-offs (time, cost, distance, similarity)

Example: Political Science 👩‍⚖️

Outcome: Which party do you vote for?
Predictors:
- Age, gender, education (socio-demographics)
- Political interest, ideology (rational/attitudinal)
- Region, social circle norms (social influence)

More Examples

Transport mode: walk, bike, bus, car, train 🚎
- Distance (rational cost/time)
- Health attitudes (behavioural)
- Peer norms (social influence)
Job choice: public, private, nonprofit 💰
- Salary (rational)
- Parents’ occupation (social reproduction)
- Altruism (values)
Eurovision jury votes 🎶
- Alternatives: songs
- Song quality, genre, performance running order (features)
- Neighbour/diaspora ties (social mechanisms)

The Core Intuition ✨

A choice situation reveals:

something about the chooser,
something about the options,
something about the match between them.

And perhaps, something about the underlying process

New models ⚡️

We will extend the toolkit with two models:
- Multinomial logit, Conditional logit
General advice:
- The key to statistical modelling is not memorizing models.
- It is thinking carefully about the outcome you are studying and what data you have
- 👉 Once you understand that, you can find models to suit your needs.

Terminology Alert

“Multinomial logit” can sometimes (in economics) refer to “conditional logit”

Multinomial Regression

Classic in social science applications
Very much an extension of logistic regression, but with more than 2 outcomes
Choice situation:
- Observations = choosers
- Predictors = characteristics of the chooser
- Outcome = one category among several (K > 2)

Example:

- Outcome: Which party do you vote for?  
- Predictors:
  - Age, gender, education (socio-demographics)  
  - Political interest, ideology (rational/attitudinal)  
  - Region, social circle norms (social influence)

Data setup - Multinomial regression

head(as.data.frame(sim_df), n = 15)

   ego_id age unemployed mode
1       1  33          0 walk
2       2  59          0  car
3       3  39          0 walk
4       4  64          0 walk
5       5  67          0 walk
6       6  20          0 walk
7       7  45          0  car
8       8  64          0  car
9       9  47          1 walk
10     10  42          0 walk
11     11  68          0  car
12     12  42          0 walk
13     13  53          1 walk
14     14  48          0 walk
15     15  23          1 walk

15 observations, distributed over 15 rows in the data
Each row = chosen alternative for one chooser
ego_id = chooser ID
mode = alternative (walk, bus, car)
Predictors vary for each ego: age, unemployed

All choices must sum to 1 📊

In every choice situation, the set of alternatives must be exhaustive — every possible option must be represented.
An individual must choose at least one option
- (or you include “non-choice” as an option)
In other words: The set of probabilities across all options must sum to 1

If we model daily transport choices using only car, bike, and train, we should include a stay-home option if that was an available choice.

From two to more outcomes

With two outcomes, logistic regression ensures: \[P(Y=1) + P(Y=0) = 1\]
With more than two outcomes, we need a generalization where: \[P(Y=1) + P(Y=2) + \dots + P(Y=K) = 1\]

Probabilities sum to 1

Multinomial Regression

For each category \(k \neq r\):

\[ \log\!\left(\frac{P(y_i = k)}{P(y_i = r)}\right) = \beta_{0k} + \beta_{1k} X_{i1} + \beta_{2k} X_{i2} + \dots + \beta_{pk} X_{ip} \]

Once again we have this linear function! How neat!
How does this compare to logistic regression?
Log odds of two choice alternatives (Repeated once for each category vs the reference)
Then they are estimated together
\(k \in \{1, \dots, K\}\) indexes outcome categories
One reference category \(r\) is chosen, and we set \(\beta_r = \mathbf{0}\) for identification

Example: Choosing mode of transportation

Outcomes: Walk \((r)\), Bus \((b)\), Car \((c)\)
Reference: Walk, so \(\beta_r = 0\)
Predictor: Age

Log-odds equations:

\[ \log\!\left(\frac{P(\text{Bus})}{P(\text{Walk})}\right) = \text{Age}_i \beta_b \]

\[ \log\!\left(\frac{P(\text{Car})}{P(\text{Walk})}\right) = \text{Age}_i \beta_c \]

Let’s interpret \(\beta_b\)
\(\beta_{b}\) = change in log-odds of taking the bus vs walking for a 1-unit increase in \(\text{Age}\)

Interpretation

library(nnet) # One possible package for multinomial regression
choice_mode_wide$actual_mode <- relevel(choice_mode_wide$actual_mode, "walk") # Set walk as reference category
m1 <- multinom(actual_mode ~ age + unemployed, data = choice_mode_wide)

Multinomial regression predicting mode of transport
		Est.	S.E.
	response	Model 1	Model 1
(Intercept)	bus	-7.153***	1.056
	car	-5.938***	1.036
age	bus	0.295***	0.042
	car	0.287***	0.041
unemployed	bus	-2.286***	0.519
	car	-3.734***	0.523
Num.Obs.		500
AIC		716.4
BIC		741.7

Intercepts

Multinomial regression predicting mode of transport
		Est.	S.E.
	response	Model 1	Model 1
(Intercept)	bus	-7.153***	1.056
	car	-5.938***	1.036
age	bus	0.295***	0.042
	car	0.287***	0.041
unemployed	bus	-2.286***	0.519
	car	-3.734***	0.523
Num.Obs.		500
AIC		716.4
BIC		741.7

Intercept:
- Value of linear predictor when all other terms are = \(0\)
Car Intercept = -5.94
- at age = 0 and employed, baseline log-odds are negative, indicating that taking the car is less likely than walking.
Bus Intercept = -7.15
- at age = 0 and employed, log-odds are negative, indicating that taking the bus is less likely than walking.
Age = 0 is outside realistic data. Intercepts mainly anchor the equation.

Age effects

Multinomial regression predicting mode of transport
		Est.	S.E.
	response	Model 1	Model 1
(Intercept)	bus	-7.153***	1.056
	car	-5.938***	1.036
age	bus	0.295***	0.042
	car	0.287***	0.041
unemployed	bus	-2.286***	0.519
	car	-3.734***	0.523
Num.Obs.		500
AIC		716.4
BIC		741.7

Car estimate: 0.287
- Each additional year of age increases the log-odds of Car vs Walking
- Odds ratio = \(\exp(\beta)\) = 1.33
  → +33% odds per year of age.
Bus estimate: 0.295
- Each additional year increases the log-odds of Bus vs Walking
- Odds ratio = \(\exp(\beta)\) = 1.34
  → + 34% odds per year of age.

Unemployment effects

Multinomial regression predicting mode of transport
		Est.	S.E.
	response	Model 1	Model 1
(Intercept)	bus	-7.153***	1.056
	car	-5.938***	1.036
age	bus	0.295***	0.042
	car	0.287***	0.041
unemployed	bus	-2.286***	0.519
	car	-3.734***	0.523
Num.Obs.		500
AIC		716.4
BIC		741.7

Car vs Walk:
- Being unemployed decrease the log-odds by -3.73.
- Odds ratio = \(\exp(\beta)\) = 0.02
  → -98% change in odds.
Bus vs Walk:
- Being unemployed decreases log-odds by -2.29.
- Odds ratio = \(\exp(\beta)\) = 0.1
  → -90% change in odds.

Predicted probabilities

You could calculate marginal effects in probability space for one additional year of age.
But, same problem as always: How do we represent our results as truthfully as possible. Additive log-odds, multiplicative odds or non-linear probabilites?

Conditional logit 🧨

Transition – From Multinomial to Conditional Logit

Multinomial regression
- Predictors = characteristics of the chooser (ego)
- Outcome = category chosen
But what if predictors vary by alternative?
- Bus time ≠ Car time ≠ Walk time
- Ticket cost ≠ Gas cost ≠ Free walking
👉 This is where Conditional Logit comes in.

Data Structure: Long format

   ego_id mode chosen time  cost age unemployed actual_mode
1       1 walk      0  140  0.00  33          0         bus
2       1  bus      1   26  2.33  33          0         bus
3       1  car      0   27  3.65  33          0         bus
4       2 walk      0  358  0.00  59          0         bus
5       2  bus      1   63  3.79  59          0         bus
6       2  car      0   74 14.31  59          0         bus
7       3 walk      0  193  0.00  39          0         car
8       3  bus      0   35  2.69  39          0         car
9       3  car      1   29  5.72  39          0         car
10      4 walk      0  399  0.00  64          0         car
11      4  bus      0   71  4.06  64          0         car
12      4  car      1   53 14.71  64          0         car

4 observations, distributed over 12 rows in the data
Each row = one alternative for one chooser
ego_id = chooser ID
mode = alternative (walk, bus, car)
chosen = 1 if chosen, 0 otherwise
Predictors vary by alternative: time, cost
Ego-specific predictors are constant within choice sets: age, unemployed

Conditional Logit Probability/Log-odds/Odds equations

Probability of individual \(i\) choosing alternative \(j\):

\[P_{ij} = \frac{\exp(X_{ij}\beta)}{\sum_{k\in C_i} \exp(X_{ik}\beta)}\]

Log odds:

\[\log\left(\frac{P_{ij}}{P_{ik}}\right) = (X_{ij} - X_{ik})\beta\]

Ratio of choice probabilities between two otherwise identical alternatives in the choice set:

\[\exp(\beta) = \frac{P_{ij}}{P_{ik}} \text{ when } X_{ij} - X_{ik} = 1\]

Fit the Conditional Logit

library(survival) #Package for clogit
m2 <- clogit(chosen ~ mode + time + cost + strata(ego_id), data = sim_df2) # `strata()` uses an indicator for the choiceset
summary(m2)

Call:
coxph(formula = Surv(rep(1, 1500L), chosen) ~ mode + time + cost + 
    strata(ego_id), data = sim_df2, method = "exact")

  n= 1500, number of events= 500 

           coef exp(coef) se(coef)     z Pr(>|z|)    
modebus -1.5811    0.2057   0.2833 -5.58  2.4e-08 ***
modecar -0.1499    0.8608   0.3377 -0.44     0.66    
time    -0.0289    0.9716   0.0032 -9.02  < 2e-16 ***
cost    -0.2130    0.8082   0.0244 -8.73  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

        exp(coef) exp(-coef) lower .95 upper .95
modebus     0.206       4.86     0.118     0.358
modecar     0.861       1.16     0.444     1.669
time        0.972       1.03     0.965     0.978
cost        0.808       1.24     0.770     0.848

Concordance= 0.783  (se = 0.018 )
Likelihood ratio test= 376  on 4 df,   p=<2e-16
Wald test            = 152  on 4 df,   p=<2e-16
Score (logrank) test = 275  on 4 df,   p=<2e-16

Coefficients: effect of time, cost and mode on choice probabilities between alternatives
No intercept, all coefficients are comparisons within choiceset.

🎯 Interpretation of Coefficients

Log-odds interpretation:

A one-unit increase in attribute \(X\) increases the log-odds of choosing an alternative over an otherwise identical alternative by \(\beta\).

Probability ratio interpretation:

A one-unit increase in attribute \(X\) multiplies the relative probability of choosing an alternative over an otherwise identical alternative by \(\exp(\beta)\).

Interpretation

modelsummary(m2)

	(1)
modebus	-1.581
	(0.283)
modecar	-0.150
	(0.338)
time	-0.029
	(0.003)
cost	-0.213
	(0.024)
Num.Obs.	1500
AIC	730.8
BIC	752.0
RMSE	0.39

modelsummary(m2, exponentiate = TRUE)

	(1)
modebus	0.206
	(0.058)
modecar	0.861
	(0.291)
time	0.972
	(0.003)
cost	0.808
	(0.020)
Num.Obs.	1500
AIC	730.8
BIC	752.0
RMSE	0.39

Predicted probabilities

   ego_id mode chosen time  cost age unemployed actual_mode  prob
1       1 walk      0  140  0.00  33          0         bus 0.068
2       1  bus      1   26  2.33  33          0         bus 0.229
3       1  car      0   27  3.65  33          0         bus 0.703
4       2 walk      0  358  0.00  59          0         bus 0.002
5       2  bus      1   63  3.79  59          0         bus 0.754
6       2  car      0   74 14.31  59          0         bus 0.244
7       3 walk      0  193  0.00  39          0         car 0.024
8       3  bus      0   35  2.69  39          0         car 0.270
9       3  car      1   29  5.72  39          0         car 0.705
10      4 walk      0  399  0.00  64          0         car 0.001
11      4  bus      0   71  4.06  64          0         car 0.578
12      4  car      1   53 14.71  64          0         car 0.421

As with previous models, we can use predict() to calculate predicted probabilities
- Within the choice set
And from this we could calculate error matrices etc.

But, what about characteristics of the decision-maker?

What about the effect of age or unemployment?
In conditional logit, comparisons are within a choice set (strata = ego)
Identification uses within-set variation only
Ego covariates (e.g., age, unemployed) are constant across alternatives → no within-set variation
Decision maker characteristics can enter through interactions.
Idea: “Age effect on Bus vs Walk,” “Age effect on sensitivity to cost” etc.

  ego_id mode chosen time  cost age unemployed actual_mode  prob
1      1 walk      0  140  0.00  33          0         bus 0.068
2      1  bus      1   26  2.33  33          0         bus 0.229
3      1  car      0   27  3.65  33          0         bus 0.703
4      2 walk      0  358  0.00  59          0         bus 0.002
5      2  bus      1   63  3.79  59          0         bus 0.754
6      2  car      0   74 14.31  59          0         bus 0.244

Main assumption: Independence of Irrelevant Alternatives (IIA) ☢️

IIA says the relative odds of choosing A over B does not depend on any other option C.
Any other choice option is irrelevant for A and B.
Adding or removing other options does not change \(\frac{P(A)}{P(B)}\)
Assumes options have no unobserved similarity

When IIA Makes Sense 👍

IIA is reasonable when alternatives are clearly distinct.

Bus vs Car
Add Bike → a very different option
The Bus/Car comparison stays the same

When IIA Breaks Down 🚍🚍

Red Bus / Blue Bus paradox:

Choice: Car (50%) vs Blue Bus (50%)
Add “Red Bus,” identical to Blue Bus
Model predicts Car 33%, Blue Bus 33%, Red Bus 33%

since \(50/50 = 33/33\) in line with IIA
But intuition says Car should stay 50%, and Bus should split 25% / 25%.

Does Controlling for Similarity Fix IIA? 🧩

Yes, if the similarity is observable and included in the model.

Add a variable like is_bus
Or a bus-specific constant
Or features capturing shared attributes

Then the model “knows” the buses belong to the same group, and the IIA problem disappears.
IIA breaks only when the similarity is unobserved.

Summary: What We Learned Today 🧠

Choices are common in social science: any situation that can be framed as choosing from a choice set even if it’s not an explicit choice.
Multinomial logit models choices using predictors that vary across individuals
- Models how predictors shift the likelihood of ending up in one category rather than another.
Conditional logit models choices where attributes vary across alternatives
- It compares alternatives within each person’s choice set.
Data structure:
- Multinomial → one row per chooser
- Conditional → one row per chooser–alternative
Individual traits cannot be included directly in conditional logit without creating interactions, because they do not vary within a choice set.
IIA: the comparison between two options should not depend on other options in the set.
- This is reasonable when options are distinct, but fails when options share unobserved similarities.

See you at the lab 🔬

Using choice models to study romantic partnerships ❤️

❤️ Why study romantic partnerships?

Because partner choice reveals something deeper: How social boundaries work.

For example:
- In school friendship networks, we already see strong divides.
  - Gender clusters
  - Ethnicity clusters

People seem gravitate toward those who feel like “us” and away from those who feel like “them.”

Similarity in social ties is typically called homophily (similar → like).
When the similarity concerns partner choice, we use homogamy (similar → marry).

Enter partnering patterns 👩‍❤️‍💋‍👨

Romantic partnerships give us a direct window into social distance.

If two groups
- rarely partner,
- partner only under certain conditions,
- or partner strongly within-group,
that tells us something fundamental about how close or far their social boundaries are.
Assortative mating.

What predicts your partner?

- Age 🎂  
- Attractiveness 😍  
- Ethnicity 🌍  
- Language 🗣️
- Religion 🙏
- Geographic Distance 📍  
- Education 🎓  
- Income 💰  
- Social circles 🧑‍🤝‍🧑  
- Personality 🧠  
- Values 💭  
- Chemistry ✨  
- Random chance 🎲

My main research questions:
- How do ethnic boundaries shape partner choice in Sweden over time?
- How does residential segregation mediate patterns of ethnic endogamy?
- Does ethnic assortative mating reflect changing preferences versus structural shifts in the partner market?

Swedish Register Data

All new marriages and cohabitations with common children, 1991–2022
Infer the single population 1991–2022
Independent variables:
- Ancestry & generational status from parental country of birth
- Education and age
- Residential location at 100×100 m grids (to calculate distances)
Both who partners with whom and structure of the partner market

Conditional logit with sampled alternatives

For each ego: data on actual partner and 100 sampled alternatives
Sample if from random opposite-sex singles in \(t − 1\) (year before union start)
Predictors are dyadic, i.e. differences in age, education, ancestry, residential location
I model the probability of choosing the actual partner over the other possible partners in the choice set

  ego alter actual x1 x2   x3
1   1     A      1  1  3  low
2   1     B      0  0  5 high
3   1     C      0  0  8 diff
4   2     D      1  0  4 diff
5   2     E      0  1  4  low
6   2     F      0  1  2  low

Link to preprint

SocArXiv Preprint

(Regression tables are in appendix)

Week 6: Discrete Choice Modelling

How difficult was last week?

Week 6: Discrete Choice Modelling - Multinomial & Conditional Logit

Week 5 recap: LPM and Logistic regression 🔙

Today’s topic

🚀 Our Journey So Far

Statistical modelling of choices 👉👜

What do we mean by choices? 🤔

Choices Are Everywhere (when you start to look for them) 🌍

What do we mean by choices? 🤔

What Drives Choices? 🎯

Example: Political Science 👩‍⚖️

More Examples

The Core Intuition ✨

Why Choices Matter in Social Science 📊

New models ⚡️

Multinomial Regression

Data setup - Multinomial regression

All choices must sum to 1 📊

From two to more outcomes

Probabilities sum to 1

Multinomial Regression

Example: Choosing mode of transportation

Interpretation

Intercepts

Age effects

Unemployment effects

Predicted probabilities

Conditional logit 🧨

Transition – From Multinomial to Conditional Logit

Data Structure: Long format

Conditional Logit Probability/Log-odds/Odds equations

Fit the Conditional Logit

🎯 Interpretation of Coefficients

Interpretation

Predicted probabilities

But, what about characteristics of the decision-maker?

Main assumption: Independence of Irrelevant Alternatives (IIA) ☢️

When IIA Makes Sense 👍

When IIA Breaks Down 🚍🚍

Does Controlling for Similarity Fix IIA? 🧩

Summary: What We Learned Today 🧠

See you at the lab 🔬

Using choice models to study romantic partnerships ❤️

❤️ Why study romantic partnerships?

Social distance & boundaries ⟷

Enter partnering patterns 👩‍❤️‍💋‍👨

What predicts your partner?

Swedish Register Data

Conditional logit with sampled alternatives

Link to preprint

Example table

Example Figure