Last week we learned to:
🧠 But note: Our models so far assumed that studying one more hour helps all students equally.
➡️ This week, we’ll challenge that idea.
What if the effect of studying depends on who you are
or on how much you already study?
Again, I want you to think about the association between hours studied and exam results for SDS I.
Now imagine two groups:
📝 Task:
What differences did you draw?
➡️ This idea — that a relationship differs across groups —
is the foundation of interaction effects.
Sometimes the effect of one variable depends on the level of another.
💡 This is called an interaction or moderation.
Example:
Does studying help everyone equally?
Or do some things affect the usefulness of one hour of studying?
Much of social science asks whether an association is stronger or weaker for different groups:
➡️ These are all interaction questions.
If two students study the same amount, but one was out partying the night before the exam.
➡️ If there is an interaction, the slopes should be different for those how partied and those who did not
We could simply fit separate regressions by stratifying the data:
💡 Solution: use a single regression with an interaction term.
\[ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 (x_1 \times x_2) \]
Let’s include an interaction between hours_studied and previous_gpa:
mod_interaction <- lm(exam_score ~ hours_studied * previous_gpa, data = data)
summary(mod_interaction)
Call:
lm(formula = exam_score ~ hours_studied * previous_gpa, data = data)
Residuals:
Min 1Q Median 3Q Max
-27.956 -5.675 0.125 5.385 26.147
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 67.848588 6.747162 10.056 < 2e-16 ***
hours_studied 0.417420 0.151585 2.754 0.00613 **
previous_gpa 0.525405 0.092194 5.699 2.18e-08 ***
hours_studied:previous_gpa -0.002731 0.002018 -1.353 0.17661
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.227 on 452 degrees of freedom
Multiple R-squared: 0.4508, Adjusted R-squared: 0.4472
F-statistic: 123.7 on 3 and 452 DF, p-value: < 2.2e-16
\[ \widehat{\text{Exam Score}} = \beta_0 + \beta_1 \times \text{Hours} + \beta_2 \times \text{GPA} + \beta_3 \times (\text{Hours} \times \text{GPA}) \]
🧠 In our data, with a negative interaction coefficient, this means: The higher their GPA, the less studying adds to exam score.
\[ \widehat{\text{Exam Score}} = 67.85 + 0.42 \times \text{Hours} + 0.53 \times \text{GPA} + (-0.0027) \times (\text{Hours} \times \text{GPA}) \]
We don’t need the full equation — just the parts that change when we increase study time by 1 hour:
\[ \widehat{\text{Exam Score}} = 0.42 \cdot 1 + (-0.0027)\times (1 \times \text{GPA}) \]
At different values of GPA:
GPA Calculation Effect_of_Study
1 40 0.42 + (-0.0027 × 40) = 0.31 0.31
2 60 0.42 + (-0.0027 × 60) = 0.25 0.25
3 80 0.42 + (-0.0027 × 80) = 0.2 0.20
We’ve calculated the marginal effect at a few GPA levels. Now let’s compute and plot it across all values using the marginaleffects package.

Using the marginaleffects package.
In regression, the marginal effect of a variable tells us:
How much the outcome (\(y\)) changes when one predictor (\(x\)) increases by one unit, holding all other variables constant.
In a simple OLS model, the marginal effect is just the coefficient (\(\beta_1\)).
In more complex models, e.g. interactions, the marginal effect is no longer constant, we have to calculate it for different values of other variables.
👩🏽🔧 Slightly more technical explanation:
The marginal effect is the instantaneous slope or the partial derivative of the regression surface

An interaction can also involve a categorical variable.
Does the effect of studying differ by gender?
Now we’re asking if the slope for hours studied is different for males and females.
mod_gender_interact <- lm(exam_score ~ hours_studied * gender, data = data)
summary(mod_gender_interact)
Call:
lm(formula = exam_score ~ hours_studied * gender, data = data)
Residuals:
Min 1Q Median 3Q Max
-29.4718 -6.7860 0.2575 6.8464 30.9571
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 98.47665 2.37389 41.483 < 2e-16 ***
hours_studied 0.38059 0.05322 7.152 3.47e-12 ***
genderFemale 5.66051 3.73230 1.517 0.130
hours_studied:genderFemale -0.10493 0.08062 -1.302 0.194
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 10.27 on 452 degrees of freedom
Multiple R-squared: 0.1448, Adjusted R-squared: 0.1391
F-statistic: 25.51 on 3 and 452 DF, p-value: 2.914e-15
✅ This fits a different slope for males and females
✅ Also adjusts for any overall mean difference
The fitted model becomes:
\[ y = \beta_0 + \beta_1 \times \text{Hours} + \beta_2 \times \text{Female} + \beta_3 \times (\text{Hours} \times \text{Female}) \]
For males (gender = 0):
➡️ Effect of studying is \(\beta_1\)
For females (gender = 1):
➡️ Effect of studying is \(\beta_1 + \beta_3\)
🧠 Interactions change the slope — not just the intercept
gender Estimate Std. Error z Pr(>|z|) S 2.5 % 97.5 %
Male 0.381 0.0532 7.15 <0.001 40.1 0.276 0.485
Female 0.276 0.0605 4.55 <0.001 17.5 0.157 0.394
Term: hours_studied
Type: response
Comparison: dY/dX
Interactions (moderators) reveal effect heterogeneity:
The relationship between a predictor and the outcome varies across levels of another variable.
✅ Visual clue: Non-parallel lines in scatterplots or slope plots
✅ Statistical clue: Significant interaction term in the model
🧮 Numeric × Numeric → slope changes continuously
👥 Categorical × Numeric → slope changes by group
👥 Categorical × Categorical → group differences depend on each other
Not all relationships are straight lines.
💡 In regression, we can capture this by including transformed variables,
like squared terms, logs, or polynomials.
To model this curve, we can include \(x^2\):
\[ \widehat{y} = \beta_0 + \beta_1 x + \beta_2 x^2 \]
Call:
lm(formula = score ~ hours + I(hours^2), data = d)
Residuals:
Min 1Q Median 3Q Max
-24.4906 -8.3426 -0.4449 8.8298 30.1122
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 65.610923 3.951576 16.604 < 2e-16 ***
hours 1.211540 0.228325 5.306 1.02e-06 ***
I(hours^2) -0.007482 0.002761 -2.709 0.00828 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.15 on 78 degrees of freedom
Multiple R-squared: 0.6062, Adjusted R-squared: 0.5961
F-statistic: 60.04 on 2 and 78 DF, p-value: < 2.2e-16
| Transformation | Example | Typical pattern captured | Interpretation |
|---|---|---|---|
| Quadratic | \(x^2\) | Curvature (U-shape or inverted-U) | Effect grows or shrinks with \(x\); marginal effect changes linearly |
| Logarithm of x | \(log(x)\) | Diminishing returns | A % change in \(x\) gives a constant change in \(y\) |
| Logarithm of y | \(log(y)\) | Multiplicative effects / skewed outcomes | Coefficients become semi-elasticities: 1-unit change in \(x\) → %Δ in \(y\) |
Transformations reshape the functional form
Call:
lm(formula = exam_score ~ hours_studied + previous_gpa, data = data)
Residuals:
Min 1Q Median 3Q Max
-27.6294 -5.5828 0.2299 5.3861 26.1088
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 76.52508 2.10517 36.35 < 2e-16 ***
hours_studied 0.21709 0.03269 6.64 8.98e-11 ***
previous_gpa 0.40547 0.02545 15.93 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.234 on 453 degrees of freedom
Multiple R-squared: 0.4486, Adjusted R-squared: 0.4462
F-statistic: 184.3 on 2 and 453 DF, p-value: < 2.2e-16
Note the coefficient for hours_studied and the \(R^2\)
Call:
lm(formula = exam_score ~ workdays_studied + previous_gpa, data = data)
Residuals:
Min 1Q Median 3Q Max
-27.6294 -5.5828 0.2299 5.3861 26.1088
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 76.52508 2.10517 36.35 < 2e-16 ***
workdays_studied 1.73669 0.26154 6.64 8.98e-11 ***
previous_gpa 0.40547 0.02545 15.93 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.234 on 453 degrees of freedom
Multiple R-squared: 0.4486, Adjusted R-squared: 0.4462
F-statistic: 184.3 on 2 and 453 DF, p-value: < 2.2e-16
Note the coefficient for workdays_studied and the \(R^2\)
Call:
lm(formula = exam_score ~ workweeks_studied + previous_gpa, data = data)
Residuals:
Min 1Q Median 3Q Max
-27.6294 -5.5828 0.2299 5.3861 26.1088
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 76.52508 2.10517 36.35 < 2e-16 ***
workweeks_studied 8.68346 1.30769 6.64 8.98e-11 ***
previous_gpa 0.40547 0.02545 15.93 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.234 on 453 degrees of freedom
Multiple R-squared: 0.4486, Adjusted R-squared: 0.4462
F-statistic: 184.3 on 2 and 453 DF, p-value: < 2.2e-16
This is why this is crucial:
“…associated with a one-unit increase in (\(X_j\))…”
Centering is a simple transformation that helps with:
We “center” a variable by subtracting its mean:
Call:
lm(formula = exam_score ~ hours_studied + previous_gpa, data = data)
Residuals:
Min 1Q Median 3Q Max
-27.6294 -5.5828 0.2299 5.3861 26.1088
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 76.52508 2.10517 36.35 < 2e-16 ***
hours_studied 0.21709 0.03269 6.64 8.98e-11 ***
previous_gpa 0.40547 0.02545 15.93 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.234 on 453 degrees of freedom
Multiple R-squared: 0.4486, Adjusted R-squared: 0.4462
F-statistic: 184.3 on 2 and 453 DF, p-value: < 2.2e-16
Call:
lm(formula = exam_score ~ hours_centered + previous_gpa, data = data)
Residuals:
Min 1Q Median 3Q Max
-27.6294 -5.5828 0.2299 5.3861 26.1088
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 86.16382 1.89758 45.41 < 2e-16 ***
hours_centered 0.21709 0.03269 6.64 8.98e-11 ***
previous_gpa 0.40547 0.02545 15.93 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.234 on 453 degrees of freedom
Multiple R-squared: 0.4486, Adjusted R-squared: 0.4462
F-statistic: 184.3 on 2 and 453 DF, p-value: < 2.2e-16
When you include an interaction term:
\[ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 (X_1 \times X_2) \]
centering prevents the main effects (\(\beta_1, \beta_2\)) from being interpreted at unrealistic values (e.g., when both predictors = 0).
✅ Often good: Center continuous variables before testing interactions.
It does not change model fit or significance — only makes interpretation clearer and more stable.