{did2s}: Two-Stage Difference-in-Differences

Kyle Butts and John Gardner

R Journal, 2022

We present statistical software to implement imputation estimator for difference-in-differences.


Recent work has highlighted the difficulties of estimating difference-in-differences models when treatment timing occurs at different times for different units. This article introduces the R package {did2s} which implements the estimator introduced in Gardner (2021). The article provides an approachable review of the underlying econometric theory and introduces the syntax for the function did2s. Further, the package introduces a function, event_study, that provides a common syntax for all the modern event-study estimators and plot_event_study to plot the results of each estimator.

5-Minute Summary

The standard difference-in-differences estimator is modeled using the Two-way Fixed Effects Model. Unit ii and time tt has potential outcome:

yit=μi+ηt+τDit+εit,y_{it} = \mu_i + \eta_t + \tau D_{it} + \varepsilon_{it},

where DitD_{it} is an idicator variable that equals one when unit ii is currently experiencing treatment at time tt. Researchers aim to estimate the effects of treatment and summarize it as the Average Treatment Effect on the Treatment, τ\tau.

If the two-way fixed effects model is correctly specified and the parallel trends assumption is satisfied, then OLS is fine and even BLUE! (woohoo 🎉🎉) So what’s with all these new Diff-in-Diff doesn’t work well problems?

Problem 1: τ\tau is not constant

The first problem is that τ\tau is not typically constant across units and over time:

  • Treatment effects may depend on when you start treatment. For example, groups that benefit more from a policy implement it earlier

  • Treatment effects may depend on treatment duration like an event-study. For example, policy effects slowly take hold as the policy is adopted broadly.

Our TWFE model clearly needs to be enriched:

yit=μi+ηt+τgtDit+εit,y_{it} = \mu_i + \eta_t + \tau_{gt} D_{it} + \varepsilon_{it},

where now we have what the literature calls the Group-Time Average Treatment Effect, τgt\tau_{gt}. This captures the two forms of treatment effect heterogeneity described above. The treatment effect size to depend on when you are treated, group gg, and how many periods it has been since treatment determined by tgt − g. It will prove really hard to accurately estimate any particular τgt\tau_{gt}, so researchers will try to estimate the Overall Average Treatment Effect:

τ1Ngtτgt,\tau \equiv \frac{1}{N_{gt}}\sum \tau_{gt},

the average of τgt\tau_{gt} across all (g,t)(g,t) observed in the sample.

Estimating Overall Average Treatment Effect

If we knew μi\mu_i and ηt\eta_t, then we could move terms around in our model and have:

yitμiηt=τgtDit+εit.y_{it} - \mu_i - \eta_t = \tau_{gt} D_{it} + \varepsilon_{it}.

Then, if we ran this regression of an outcome variable (yitμiηty_{it} - \mu_i - \eta_t) on an indicator variable DitD_{it}, then OLS algebra says that τ^\hat{\tau} will take the average of τgt\tau_{gt} for us and estimate the overall average treatment effect τ\tau.

Too bad we don’t know μi\mu_i and ηt\eta_t… Since we don’t know them, why don’t we instead estimate them? From the Frisch-Waugh-Lovell (FWL) theorem, we have:

yitμ^iη^t=τgtD~it+εit,y_{it} - \hat{\mu}_i - \hat{\eta}_t = \tau_{gt} \tilde{D}_{it} + \varepsilon_{it},

where everything looks about the same, but now μ^i\hat{\mu}_i and η^t\hat{\eta}_t are estimated and D~it\tilde{D}_{it} is the treatment indicator after being residualized by unit and time fixed effects. The left-hand side is a good estimate for τgt\tau_{gt}, so what’s the problem then? This residualized treatment variable is actually the source of all the problems you’ve heard about. No really, go look at the proofs in those papers, you’ll see D~it\tilde{D}_{it} right away.

The problem is that since D~it\tilde{D}_{it} is no longer a simple dummy variable, OLS no longer estimates the simple average of τgt\tau_{gt}.

Instead, OLS estimates

τ^=wgtτgtτ,\hat{\tau} = \sum w_{gt} \tau_{gt} \neq \tau,

where the weights wgtw_{gt} are quite strange, but sum to 1. I’ll leave it to the paper to describe the weights, but no this: In some cases, τ^\hat{\tau} can be the opposite sign of the overall average treatment effect, τ\tau 🚩🚩!

Okay, so why don’t you just regress yitμ^iη^ty_{it} - \hat{\mu}_i - \hat{\eta}_t on DitD_{it}? Great idea! You just thought of the two-stage difference-in-differences estimator (and Borusyak et. al.’s estimator too)

Two-stage Difference-in-Differences estimator

The above intution (hopefully) has lead us directly to the two-stage differences estimator as proposed by Gardner (2021). The two stages are as follows:

  • Stage 1: Estimate μi\mu_i and ηt\eta_t using untreated/not-yet-treated observations (Dit=0D_{it} = 0). Don’t include Dit=1D_{it} = 1 since the treatment effects will be partially absorbed by unit and time fixed effects.

  • Stage 2: Regress yitμ^iη^ty_{it} - \hat{\mu}_i - \hat{\eta}_t on DitD_{it}, not Dit~\tilde{D_{it}}!.

Inference is a bit more complicated in this two-step estimator since your “outcome variable” is a noisy estimated object. Valid inference needs to account for this first-stage estimation. The details are in the paper and done properly in the {did2s} package.


The paper then goes on to show off the {did2s} package and provide some practical advice when implementing the estimator. For example, we will show how {did2s} works for a dataset of simulated data with a treatment effect of 2.


static = did2s(
  yname = "dep_var", treatment = "treat", cluster_var = "state",
  first_stage = ~ 0 | unit + year,
  second_stage = ~ i(treat, ref=FALSE), 
  verbose = TRUE
Running Two-stage Difference-in-Differences
• first stage formula `~ 0 | unit + year`
• second stage formula `~ i(treat, ref = FALSE)`
• The indicator variable that denotes when treatment is on is `treat`
• Standard errors will be clustered by `state`

Since the package returns a fixest object, you can use the suite of exporting tools that fixest provides, like esttable and iplot/coefplot. This makes, for example, exporting latex tables and event-study plots super simple.

Dependent Var.:           dep_var
treat = TRUE    2.025*** (0.0307)
_______________ _________________
S.E. type                  Custom
Observations               31,000
R2                        0.31846
Adj. R2                   0.31846

Alternatively, if we use relative year indicator variables in the second-stage, we estimate event-study type coefficients.

dynamic = did2s(df_hom,
    yname = "dep_var", treatment = "treat", cluster_var = "state",
    first_stage = ~ 0 | unit + year,
    second_stage = ~ i(rel_year, ref=c(-1, Inf)),
    verbose = FALSE

Then, plutting an event study is super simple with fixest::coefplot.

# plot rel_year coefficients and standard errors
Two-stage Difference-in-Differences
Figure:Two-stage Difference-in-Differences