A note on did2s syntax

Kyle Butts |

Being the author of the did2s package, I get a lot of emails from folks asking how to properly specify a first_stage and second_stage argument. A typical usage will look like:

library(did2s)
did2s(
data = df,
yname = "y",
first_stage = ~ 0 | unit + year,
second_stage = ~ i(post * treated),
treatment = "treat",
cluster_var = "state"
)

It is equivalent to the following:

library(fixest)
fs <- feols(
y ~ 0 | unit + year,
data = subset(df, treat == 0)
)
df$y_diff <- df$y - predict(fs, newdata = df)
ss <- feols(
y_diff ~ i(post * treated),
data = df
)

Note the first-stage is run only using observations where the treatment variable takes the value 0. These should be the untreated and not-yet-treated observations (dit=0d_{it} = 0). The first_stage argument is therefore your model for untreated potential outcomes, Yit(0)Y_{it}(0). So, this could include things like state-specific linear trends, time-invariant covariates interacted with time-dummies, etc. But, it should not include any treatment variables (these are untreated YY).

Next, you use your estimated model (predicting Yit(0)Y_{it}(0)) to forecast for the entire sample including post-treatment observations with dit=1d_{it} = 1.

The second_stage is what you regress YitY^it(0)Y_{it} - \hat{Y}_{it}(0) on. If you observed this Yit(0)Y_{it}(0), this variable would take the value 0 for all dit=0d_{it} = 0 observations and would take the value τit\tau_{it} for all dit=1d_{it} = 1 observations. Therefore, the second_stage is your model for how you want to summarize (estimated) unit-time treatment effects. This could be things like a treat x post variable, event-study indicators, or an interaction between a treat x post and some discrete variable (like gender). Of course, when you take these averages you want there to be a large number of observations (so a central limit theorem can kick in).

In summary:

  1. first_stage tells you the model for Yit(0)Y_{it}(0) you estimate with the treatment = 0 group.
  2. second_stage tells you how you want to summarize estimated treatment effects