A note on did2s syntax
Being the author of the did2s
package, I get a lot of emails from folks asking how to properly specify a first_stage
and second_stage
argument.
A typical usage will look like:
library(did2s)did2s( data = df, yname = "y", first_stage = ~ 0 | unit + year, second_stage = ~ i(post * treated), treatment = "treat", cluster_var = "state")
It is equivalent to the following:
library(fixest)fs <- feols( y ~ 0 | unit + year, data = subset(df, treat == 0))df$y_diff <- df$y - predict(fs, newdata = df)ss <- feols( y_diff ~ i(post * treated), data = df)
Note the first-stage is run only using observations where the treatment
variable takes the value 0.
These should be the untreated and not-yet-treated observations ().
The first_stage
argument is therefore your model for untreated potential outcomes, .
So, this could include things like state-specific linear trends, time-invariant covariates interacted with time-dummies, etc.
But, it should not include any treatment variables (these are untreated ).
Next, you use your estimated model (predicting ) to forecast for the entire sample including post-treatment observations with .
The second_stage
is what you regress on.
If you observed this , this variable would take the value 0 for all observations and would take the value for all observations.
Therefore, the second_stage
is your model for how you want to summarize (estimated) unit-time treatment effects.
This could be things like a treat x post variable, event-study indicators, or an interaction between a treat x post and some discrete variable (like gender).
Of course, when you take these averages you want there to be a large number of observations (so a central limit theorem can kick in).
In summary:
first_stage
tells you the model for you estimate with thetreatment
= 0 group.second_stage
tells you how you want to summarize estimated treatment effects