ContinuousRD

ContinuousRD is a package that implements regression discontinuity designs when the treatment variable is continuous following Dong, Lee, and Gou (2022)

Installation

You can install the development version of ContinuousRD from GitHub with:

r
# install.packages("devtools")
devtools::install_github("kylebutts/ContinuousRD")

Setting

I will provide a quick (and simplified) overview of the methods from Dong, Lee, and Gou (2022). For the full details, please see the original text. I’m purposefully leaving out details since hopefully you are here after reading the paper :-)

Let YY denote the outcome variable and TT a continuous variable of interest. There exists a running variable RR and a cutoff cc (which we’ll normalize to zero) that affects the treatment TT. The paper assumes that different quantiles of TT are differentially affected by the discontinuity.

For example, in the paper they discuss a minimum capital requirement for banks in cities with population over 3,000 in 1900. The running variable would be population and the treatment variable is banking capital. Clearly, only banks with counterfactual banking capital below the threshold are affected by the population cutoff. Therefore the jump in TT will be zero for most banks, but positive for banks at the lower end of the counterfactual distribution of TT.

For this reason, the estimator proposed in the paper wants to estimate the LATE at different quantiles of the distribution of TT, which they denote as the Quantile-LATE or QLATE for short. The method proceeds as follows.

For a given quantile of TT, denoted u(0,1)u \in (0,1), estimate the “first-stage” RD of TT. Define q+(u)q^+(u) be the uthu^{th} percentile of TT under treatment at R=0R = 0 and q(u)q^-(u) be the uthu^{th} percentile of TT under no treatment at R=0R = 0. The RD estimate of TT would therefore estimate This will estimate E[q+(u)q(u)U=u,R=0]\mathbb{E}[ q^+(u) - q^-(u) \vert U = u, R = 0]. Intuitively, this gives the jump in the continuous treatment for units at the uthu^{th} percentile of the distribution.

Then, the reduced form involves running an RD of YY using observations to the right of the cutoff with values of TT near T1(u)T_1(u) and to the left of the cutoff with values of TT near T0(u)T_0(u). The left and right limits in the RD would respectively identify

m(u)=E[YT=q(u),R=0] and m+(u)=E[YT=q+(u),R=0]m^-(u) = \mathbb{E}[ Y | T = q^-(u), R = 0] \text{ and } m^+(u) = \mathbb{E}[ Y | T = q^+(u), R = 0]

The difference between these two terms would estimate the jump in YY for units near the uthu^{th} percentile of TT. The QLATE estimate will then be the ratio of these two quantities. Intuitively it’s the jump in YY rescaled by the jump in TT. Note that only quantiles that have a significant jump q+(u)q(u)0q^+(u) - q^-(u) \neq 0 are used to form estimates (since otherwise the denominator is 0).

A visual example using a made-up DGP similar to the paper’s main example is shown here for a given quantile u=0.1u = 0.1. The left-figure shows the estimated jump in TT. The right-figure shows the estimated jump in YY at value of q(u)q^-(u) and q+(u)q^+(u). The ratio of these two estimates would be the estimated treatment effect and can be interpreted as the jump in YY per unit increase in TT.

Example

I will use a set of simulated data as an illustration of the method

r
library(ContinuousRD)
#> Loading required package: data.table
# Load data
data(sim_data)
head(sim_data)
#> Key: <id>
#> id u R Z T Y
#> <int> <num> <num> <num> <num> <num>
#> 1: 1 0.2002145 -0.71892650 0 0.6673815 0.6652777
#> 2: 2 0.6852186 0.19781138 1 1.3139139 1.3141626
#> 3: 3 0.9168758 0.03111523 1 1.4876568 1.4881608
#> 4: 4 0.2843995 0.64044390 1 1.0132996 1.0165819
#> 5: 5 0.1046501 -0.75736198 0 0.3488338 0.3474286
#> 6: 6 0.7010575 0.06593254 1 1.3257931 1.3291840

The data consists of 10000 observations each of which is a unique unit. The variable Y is the outcome variable, T is the continuous treatment of interest, R is the running variable with a cutoff of 00. Positive values of the running variable are affected by the border, in this case a binding lower bound on T. The variable u is the (typically unobserved) percentile of each unit for the variable T.

The below data shows the observed continuous treatment along the running variable. The color of the dot represents the quantile of T at that value of R. You can see a clear jump at the border where T is affected by the binding constraint.

r
library(ggplot2)
ggplot(sim_data) +
geom_point(
aes(x = R, y = T, color = u),
alpha = 0.4
) +
geom_vline(xintercept = 0, color = "red") +
labs(
x = "Running Variable",
y = "Continuous Treatment"
) +
theme_minimal()

However, this is only binding for units at the lower end of the distribution. To see this more clearly, let’s look at multiple RDs at different quantiles of T.

r
library(rdrobust)
low_q = sim_data[sim_data$u > 0 & sim_data$u < 0.2,]
high_q = sim_data[sim_data$u > 0.5 & sim_data$R < 0.7,]
rdrobust::rdplot(
y = low_q$T, x = low_q$R,
title = "Low quantile of T", y.label = "Continuous Treatment", x.label = "Running Variable"
)
r
rdrobust::rdplot(
y = high_q$T, x = high_q$R,
title = "High quantile of T", y.label = "Continuous Treatment", x.label = "Running Variable"
)

The data is simulated to have units with quantile below 0.30.3 be affected by the cutoff (the magnitude of the change in T depends on quantile u) and those units experience a treatment effect of size 11. Let’s see if we get that using the QLATE_bc_se function:

r
res <- ContinuousRD::QLATE_bc_se(sim_data, qlist = seq(0.1, 0.9, by = 0.1), yname = "Y", tname = "T", rname = "R", c = 0)
#> `qlist`: 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, and 0.9
#> `tau_u`: 1, 1.001, NA, NA, NA, NA, NA, NA, and NA
#> `SEtau`: 0.005, 0.018, NA, NA, NA, NA, NA, NA, and NA
#> `tau_u_bc`: 0.992, 0.985, NA, NA, NA, NA, NA, NA, and NA
#> `SEtau_bc`: 2.384, 7.942, NA, NA, NA, NA, NA, NA, and NA
#> `pi`: 1.001
#> `SEpi`: 0.004
#> `pi_bc`: 0.991
#> `SEpi_bc`: 0.359

In the printed results a value of NA appears when no significant jump in the treatment variable is found at the cutoff. In this case, since the jump around the 30th percentile is near zero, it doesn’t show up in the estimate. However, we see there the effect is around 11 as expected for the 10th and 20th percentile. The overall average effect pi is also approximately 11 as well.