Multiple Imputation of Multivariate Regression Discontinuity Estimation
mrd_impute.Rd
mrd_impute
estimates treatment effects in a multivariate regression discontinuity design (MRDD) with imputed missing values.
Usage
mrd_impute(
formula,
data,
subset = NULL,
cutpoint = NULL,
bw = NULL,
front.bw = NA,
m = 10,
k = 5,
kernel = "triangular",
se.type = "HC1",
cluster = NULL,
impute = NULL,
verbose = FALSE,
less = FALSE,
est.cov = FALSE,
est.itt = FALSE,
local = 0.15,
ngrid = 250,
margin = 0.03,
boot = NULL,
method = c("center", "univ", "front"),
t.design = NULL,
stop.on.error = TRUE
)
Arguments
- formula
The formula of the MRDD; a symbolic description of the model to be fitted. This is supplied in the format of
y ~ x1 + x2
for a simple sharp MRDD ory ~ x1 + x2 | c1 + c2
for a sharp MRDD with two covariates. A fuzzy MRDD may be specified asy ~ x1 + x2 + z
wherex1
is the first running variable,x2
is the second running variable, andz
is the endogenous treatment variable. Covariates are then included in the same manner as in a sharp MRDD.- data
An optional data frame containing the variables in the model. If not found in
data
, the variables are taken fromenvironment(formula)
.- subset
An optional vector specifying a subset of observations to be used in the fitting process.
- cutpoint
A numeric vector of length 2 containing the cutpoints at which assignment to the treatment is determined. The default is
c(0, 0)
.- bw
A vector specifying the bandwidths at which to estimate the RD. Possible values are
"IK09"
,"IK12"
, and a user-specified non-negative numeric vector specifying the bandwidths at which to estimate the RD. The default is"IK12"
. Ifbw
is"IK12"
, the bandwidth is calculated using the Imbens-Kalyanaraman 2012 method. Ifbw
is"IK09"
, the bandwidth is calculated using the Imbens-Kalyanaraman 2009 method. Then the RD is estimated with that bandwidth, half that bandwidth, and twice that bandwidth. If only a single value is passed into the function, the RD will similarly be estimated at that bandwidth, half that bandwidth, and twice that bandwidth.- front.bw
A non-negative numeric vector of length 3 specifying the bandwidths at which to estimate the RD for each of three effects models (complete model, heterogeneous treatment model, and treatment only model) detailed in Wong, Steiner, and Cook (2013). If
NA
,front.bw
will be determined by cross-validation. The default isNA
.- m
A non-negative integer specifying the number of uniformly-at-random samples to draw as search candidates for
front.bw
, iffront.bw
isNA
. The default is 10.- k
A non-negative integer specifying the number of folds for cross-validation to determine
front.bw
, iffront.bw
isNA
. The default is 5.- kernel
A string indicating which kernel to use. Options are
"triangular"
(default and recommended),"rectangular"
,"epanechnikov"
,"quartic"
,"triweight"
,"tricube"
, and"cosine"
.- se.type
This specifies the robust standard error calculation method to use, from the "sandwich" package. Options are, as in
vcovHC
,"HC3"
,"const"
,"HC"
,"HC0"
,"HC1"
,"HC2"
,"HC4"
,"HC4m"
,"HC5"
. The default is"HC1"
. This option is overridden bycluster
.- cluster
An optional vector of length n specifying clusters within which the errors are assumed to be correlated. This will result in reporting cluster robust SEs. This option overrides anything specified in
se.type
. It is suggested that data with a discrete running variable be clustered by each unique value of the running variable (Lee and Card, 2008).- impute
An optional vector of length n containing a grouping variable that specifies the imputed variables with missing values.
- verbose
A logical value indicating whether to print additional information to the terminal. The default is
FALSE
.- less
Logical. If
TRUE
, return the estimates of parametric linear and optimal bandwidth non-parametric models only. IfFALSE
return the estimates of linear, quadratic, and cubic parametric models and optimal, half and double bandwidths in non-parametric models. The default isFALSE
.- est.cov
Logical. If
TRUE
, the estimates of covariates will be included. IfFALSE
, the estimates of covariates will not be included. The default isFALSE
. This option is not applicable if method is"front"
.- est.itt
Logical. If
TRUE
, the estimates of intent-to-treat (ITT) will be returned. IfFALSE
, the estimates of ITT will not be returned. The default isFALSE
. This option is not applicable if method is"front"
.- local
A non-negative numeric value specifying the range of neighboring points around the cutoff on the standardized scale, for each assignment variable. The default is 0.15.
- ngrid
A non-negative integer specifying the number of non-zero grid points on each assignment variable, which is also the number of zero grid points on each assignment variable. The default is 250. The value used in Wong, Steiner and Cook (2013) is 2500, which may cause long computational time.
- margin
A non-negative numeric value specifying the range of grid points beyond the minimum and maximum of sample points on each assignment variable. The default is 0.03.
- boot
An optional non-negative integer specifying the number of bootstrap samples to obtain standard error of estimates. This argument is not optional if method is
"front"
.- method
A string specifying the method to estimate the RD effect. Options are
"center"
,"univ"
,"front"
, based on the centering, univariate, and frontier approaches (respectively) from Wong, Steiner, and Cook (2013).- t.design
A character vector of length 2 specifying the treatment option according to design. The first entry is for
x1
and the second entry is forx2
. Options are"g"
(treatment is assigned ifx1
is greater than its cutoff),"geq"
(treatment is assigned ifx1
is greater than or equal to its cutoff),"l"
(treatment is assigned ifx1
is less than its cutoff), and"leq"
(treatment is assigned ifx1
is less than or equal to its cutoff). The same options are available forx2
.- stop.on.error
A logical value indicating whether to remove bootstraps which cause error in the
integrate
function. IfTRUE
, bootstraps which cause error are removed and resampled until the specified number of bootstrap samples are acquired. IfFALSE
, bootstraps which cause error are not removed. The default isTRUE
.
Value
mrd_impute
returns an object of class "mrd
" or "mrdi"
for "front"
method.
The function summary
is used to obtain and print a summary of the
estimated regression discontinuity. The object of class mrd
is a list
containing the following components for each estimated treatment effect,
tau_MRD
or tau_R
and tau_M
:
- call
The matched call.
- type
A string denoting either
"sharp"
or"fuzzy"
RDD.- cov
The names of covariates.
- bw
Numeric vector of each bandwidth used in estimation.
- obs
Vector of the number of observations within the corresponding bandwidth.
- model
For a sharp design, a list of the
lm
objects is returned. For a fuzzy design, a list of lists is returned, each with two elements:firststage
, the first stagelm
object, andiv
, theivreg
object. A model is returned for each parametric and non-parametric case and corresponding bandwidth.- frame
Returns the model frame used in fitting.
- na.action
The observations removed from fitting due to missingness.
- est
Numeric vector of the estimate of the discontinuity in the outcome under a sharp MRDD or the Wald estimator in the fuzzy MRDD, for each corresponding bandwidth.
- d
Numeric vector of the effect size (Cohen's d) for each estimate.
- se
Numeric vector of the standard error for each corresponding bandwidth.
- z
Numeric vector of the z statistic for each corresponding bandwidth.
- df
Numeric vector of the degrees of freedom computed using Barnard and Rubin (1999) adjustment for imputation.
- p
Numeric vector of the p-value for each corresponding bandwidth.
- ci
The matrix of the 95 for each corresponding bandwidth.
- impute
A logical value indicating whether multiple imputation is used or not.
References
Wong, V. C., Steiner, P. M., Cook, T. D. (2013). Analyzing regression-discontinuity designs with multiple assignment variables: A comparative study of four estimation methods. Journal of Educational and Behavioral Statistics, 38(2), 107-141. https://journals.sagepub.com/doi/10.3102/1076998611432172.
Lee, D. S., Lemieux, T. (2010). Regression Discontinuity Designs in Economics. Journal of Economic Literature, 48(2), 281-355. doi:10.1257/jel.48.2.281 .
Lee, D. S., Card, D. (2010). Regression discontinuity inference with specification error. Journal of Econometrics, 142(2), 655-674. doi:10.1016/j.jeconom.2007.05.003 .
Barnard, J., Rubin, D. (1999). Small-Sample Degrees of Freedom with Multiple Imputation. Biometrika, 86(4), 948-55.
Examples
set.seed(12345)
x1 <- runif(300, -1, 1)
x2 <- runif(300, -1, 1)
cov <- rnorm(300)
y <- 3 + 2 * (x1 >= 0) + 3 * cov + 10 * (x2 >= 0) + rnorm(300)
imp <- rep(1:3, each = 100)
# all examples below have smaller numbers of m to keep run-time low
# centering
mrd_impute(y ~ x1 + x2 | cov, impute = imp, method = "center", t.design = c("geq", "geq"), m = 3)
#> $call
#> mrd_impute(formula = y ~ x1 + x2 | cov, m = 3, impute = imp,
#> method = "center", t.design = c("geq", "geq"))
#>
#> $center
#> $center$tau_MRD
#>
#> Call:
#> rd_est(formula = Y ~ X | cov, data = data, subset = NULL, cutpoint = 0,
#> bw = NULL, kernel = "triangular", se.type = "HC1", cluster = cluster,
#> verbose = FALSE, less = FALSE, est.cov = FALSE, est.itt = FALSE,
#> t.design = "leq")
#>
#> Coefficients:
#> Linear Quadratic Cubic Opt Half-Opt Double-Opt
#> 6.679 5.056 6.532 6.534 6.126 6.348
#>
#>
#>
#> attr(,"class")
#> [1] "mrd"
# univariate
mrd_impute(y ~ x1 + x2 | cov, impute = imp, method = "univ", t.design = c("geq", "geq"), m = 3)
#> $call
#> mrd_impute(formula = y ~ x1 + x2 | cov, m = 3, impute = imp,
#> method = "univ", t.design = c("geq", "geq"))
#>
#> $univ
#> $univ$tau_R
#>
#> Call:
#> rd_est(formula = Y ~ X1 | cov, data = data, subset = subset1,
#> cutpoint = 0, bw = NULL, kernel = "triangular", se.type = "HC1",
#> cluster = cluster, verbose = FALSE, less = FALSE, est.cov = FALSE,
#> est.itt = FALSE, t.design = "geq")
#>
#> Coefficients:
#> Linear Quadratic Cubic Opt Half-Opt Double-Opt
#> 1.792 1.844 1.619 1.864 1.844 1.796
#>
#>
#> $univ$tau_M
#>
#> Call:
#> rd_est(formula = Y ~ X2 | cov, data = data, subset = subset2,
#> cutpoint = 0, bw = NULL, kernel = "triangular", se.type = "HC1",
#> cluster = cluster, verbose = FALSE, less = FALSE, est.cov = FALSE,
#> est.itt = FALSE, t.design = "geq")
#>
#> Coefficients:
#> Linear Quadratic Cubic Opt Half-Opt Double-Opt
#> 9.326 10.113 11.082 10.037 10.237 9.521
#>
#>
#>
#> attr(,"class")
#> [1] "mrd"
# frontier - don't run due to computation time
if (FALSE) mrd_impute(y ~ x1 + x2 | cov, impute = imp, method = "front",
boot = 1000, t.design = c("geq", "geq"), m = 3)