McCrary Sorting Test — dc

dc_test implements the McCrary (2008) sorting test to identify violations of assignment rules. It is based on the DCdensity function in the "rdd" package.

Usage

dc_test(
  runvar,
  cutpoint,
  bin = NULL,
  bw = NULL,
  verbose = TRUE,
  plot = TRUE,
  ext.out = FALSE,
  htest = FALSE,
  level = 0.95,
  digits = max(3, getOption("digits") - 3),
  timeout = 30
)

Arguments

runvar: A numeric vector containing the running variable.
cutpoint: A numeric value containing the cutpoint at which assignment to the treatment is determined. The default is 0.
bin: A numeric value containing the binwidth. The default is 2*sd(runvar)*length(runvar)^(-.5).
bw: A numeric value containing bandwidth to use. If no bandwidth is supplied, the default uses bandwidth selection calculation from McCrary (2008).
verbose: A logical value indicating whether to print diagnostic information to the terminal. The default is TRUE.
plot: A logical value indicating whether to plot the histogram and density estimations The default is TRUE. The user may wrap this function in additional graphical options to modify the plot.
ext.out: A logical value indicating whether to return extended output. The default is FALSE. When FALSE dc_test will return only the p-value of the test, but will print more information. When TRUE, dc_test will return and print the additional information documented below.
htest: A logical value indicating whether to return an "htest" object compatible with base R's hypothesis test output. The default is FALSE.
level: A numerical value between 0 and 1 specifying the confidence level for confidence intervals. The default is 0.95.
digits: A non-negative integer specifying the number of digits to display in all output. The default is max(3, getOption("digits") - 3).
timeout: A non-negative numerical value specifying the maximum number of seconds that expressions in the function are allowed to run. The default is 30. Specify Inf to run all expressions to completion.

Value

If ext.out is FALSE, dc_test returns a numeric value specifying the p-value of the McCrary (2008) sorting test. Additional output is enabled when ext.out is TRUE. In this case, dc_test returns a list with the following elements:

theta: The estimated log difference in heights of the density curve at the cutpoint.
se: The standard error of theta.
z: The z statistic of the test.
p: The p-value of the test. A p-value below the significance threshold indicates that the user can reject the null hypothesis of no sorting.
binsize: The calculated size of bins for the test.
bw: The calculated bandwidth for the test.
cutpoint: The cutpoint used.
data: A dataframe for the binning of the histogram. Columns are cellmp (the midpoints of each cell) and cellval (the normalized height of each cell).

References

McCrary, J. (2008). Manipulation of the running variable in the regression discontinuity design: A density test. Journal of Econometrics, 142(2), 698-714. doi:10.1016/j.jeconom.2007.05.005 .

Drew Dimmery (2016). rdd: Regression Discontinuity Estimation. R package version 0.57. https://CRAN.R-project.org/package=rdd

Examples

set.seed(12345)
# No discontinuity
x <- runif(1000, -1, 1)
dc_test(x, 0)
#> Binwidth:
#> 0.03597 
#> 
#> Bandwidth:
#> 0.5025 
#> 

#> Estimate for log difference in heights:
#>   Estimate  Std. Error  lower.CL  upper.CL  z value  Pr(>|z|)   
#>    0.2472    0.2011     -0.1469    0.6413    1.2294   0.2189    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Confidence interval used:  0.95 
#> 
#> [1] 0.2189345

# Discontinuity
x <- runif(1000, -1, 1)
x <- x + 2 * (runif(1000, -1, 1) > 0 & x < 0)
dc_test(x, 0)
#> Binwidth:
#> 0.04767 
#> 
#> Bandwidth:
#> 0.6016 
#> 

#> Estimate for log difference in heights:
#>   Estimate  Std. Error  lower.CL  upper.CL  z value  Pr(>|z|)    
#>   0.56818   0.20925     0.15806   0.97830   2.71536  0.00662   **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Confidence interval used:  0.95 
#> 
#> [1] 0.006620445