Title: | Treatment Effects with Multiple Periods and Groups |
Version: | 2.1.2 |
URL: | https://bcallaway11.github.io/did/, https://github.com/bcallaway11/did/ |
Description: | The standard Difference-in-Differences (DID) setup involves two periods and two groups – a treated group and untreated group. Many applications of DID methods involve more than two periods and have individuals that are treated at different points in time. This package contains tools for computing average treatment effect parameters in Difference in Differences setups with more than two periods and with variation in treatment timing using the methods developed in Callaway and Sant'Anna (2021) <doi:10.1016/j.jeconom.2020.12.001>. The main parameters are group-time average treatment effects which are the average treatment effect for a particular group at a a particular time. These can be aggregated into a fewer number of treatment effect parameters, and the package deals with the cases where there is selective treatment timing, dynamic treatment effects, calendar time effects, or combinations of these. There are also functions for testing the Difference in Differences assumption, and plotting group-time average treatment effects. |
Depends: | R (≥ 3.5), |
License: | GPL-2 |
Encoding: | UTF-8 |
LazyData: | true |
Imports: | BMisc (≥ 1.4.4), Matrix, pbapply, ggplot2, ggpubr, DRDID, generics, methods, tidyr |
RoxygenNote: | 7.2.1 |
VignetteBuilder: | knitr |
Suggests: | rmarkdown, plm, here, knitr, covr |
NeedsCompilation: | no |
Packaged: | 2022-07-20 03:09:38 UTC; brant |
Author: | Brantly Callaway [aut, cre], Pedro H. C. Sant'Anna [aut] |
Maintainer: | Brantly Callaway <brantly.callaway@uga.edu> |
Repository: | CRAN |
Date/Publication: | 2022-07-20 16:00:05 UTC |
AGGTEobj
Description
Objects of this class hold results on aggregated group-time average treatment effects
An object for holding aggregated treatment effect parameters.
Usage
AGGTEobj(
overall.att = NULL,
overall.se = NULL,
type = "simple",
egt = NULL,
att.egt = NULL,
se.egt = NULL,
crit.val.egt = NULL,
inf.function = NULL,
min_e = NULL,
max_e = NULL,
balance_e = NULL,
call = NULL,
DIDparams = NULL
)
Arguments
overall.att |
The estimated overall ATT |
overall.se |
Standard error for overall ATT |
type |
Which type of aggregated treatment effect parameter to compute. One option is "simple" (this just computes a weighted average of all group-time average treatment effects with weights proportional to group size). Other options are "dynamic" (this computes average effects across different lengths of exposure to the treatment and is similar to an "event study"; here the overall effect averages the effect of the treatment across all positive lengths of exposure); "group" (this is the default option and computes average treatment effects across different groups; here the overall effect averages the effect across different groups); and "calendar" (this computes average treatment effects across different time periods; here the overall effect averages the effect across each time period). |
egt |
Holds the length of exposure (for dynamic effects), the group (for selective treatment timing), or the time period (for calendar time effects) |
att.egt |
The ATT specific to egt |
se.egt |
The standard error specific to egt |
crit.val.egt |
A critical value for computing uniform confidence bands for dynamic effects, selective treatment timing, or time period effects. |
inf.function |
The influence function of the chosen aggregated parameters |
min_e |
For event studies, this is the smallest event time to compute
dynamic effects for. By default, |
max_e |
For event studies, this is the largest event time to compute
dynamic effects for. By default, |
balance_e |
If set (and if one computes dynamic effects), it balances
the sample with respect to event time. For example, if |
call |
The function call to aggte |
DIDparams |
A DIDparams object |
Value
an AGGTEobj
DIDparams
Description
Object to hold did parameters that are passed across functions
Usage
DIDparams(
yname,
tname,
idname = NULL,
gname,
xformla = NULL,
data,
control_group,
anticipation = 0,
weightsname = NULL,
alp = 0.05,
bstrap = TRUE,
biters = 1000,
clustervars = NULL,
cband = TRUE,
print_details = TRUE,
pl = FALSE,
cores = 1,
est_method = "dr",
base_period = "varying",
panel = TRUE,
true_repeated_cross_sections,
n = NULL,
nG = NULL,
nT = NULL,
tlist = NULL,
glist = NULL,
call = NULL
)
Arguments
yname |
The name of the outcome variable |
tname |
The name of the column containing the time periods |
idname |
The individual (cross-sectional unit) id name |
gname |
The name of the variable in |
xformla |
A formula for the covariates to include in the
model. It should be of the form |
data |
The name of the data.frame that contains the data |
control_group |
Which units to use the control group.
The default is "nevertreated" which sets the control group
to be the group of units that never participate in the
treatment. This group does not change across groups or
time periods. The other option is to set
|
anticipation |
The number of time periods before participating in the treatment where units can anticipate participating in the treatment and therefore it can affect their untreated potential outcomes |
weightsname |
The name of the column containing the sampling weights. If not set, all observations have same weight. |
alp |
the significance level, default is 0.05 |
bstrap |
Boolean for whether or not to compute standard errors using
the multiplier bootstrap. If standard errors are clustered, then one
must set |
biters |
The number of bootstrap iterations to use. The default is 1000,
and this is only applicable if |
clustervars |
A vector of variables names to cluster on. At most, there
can be two variables (otherwise will throw an error) and one of these
must be the same as idname which allows for clustering at the individual
level. By default, we cluster at individual level (when |
cband |
Boolean for whether or not to compute a uniform confidence
band that covers all of the group-time average treatment effects
with fixed probability |
print_details |
Whether or not to show details/progress of computations.
Default is |
pl |
Whether or not to use parallel processing |
cores |
The number of cores to use for parallel processing |
est_method |
the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust
approach in the |
base_period |
Whether to use a "varying" base period or a "universal" base period. Either choice results in the same post-treatment estimates of ATT(g,t)'s. In pre-treatment periods, using a varying base period amounts to computing a pseudo-ATT in each treatment period by comparing the change in outcomes for a particular group relative to its comparison group in the pre-treatment periods (i.e., in pre-treatment periods this setting computes changes from period t-1 to period t, but repeatedly changes the value of t) A universal base period fixes the base period to always be (g-anticipation-1). This does not compute pseudo-ATT(g,t)'s in pre-treatment periods, but rather reports average changes in outcomes from period t to (g-anticipation-1) for a particular group relative to its comparison group. This is analogous to what is often reported in event study regressions. Using a varying base period results in an estimate of ATT(g,t) being reported in the period immediately before treatment. Using a universal base period normalizes the estimate in the period right before treatment (or earlier when the user allows for anticipation) to be equal to 0, but one extra estimate in an earlier period. |
panel |
Whether or not the data is a panel dataset.
The panel dataset should be provided in long format – that
is, where each row corresponds to a unit observed at a
particular point in time. The default is TRUE. When
is using a panel dataset, the variable |
true_repeated_cross_sections |
Whether or not the data really is repeated cross sections. (We include this because unbalanced panel code runs through the repeated cross sections code) |
n |
The number of observations. This is equal to the number of units (which may be different from the number of rows in a panel dataset). |
nG |
The number of groups |
nT |
The number of time periods |
tlist |
a vector containing each time period |
glist |
a vector containing each group |
call |
Function call to att_gt |
MP
Description
Multi-period objects that hold results for group-time average treatment effects
Usage
MP(
group,
t,
att,
V_analytical,
se,
c,
inffunc,
n = NULL,
W = NULL,
Wpval = NULL,
aggte = NULL,
alp = 0.05,
DIDparams = NULL
)
Arguments
group |
which group (defined by period first treated) an group-time average treatment effect is for |
t |
which time period a group-time average treatment effect is for |
att |
the group-average treatment effect for group |
V_analytical |
Analytical estimator for the asymptotic variance-covariance matrix for group-time average treatment effects |
se |
standard errors for group-time average treatment effects. If bootstrap is set to TRUE, this provides bootstrap-based se. |
c |
simultaneous critical value if one is obtaining simultaneous confidence bands. Otherwise it reports the critical value based on pointwise normal approximation. |
inffunc |
the influence function for estimating group-time average treatment effects |
n |
the number of unique cross-sectional units (unique values of idname) |
W |
the Wald statistic for pre-testing the common trends assumption |
Wpval |
the p-value of the Wald statistic for pre-testing the common trends assumption |
aggte |
an aggregate treatment effects object |
alp |
the significance level, default is 0.05 |
DIDparams |
a |
Value
MP object
MP.TEST
Description
An object that holds results from computing pre-test of the conditional parallel trends assumption
Usage
MP.TEST(
CvM = NULL,
CvMb = NULL,
CvMcval = NULL,
CvMpval = NULL,
KS = NULL,
KSb = NULL,
KScval = NULL,
KSpval = NULL,
clustervars = NULL,
xformla = NULL
)
Arguments
CvM |
Cramer von Mises test statistic |
CvMb |
a vector of bootstrapped Cramer von Mises test statistics |
CvMcval |
CvM critical value |
CvMpval |
p-value for CvM test |
KS |
Kolmogorov-Smirnov test statistic |
KSb |
a vector of bootstrapped KS test statistics |
KScval |
KS critical value |
KSpval |
p-value for KS test |
clustervars |
vector of which variables were clustered on for the test |
xformla |
formla for the X variables used in the test |
Aggregate Group-Time Average Treatment Effects
Description
A function to take group-time average treatment effects and aggregate them into a smaller number of parameters. There are several possible aggregations including "simple", "dynamic", "group", and "calendar."
Usage
aggte(
MP,
type = "group",
balance_e = NULL,
min_e = -Inf,
max_e = Inf,
na.rm = FALSE,
bstrap = NULL,
biters = NULL,
cband = NULL,
alp = NULL,
clustervars = NULL
)
Arguments
MP |
an MP object (i.e., the results of the |
type |
Which type of aggregated treatment effect parameter to compute. One option is "simple" (this just computes a weighted average of all group-time average treatment effects with weights proportional to group size). Other options are "dynamic" (this computes average effects across different lengths of exposure to the treatment and is similar to an "event study"; here the overall effect averages the effect of the treatment across all positive lengths of exposure); "group" (this is the default option and computes average treatment effects across different groups; here the overall effect averages the effect across different groups); and "calendar" (this computes average treatment effects across different time periods; here the overall effect averages the effect across each time period). |
balance_e |
If set (and if one computes dynamic effects), it balances
the sample with respect to event time. For example, if |
min_e |
For event studies, this is the smallest event time to compute
dynamic effects for. By default, |
max_e |
For event studies, this is the largest event time to compute
dynamic effects for. By default, |
na.rm |
Logical value if we are to remove missing Values from analyses. Defaults is FALSE. |
bstrap |
Boolean for whether or not to compute standard errors using
the multiplier bootstrap. If standard errors are clustered, then one
must set |
biters |
The number of bootstrap iterations to use. The default is the value set in the MP object,
and this is only applicable if |
cband |
Boolean for whether or not to compute a uniform confidence
band that covers all of the group-time average treatment effects
with fixed probability |
alp |
the significance level, default is value set in the MP object. |
clustervars |
A vector of variables to cluster on. At most, there can be two variables (otherwise will throw an error) and one of these must be the same as idname which allows for clustering at the individual level. Default is the variables set in the MP object |
Value
An AGGTEobj
object that holds the results from the
aggregation
Examples
Initial ATT(g,t) estimates from att_gt()
data(mpdta) out <- att_gt(yname="lemp", tname="year", idname="countyreal", gname="first.treat", xformla=NULL, data=mpdta)
You can aggregate the ATT(g,t) in many ways.
Overall ATT:
aggte(out, type = "simple") #> #> Call: #> aggte(MP = out, type = "simple") #> #> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> #> #> #> ATT Std. Error [ 95% Conf. Int.] #> -0.04 0.013 -0.0654 -0.0145 * #> #> #> --- #> Signif. codes: `*' confidence band does not cover 0 #> #> Control Group: Never Treated, Anticipation Periods: 0 #> Estimation Method: Doubly Robust
Dynamic ATT (Event-Study):
aggte(out, type = "dynamic") #> #> Call: #> aggte(MP = out, type = "dynamic") #> #> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> #> #> #> Overall summary of ATT's based on event-study/dynamic aggregation: #> ATT Std. Error [ 95% Conf. Int.] #> -0.0772 0.02 -0.1165 -0.038 * #> #> #> Dynamic Effects: #> Event time Estimate Std. Error [95% Simult. Conf. Band] #> -3 0.0305 0.0150 -0.0060 0.0670 #> -2 -0.0006 0.0139 -0.0344 0.0333 #> -1 -0.0245 0.0150 -0.0610 0.0121 #> 0 -0.0199 0.0117 -0.0485 0.0087 #> 1 -0.0510 0.0168 -0.0919 -0.0100 * #> 2 -0.1373 0.0380 -0.2299 -0.0446 * #> 3 -0.1008 0.0360 -0.1887 -0.0130 * #> --- #> Signif. codes: `*' confidence band does not cover 0 #> #> Control Group: Never Treated, Anticipation Periods: 0 #> Estimation Method: Doubly Robust
ATT for each group:
aggte(out, type = "group") #> #> Call: #> aggte(MP = out, type = "group") #> #> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> #> #> #> Overall summary of ATT's based on group/cohort aggregation: #> ATT Std. Error [ 95% Conf. Int.] #> -0.031 0.0127 -0.0558 -0.0062 * #> #> #> Group Effects: #> Group Estimate Std. Error [95% Simult. Conf. Band] #> 2004 -0.0797 0.0308 -0.1461 -0.0134 * #> 2006 -0.0229 0.0175 -0.0606 0.0148 #> 2007 -0.0261 0.0163 -0.0612 0.0091 #> --- #> Signif. codes: `*' confidence band does not cover 0 #> #> Control Group: Never Treated, Anticipation Periods: 0 #> Estimation Method: Doubly Robust
ATT for each calendar year:
aggte(out, type = "calendar") #> #> Call: #> aggte(MP = out, type = "calendar") #> #> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> #> #> #> Overall summary of ATT's based on calendar time aggregation: #> ATT Std. Error [ 95% Conf. Int.] #> -0.0417 0.0177 -0.0765 -0.0069 * #> #> #> Time Effects: #> Time Estimate Std. Error [95% Simult. Conf. Band] #> 2004 -0.0105 0.0248 -0.0689 0.0479 #> 2005 -0.0704 0.0313 -0.1443 0.0035 #> 2006 -0.0488 0.0199 -0.0956 -0.0020 * #> 2007 -0.0371 0.0143 -0.0708 -0.0033 * #> --- #> Signif. codes: `*' confidence band does not cover 0 #> #> Control Group: Never Treated, Anticipation Periods: 0 #> Estimation Method: Doubly Robust
Group-Time Average Treatment Effects
Description
att_gt
computes average treatment effects in DID
setups where there are more than two periods of data and allowing for
treatment to occur at different points in time and allowing for
treatment effect heterogeneity and dynamics.
See Callaway and Sant'Anna (2021) for a detailed description.
Usage
att_gt(
yname,
tname,
idname = NULL,
gname,
xformla = NULL,
data,
panel = TRUE,
allow_unbalanced_panel = FALSE,
control_group = c("nevertreated", "notyettreated"),
anticipation = 0,
weightsname = NULL,
alp = 0.05,
bstrap = TRUE,
cband = TRUE,
biters = 1000,
clustervars = NULL,
est_method = "dr",
base_period = "varying",
print_details = FALSE,
pl = FALSE,
cores = 1
)
Arguments
yname |
The name of the outcome variable |
tname |
The name of the column containing the time periods |
idname |
The individual (cross-sectional unit) id name |
gname |
The name of the variable in |
xformla |
A formula for the covariates to include in the
model. It should be of the form |
data |
The name of the data.frame that contains the data |
panel |
Whether or not the data is a panel dataset.
The panel dataset should be provided in long format – that
is, where each row corresponds to a unit observed at a
particular point in time. The default is TRUE. When
is using a panel dataset, the variable |
allow_unbalanced_panel |
Whether or not function should
"balance" the panel with respect to time and id. The default
values if |
control_group |
Which units to use the control group.
The default is "nevertreated" which sets the control group
to be the group of units that never participate in the
treatment. This group does not change across groups or
time periods. The other option is to set
|
anticipation |
The number of time periods before participating in the treatment where units can anticipate participating in the treatment and therefore it can affect their untreated potential outcomes |
weightsname |
The name of the column containing the sampling weights. If not set, all observations have same weight. |
alp |
the significance level, default is 0.05 |
bstrap |
Boolean for whether or not to compute standard errors using
the multiplier bootstrap. If standard errors are clustered, then one
must set |
cband |
Boolean for whether or not to compute a uniform confidence
band that covers all of the group-time average treatment effects
with fixed probability |
biters |
The number of bootstrap iterations to use. The default is 1000,
and this is only applicable if |
clustervars |
A vector of variables names to cluster on. At most, there
can be two variables (otherwise will throw an error) and one of these
must be the same as idname which allows for clustering at the individual
level. By default, we cluster at individual level (when |
est_method |
the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust
approach in the |
base_period |
Whether to use a "varying" base period or a "universal" base period. Either choice results in the same post-treatment estimates of ATT(g,t)'s. In pre-treatment periods, using a varying base period amounts to computing a pseudo-ATT in each treatment period by comparing the change in outcomes for a particular group relative to its comparison group in the pre-treatment periods (i.e., in pre-treatment periods this setting computes changes from period t-1 to period t, but repeatedly changes the value of t) A universal base period fixes the base period to always be (g-anticipation-1). This does not compute pseudo-ATT(g,t)'s in pre-treatment periods, but rather reports average changes in outcomes from period t to (g-anticipation-1) for a particular group relative to its comparison group. This is analogous to what is often reported in event study regressions. Using a varying base period results in an estimate of ATT(g,t) being reported in the period immediately before treatment. Using a universal base period normalizes the estimate in the period right before treatment (or earlier when the user allows for anticipation) to be equal to 0, but one extra estimate in an earlier period. |
print_details |
Whether or not to show details/progress of computations.
Default is |
pl |
Whether or not to use parallel processing |
cores |
The number of cores to use for parallel processing |
Value
an MP
object containing all the results for group-time average
treatment effects
Examples:
Basic att_gt()
call:
# Example data data(mpdta) out1 <- att_gt(yname="lemp", tname="year", idname="countyreal", gname="first.treat", xformla=NULL, data=mpdta) summary(out1) #> #> Call: #> att_gt(yname = "lemp", tname = "year", idname = "countyreal", #> gname = "first.treat", xformla = NULL, data = mpdta) #> #> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> #> #> Group-Time Average Treatment Effects: #> Group Time ATT(g,t) Std. Error [95% Simult. Conf. Band] #> 2004 2004 -0.0105 0.0235 -0.0752 0.0542 #> 2004 2005 -0.0704 0.0307 -0.1549 0.0140 #> 2004 2006 -0.1373 0.0365 -0.2379 -0.0367 * #> 2004 2007 -0.1008 0.0383 -0.2062 0.0046 #> 2006 2004 0.0065 0.0236 -0.0585 0.0715 #> 2006 2005 -0.0028 0.0195 -0.0564 0.0509 #> 2006 2006 -0.0046 0.0185 -0.0556 0.0464 #> 2006 2007 -0.0412 0.0202 -0.0969 0.0145 #> 2007 2004 0.0305 0.0155 -0.0122 0.0733 #> 2007 2005 -0.0027 0.0158 -0.0462 0.0408 #> 2007 2006 -0.0311 0.0176 -0.0794 0.0173 #> 2007 2007 -0.0261 0.0167 -0.0720 0.0199 #> --- #> Signif. codes: `*' confidence band does not cover 0 #> #> P-value for pre-test of parallel trends assumption: 0.16812 #> Control Group: Never Treated, Anticipation Periods: 0 #> Estimation Method: Doubly Robust
Using covariates:
out2 <- att_gt(yname="lemp", tname="year", idname="countyreal", gname="first.treat", xformla=~lpop, data=mpdta) summary(out2) #> #> Call: #> att_gt(yname = "lemp", tname = "year", idname = "countyreal", #> gname = "first.treat", xformla = ~lpop, data = mpdta) #> #> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> #> #> Group-Time Average Treatment Effects: #> Group Time ATT(g,t) Std. Error [95% Simult. Conf. Band] #> 2004 2004 -0.0145 0.0233 -0.0759 0.0469 #> 2004 2005 -0.0764 0.0297 -0.1546 0.0018 #> 2004 2006 -0.1404 0.0348 -0.2321 -0.0488 * #> 2004 2007 -0.1069 0.0340 -0.1964 -0.0174 * #> 2006 2004 -0.0005 0.0236 -0.0627 0.0618 #> 2006 2005 -0.0062 0.0184 -0.0548 0.0424 #> 2006 2006 0.0010 0.0194 -0.0502 0.0521 #> 2006 2007 -0.0413 0.0189 -0.0912 0.0086 #> 2007 2004 0.0267 0.0145 -0.0115 0.0650 #> 2007 2005 -0.0046 0.0163 -0.0476 0.0384 #> 2007 2006 -0.0284 0.0191 -0.0788 0.0219 #> 2007 2007 -0.0288 0.0166 -0.0724 0.0149 #> --- #> Signif. codes: `*' confidence band does not cover 0 #> #> P-value for pre-test of parallel trends assumption: 0.23267 #> Control Group: Never Treated, Anticipation Periods: 0 #> Estimation Method: Doubly Robust
Specify comparison units:
out3 <- att_gt(yname="lemp", tname="year", idname="countyreal", gname="first.treat", xformla=~lpop, control_group = "notyettreated", data=mpdta) summary(out3) #> #> Call: #> att_gt(yname = "lemp", tname = "year", idname = "countyreal", #> gname = "first.treat", xformla = ~lpop, data = mpdta, control_group = "notyettreated") #> #> Reference: Callaway, Brantly and Pedro H.C. Sant'Anna. "Difference-in-Differences with Multiple Time Periods." Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. <https://doi.org/10.1016/j.jeconom.2020.12.001>, <https://arxiv.org/abs/1803.09015> #> #> Group-Time Average Treatment Effects: #> Group Time ATT(g,t) Std. Error [95% Simult. Conf. Band] #> 2004 2004 -0.0212 0.0225 -0.0810 0.0386 #> 2004 2005 -0.0816 0.0296 -0.1603 -0.0029 * #> 2004 2006 -0.1382 0.0382 -0.2397 -0.0367 * #> 2004 2007 -0.1069 0.0353 -0.2007 -0.0131 * #> 2006 2004 -0.0075 0.0236 -0.0703 0.0553 #> 2006 2005 -0.0046 0.0193 -0.0559 0.0468 #> 2006 2006 0.0087 0.0171 -0.0367 0.0540 #> 2006 2007 -0.0413 0.0195 -0.0931 0.0105 #> 2007 2004 0.0269 0.0136 -0.0093 0.0632 #> 2007 2005 -0.0042 0.0158 -0.0461 0.0377 #> 2007 2006 -0.0284 0.0185 -0.0777 0.0208 #> 2007 2007 -0.0288 0.0167 -0.0732 0.0156 #> --- #> Signif. codes: `*' confidence band does not cover 0 #> #> P-value for pre-test of parallel trends assumption: 0.23326 #> Control Group: Not Yet Treated, Anticipation Periods: 0 #> Estimation Method: Doubly Robust
References
Callaway, Brantly and Pedro H.C. Sant'Anna. \"Difference-in-Differences with Multiple Time Periods.\" Journal of Econometrics, Vol. 225, No. 2, pp. 200-230, 2021. doi:10.1016/j.jeconom.2020.12.001, https://arxiv.org/abs/1803.09015
build_sim_dataset
Description
A function for building simulated data
Usage
build_sim_dataset(sp_list, panel = TRUE)
Arguments
sp_list |
A list of simulation parameters. See |
panel |
whether to construct panel data (the default) or repeated cross sections data |
Value
a data.frame with the following columns
G observations group
X value of covariate
id observation's id
cluster observation's cluster (by construction there is no within-cluster correlation)
period time period for current observation
Y outcome
treat whether or not this unit is ever treated
citation
Description
print the citation for the relevant paper
Usage
citation()
Compute Aggregated Treatment Effect Parameters
Description
Does the heavy lifting on computing aggregated group-time average treatment effects
Usage
compute.aggte(
MP,
type = "group",
balance_e = NULL,
min_e = -Inf,
max_e = Inf,
na.rm = FALSE,
bstrap = NULL,
biters = NULL,
cband = NULL,
alp = NULL,
clustervars = NULL,
call = NULL
)
Arguments
MP |
an MP object (i.e., the results of the |
type |
Which type of aggregated treatment effect parameter to compute. One option is "simple" (this just computes a weighted average of all group-time average treatment effects with weights proportional to group size). Other options are "dynamic" (this computes average effects across different lengths of exposure to the treatment and is similar to an "event study"; here the overall effect averages the effect of the treatment across all positive lengths of exposure); "group" (this is the default option and computes average treatment effects across different groups; here the overall effect averages the effect across different groups); and "calendar" (this computes average treatment effects across different time periods; here the overall effect averages the effect across each time period). |
balance_e |
If set (and if one computes dynamic effects), it balances
the sample with respect to event time. For example, if |
min_e |
For event studies, this is the smallest event time to compute
dynamic effects for. By default, |
max_e |
For event studies, this is the largest event time to compute
dynamic effects for. By default, |
na.rm |
Logical value if we are to remove missing Values from analyses. Defaults is FALSE. |
bstrap |
Boolean for whether or not to compute standard errors using
the multiplier bootstrap. If standard errors are clustered, then one
must set |
biters |
The number of bootstrap iterations to use. The default is 1000,
and this is only applicable if |
cband |
Boolean for whether or not to compute a uniform confidence
band that covers all of the group-time average treatment effects
with fixed probability |
alp |
the significance level, default is 0.05 |
clustervars |
A vector of variables names to cluster on. At most, there
can be two variables (otherwise will throw an error) and one of these
must be the same as idname which allows for clustering at the individual
level. By default, we cluster at individual level (when |
call |
The function call to aggte |
Value
AGGTEobj
object
Compute Group-Time Average Treatment Effects
Description
compute.att_gt
does the main work for computing
multiperiod group-time average treatment effects
Usage
compute.att_gt(dp)
Arguments
dp |
A DIDparams object |
Value
a list with length equal to the number of groups times the number of time periods; each element of the list contains an object that contains group-time average treatment effect as well as which group it is for and which time period it is for. It also exports the influence function which is used externally to compute standard errors.
Pre-Test of Conditional Parallel Trends Assumption
Description
An integrated moments test for the conditional parallel trends assumption holding in all pre-treatment time periods for all groups
Usage
conditional_did_pretest(
yname,
tname,
idname = NULL,
gname,
xformla = NULL,
data,
panel = TRUE,
allow_unbalanced_panel = FALSE,
control_group = c("nevertreated", "notyettreated"),
weightsname = NULL,
alp = 0.05,
bstrap = TRUE,
cband = TRUE,
biters = 1000,
clustervars = NULL,
est_method = "ipw",
print_details = FALSE,
pl = FALSE,
cores = 1
)
Arguments
yname |
The name of the outcome variable |
tname |
The name of the column containing the time periods |
idname |
The individual (cross-sectional unit) id name |
gname |
The name of the variable in |
xformla |
A formula for the covariates to include in the
model. It should be of the form |
data |
The name of the data.frame that contains the data |
panel |
Whether or not the data is a panel dataset.
The panel dataset should be provided in long format – that
is, where each row corresponds to a unit observed at a
particular point in time. The default is TRUE. When
is using a panel dataset, the variable |
allow_unbalanced_panel |
Whether or not function should
"balance" the panel with respect to time and id. The default
values if |
control_group |
Which units to use the control group.
The default is "nevertreated" which sets the control group
to be the group of units that never participate in the
treatment. This group does not change across groups or
time periods. The other option is to set
|
weightsname |
The name of the column containing the sampling weights. If not set, all observations have same weight. |
alp |
the significance level, default is 0.05 |
bstrap |
Boolean for whether or not to compute standard errors using
the multiplier bootstrap. If standard errors are clustered, then one
must set |
cband |
Boolean for whether or not to compute a uniform confidence
band that covers all of the group-time average treatment effects
with fixed probability |
biters |
The number of bootstrap iterations to use. The default is 1000,
and this is only applicable if |
clustervars |
A vector of variables names to cluster on. At most, there
can be two variables (otherwise will throw an error) and one of these
must be the same as idname which allows for clustering at the individual
level. By default, we cluster at individual level (when |
est_method |
the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust
approach in the |
print_details |
Whether or not to show details/progress of computations.
Default is |
pl |
Whether or not to use parallel processing |
cores |
The number of cores to use for parallel processing |
Value
an MP.TEST
object
References
Callaway, Brantly and Sant'Anna, Pedro H. C. "Difference-in-Differences with Multiple Time Periods and an Application on the Minimum Wage and Employment." Working Paper https://arxiv.org/abs/1803.09015v2 (2018).
Examples
## Not run:
data(mpdta)
pre.test <- conditional_did_pretest(yname="lemp",
tname="year",
idname="countyreal",
gname="first.treat",
xformla=~lpop,
data=mpdta)
summary(pre.test)
## End(Not run)
Difference in Differences
Description
Difference in Differences with multiple periods and variation in treatment timing
Take influence function and return standard errors
Description
Function to take an nx1 influence function and return a standard error
Usage
getSE(thisinffunc, DIDparams = NULL)
Arguments
thisinffunc |
An influence function |
Value
scalar standard error
Get an influence function for particular aggregate parameters
Description
Get an influence function for particular aggregate parameters
This is a generic internal function for combining influence functions across ATT(g,t)'s to return an influence function for various aggregated treatment effect parameters.
Usage
get_agg_inf_func(att, inffunc1, whichones, weights.agg, wif = NULL)
Arguments
att |
vector of group-time average treatment effects |
inffunc1 |
influence function for all group-time average treatment effects (matrix) |
whichones |
which elements of att will be used to compute the aggregated treatment effect parameter |
weights.agg |
the weights to apply to each element of att(whichones); should have the same dimension as att(whichones) |
wif |
extra influence function term coming from estimating the weights; should be n x k matrix where k is dimension of whichones |
Value
nx1 influence function
Plot did
objects using ggplot2
Description
Function to plot objects from the did
package
Usage
ggdid(object, ...)
Arguments
object |
either a |
... |
other arguments |
Plot AGGTEobj
objects
Description
A function to plot AGGTEobj
objects
Usage
## S3 method for class 'AGGTEobj'
ggdid(
object,
ylim = NULL,
xlab = NULL,
ylab = NULL,
title = "",
xgap = 1,
legend = TRUE,
ref_line = 0,
theming = TRUE,
...
)
Arguments
object |
either a |
ylim |
optional y limits for the plot; setting here makes the y limits the same across different plots |
xlab |
optional x-axis label |
ylab |
optional y-axis label |
title |
optional plot title |
xgap |
optional gap between the labels on the x-axis. For example,
|
legend |
Whether or not to include a legend (which will indicate color
of pre- and post-treatment estimates). Default is |
ref_line |
A reference line at this value, usually to compare confidence intervals to 0. Set to NULL to omit. |
theming |
Set to FALSE to skip all theming so you can do it yourself. |
... |
other arguments |
Plot MP
objects using ggplot2
Description
A function to plot MP
objects
Usage
## S3 method for class 'MP'
ggdid(
object,
ylim = NULL,
xlab = NULL,
ylab = NULL,
title = "Group",
xgap = 1,
ncol = 1,
legend = TRUE,
group = NULL,
ref_line = 0,
theming = TRUE,
grtitle = "Group",
...
)
Arguments
object |
either a |
ylim |
optional y limits for the plot; setting here makes the y limits the same across different plots |
xlab |
optional x-axis label |
ylab |
optional y-axis label |
title |
optional plot title |
xgap |
optional gap between the labels on the x-axis. For example,
|
ncol |
The number of columns to include in the resulting plot. The default is 1. |
legend |
Whether or not to include a legend (which will indicate color
of pre- and post-treatment estimates). Default is |
group |
Vector for which groups to include in the plots of ATT(g,t).
Default is NULL, and, in this case, plots for all groups will be included ( |
ref_line |
A reference line at this value, usually to compare confidence intervals to 0. Set to NULL to omit. |
theming |
Set to FALSE to skip all theming so you can do it yourself. |
grtitle |
Title to append before each group name ( |
... |
other arguments |
glance model characteristics from AGGTEobj objects
Description
glance model characteristics from AGGTEobj objects
Usage
## S3 method for class 'AGGTEobj'
glance(x, ...)
Arguments
x |
a model of class AGGTEobj produced by the |
... |
other arguments passed to methods |
glance model characteristics from MP objects
Description
glance model characteristics from MP objects
Usage
## S3 method for class 'MP'
glance(x, ...)
Arguments
x |
a model of class MP produced by the |
... |
other arguments passed to methods |
gplot
Description
does the heavy lifting for making a plot of an group-time average treatment effect
Usage
gplot(
ssresults,
ylim = NULL,
xlab = NULL,
ylab = NULL,
title = "Group",
xgap = 1,
legend = TRUE,
ref_line = 0,
theming = TRUE
)
Value
a ggplot2
object
indicator
Description
indicator weighting function
Usage
indicator(X, u)
Arguments
X |
matrix of X's from the data |
u |
a particular value to compare X's to |
Value
numeric vector
Examples
data(mpdta)
dta <- subset(mpdta, year==2007)
X <- model.matrix(~lpop, data=dta)
X <- indicator(X, X[1,])
Multiplier Bootstrap
Description
A function to take an influence function and use the multiplier bootstrap to compute standard errors and critical values for uniform confidence bands.
Usage
mboot(inf.func, DIDparams, pl = FALSE, cores = 1)
Arguments
inf.func |
an influence function |
DIDparams |
DIDparams object |
pl |
whether or not to use parallel processing in the multiplier bootstrap, default=FALSE |
cores |
the number of cores to use with parallel processing, default=1 |
Value
list with elements
bres |
results from each bootstrap iteration |
V |
variance matrix |
se |
standard errors |
crit.val |
a critical value for computing uniform confidence bands |
County Teen Employment Dataset
Description
A dataset containing (the log of) teen employment in 500 counties in the U.S. from 2004 to 2007. This is a subset of the dataset used in Callaway and Sant'Anna (2021). See that paper for additional descriptions.
Usage
mpdta
Format
A data frame with 2000 rows and 5 variables:
- year
the year of the observation
- countyreal
a unique identifier for a particular county
- lpop
the log of 1000s of population for the county
- lemp
the log of teen employment in the county
- first.treat
the year that the state where the county is located raised its minimum wage, it is set equal to 0 for counties that have minimum wages equal to the federal minimum wage over the entire period.
- treat
whether or not a particular county is treated in that year
Source
Callaway and Sant'Anna (2020)
Process did
Function Arguments
Description
Function to process arguments passed to the main methods in the
did
package as well as conducting some tests to make sure
data is in proper format / try to throw helpful error messages.
Usage
pre_process_did(
yname,
tname,
idname,
gname,
xformla = NULL,
data,
panel = TRUE,
allow_unbalanced_panel,
control_group = c("nevertreated", "notyettreated"),
anticipation = 0,
weightsname = NULL,
alp = 0.05,
bstrap = FALSE,
cband = FALSE,
biters = 1000,
clustervars = NULL,
est_method = "dr",
base_period = "varying",
print_details = TRUE,
pl = FALSE,
cores = 1,
call = NULL
)
Arguments
yname |
The name of the outcome variable |
tname |
The name of the column containing the time periods |
idname |
The individual (cross-sectional unit) id name |
gname |
The name of the variable in |
xformla |
A formula for the covariates to include in the
model. It should be of the form |
data |
The name of the data.frame that contains the data |
panel |
Whether or not the data is a panel dataset.
The panel dataset should be provided in long format – that
is, where each row corresponds to a unit observed at a
particular point in time. The default is TRUE. When
is using a panel dataset, the variable |
allow_unbalanced_panel |
Whether or not function should
"balance" the panel with respect to time and id. The default
values if |
control_group |
Which units to use the control group.
The default is "nevertreated" which sets the control group
to be the group of units that never participate in the
treatment. This group does not change across groups or
time periods. The other option is to set
|
anticipation |
The number of time periods before participating in the treatment where units can anticipate participating in the treatment and therefore it can affect their untreated potential outcomes |
weightsname |
The name of the column containing the sampling weights. If not set, all observations have same weight. |
alp |
the significance level, default is 0.05 |
bstrap |
Boolean for whether or not to compute standard errors using
the multiplier bootstrap. If standard errors are clustered, then one
must set |
cband |
Boolean for whether or not to compute a uniform confidence
band that covers all of the group-time average treatment effects
with fixed probability |
biters |
The number of bootstrap iterations to use. The default is 1000,
and this is only applicable if |
clustervars |
A vector of variables names to cluster on. At most, there
can be two variables (otherwise will throw an error) and one of these
must be the same as idname which allows for clustering at the individual
level. By default, we cluster at individual level (when |
est_method |
the method to compute group-time average treatment effects. The default is "dr" which uses the doubly robust
approach in the |
base_period |
Whether to use a "varying" base period or a "universal" base period. Either choice results in the same post-treatment estimates of ATT(g,t)'s. In pre-treatment periods, using a varying base period amounts to computing a pseudo-ATT in each treatment period by comparing the change in outcomes for a particular group relative to its comparison group in the pre-treatment periods (i.e., in pre-treatment periods this setting computes changes from period t-1 to period t, but repeatedly changes the value of t) A universal base period fixes the base period to always be (g-anticipation-1). This does not compute pseudo-ATT(g,t)'s in pre-treatment periods, but rather reports average changes in outcomes from period t to (g-anticipation-1) for a particular group relative to its comparison group. This is analogous to what is often reported in event study regressions. Using a varying base period results in an estimate of ATT(g,t) being reported in the period immediately before treatment. Using a universal base period normalizes the estimate in the period right before treatment (or earlier when the user allows for anticipation) to be equal to 0, but one extra estimate in an earlier period. |
print_details |
Whether or not to show details/progress of computations.
Default is |
pl |
Whether or not to use parallel processing |
cores |
The number of cores to use for parallel processing |
call |
Function call to att_gt |
Value
a DIDparams
object
print.AGGTEobj
Description
prints value of a AGGTEobj
object
Usage
## S3 method for class 'AGGTEobj'
print(x, ...)
Arguments
x |
a |
... |
extra arguments |
print.MP
Description
prints value of a MP
object
Usage
## S3 method for class 'MP'
print(x, ...)
Arguments
x |
a |
... |
extra arguments |
Process Results from compute.att_gt()
Description
Process Results from compute.att_gt()
Usage
process_attgt(attgt.list)
Arguments
attgt.list |
list of results from |
Value
list with elements:
group |
which group a set of results belongs to |
tt |
which time period a set of results belongs to |
att |
the group time average treatment effect |
tidy results
Description
These objects are imported from other packages. Follow the links below to see their documentation.
reset.sim
Description
a function to create a "reasonable" set of parameters to create simulated panel data that obeys a parallel trends assumption. In particular, it provides parameters where the the effect of participating in the treatment is equal to one in all post-treatment time periods.
After calling this function, the user can change particular values of the parameters in order to generate dynamics, heterogeneous effects across groups, etc.
Usage
reset.sim(time.periods = 4, n = 5000, ipw = TRUE, reg = TRUE)
Arguments
time.periods |
The number of time periods to include |
n |
The total number of observations |
ipw |
If TRUE, sets parameters so that DGP is compatible with recovering ATT(g,t)'s using IPW (i.e., where logit that just includes a linear term in X works). If FALSE, sets parameters that will be incompatible with IPW. Either way, these parameters can be specified by the user if so desired. |
reg |
If TRUE, sets parameters so that DGP is compatible with recovering ATT(g,t)'s using regressions on untreated untreated potential outcomes. If FALSE, sets parameters that will be incompatible with using regressions (i.e., regressions that include only linear term in X). Either way, these parameters can be specified by the user if so desired. |
Value
list of simulation parameters
sim
Description
An internal function that builds simulated data, computes
ATT(g,t)'s and some aggregations. It is useful for testing the inference
procedures in the did
function.
Usage
sim(
sp_list,
ret = NULL,
bstrap = TRUE,
cband = TRUE,
control_group = "nevertreated",
xformla = ~X,
est_method = "dr",
clustervars = NULL,
panel = TRUE
)
Arguments
sp_list |
A list of simulation parameters. See |
ret |
which type of results to return. The options are |
bstrap |
whether or not to use the bootstrap to conduct inference (default is TRUE) |
cband |
whether or not to compute uniform confidence bands in the call to |
control_group |
Whether to use the "nevertreated" comparison group (the default) or the "notyettreated" as the comparison group |
xformla |
Formula for covariates in |
est_method |
Which estimation method to use in |
clustervars |
Any additional variables which should be clustered on |
panel |
whether to simulate panel data (the default) or otherwise repeated cross sections data |
Value
When ret=NULL
, returns the results of the call to att_gt
, otherwise returns
1 if the specified test rejects or 0 if not.
splot
Description
alternative plotting function for the case with selective treatment timing
Usage
splot(
ssresults,
ylim = NULL,
xlab = NULL,
ylab = NULL,
title = "Group",
legend = TRUE,
ref_line = 0,
theming = TRUE
)
Value
a ggplot2
object
Summary Aggregate Treatment Effect Parameter Objects
Description
A function to summarize aggregated treatment effect parameters.
Usage
## S3 method for class 'AGGTEobj'
summary(object, ...)
Arguments
object |
an |
... |
other arguments |
summary.MP
Description
prints a summary of a MP
object
Usage
## S3 method for class 'MP'
summary(object, ...)
Arguments
object |
an |
... |
extra arguments |
summary.MP.TEST
Description
print a summary of test results
Usage
## S3 method for class 'MP.TEST'
summary(object, ...)
Arguments
object |
an MP.TEST object |
... |
other variables |
Multiplier Bootstrap for Conditional Moment Test
Description
A slightly modified multiplier bootstrap procedure for the pre-test of the conditional parallel trends assumption
Usage
test.mboot(inf.func, DIDparams, cores = 1)
Arguments
inf.func |
an influence function |
DIDparams |
DIDparams object |
cores |
The number of cores to use to bootstrap the test
statistic in parallel. Default is |
Value
list
bres |
CvM test statistics for each bootstrap iteration |
crit.val |
critical value for CvM test statistic |
tidy results from AGGTEobj objects
Description
tidy results from AGGTEobj objects
Usage
## S3 method for class 'AGGTEobj'
tidy(x, ...)
Arguments
x |
a model of class AGGTEobj produced by the |
... |
Additional arguments to tidying method. |
tidy results from MP objects
Description
tidy results from MP objects
Usage
## S3 method for class 'MP'
tidy(x, ...)
Arguments
x |
a model of class MP produced by the |
... |
Additional arguments to tidying method. |
trimmer
Description
A utility function to find observations that appear to violate support conditions. This function is not called anywhere in the code, but it is just useful for debugging some common issues that users run into.
Usage
trimmer(
g,
tname,
idname,
gname,
xformla,
data,
control_group = "notyettreated",
threshold = 0.999
)
Arguments
g |
is a particular group (below I pass in 2009) |
tname |
The name of the column containing the time periods |
idname |
The individual (cross-sectional unit) id name |
gname |
The name of the variable in |
xformla |
A formula for the covariates to include in the
model. It should be of the form |
data |
The name of the data.frame that contains the data |
control_group |
Which units to use the control group.
The default is "nevertreated" which sets the control group
to be the group of units that never participate in the
treatment. This group does not change across groups or
time periods. The other option is to set
|
threshold |
the cutoff for which observations are flagged as likely violators of the support condition. |
Value
list of ids of observations that likely violate support conditions
Compute extra term in influence function due to estimating weights
Description
A function to compute the extra term that shows up in the influence function for aggregated treatment effect parameters due to estimating the weights
Usage
wif(keepers, pg, weights.ind, G, group)
Arguments
keepers |
a vector of indices for which group-time average treatment effects are used to compute a particular aggregated parameter |
pg |
a vector with same length as total number of group-time average treatment effects that contains the probability of being in particular group |
weights.ind |
additional sampling weights (nx1) |
G |
vector containing which group a unit belongs to (nx1) |
group |
vector of groups |
Value
nxk influence function matrix