Type: | Package |
Title: | Estimation of Conditional Average Treatment Effects with High-Dimensional Data |
Version: | 0.1.0 |
Imports: | KernSmooth, R6, hdm, locpol, caret |
Description: | A two-step double-robust method to estimate the conditional average treatment effects (CATE) with potentially high-dimensional covariate(s). In the first stage, the nuisance functions necessary for identifying CATE are estimated by machine learning methods, allowing the number of covariates to be comparable to or larger than the sample size. The second stage consists of a low-dimensional local linear regression, reducing CATE to a function of the covariate(s) of interest. The CATE estimator implemented in this package not only allows for high-dimensional data, but also has the “double robustness” property: either the model for the propensity score or the models for the conditional means of the potential outcomes are allowed to be misspecified (but not both). This package is based on the paper by Fan et al., "Estimation of Conditional Average Treatment Effects With High-Dimensional Data" (2022), Journal of Business & Economic Statistics <doi:10.1080/07350015.2020.1811102>. |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
Suggests: | knitr, rmarkdown, xfun, randomForest, dplyr, ggplot2, ggthemes |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2022-12-13 19:08:51 UTC; hengzhaohong |
Author: | Qingliang Fan [aut, cre], Hengzhao Hong [aut] |
Maintainer: | Qingliang Fan <michaelqfan@cuhk.edu.hk> |
Repository: | CRAN |
Date/Publication: | 2022-12-14 11:50:02 UTC |
High-Dimensional Conditional Average Treatment Effects (HDCATE) Estimator
Description
Use a two-step procedure to estimate the conditional average treatment effects (CATE) with potentially high-dimensional covariate(s).
Run browseVignettes('hdcate')
to browse the user manual of this package.
Usage
HDCATE(data, y_name, d_name, x_formula)
Arguments
data |
data frame of the observed data |
y_name |
variable name of the observed outcomes |
d_name |
variable name of the treatment indicators |
x_formula |
formula of the covariates |
Value
An initialized HDCATE
model (object), ready for estimation.
Examples
# get simulation data
n_obs <- 500 # Num of observations
n_var <- 100 # Num of observed variables
n_rel_var <- 4 # Num of relevant variables
data <- HDCATE.get_sim_data(n_obs, n_var, n_rel_var)
# conditional expectation model is misspecified
x_formula <- paste(paste0('X', c(2:n_var)), collapse ='+')
# for example, and alternatively, the propensity score model is misspecified
# x_formula <- paste(paste0('X', c(1:(n_var-1))), collapse ='+')
# Example 1: full-sample estimator
# create a new HDCATE model
model <- HDCATE(data=data, y_name='Y', d_name='D', x_formula=x_formula)
# estimate HDCATE function, inference, and plot
HDCATE.set_condition_var(model, 'X2', min=-1, max=1, step=0.01)
HDCATE.fit(model)
HDCATE.inference(model)
HDCATE.plot(model)
# Example 2: cross-fitting estimator
# change above estimator to cross-fitting mode, 5 folds, for example.
HDCATE.use_cross_fitting(model, k_fold=5)
# estimate HDCATE function, inference, and plot
HDCATE.set_condition_var(model, 'X2', min=-1, max=1, step=0.01)
HDCATE.fit(model)
HDCATE.inference(model)
HDCATE.plot(model)
Fit the HDCATE function
Description
Fit the HDCATE function
Usage
HDCATE.fit(HDCATE_model, verbose = TRUE)
Arguments
HDCATE_model |
an object created via HDCATE |
verbose |
whether the verbose message is displayed, the default is |
Value
None. The HDCATE_model
is fitted.
Examples
# get simulation data
n_obs <- 500 # Num of observations
n_var <- 100 # Num of observed variables
n_rel_var <- 4 # Num of relevant variables
data <- HDCATE.get_sim_data(n_obs, n_var, n_rel_var)
# conditional expectation model is misspecified
x_formula <- paste(paste0('X', c(2:n_var)), collapse ='+')
# propensity score model is misspecified
# x_formula <- paste(paste0('X', c(1:(n_var-1))), collapse ='+')
# create a new HDCATE model
model <- HDCATE(data=data, y_name='Y', d_name='D', x_formula=x_formula)
HDCATE.set_condition_var(model, 'X2', min=-1, max=1, step=0.01)
HDCATE.fit(model)
Get simulation data
Description
Get simulation data
Usage
HDCATE.get_sim_data(
n_obs = 500,
n_var = 100,
n_rel_var = 4,
sig_strength_propensity = 0.5,
sig_strength_outcome = 1,
intercept = 10
)
Arguments
n_obs |
Num of observations |
n_var |
Num of covariates |
n_rel_var |
Num of relevant variables, only the first |
sig_strength_propensity |
signal strength in propensity score functions |
sig_strength_outcome |
signal strength in outcome functions |
intercept |
value of intercept in outcome functions |
Value
a data.frame, which is the simulated observed data.
Examples
HDCATE.get_sim_data()
HDCATE.get_sim_data(n_obs=50, n_var=4, n_rel_var=2)
Construct uniform confidence bands
Description
Construct uniform confidence bands
Usage
HDCATE.inference(
HDCATE_model,
sig_level = 0.01,
n_rep_boot = 1000,
verbose = FALSE
)
Arguments
HDCATE_model |
an object created via HDCATE |
sig_level |
a (vector of) significant level, such as 0.01, or c(0.01, 0.05, 0.10) |
n_rep_boot |
repeat n times for bootstrap, the default is 1000 |
verbose |
whether the verbose message is displayed, the default is |
Value
None. The HDCATE confidence bands are constructed.
Examples
# get simulation data
n_obs <- 500 # Num of observations
n_var <- 100 # Num of observed variables
n_rel_var <- 4 # Num of relevant variables
data <- HDCATE.get_sim_data(n_obs, n_var, n_rel_var)
# conditional expectation model is misspecified
x_formula <- paste(paste0('X', c(2:n_var)), collapse ='+')
# propensity score model is misspecified
# x_formula <- paste(paste0('X', c(1:(n_var-1))), collapse ='+')
# create a new HDCATE model
model <- HDCATE(data=data, y_name='Y', d_name='D', x_formula=x_formula)
HDCATE.set_condition_var(model, 'X2', min=-1, max=1, step=0.01)
HDCATE.fit(model)
HDCATE.inference(model)
Plot HDCATE function and the uniform confidence bands
Description
Plot HDCATE function and the uniform confidence bands
Usage
HDCATE.plot(
HDCATE_model,
output_pdf = FALSE,
pdf_name = "hdcate_plot.pdf",
include_band = TRUE,
test_side = "both",
y_axis_min = "auto",
y_axis_max = "auto",
display.hdcate = "HDCATEF",
display.ate = "ATE",
display.siglevel = "sig_level"
)
Arguments
HDCATE_model |
an object created via HDCATE |
output_pdf |
if |
pdf_name |
file name when |
include_band |
if |
test_side |
|
y_axis_min |
minimum value of the Y axis to plot in the graph, the default is |
y_axis_max |
maximum value of the Y axis to plot in the graph, the default is |
display.hdcate |
the name of HDCATE function in the legend, the default is 'HDCATEF' |
display.ate |
the name of average treatment effect in the legend, the default is 'ATE' |
display.siglevel |
the name of the significant level for confidence bands in the legend, the default is 'sig_level' |
Value
None. A plot will be shown or saved as PDF.
Examples
# get simulation data
n_obs <- 500 # Num of observations
n_var <- 100 # Num of observed variables
n_rel_var <- 4 # Num of relevant variables
data <- HDCATE.get_sim_data(n_obs, n_var, n_rel_var)
# conditional expectation model is misspecified
x_formula <- paste(paste0('X', c(2:n_var)), collapse ='+')
# propensity score model is misspecified
# x_formula <- paste(paste0('X', c(1:(n_var-1))), collapse ='+')
# create a new HDCATE model
model <- HDCATE(data=data, y_name='Y', d_name='D', x_formula=x_formula)
HDCATE.set_condition_var(model, 'X2', min=-1, max=1, step=0.01)
HDCATE.fit(model)
HDCATE.inference(model)
HDCATE.plot(model)
Set bandwidth
Description
Set user-defined bandwidth.
Usage
HDCATE.set_bw(model, bandwidth = "default")
Arguments
model |
an object created via HDCATE |
bandwidth |
the value of bandwidth |
Value
None.
Examples
# get simulation data
n_obs <- 500 # Num of observations
n_var <- 100 # Num of observed variables
n_rel_var <- 4 # Num of relevant variables
data <- HDCATE.get_sim_data(n_obs, n_var, n_rel_var)
# conditional expectation model is misspecified
x_formula <- paste(paste0('X', c(2:n_var)), collapse ='+')
# propensity score model is misspecified
# x_formula <- paste(paste0('X', c(1:(n_var-1))), collapse ='+')
# create a new HDCATE model
model <- HDCATE(data=data, y_name='Y', d_name='D', x_formula=x_formula)
# Set user-defined bandwidth, e.g., 0.15.
HDCATE.set_bw(model, 0.15)
Set the conditional variable in CATE
Description
Set the conditional variable in CATE
Usage
HDCATE.set_condition_var(
HDCATE_model,
name = NA,
min = NA,
max = NA,
step = NA
)
Arguments
HDCATE_model |
an object created via HDCATE |
name |
name of the conditional variable |
min |
minimum value of the conditional variable for evaluation |
max |
maximum value of the conditional variable for evaluation |
step |
minimum distance between two evaluation points |
Value
None. The HDCATE_model
is ready to fit.
Examples
# get simulation data
n_obs <- 500 # Num of observations
n_var <- 100 # Num of observed variables
n_rel_var <- 4 # Num of relevant variables
data <- HDCATE.get_sim_data(n_obs, n_var, n_rel_var)
# conditional expectation model is misspecified
x_formula <- paste(paste0('X', c(2:n_var)), collapse ='+')
# propensity score model is misspecified
# x_formula <- paste(paste0('X', c(1:(n_var-1))), collapse ='+')
# create a new HDCATE model
model <- HDCATE(data=data, y_name='Y', d_name='D', x_formula=x_formula)
HDCATE.set_condition_var(model, 'X2', min=-1, max=1, step=0.01)
Set user-defined first-stage estimating methods
Description
Set user-defined ML methods (such as random forests, elastic-net, boosting) to run the first-stage estimation.
Usage
HDCATE.set_first_stage(
model,
fit.treated,
fit.untreated,
fit.propensity,
predict.treated,
predict.untreated,
predict.propensity
)
Arguments
model |
an object created via HDCATE |
fit.treated |
function that accepts a data.frame as the only argument, fits the treated expectation function, and returns a fitted object |
fit.untreated |
function that accepts a data.frame as the only argument, fits the untreated expectation function, and returns a fitted object |
fit.propensity |
function that accepts a data.frame as the only argument, fits the propensity function, and return a fitted object |
predict.treated |
function that accepts the returned object of |
predict.untreated |
function that accepts the returned object of |
predict.propensity |
function that accepts the returned object of |
Value
None.
Examples
# get simulation data
n_obs <- 500 # Num of observations
n_var <- 100 # Num of observed variables
n_rel_var <- 4 # Num of relevant variables
data <- HDCATE.get_sim_data(n_obs, n_var, n_rel_var)
# conditional expectation model is misspecified
x_formula <- paste(paste0('X', c(2:n_var)), collapse ='+')
# propensity score model is misspecified
# x_formula <- paste(paste0('X', c(1:(n_var-1))), collapse ='+')
# create a new HDCATE model
model <- HDCATE(data=data, y_name='Y', d_name='D', x_formula=x_formula)
# manually define a lasso method
my_lasso_fit_exp <- function(df) {
hdm::rlasso(as.formula(paste0('Y', "~", x_formula)), df)
}
my_lasso_predict_exp <- function(fitted_model, df) {
predict(fitted_model, df)
}
my_lasso_fit_ps <- function(df) {
hdm::rlassologit(as.formula(paste0('D', "~", x_formula)), df)
}
my_lasso_predict_ps <- function(fitted_model, df) {
predict(fitted_model, df, type="response")
}
# Apply the "my-lasso" apporach to the first stage
HDCATE.set_first_stage(
model,
my_lasso_fit_exp,
my_lasso_fit_exp,
my_lasso_fit_ps,
my_lasso_predict_exp,
my_lasso_predict_exp,
my_lasso_predict_ps
)
Clear the user-defined first-stage estimating methods
Description
Inverse operation of HDCATE.set_first_stage
Usage
HDCATE.unset_first_stage(model)
Arguments
model |
an object created via HDCATE |
Value
None.
Examples
# get simulation data
n_obs <- 500 # Num of observations
n_var <- 100 # Num of observed variables
n_rel_var <- 4 # Num of relevant variables
data <- HDCATE.get_sim_data(n_obs, n_var, n_rel_var)
# conditional expectation model is misspecified
x_formula <- paste(paste0('X', c(2:n_var)), collapse ='+')
# propensity score model is misspecified
# x_formula <- paste(paste0('X', c(1:(n_var-1))), collapse ='+')
# create a new HDCATE model
model <- HDCATE(data=data, y_name='Y', d_name='D', x_formula=x_formula)
# ... manually set user-defined first-stage estimating methods via `HDCATE.set_first_stage`
# Clear those user-defined methods and use the built-in method
HDCATE.unset_first_stage(model)
Use k-fold cross-fitting estimator
Description
Use k-fold cross-fitting estimator
Usage
HDCATE.use_cross_fitting(model, k_fold = 5, folds = NULL)
Arguments
model |
an object created via HDCATE |
k_fold |
number of folds |
folds |
you can manually set the folds, should be a list of index vector |
Value
None.
Examples
# get simulation data
n_obs <- 500 # Num of observations
n_var <- 100 # Num of observed variables
n_rel_var <- 4 # Num of relevant variables
data <- HDCATE.get_sim_data(n_obs, n_var, n_rel_var)
# conditional expectation model is misspecified
x_formula <- paste(paste0('X', c(2:n_var)), collapse ='+')
# propensity score model is misspecified
# x_formula <- paste(paste0('X', c(1:(n_var-1))), collapse ='+')
# create a new HDCATE model
model <- HDCATE(data=data, y_name='Y', d_name='D', x_formula=x_formula)
# for example, use 5-fold cross-fitting estimator
HDCATE.use_cross_fitting(model, k_fold=5)
# alternatively, pass a list of index vector to the third argument to set the folds manually,
# in this case, the second argument k_fold is auto detected, you can pass any value to it.
HDCATE.use_cross_fitting(model, k_fold=2, folds=list(c(1:250), c(251:500)))
Use full-sample estimator
Description
This is the default mode when creating a model via HDCATE
Usage
HDCATE.use_full_sample(model)
Arguments
model |
an object created via HDCATE |
Value
None.
Examples
# get simulation data
n_obs <- 500 # Num of observations
n_var <- 100 # Num of observed variables
n_rel_var <- 4 # Num of relevant variables
data <- HDCATE.get_sim_data(n_obs, n_var, n_rel_var)
# conditional expectation model is misspecified
x_formula <- paste(paste0('X', c(2:n_var)), collapse ='+')
# propensity score model is misspecified
# x_formula <- paste(paste0('X', c(1:(n_var-1))), collapse ='+')
# create a new HDCATE model
model <- HDCATE(data=data, y_name='Y', d_name='D', x_formula=x_formula)
HDCATE.use_full_sample(model)
High-Dimensional Conditional Average Treatment Effects (HDCATE) Estimator
Description
Use a two-step procedure to estimate the conditional average treatment effects (CATE) for all possible values of the covariate(s).
Format
R6::R6Class object.
Methods
Public methods
Method new()
Usage
HDCATE_R6Class$new(data, y_name, d_name, x_formula)
Method propensity_hd_estimate()
Usage
HDCATE_R6Class$propensity_hd_estimate(data = NA, verbose = F)
Method conditional_expectations_hd_estimate()
Usage
HDCATE_R6Class$conditional_expectations_hd_estimate(data = NA, verbose = F)
Method first_stage()
Usage
HDCATE_R6Class$first_stage(data = NA, verbose = F)
Method second_stage()
Usage
HDCATE_R6Class$second_stage( predictor_eta_hat = NA, eta_hat = NA, subsample_idx = NULL, local_weight = NULL, estimate_std = TRUE, verbose = FALSE, save_model = TRUE )
Method get_bw()
Usage
HDCATE_R6Class$get_bw(phi, use_sample_idx)
Method fit()
Fit the HDCATE function
Usage
HDCATE_R6Class$fit(verbose = FALSE)
Returns
estimated HDCATE
Method inference()
Usage
HDCATE_R6Class$inference( sig_level = 0.01, boot_method = "normal", n_rep_boot = 1000, verbose = FALSE )
Method plot()
Plot the results.
Usage
HDCATE_R6Class$plot( output_pdf = FALSE, pdf_name = "hdcate_plot.pdf", include_band = TRUE, test_side = "both", y_axis_min = "auto", y_axis_max = "auto", display.hdcate = "HDCATEF", display.ate = "ATE", display.siglevel = "sig_level" )
Arguments
output_pdf
if
TRUE
, save image to a pdf file named aspdf_name
pdf_name
the name of the output PDF file
include_band
if
TRUE
, plot uniform confidence bands as well.test_side
'both'
for a 2-sided test,'left'
for a left-sided test or'right'
for a right-sided testy_axis_min
the lowest value plotted in Y axis, the default is
'auto'
y_axis_max
the largest value plotted in Y axis, the default is
'auto'
display.hdcate
the name of HDCATE function in the legend, the default is 'HDCATEF'
display.ate
the name of average treatment effect in the legend, the default is 'ATE'
display.siglevel
the name of the significant level for confidence bands in the legend, the default is 'sig_level'
Method get_confidence_bands()
Usage
HDCATE_R6Class$get_confidence_bands(test_side = "both")
Method draw_weights()
Usage
HDCATE_R6Class$draw_weights(method, n_rep_boot, n_obs)
Method set_condition_var()
Usage
HDCATE_R6Class$set_condition_var(name = NA, min = NA, max = NA, step = NA)
Method clone()
The objects of this class are cloneable with this method.
Usage
HDCATE_R6Class$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.