Type: | Package |
Date: | 2022-03-30 |
Title: | Stratification and Matching for Large Observational Data Sets |
Version: | 0.1.9 |
Maintainer: | Rachael C. Aikens <rockyaikens@gmail.com> |
BugReports: | https://github.com/raikens1/stratamatch/issues |
Description: | A pilot matching design to automatically stratify and match large datasets. The manual_stratify() function allows users to manually stratify a dataset based on categorical variables of interest, while the auto_stratify() function does automatically by allocating a held-aside (pilot) data set, fitting a prognostic score (see Hansen (2008) <doi:10.1093/biomet/asn004>) on the pilot set, and stratifying the data set based on prognostic score quantiles. The strata_match() function then does optimal matching of the data set in parallel within strata. |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
Imports: | dplyr (≥ 0.8.3), Hmisc (≥ 4.2-0), magrittr (≥ 1.5), rlang (≥ 0.4.0), survival(≥ 2.44.1.1) |
Depends: | R (≥ 3.4.0) |
Suggests: | knitr, optmatch (≥ 0.9-11), rmarkdown, testthat (≥ 2.1.0), glmnet (≥ 4.0), randomForest (≥ 4.6-14) |
URL: | https://github.com/raikens1/stratamatch |
RoxygenNote: | 7.1.2 |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2022-03-31 00:19:08 UTC; rocky |
Author: | Rachael C. Aikens [aut, cre], Joseph Rigdon [aut], Justin Lee [aut], Michael Baiocchi [aut], Jonathan Chen [aut] |
Repository: | CRAN |
Date/Publication: | 2022-03-31 06:00:02 UTC |
Pipe operator
Description
Pipe operator
Demographics and comorbidities of 10,157 ICU patients
Description
An deidentified data set containing the demographics, comorbidities, DNR code
status, and surgical team assignment of 10,157 patients in the Stanford
University Hospital Intensive Care Unit (ICU). This data was extracted from
the electronic record system, deidentified, and made publically available by
Chavez et al (2018) <doi:10.1371/journal.pone.0190569>. It was reprocessed
for use in the stratamatch
package as a sample data set. For more
details on the data extraction and inclusion criteria, see Chavez et al.
Usage
ICU_data
Format
A data frame with 10157 rows and 29 variables:
- patid
patient id, numeric
- Birth.preTimeDays
age of patient at time of admission to the ICU in days, numeric
- Female.pre
whether the patient was documented to be female prior to ICU visit, binary
- RaceAsian.pre
whether the patient's race/ethnicity was documented as Asian prior to ICU visit, binary
- RaceUnknown.pre
whether the patient's race/ethnicity was unknown prior to ICU visit, binary
- RaceOther.pre
whether the patient's race/ethnicity was documented as Other" prior to ICU visit, binary
- RaceBlack.pre
whether the patient's race/ethnicity was documented as Black/African American prior to ICU visit, binary
- RacePacificIslander.pre
whether the patient's race/ethnicity was documented as PacificIslander prior to ICU visit, binary
- RaceNativeAmerican.pre
whether the patient's race/ethnicity was documented as Native American prior to ICU visit, binary
- self_pay
whether the patient was "self pay" (i.e. uninsured), binary
- all_latinos
whether the patient was documented to be latino prior to ICU visit, binary
- DNR
whether the patient had code status set to any DNR "Do not resuscitate" order at any point during their ICU stay, binary
- surgicalTeam
whether the patient was assigned to a surgical team at any point during their ICU stay, binary
Details
License information for this data is as follows:
Copyright (c) 2016, Stanford University
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Source
https://simtk.org/frs/download_confirm.php/latestzip/1969/ICUDNR-latest.zip?group_id=892
Auto Stratify
Description
Automatically creates strata for matching based on a prognostic score formula
or a vector of prognostic scores already estimated by the user. Creates a
auto_strata
object, which can be passed to strata_match
for stratified matching or unpacked by the user to be matched by some other
means.
Usage
auto_stratify(
data,
treat,
prognosis,
outcome = NULL,
size = 2500,
pilot_fraction = 0.1,
pilot_size = NULL,
pilot_sample = NULL,
group_by_covariates = NULL
)
Arguments
data |
|
treat |
string giving the name of column designating treatment assignment |
prognosis |
information on how to build prognostic scores. Three different input types are allowed:
|
outcome |
string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula |
size |
numeric, desired size of strata (default = 2500) |
pilot_fraction |
numeric between 0 and 1 giving the proportion of controls to be allotted for building the prognostic score (default = 0.1) |
pilot_size |
alternative to pilot_fraction. Approximate number of
observations to be used in pilot set. Note that the actual pilot set size
returned may not be exactly |
pilot_sample |
a data.frame of held aside samples for building
prognostic score model. If |
group_by_covariates |
character vector giving the names of covariates to be grouped by (optional). If specified, the pilot set will be sampled in a stratified manner, so that the composition of the pilot set reflects the composition of the whole data set in terms of these covariates. The specified covariates must be categorical. |
Details
Stratifying by prognostic score quantiles can be more effective than manually stratifying a data set because the prognostic score is continuous, thus the strata produced tend to be of equal size with similar prognosis.
Automatic stratification requires information on how the prognostic scores
should be derived. This is primarily determined by the specifciation of the
prognosis
argument. Three main forms of input for prognosis
are allowed:
A vector of prognostic scores. This vector should be the same length and order of the rows in the data set. If this method is used, the
outcome
argument must also be specified; this is simply a string giving the name of the column which contains outcome information.A formula for prognosis (e.g.
outcome ~ X1 + X2
). If this method is used,auto_stratify
will automatically split the data set into apilot_set
and ananalysis_set
. The pilot set will be used to fit a logistic regression model for outcome in the absence of treatment, and this model will be used to estimate prognostic scores on the analysis set. The analysis set will then be stratified based on the estimated prognostic scores. In this case theoutcome
argument need not be specified since it can be inferred from the input formula.A model for prognosis (e.g. a
glm
object). If this method is used, theoutcome
argument must also be specified
Value
Returns an auto_strata
object. This contains:
-
outcome
- a string giving the name of the column where outcome information is stored -
treat
- a string giving the name of the column encoding treatment assignment -
analysis_set
- the data set with strata assignments -
call
- the call toauto_stratify
used to generate this object -
issue_table
- a table of each stratum and potential issues of size and treat:control balance. In small or imbalanced strata, it may be difficult or infeasible to find high-quality matches, while very large strata may be computationally intensive to match. -
strata_table
- a table of each stratum and the prognostic score quantile bin to which it corresponds -
prognostic_scores
- a vector of prognostic scores. -
prognostic_model
- a model for prognosis fit on a pilot data set. Will beNULL
if a vector of prognostic scores was provided as theprognosis
argument toauto_stratify
rather than a model or formula. -
pilot_set
- the set of controls used to fit the prognostic model. These are excluded from subsequent analysis so that the prognostic score is not overfit to the data used to estimate the treatment effect. Will beNULL
if a pre-fit model or a vector of prognostic scores was provided as theprognosis
argument toauto_stratify
rather than formula.
Troubleshooting
This section suggests fixes for common errors that appear while fitting the prognostic score or using it to estimate prognostic scores on the analysis set.
-
Encountered an error while fitting the prognostic model... numeric probabilities 0 or 1 produced
. This error means that the prognostic model can perfectly separate positive from negative outcomes. Estimating a treatment effect in this case is unwise since an individual's baseline characteristics perfectly determine their outcome, regardless of whether they recieve the treatment. This error may also appear on rare occaisions when your pilot set is very small (number of observations approximately <= number of covariates in the prognostic model), so that perfect separation happens by chance. -
Encountered an error while estimating prognostic scores ... factor X has new levels ...
This may indicate that some value(s) of one or more categorical variables appear in the analysis set which were not seen in the pilot set. This means that when we try to obtain prognostic scores for our analysis set, we run into some new value that our prognostic model was not prepared to handle. There are a few options we have to troubleshoot this problem:-
Rejection sampling. Run
auto_stratify
again with the same arguments until this error does not occur (i.e. until some observations with the missing value are randomly selected into the pilot set) -
Eliminate this covariate from the prognostic formula.
-
Remove observations with the rare covariate value from the entire data set. Consider carefully how this exclusion might affect your results.
-
Other errors or warnings can occur if the pilot set is too small and the prognostic formula is too complicated. Always make sure that the number of observations in the pilot set is large enough that you can confidently fit a prognostic model with the number of covariates you want.
See Also
manual_stratify
, new_auto_strata
Examples
# make sample data set
set.seed(111)
dat <- make_sample_data(n = 75)
# construct a pilot set, build a prognostic score for `outcome` based on X2
# and stratify the data set based on the scores into sets of about 25
# observations
a.strat_formula <- auto_stratify(dat, "treat", outcome ~ X2, size = 25)
# stratify the data set based on a model for prognosis
pilot_data <- make_sample_data(n = 30)
prognostic_model <- glm(outcome ~ X2, pilot_data, family = "binomial")
a.strat_model <- auto_stratify(dat, "treat", prognostic_model,
outcome = "outcome", size = 25
)
# stratify the data set based on a vector of prognostic scores
prognostic_scores <- predict(prognostic_model,
newdata = dat,
type = "response"
)
a.strat_scores <- auto_stratify(dat, "treat", prognostic_scores,
outcome = "outcome", size = 25
)
# diagnostic plots
plot(a.strat_formula)
plot(a.strat_formula, type = "AC", propensity = treat ~ X1, stratum = 1)
plot(a.strat_formula, type = "hist", propensity = treat ~ X1, stratum = 1)
plot(a.strat_formula, type = "residual")
Build Autostrata object
Description
Not meant to be called externally. Given the arguments to auto_stratify,
build the prognostic scores and return the analysis set, the prognostic
scores, the pilot set, the prognostic model, and the outcome string. The
primary function of this code is to determine the type of prognosis
and handle it appropriately.
Usage
build_autostrata(
data,
treat,
prognosis,
outcome,
pilot_fraction,
pilot_size,
pilot_sample,
group_by_covariates
)
Arguments
data |
|
treat |
string giving the name of column designating treatment assignment |
prognosis |
information on how to build prognostic scores. Three different input types are allowed:
|
outcome |
string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula |
pilot_fraction |
numeric between 0 and 1 giving the proportion of controls to be allotted for building the prognostic score (default = 0.1) |
pilot_size |
alternative to pilot_fraction. Approximate number of
observations to be used in pilot set. Note that the actual pilot set size
returned may not be exactly |
pilot_sample |
a data.frame of held aside samples for building
prognostic score model. If |
group_by_covariates |
character vector giving the names of covariates to be grouped by (optional). If specified, the pilot set will be sampled in a stratified manner, so that the composition of the pilot set reflects the composition of the whole data set in terms of these covariates. The specified covariates must be categorical. |
Value
a list of: analysis set, prognostic scores, pilot set, prognostic model, and outcome string
See Also
Check inputs from auto_stratify
Description
Not meant to be called externally. Throws errors if basic auto_stratify inputs are incorrect.
Usage
check_base_inputs_auto_stratify(data, treat, outcome)
Arguments
data |
|
treat |
string giving the name of column designating treatment assignment |
outcome |
string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula |
Value
nothing; produces errors and warnings if anything is wrong
Check inputs to manual_stratify
Description
Not meant to be called externally. Checks validity of formula, types of all inputs to manual stratify, and warns if covariates are continuous.
Usage
check_inputs_manual_stratify(data, strata_formula, force)
Arguments
data |
data.frame with observations as rows, features as columns |
strata_formula |
the formula to be used for stratification. (e.g. |
force |
a boolean. If true, run even if a variable appears continuous. (default = FALSE) |
Value
nothing; produces errors and warnings if anything is wrong
Check inputs to any matching function
Description
Check inputs to any matching function
Usage
check_inputs_matcher(object, model, k)
Arguments
object |
a strata object |
model |
(optional) formula for matching. If left blank, all
columns of the analysis set in |
k |
the number of control individuals to be matched to each treated
individual. If |
Value
nothing
Check Outcome
Description
Checks that outcome is a string which is a column in the data
Usage
check_outcome(outcome, data, treat)
Arguments
outcome |
string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula |
data |
|
treat |
string giving the name of column designating treatment assignment |
Value
nothing
Check Pilot set options
Description
Check Pilot set options
Usage
check_pilot_set_options(
pilot_fraction,
pilot_size,
group_by_covariates,
data,
n_c
)
Arguments
pilot_fraction |
numeric between 0 and 1 giving the proportion of controls to be allotted for building the prognostic score (default = 0.1) |
pilot_size |
alternative to pilot_fraction. Approximate number of
observations to be used in pilot set. Note that the actual pilot set size
returned may not be exactly |
group_by_covariates |
character vector giving the names of covariates to be grouped by (optional). If specified, the pilot set will be sampled in a stratified manner, so that the composition of the pilot set reflects the composition of the whole data set in terms of these covariates. The specified covariates must be categorical. |
data |
|
n_c |
number of control observations in |
Value
nothing
Check Prognostic Formula
Description
Check Prognostic Formula
Usage
check_prognostic_formula(prog_formula, data, outcome, treat)
Arguments
prog_formula |
a formula for prognostic score |
data |
|
outcome |
string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula |
treat |
string giving the name of column designating treatment assignment |
Value
nothing
Check Propensity Formula
Description
Check Propensity Formula
Usage
check_prop_formula(prop_formula, data, treat)
Arguments
prop_formula |
a formula |
data |
the analysis set data within a stratum |
treat |
the name of the treatment assignment column |
Value
nothing
Check Scores
Description
Checks that prognostic scores are the same length as data
Usage
check_scores(prognostic_scores, data, outcome)
Arguments
prognostic_scores |
a numeric vector |
data |
|
outcome |
string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula |
Value
nothing
Estimate Prognostic Scores
Description
Tries to make prognostic scores. If successfull, returns them, otherwise throws an error message. Common failure mode is that the prognostic score is built on some categorical variable that takes on some values in the analysis set that are never seen in the pilot set. Outputs are on the response scale, (rather than the linear predictor), so the score is the expected value of the outcome under the control assignement based on the observed covariates.
Usage
estimate_scores(prognostic_model, analysis_set)
Arguments
prognostic_model |
Model of prognosis |
analysis_set |
data set on which prognostic scores should be estimated |
Value
vector of prognostic scores
Extract cutoffs between strata
Description
By default, returns only the internal cut points. Cutoffs at 0 and 1 are implied.
Usage
extract_cut_points(x)
Arguments
x |
an autostrata object |
Value
a vector of the score values delineating cutoffs between strata
Examples
dat <- make_sample_data()
a.strat <- auto_stratify(dat, "treat", outcome ~ X1 + X2)
cutoffs <- extract_cut_points(a.strat)
Extract cutoffs between strata
Description
Extract cutoffs between strata
Usage
## S3 method for class 'auto_strata'
extract_cut_points(x)
Arguments
x |
an autostrata object |
Value
a vector of the score values delineating cutoffs between strata
Fit Prognostic Model
Description
Given a pilot set and a prognostic formula, return the fitted formula. If the outcome is binary, fit a logistic regression. Otherwise, fit a linear model.
Usage
fit_prognostic_model(dat, prognostic_formula, outcome)
Arguments
dat |
data.frame on which model should be fit |
prognostic_formula |
formula for prognostic model |
outcome |
string giving name of column of data where outcomes are recorded |
Value
a glm or lm object fit from prognostic_formula
on data
Get Issues
Description
Helper for make_issue_table to return issues string. Given a row which summarizes the Treat, Control, Total, and Control_Proportion of a stratum, return a string of potential issues with the stratum.
Usage
get_issues(row)
Arguments
row |
a row of the data.frame produced in make_issue_table |
Value
Returns a string of potential issues
Parse propensity
input to obtain propensity scores
Description
the propensity
input to plot.auto_strata
or
plot.manual_strata
can be propensity scores, a propensity model, or a
formula for propensity score. This function figures out which type
propensity
is and returns the propensity scores. Returns the
propensity score on the response scale (rather than the linear predictor), so
the scores are the predited probabilities of treatment.
Usage
get_prop_scores(propensity, data, treat)
Arguments
propensity |
either a vector of propensity scores, a model for propensity, or a formula for propensity scores |
data |
the analysis set data within a stratum |
treat |
the name of the treatment assignment column |
Value
vector of propensity scores
Checks auto_strata
class
Description
Checks if the target object is an auto_strata
object.
Usage
is.auto_strata(object)
Arguments
object |
any R object |
Value
Returns TRUE
if its argument has auto_strata
among its
classes and FALSE
otherwise.
Examples
dat <- make_sample_data()
a.strat <- auto_stratify(dat, "treat", outcome ~ X1 + X2)
is.auto_strata(a.strat) # returns TRUE
Checks manual_strata
class
Description
Checks if the target object is a manual_strata
object.
Usage
is.manual_strata(object)
Arguments
object |
any R object |
Value
Returns TRUE
if its argument has manual_strata
among
its classes and FALSE
otherwise.
Examples
dat <- make_sample_data()
m.strat <- manual_stratify(dat, treat ~ C1)
is.manual_strata(m.strat) # returns TRUE
Checks strata
class
Description
Checks if the target object is a strata
object.
Usage
is.strata(object)
Arguments
object |
any R object |
Value
Returns TRUE
if its argument has strata
among its
classes and FALSE
otherwise.
Examples
dat <- make_sample_data()
m.strat <- manual_stratify(dat, treat ~ C1)
is.strata(m.strat) # returns TRUE
Check if a vector is binary
Description
return TRUE if the input is logical or if it contains only 0's and 1's
Usage
is_binary(col)
Arguments
col |
a column from a data frame |
Value
logical
Make Size-Ratio plot
Description
Not meant to be called externally. Helper plot function for strata
.
Produces a scatter plot of strata by size and control proportion.
Usage
make_SR_plot(x, label)
Arguments
x |
a |
label |
ignored unless |
Make Assignment-Control plot
Description
Not meant to be called externally. Helper plot function for strata
object with type = "AC"
. Produces a Assignment-Control plot of stratum
s
Usage
make_ac_plot(
x,
propensity,
strat,
strata_lines,
jitter_prognosis,
jitter_propensity
)
Arguments
x |
an |
propensity |
ignored unless |
strat |
the number code of the stratum to be plotted. If "all", plots all strata. |
strata_lines |
default = |
jitter_prognosis |
ignored unless |
jitter_propensity |
ignored unless |
See Also
Aikens et al. (preprint) https://arxiv.org/abs/1908.09077 . Section 3.2 for an explaination of Assignment-Control plots
Make strata table
Description
Make strata table
Usage
make_autostrata_table(qcut)
Arguments
qcut |
the prognostic score quantile cuts |
Value
data.frame of strata definitions
Make histogram plot
Description
Not meant to be called externally. Helper plot function for strata
object with type = "hist"
. Produces a histogram of propensity scores
within a stratum
Usage
make_hist_plot(x, propensity, strat)
Arguments
x |
a |
propensity |
ignored unless |
strat |
the number code of the strata to be plotted. If "all", plots all strata |
Make Issue Table
Description
Not meant to be called externally. Produce table of the number of treated and control individuals in each stratum. Also checks for potential problems with treat/control ratio or stratum size which might result in slow or poor quality matching.
Usage
make_issue_table(a_set, treat)
Arguments
a_set |
|
treat |
string name of treatment column |
Value
Returns a 3 by [number of strata] dataframe with Treat, Control, Total, Control Proportion, and Potential Issues
Make match distances within strata
Description
Makes the match distance with strata specifications for strata_match
.
This function is largely unecessary to call outside of stratamatch, but it is
exported for the benefit of the user to aid in debugging. Note that this
function requires that the R package optmatch
is installed.
Usage
make_match_distances(object, model, method)
Arguments
object |
a strata object |
model |
(optional) formula for matching. If left blank, all
columns of the analysis set in |
method |
either "prop" for propensity score matching based on a glm fit
with model |
Value
a match distance matrix for optmatch
See Also
https://cran.r-project.org/package=optmatch
Examples
dat <- make_sample_data(n = 75)
# stratify with auto_stratify
a.strat <- auto_stratify(dat, "treat", outcome ~ X2, size = 25)
# make match distances. Requires optmatch package to be installed.
md <- make_match_distances(a.strat, treat ~ X1 + X2, method = "mahal")
Make Residual Plot
Description
Not yet implemented. Not meant to be called externally. Helper plot function
for strata
object with type = "residual"
. Produces the
diagnostic plots for the prognostic score model
Usage
make_resid_plot(x)
Arguments
x |
an |
Make sample data
Description
Makes a simple data frame with treat (binary), outcome (binary), and five covariates: X1 (continuous), X2 (continuous), B1 (binary), B2 (binary), and C1 (categorical). Probability outcome = 1 is sigmoid(treat + X1). Probability treatment = 1 is sigmoid(- 0.2 * X1 + X2 - B1 + 2 * B2)
Usage
make_sample_data(n = 100)
Arguments
n |
the size of the desired data set |
Examples
# make sample data set of 30 observations
dat <- make_sample_data(n = 30)
Manual Stratify
Description
Stratifies a data set based on a set of blocking covariates specified by the
user. Creates a manual_strata
object, which can be passed to
strata_match
for stratified matching or unpacked by the user to be
matched by some other means.
Usage
manual_stratify(data, strata_formula, force = FALSE)
Arguments
data |
data.frame with observations as rows, features as columns |
strata_formula |
the formula to be used for stratification. (e.g. |
force |
a boolean. If true, run even if a variable appears continuous. (default = FALSE) |
Value
Returns a manual_strata
object. This contains:
-
treat
- a string giving the name of the column encoding treatment assignment -
covariates
- a character vector with the names of the categorical columns on which the data were stratified -
analysis_set
- the data set with strata assignments -
call
- the call tomanual_stratify
used to generate this object -
issue_table
- a table of each stratum and potential issues of size and treat:control balance. In small or imbalanced strata, it may be difficult or infeasible to find high-quality matches, while very large strata may be computationally intensive to match. -
strata_table
- a table of each stratum and the covariate bin to which it corresponds
See Also
auto_stratify
, new_manual_strata
Examples
# make sample data set
dat <- make_sample_data(n = 75)
# stratify based on B1 and B2
m.strat <- manual_stratify(dat, treat ~ B1 + B2)
# diagnostic plot
plot(m.strat)
New Autostrata
Description
Basic constructor for an auto_strata
object. These objects hold all
the information associated with a dataset that has been stratified via
auto_stratify
. This object may be passed to
strata_match
to be matched or it may be unpacked by the user to be
matched by other means.
Usage
new_auto_strata(
outcome,
treat,
analysis_set = NULL,
call = NULL,
issue_table = NULL,
strata_table = NULL,
prognostic_scores = NULL,
prognostic_model = NULL,
pilot_set = NULL
)
Arguments
outcome |
a string giving the name of the column where outcome information is stored |
treat |
a string giving the name of the column where treatment information is stored |
analysis_set |
the data set which will be stratified |
call |
the call to |
issue_table |
a table of each stratum and potential issues of size and treat:control balance |
strata_table |
a table of each stratum and the prognostic score quantile bin this corresponds to |
prognostic_scores |
a vector of prognostic scores. |
prognostic_model |
a model for prognosis fit on a separate data set. |
pilot_set |
the set of controls used to fit the prognostic model. These are excluded from subsequent analysis so that the prognostic score is not overfit to the data used to estimate the treatment effect. |
Value
a basic auto_strata
object
See Also
auto_stratify
, a function which calls this constructor
to produce an auto_strata
object.
New Manual Strata
Description
Basic constructor for an manual_strata
object. These objects hold all
the information associated with a dataset that has been stratified via
manual_stratify
. This object may be passed to
strata_match
to be matched or it may be unpacked by the user to be
matched by other means.
Usage
new_manual_strata(
treat = character(),
covariates = character(),
analysis_set = data.frame(),
call = call(),
issue_table = data.frame(),
strata_table = data.frame()
)
Arguments
treat |
a string giving the name of the column where treatment information is stored |
covariates |
a character vector with the names of the categorical columns on which to stratify |
analysis_set |
the data set which will be stratified |
call |
the call to |
issue_table |
a table of each stratum and potential issues of size and treat:control balance |
strata_table |
a table of each stratum and the covariate bin this corresponds to |
Value
a basic manual_strata
object
Plot method for auto_strata
object
Description
Generates diagnostic plots for the product of a stratification by
auto_stratify
. There are four plot types:
-
"SR"
(default) - produces a scatter plot of strata by size and treat:control ratio -
"hist"
- produces a histogram of propensity scores within a stratum -
"AC"
- produces a Assignment-Control plot of individuals within a stratum -
"residual"
- produces a residual plot for the prognostic model
Usage
## S3 method for class 'auto_strata'
plot(
x,
type = "SR",
label = FALSE,
stratum = "all",
strata_lines = TRUE,
jitter_prognosis,
jitter_propensity,
propensity,
...
)
Arguments
x |
an |
type |
string giving the plot type (default = |
label |
ignored unless |
stratum |
ignored unless |
strata_lines |
default = |
jitter_prognosis |
ignored unless |
jitter_propensity |
ignored unless |
propensity |
ignored unless |
... |
other arguments |
See Also
Aikens, Greaves, and Baiocchi (2020) in Statistics in Medicine, Section 3.2 for an explaination of Assignment-Control plots (formerly "Fisher-Mill" plots).
Examples
dat <- make_sample_data()
a.strat <- auto_stratify(dat, "treat", outcome ~ X1 + X2)
plot(a.strat) # makes size-ratio scatter plot
plot(a.strat, type = "hist", propensity = treat ~ X1, stratum = 1)
plot(a.strat, type = "AC", propensity = treat ~ X1, stratum = 1)
plot(a.strat, type = "residual")
Plot method for manual_strata
object
Description
Generates diagnostic plots for the product of a stratification by
manual_stratify
. There are two plot types:
-
"SR"
(default) - produces a scatter plot of strata by size and treat:control ratio -
"hist"
- produces a histogram of propensity scores within a stratum.
Note that residual plots and AC plots are not
supported for manual_strata
objects because no prognostic model is
fit.
Usage
## S3 method for class 'manual_strata'
plot(x, type = "SR", label = FALSE, stratum = "all", propensity, ...)
Arguments
x |
a |
type |
string giving the plot type (default = |
label |
ignored unless |
stratum |
ignored unless |
propensity |
ignored unless |
... |
other arguments |
Examples
dat <- make_sample_data()
m.strat <- manual_stratify(dat, treat ~ C1)
plot(m.strat) # makes size-ratio scatter plot
plot(m.strat, type = "hist", propensity = treat ~ X1, stratum = 1)
Print Auto Strata
Description
Print method for auto_strata
object
Usage
## S3 method for class 'auto_strata'
print(x, ...)
Arguments
x |
an |
... |
other arguments |
Examples
dat <- make_sample_data()
a.strat <- auto_stratify(dat, "treat", outcome ~ X1 + X2)
print(a.strat) # prints information about a.strat
Print Manual Strata
Description
Print method for manual_strata
object
Usage
## S3 method for class 'manual_strata'
print(x, ...)
Arguments
x |
a |
... |
other arguments |
Examples
dat <- make_sample_data()
m.strat <- manual_stratify(dat, treat ~ C1)
print(m.strat) # prints information about m.strat
Split data into pilot and analysis sets
Description
Given a data set and some parameters about how to split the data, this
function partitions the data accordingly and returns the partitioned data as
a list containing the analysis_set
and pilot_set
.
Usage
split_pilot_set(
data,
treat,
pilot_fraction = 0.1,
pilot_size = NULL,
group_by_covariates = NULL
)
Arguments
data |
|
treat |
string giving the name of column designating treatment assignment |
pilot_fraction |
numeric between 0 and 1 giving the proportion of controls to be allotted for building the prognostic score (default = 0.1) |
pilot_size |
alternative to pilot_fraction. Approximate number of
observations to be used in pilot set. Note that the actual pilot set size
returned may not be exactly |
group_by_covariates |
character vector giving the names of covariates to be grouped by (optional). If specified, the pilot set will be sampled in a stratified manner, so that the composition of the pilot set reflects the composition of the whole data set in terms of these covariates. The specified covariates must be categorical. |
Value
a list with analaysis_set and pilot_set
Examples
dat <- make_sample_data()
splt <- split_pilot_set(dat, "treat", 0.2)
# can be passed into auto_stratify if desired
a.strat <- auto_stratify(splt$analysis_set, "treat", outcome ~ X1,
pilot_sample = splt$pilot_set
)
Strata function from package Survival
Description
Strata function from package Survival
Strata Match
Description
Match within strata in series using optmatch. Note that this function
requires that the R package optmatch
is installed.
Usage
strata_match(object, model = NULL, method = "prop", k = 1)
Arguments
object |
a strata object |
model |
(optional) formula for matching. If left blank, all
columns of the analysis set in |
method |
either "prop" for propensity score matching based on a glm fit
with model |
k |
the number of control individuals to be matched to each treated
individual. If |
Value
a named factor with matching assignments
See Also
https://cran.r-project.org/package=optmatch
Examples
# make a sample data set
set.seed(1)
dat <- make_sample_data(n = 75)
# stratify with auto_stratify
a.strat <- auto_stratify(dat, "treat", outcome ~ X2, size = 25)
# 1:1 match based on propensity formula: treat ~ X1 + X2
# Requires optmatch package to be installed.
strata_match(a.strat, model = treat ~ X1 + X2, k = 1)
# full match within strata based on mahalanobis distance.
# Requires optmatch package to be installed.
strata_match(a.strat, model = treat ~ X1 + X2, method = "mahal", k = 1)
Match without Stratification
Description
Not meant to be called externally. Match a data set without stratifying.
Used to compare performance with and without stratification. Note that this
function requires that the R package optmatch
is installed.
Usage
strata_match_nstrat(object, model = NULL, k = 1)
Arguments
object |
a strata object |
model |
(optional) formula for matching. If left blank, all
columns of the analysis set in |
k |
the number of control individuals to be matched to each treated
individual. If |
Value
a named factor with matching assignments
See Also
https://cran.r-project.org/package=optmatch
stratamatch: stratify and match large data sets
Description
This package employs a pilot matching design to automatically stratify and
match large datasets. The manual_stratify
function allows
users to manually stratify a dataset based on categorical variables of
interest, while the auto_stratify
function does automatically
by allocating a held-aside (pilot) data set, fitting a prognostic score (see
Hansen (2008) <doi:10.1093/biomet/asn004>) on the pilot set, and stratifying
the data set based on prognostic score quantiles. The
strata_match
function then does optimal matching of the data
set within strata.
See Also
Summary for strata object
Description
Summarize number and sizes of strata in a strata
object. Also prints
number of strata with potential issues.
Usage
## S3 method for class 'strata'
summary(object, ...)
Arguments
object |
a |
... |
other arguments |
Details
For more information, access the issue table for your strata object with
mystrata$issue_table
.
Examples
dat <- make_sample_data()
m.strat <- manual_stratify(dat, treat ~ C1)
summary(m.strat) # Summarizes strata in m.strat
Warn if continuous
Description
Throws an error if a column is continuous
Usage
warn_if_continuous(column, name, force, n)
Arguments
column |
vector or factor column from a |
name |
name of the input column |
force |
a boolean. If true, warn but do not stop |
n |
the number of rows in the data set |
Details
Not meant to be called externally. Only categorical or binary covariates should be used to manually stratify a data set. However, it's hard to tell for sure if something is continuous or just discrete with real-numbered values. Returns without throwing an error if the column is a factor, but throws an error or warning if the column has many distinct values.
Value
Does not return anything