Help for package stratamatch

Type:

Package

Date:

2022-03-30

Title:

Stratification and Matching for Large Observational Data Sets

Version:

0.1.9

Maintainer:

Rachael C. Aikens <rockyaikens@gmail.com>

BugReports:

https://github.com/raikens1/stratamatch/issues

Description:

A pilot matching design to automatically stratify and match large datasets. The manual_stratify() function allows users to manually stratify a dataset based on categorical variables of interest, while the auto_stratify() function does automatically by allocating a held-aside (pilot) data set, fitting a prognostic score (see Hansen (2008) <doi:10.1093/biomet/asn004>) on the pilot set, and stratifying the data set based on prognostic score quantiles. The strata_match() function then does optimal matching of the data set in parallel within strata.

License:

GPL-3

Encoding:

UTF-8

LazyData:

true

Imports:

dplyr (≥ 0.8.3), Hmisc (≥ 4.2-0), magrittr (≥ 1.5), rlang (≥ 0.4.0), survival(≥ 2.44.1.1)

Depends:

R (≥ 3.4.0)

Suggests:

knitr, optmatch (≥ 0.9-11), rmarkdown, testthat (≥ 2.1.0), glmnet (≥ 4.0), randomForest (≥ 4.6-14)

URL:

https://github.com/raikens1/stratamatch

RoxygenNote:

7.1.2

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2022-03-31 00:19:08 UTC; rocky

Author:

Rachael C. Aikens [aut, cre], Joseph Rigdon [aut], Justin Lee [aut], Michael Baiocchi [aut], Jonathan Chen [aut]

Repository:

CRAN

Date/Publication:

2022-03-31 06:00:02 UTC

Pipe operator

Description

Pipe operator

Demographics and comorbidities of 10,157 ICU patients

Description

An deidentified data set containing the demographics, comorbidities, DNR code status, and surgical team assignment of 10,157 patients in the Stanford University Hospital Intensive Care Unit (ICU). This data was extracted from the electronic record system, deidentified, and made publically available by Chavez et al (2018) <doi:10.1371/journal.pone.0190569>. It was reprocessed for use in the stratamatch package as a sample data set. For more details on the data extraction and inclusion criteria, see Chavez et al.

Usage

ICU_data

Format

A data frame with 10157 rows and 29 variables:

patid: patient id, numeric
Birth.preTimeDays: age of patient at time of admission to the ICU in days, numeric
Female.pre: whether the patient was documented to be female prior to ICU visit, binary
RaceAsian.pre: whether the patient's race/ethnicity was documented as Asian prior to ICU visit, binary
RaceUnknown.pre: whether the patient's race/ethnicity was unknown prior to ICU visit, binary
RaceOther.pre: whether the patient's race/ethnicity was documented as Other" prior to ICU visit, binary
RaceBlack.pre: whether the patient's race/ethnicity was documented as Black/African American prior to ICU visit, binary
RacePacificIslander.pre: whether the patient's race/ethnicity was documented as PacificIslander prior to ICU visit, binary
RaceNativeAmerican.pre: whether the patient's race/ethnicity was documented as Native American prior to ICU visit, binary
self_pay: whether the patient was "self pay" (i.e. uninsured), binary
all_latinos: whether the patient was documented to be latino prior to ICU visit, binary
DNR: whether the patient had code status set to any DNR "Do not resuscitate" order at any point during their ICU stay, binary
surgicalTeam: whether the patient was assigned to a surgical team at any point during their ICU stay, binary

Details

License information for this data is as follows:

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Source

https://simtk.org/frs/download_confirm.php/latestzip/1969/ICUDNR-latest.zip?group_id=892

Auto Stratify

Description

Automatically creates strata for matching based on a prognostic score formula or a vector of prognostic scores already estimated by the user. Creates a auto_strata object, which can be passed to strata_match for stratified matching or unpacked by the user to be matched by some other means.

Usage

auto_stratify(
  data,
  treat,
  prognosis,
  outcome = NULL,
  size = 2500,
  pilot_fraction = 0.1,
  pilot_size = NULL,
  pilot_sample = NULL,
  group_by_covariates = NULL
)

Arguments

data

data.frame with observations as rows, features as columns

treat

string giving the name of column designating treatment assignment

prognosis

information on how to build prognostic scores. Three different input types are allowed:

vector of prognostic scores for all individuals in the data set. Should be in the same order as the rows of data.
a formula for fitting a prognostic model
an already-fit prognostic score model

outcome

string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula

size

numeric, desired size of strata (default = 2500)

pilot_fraction

numeric between 0 and 1 giving the proportion of controls to be allotted for building the prognostic score (default = 0.1)

pilot_size

alternative to pilot_fraction. Approximate number of observations to be used in pilot set. Note that the actual pilot set size returned may not be exactly pilot_size if group_by_covariates is specified because balancing by covariates may result in deviations from desired size. If pilot_size is specified, pilot_fraction is ignored.

pilot_sample

a data.frame of held aside samples for building prognostic score model. If pilot_sample is specified, pilot_size and pilot_fraction are both ignored.

group_by_covariates

character vector giving the names of covariates to be grouped by (optional). If specified, the pilot set will be sampled in a stratified manner, so that the composition of the pilot set reflects the composition of the whole data set in terms of these covariates. The specified covariates must be categorical.

Details

Stratifying by prognostic score quantiles can be more effective than manually stratifying a data set because the prognostic score is continuous, thus the strata produced tend to be of equal size with similar prognosis.

Automatic stratification requires information on how the prognostic scores should be derived. This is primarily determined by the specifciation of the prognosis argument. Three main forms of input for prognosis are allowed:

A vector of prognostic scores. This vector should be the same length and order of the rows in the data set. If this method is used, the outcome argument must also be specified; this is simply a string giving the name of the column which contains outcome information.
A formula for prognosis (e.g. outcome ~ X1 + X2). If this method is used, auto_stratify will automatically split the data set into a pilot_set and an analysis_set. The pilot set will be used to fit a logistic regression model for outcome in the absence of treatment, and this model will be used to estimate prognostic scores on the analysis set. The analysis set will then be stratified based on the estimated prognostic scores. In this case the outcome argument need not be specified since it can be inferred from the input formula.
A model for prognosis (e.g. a glm object). If this method is used, the outcome argument must also be specified

Value

Returns an auto_strata object. This contains:

outcome - a string giving the name of the column where outcome information is stored
treat - a string giving the name of the column encoding treatment assignment
analysis_set - the data set with strata assignments
call - the call to auto_stratify used to generate this object
issue_table - a table of each stratum and potential issues of size and treat:control balance. In small or imbalanced strata, it may be difficult or infeasible to find high-quality matches, while very large strata may be computationally intensive to match.
strata_table - a table of each stratum and the prognostic score quantile bin to which it corresponds
prognostic_scores - a vector of prognostic scores.
prognostic_model - a model for prognosis fit on a pilot data set. Will be NULL if a vector of prognostic scores was provided as the prognosis argument to auto_stratify rather than a model or formula.
pilot_set - the set of controls used to fit the prognostic model. These are excluded from subsequent analysis so that the prognostic score is not overfit to the data used to estimate the treatment effect. Will be NULL if a pre-fit model or a vector of prognostic scores was provided as the prognosis argument to auto_stratify rather than formula.

Troubleshooting

This section suggests fixes for common errors that appear while fitting the prognostic score or using it to estimate prognostic scores on the analysis set.

Encountered an error while fitting the prognostic model... numeric probabilities 0 or 1 produced. This error means that the prognostic model can perfectly separate positive from negative outcomes. Estimating a treatment effect in this case is unwise since an individual's baseline characteristics perfectly determine their outcome, regardless of whether they recieve the treatment. This error may also appear on rare occaisions when your pilot set is very small (number of observations approximately <= number of covariates in the prognostic model), so that perfect separation happens by chance.
Encountered an error while estimating prognostic scores ... factor X has new levels ... This may indicate that some value(s) of one or more categorical variables appear in the analysis set which were not seen in the pilot set. This means that when we try to obtain prognostic scores for our analysis set, we run into some new value that our prognostic model was not prepared to handle. There are a few options we have to troubleshoot this problem:
- Rejection sampling. Run auto_stratify again with the same arguments until this error does not occur (i.e. until some observations with the missing value are randomly selected into the pilot set)
- Eliminate this covariate from the prognostic formula.
- Remove observations with the rare covariate value from the entire data set. Consider carefully how this exclusion might affect your results.

Other errors or warnings can occur if the pilot set is too small and the prognostic formula is too complicated. Always make sure that the number of observations in the pilot set is large enough that you can confidently fit a prognostic model with the number of covariates you want.

Examples

# make sample data set
set.seed(111)
dat <- make_sample_data(n = 75)

# construct a pilot set, build a prognostic score for `outcome` based on X2
# and stratify the data set based on the scores into sets of about 25
# observations
a.strat_formula <- auto_stratify(dat, "treat", outcome ~ X2, size = 25)

# stratify the data set based on a model for prognosis
pilot_data <- make_sample_data(n = 30)
prognostic_model <- glm(outcome ~ X2, pilot_data, family = "binomial")
a.strat_model <- auto_stratify(dat, "treat", prognostic_model,
  outcome = "outcome", size = 25
)

# stratify the data set based on a vector of prognostic scores
prognostic_scores <- predict(prognostic_model,
  newdata = dat,
  type = "response"
)
a.strat_scores <- auto_stratify(dat, "treat", prognostic_scores,
  outcome = "outcome", size = 25
)

# diagnostic plots
plot(a.strat_formula)
plot(a.strat_formula, type = "AC", propensity = treat ~ X1, stratum = 1)
plot(a.strat_formula, type = "hist", propensity = treat ~ X1, stratum = 1)
plot(a.strat_formula, type = "residual")

Build Autostrata object

Description

Not meant to be called externally. Given the arguments to auto_stratify, build the prognostic scores and return the analysis set, the prognostic scores, the pilot set, the prognostic model, and the outcome string. The primary function of this code is to determine the type of prognosis and handle it appropriately.

Usage

build_autostrata(
  data,
  treat,
  prognosis,
  outcome,
  pilot_fraction,
  pilot_size,
  pilot_sample,
  group_by_covariates
)

Arguments

data

data.frame with observations as rows, features as columns

treat

string giving the name of column designating treatment assignment

prognosis

information on how to build prognostic scores. Three different input types are allowed:

vector of prognostic scores for all individuals in the data set. Should be in the same order as the rows of data.
a formula for fitting a prognostic model
an already-fit prognostic score model

outcome

string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula

pilot_fraction

numeric between 0 and 1 giving the proportion of controls to be allotted for building the prognostic score (default = 0.1)

pilot_size

pilot_sample

a data.frame of held aside samples for building prognostic score model. If pilot_sample is specified, pilot_size and pilot_fraction are both ignored.

group_by_covariates

Value

a list of: analysis set, prognostic scores, pilot set, prognostic model, and outcome string

Check inputs from auto_stratify

Description

Not meant to be called externally. Throws errors if basic auto_stratify inputs are incorrect.

Usage

check_base_inputs_auto_stratify(data, treat, outcome)

Arguments

data

data.frame with observations as rows, features as columns

treat

string giving the name of column designating treatment assignment

outcome

string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula

Value

nothing; produces errors and warnings if anything is wrong

Check inputs to manual_stratify

Description

Not meant to be called externally. Checks validity of formula, types of all inputs to manual stratify, and warns if covariates are continuous.

Usage

check_inputs_manual_stratify(data, strata_formula, force)

Arguments

data

data.frame with observations as rows, features as columns

strata_formula

the formula to be used for stratification. (e.g. treat ~ X1) the variable on the left is taken to be the name of the treatment assignment column, and the variables on the left are taken to be the variables by which the data should be stratified

force

a boolean. If true, run even if a variable appears continuous. (default = FALSE)

Value

nothing; produces errors and warnings if anything is wrong

Check inputs to any matching function

Description

Check inputs to any matching function

Usage

check_inputs_matcher(object, model, k)

Arguments

object

a strata object

model

(optional) formula for matching. If left blank, all columns of the analysis set in object will be used as covariates in the propensity model or mahalanobis match (except outcome, treatment and stratum)

k

the number of control individuals to be matched to each treated individual. If "k = full" is used, fullmatching is done instead of pairmatching

Value

nothing

Check Outcome

Description

Checks that outcome is a string which is a column in the data

Usage

check_outcome(outcome, data, treat)

Arguments

outcome

string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula

data

data.frame with observations as rows, features as columns

treat

string giving the name of column designating treatment assignment

Value

nothing

Check Pilot set options

Description

Check Pilot set options

Usage

check_pilot_set_options(
  pilot_fraction,
  pilot_size,
  group_by_covariates,
  data,
  n_c
)

Arguments

pilot_fraction

numeric between 0 and 1 giving the proportion of controls to be allotted for building the prognostic score (default = 0.1)

pilot_size

group_by_covariates

data

data.frame with observations as rows, features as columns

n_c

number of control observations in data

Value

nothing

Check Prognostic Formula

Description

Check Prognostic Formula

Usage

check_prognostic_formula(prog_formula, data, outcome, treat)

Arguments

prog_formula

a formula for prognostic score

data

data.frame with observations as rows, features as columns

outcome

string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula

treat

string giving the name of column designating treatment assignment

Value

nothing

Check Propensity Formula

Description

Check Propensity Formula

Usage

check_prop_formula(prop_formula, data, treat)

Arguments

prop_formula

a formula

data

the analysis set data within a stratum

treat

the name of the treatment assignment column

Value

nothing

Check Scores

Description

Checks that prognostic scores are the same length as data

Usage

check_scores(prognostic_scores, data, outcome)

Arguments

prognostic_scores

a numeric vector

data

data.frame with observations as rows, features as columns

outcome

string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula

Value

nothing

Estimate Prognostic Scores

Description

Tries to make prognostic scores. If successfull, returns them, otherwise throws an error message. Common failure mode is that the prognostic score is built on some categorical variable that takes on some values in the analysis set that are never seen in the pilot set. Outputs are on the response scale, (rather than the linear predictor), so the score is the expected value of the outcome under the control assignement based on the observed covariates.

Usage

estimate_scores(prognostic_model, analysis_set)

Arguments

prognostic_model

Model of prognosis

analysis_set

data set on which prognostic scores should be estimated

Value

vector of prognostic scores

Extract cutoffs between strata

Description

By default, returns only the internal cut points. Cutoffs at 0 and 1 are implied.

Usage

extract_cut_points(x)

Arguments

x

an autostrata object

Value

a vector of the score values delineating cutoffs between strata

Examples

dat <- make_sample_data()
a.strat <- auto_stratify(dat, "treat", outcome ~ X1 + X2)
cutoffs <- extract_cut_points(a.strat)

Extract cutoffs between strata

Description

Extract cutoffs between strata

Usage

## S3 method for class 'auto_strata'
extract_cut_points(x)

Arguments

x

an autostrata object

Value

a vector of the score values delineating cutoffs between strata

Fit Prognostic Model

Description

Given a pilot set and a prognostic formula, return the fitted formula. If the outcome is binary, fit a logistic regression. Otherwise, fit a linear model.

Usage

fit_prognostic_model(dat, prognostic_formula, outcome)

Arguments

dat

data.frame on which model should be fit

prognostic_formula

formula for prognostic model

outcome

string giving name of column of data where outcomes are recorded

Value

a glm or lm object fit from prognostic_formula on data

Get Issues

Description

Helper for make_issue_table to return issues string. Given a row which summarizes the Treat, Control, Total, and Control_Proportion of a stratum, return a string of potential issues with the stratum.

Usage

get_issues(row)

Arguments

row

a row of the data.frame produced in make_issue_table

Value

Returns a string of potential issues

Parse `propensity` input to obtain propensity scores

Description

the propensity input to plot.auto_strata or plot.manual_strata can be propensity scores, a propensity model, or a formula for propensity score. This function figures out which type propensity is and returns the propensity scores. Returns the propensity score on the response scale (rather than the linear predictor), so the scores are the predited probabilities of treatment.

Usage

get_prop_scores(propensity, data, treat)

Arguments

propensity

either a vector of propensity scores, a model for propensity, or a formula for propensity scores

data

the analysis set data within a stratum

treat

the name of the treatment assignment column

Value

vector of propensity scores

Checks `auto_strata` class

Description

Checks if the target object is an auto_strata object.

Usage

is.auto_strata(object)

Arguments

object

any R object

Value

Returns TRUE if its argument has auto_strata among its classes and FALSE otherwise.

Examples

dat <- make_sample_data()
a.strat <- auto_stratify(dat, "treat", outcome ~ X1 + X2)
is.auto_strata(a.strat) # returns TRUE

Checks `manual_strata` class

Description

Checks if the target object is a manual_strata object.

Usage

is.manual_strata(object)

Arguments

object

any R object

Value

Returns TRUE if its argument has manual_strata among its classes and FALSE otherwise.

Examples

dat <- make_sample_data()
m.strat <- manual_stratify(dat, treat ~ C1)
is.manual_strata(m.strat) # returns TRUE

Checks `strata` class

Description

Checks if the target object is a strata object.

Usage

is.strata(object)

Arguments

object

any R object

Value

Returns TRUE if its argument has strata among its classes and FALSE otherwise.

Examples

dat <- make_sample_data()
m.strat <- manual_stratify(dat, treat ~ C1)
is.strata(m.strat) # returns TRUE

Check if a vector is binary

Description

return TRUE if the input is logical or if it contains only 0's and 1's

Usage

is_binary(col)

Arguments

col

a column from a data frame

Value

logical

Make Size-Ratio plot

Description

Not meant to be called externally. Helper plot function for strata. Produces a scatter plot of strata by size and control proportion.

Usage

make_SR_plot(x, label)

Arguments

x

a strata object returned by auto_stratify or manual_stratify

label

ignored unless type = "SR". If TRUE, a clickable plot is produced. The user may click on any number of strata and press finish to have those strata labeled. Note: uses identify, which may not be supported on some devices

Make Assignment-Control plot

Description

Not meant to be called externally. Helper plot function for strata object with type = "AC". Produces a Assignment-Control plot of stratum s

Usage

make_ac_plot(
  x,
  propensity,
  strat,
  strata_lines,
  jitter_prognosis,
  jitter_propensity
)

Arguments

x

an auto_strata object returned by auto_stratify

propensity

ignored unless type = "hist" or type = "AC". Specifies propensity score information for plots where this is required. Accepts either a vector of propensity scores, a glm model for propensity scores, or a formula for fitting a propensity score model.

strat

the number code of the stratum to be plotted. If "all", plots all strata.

strata_lines

default = TRUE. Ignored unless type = "AC". If TRUE, lines on the plot indicate strata cut points.

jitter_prognosis

ignored unless type = "AC". Amount of uniform random noise to add to prognostic scores in plot.

jitter_propensity

ignored unless type = "AC". Amount of uniform random noise to add to propensity scores in plot.

Make strata table

Description

Make strata table

Usage

make_autostrata_table(qcut)

Arguments

qcut

the prognostic score quantile cuts

Value

data.frame of strata definitions

Make histogram plot

Description

Not meant to be called externally. Helper plot function for strata object with type = "hist". Produces a histogram of propensity scores within a stratum

Usage

make_hist_plot(x, propensity, strat)

Arguments

x

a strata object returned by auto_stratify or manual_stratify

propensity

strat

the number code of the strata to be plotted. If "all", plots all strata

Make Issue Table

Description

Not meant to be called externally. Produce table of the number of treated and control individuals in each stratum. Also checks for potential problems with treat/control ratio or stratum size which might result in slow or poor quality matching.

Usage

make_issue_table(a_set, treat)

Arguments

a_set

data.frame with observations as rows, features as columns. This should be the analysis set from the recently stratified data.

treat

string name of treatment column

Value

Returns a 3 by [number of strata] dataframe with Treat, Control, Total, Control Proportion, and Potential Issues

Make match distances within strata

Description

Makes the match distance with strata specifications for strata_match. This function is largely unecessary to call outside of stratamatch, but it is exported for the benefit of the user to aid in debugging. Note that this function requires that the R package optmatch is installed.

Usage

make_match_distances(object, model, method)

Arguments

object

a strata object

model

method

either "prop" for propensity score matching based on a glm fit with model model, or "mahal" for mahalanobis distance matching by the covariates in model.

Value

a match distance matrix for optmatch

Examples


dat <- make_sample_data(n = 75)

# stratify with auto_stratify
a.strat <- auto_stratify(dat, "treat", outcome ~ X2, size = 25)

# make match distances.  Requires optmatch package to be installed.

md <- make_match_distances(a.strat, treat ~ X1 + X2, method = "mahal")

Make Residual Plot

Description

Not yet implemented. Not meant to be called externally. Helper plot function for strata object with type = "residual". Produces the diagnostic plots for the prognostic score model

Usage

make_resid_plot(x)

Arguments

x

an auto_strata object returned by auto_stratify

Make sample data

Description

Makes a simple data frame with treat (binary), outcome (binary), and five covariates: X1 (continuous), X2 (continuous), B1 (binary), B2 (binary), and C1 (categorical). Probability outcome = 1 is sigmoid(treat + X1). Probability treatment = 1 is sigmoid(- 0.2 * X1 + X2 - B1 + 2 * B2)

Usage

make_sample_data(n = 100)

Arguments

n

the size of the desired data set

Examples

# make sample data set of 30 observations
dat <- make_sample_data(n = 30)

Manual Stratify

Description

Stratifies a data set based on a set of blocking covariates specified by the user. Creates a manual_strata object, which can be passed to strata_match for stratified matching or unpacked by the user to be matched by some other means.

Usage

manual_stratify(data, strata_formula, force = FALSE)

Arguments

data

data.frame with observations as rows, features as columns

strata_formula

force

a boolean. If true, run even if a variable appears continuous. (default = FALSE)

Value

Returns a manual_strata object. This contains:

treat - a string giving the name of the column encoding treatment assignment
covariates - a character vector with the names of the categorical columns on which the data were stratified
analysis_set - the data set with strata assignments
call - the call to manual_stratify used to generate this object
issue_table - a table of each stratum and potential issues of size and treat:control balance. In small or imbalanced strata, it may be difficult or infeasible to find high-quality matches, while very large strata may be computationally intensive to match.
strata_table - a table of each stratum and the covariate bin to which it corresponds

Examples

# make sample data set
dat <- make_sample_data(n = 75)

# stratify based on B1 and B2
m.strat <- manual_stratify(dat, treat ~ B1 + B2)

# diagnostic plot
plot(m.strat)

New Autostrata

Description

Basic constructor for an auto_strata object. These objects hold all the information associated with a dataset that has been stratified via auto_stratify. This object may be passed to strata_match to be matched or it may be unpacked by the user to be matched by other means.

Usage

new_auto_strata(
  outcome,
  treat,
  analysis_set = NULL,
  call = NULL,
  issue_table = NULL,
  strata_table = NULL,
  prognostic_scores = NULL,
  prognostic_model = NULL,
  pilot_set = NULL
)

Arguments

outcome

a string giving the name of the column where outcome information is stored

treat

a string giving the name of the column where treatment information is stored

analysis_set

the data set which will be stratified

call

the call to auto_stratify used to generate this object

issue_table

a table of each stratum and potential issues of size and treat:control balance

strata_table

a table of each stratum and the prognostic score quantile bin this corresponds to

prognostic_scores

a vector of prognostic scores.

prognostic_model

a model for prognosis fit on a separate data set.

pilot_set

the set of controls used to fit the prognostic model. These are excluded from subsequent analysis so that the prognostic score is not overfit to the data used to estimate the treatment effect.

Value

a basic auto_strata object

New Manual Strata

Description

Basic constructor for an manual_strata object. These objects hold all the information associated with a dataset that has been stratified via manual_stratify. This object may be passed to strata_match to be matched or it may be unpacked by the user to be matched by other means.

Usage

new_manual_strata(
  treat = character(),
  covariates = character(),
  analysis_set = data.frame(),
  call = call(),
  issue_table = data.frame(),
  strata_table = data.frame()
)

Arguments

treat

a string giving the name of the column where treatment information is stored

covariates

a character vector with the names of the categorical columns on which to stratify

analysis_set

the data set which will be stratified

call

the call to manual_stratify used to generate this object

issue_table

a table of each stratum and potential issues of size and treat:control balance

strata_table

a table of each stratum and the covariate bin this corresponds to

Value

a basic manual_strata object

Plot method for `auto_strata` object

Description

Generates diagnostic plots for the product of a stratification by auto_stratify. There are four plot types:

"SR" (default) - produces a scatter plot of strata by size and treat:control ratio
"hist" - produces a histogram of propensity scores within a stratum
"AC" - produces a Assignment-Control plot of individuals within a stratum
"residual" - produces a residual plot for the prognostic model

Usage

## S3 method for class 'auto_strata'
plot(
  x,
  type = "SR",
  label = FALSE,
  stratum = "all",
  strata_lines = TRUE,
  jitter_prognosis,
  jitter_propensity,
  propensity,
  ...
)

Arguments

x

an auto_strata object returned by auto_stratify

type

string giving the plot type (default = "SR"). Other options are "hist", "AC" and "residual"

label

stratum

ignored unless type = "hist" or type = "AC". A number specifying which stratum to plot.

strata_lines

default = TRUE. Ignored unless type = "AC". If TRUE, lines on the plot indicate strata cut points.

jitter_prognosis

ignored unless type = "AC". Amount of uniform random noise to add to prognostic scores in plot.

jitter_propensity

ignored unless type = "AC". Amount of uniform random noise to add to propensity scores in plot.

propensity

...

other arguments

Examples

dat <- make_sample_data()
a.strat <- auto_stratify(dat, "treat", outcome ~ X1 + X2)
plot(a.strat) # makes size-ratio scatter plot
plot(a.strat, type = "hist", propensity = treat ~ X1, stratum = 1)
plot(a.strat, type = "AC", propensity = treat ~ X1, stratum = 1)
plot(a.strat, type = "residual")

Plot method for `manual_strata` object

Description

Generates diagnostic plots for the product of a stratification by manual_stratify. There are two plot types:

"SR" (default) - produces a scatter plot of strata by size and treat:control ratio
"hist" - produces a histogram of propensity scores within a stratum.

Note that residual plots and AC plots are not supported for manual_strata objects because no prognostic model is fit.

Usage

## S3 method for class 'manual_strata'
plot(x, type = "SR", label = FALSE, stratum = "all", propensity, ...)

Arguments

x

a manual_strata object returned by manual_stratify

type

string giving the plot type (default = "SR"). Other option is "hist"

label

stratum

ignored unless type = "hist". A number specifying which stratum to plot.

propensity

ignored unless type = "hist". Specifies propensity score information for plots where this is required. Accepts either a vector of propensity scores, a glm model for propensity scores, or a formula for fitting a propensity score model.

...

other arguments

Examples

dat <- make_sample_data()
m.strat <- manual_stratify(dat, treat ~ C1)
plot(m.strat) # makes size-ratio scatter plot
plot(m.strat, type = "hist", propensity = treat ~ X1, stratum = 1)

Print Auto Strata

Description

Print method for auto_strata object

Usage

## S3 method for class 'auto_strata'
print(x, ...)

Arguments

x

an auto_strata object

...

other arguments

Examples

dat <- make_sample_data()
a.strat <- auto_stratify(dat, "treat", outcome ~ X1 + X2)
print(a.strat) # prints information about a.strat

Print Manual Strata

Description

Print method for manual_strata object

Usage

## S3 method for class 'manual_strata'
print(x, ...)

Arguments

x

a manual_strata object

...

other arguments

Examples

dat <- make_sample_data()
m.strat <- manual_stratify(dat, treat ~ C1)
print(m.strat) # prints information about m.strat

Split data into pilot and analysis sets

Description

Given a data set and some parameters about how to split the data, this function partitions the data accordingly and returns the partitioned data as a list containing the analysis_set and pilot_set.

Usage

split_pilot_set(
  data,
  treat,
  pilot_fraction = 0.1,
  pilot_size = NULL,
  group_by_covariates = NULL
)

Arguments

data

data.frame with observations as rows, features as columns

treat

string giving the name of column designating treatment assignment

pilot_fraction

numeric between 0 and 1 giving the proportion of controls to be allotted for building the prognostic score (default = 0.1)

pilot_size

group_by_covariates

Value

a list with analaysis_set and pilot_set

Examples

dat <- make_sample_data()
splt <- split_pilot_set(dat, "treat", 0.2)
# can be passed into auto_stratify if desired
a.strat <- auto_stratify(splt$analysis_set, "treat", outcome ~ X1,
  pilot_sample = splt$pilot_set
)

Strata function from package Survival

Description

Strata function from package Survival

Strata Match

Description

Match within strata in series using optmatch. Note that this function requires that the R package optmatch is installed.

Usage

strata_match(object, model = NULL, method = "prop", k = 1)

Arguments

object

a strata object

model

method

either "prop" for propensity score matching based on a glm fit with model model, or "mahal" for mahalanobis distance matching by the covariates in model.

k

the number of control individuals to be matched to each treated individual. If "k = full" is used, fullmatching is done instead of pairmatching

Value

a named factor with matching assignments

Examples

# make a sample data set
set.seed(1)
dat <- make_sample_data(n = 75)

# stratify with auto_stratify
a.strat <- auto_stratify(dat, "treat", outcome ~ X2, size = 25)

# 1:1 match based on propensity formula: treat ~ X1 + X2
# Requires optmatch package to be installed.

strata_match(a.strat, model = treat ~ X1 + X2, k = 1)


# full match within strata based on mahalanobis distance.
# Requires optmatch package to be installed.

strata_match(a.strat, model = treat ~ X1 + X2, method = "mahal", k = 1)

Match without Stratification

Description

Not meant to be called externally. Match a data set without stratifying. Used to compare performance with and without stratification. Note that this function requires that the R package optmatch is installed.

Usage

strata_match_nstrat(object, model = NULL, k = 1)

Arguments

object

a strata object

model

k

the number of control individuals to be matched to each treated individual. If "k = full" is used, fullmatching is done instead of pairmatching

Value

a named factor with matching assignments

stratamatch: stratify and match large data sets

Description

This package employs a pilot matching design to automatically stratify and match large datasets. The manual_stratify function allows users to manually stratify a dataset based on categorical variables of interest, while the auto_stratify function does automatically by allocating a held-aside (pilot) data set, fitting a prognostic score (see Hansen (2008) <doi:10.1093/biomet/asn004>) on the pilot set, and stratifying the data set based on prognostic score quantiles. The strata_match function then does optimal matching of the data set within strata.

Summary for strata object

Description

Summarize number and sizes of strata in a strata object. Also prints number of strata with potential issues.

Usage

## S3 method for class 'strata'
summary(object, ...)

Arguments

object

a strata object

...

other arguments

Details

For more information, access the issue table for your strata object with mystrata$issue_table.

Examples

dat <- make_sample_data()
m.strat <- manual_stratify(dat, treat ~ C1)
summary(m.strat) # Summarizes strata in m.strat

Warn if continuous

Description

Throws an error if a column is continuous

Usage

warn_if_continuous(column, name, force, n)

Arguments

column

vector or factor column from a data.frame

name

name of the input column

force

a boolean. If true, warn but do not stop

n

the number of rows in the data set

Details

Not meant to be called externally. Only categorical or binary covariates should be used to manually stratify a data set. However, it's hard to tell for sure if something is continuous or just discrete with real-numbered values. Returns without throwing an error if the column is a factor, but throws an error or warning if the column has many distinct values.

Value

Does not return anything

Pipe operator

Description

Demographics and comorbidities of 10,157 ICU patients

Description

Usage

Format

Details

Source

Auto Stratify

Description

Usage

Arguments

Details

Value

Troubleshooting

See Also

Examples

Build Autostrata object

Description

Usage

Arguments

Value

See Also

Check inputs from auto_stratify

Description

Usage

Arguments

Value

Check inputs to manual_stratify

Description

Usage

Arguments

Value

Check inputs to any matching function

Description

Usage

Arguments

Value

Check Outcome

Description

Usage

Arguments

Value

Check Pilot set options

Description

Usage

Arguments

Value

Check Prognostic Formula

Description

Usage

Arguments

Value

Check Propensity Formula

Description

Usage

Arguments

Value

Check Scores

Description

Usage

Arguments

Value

Estimate Prognostic Scores

Description

Usage

Arguments

Value

Extract cutoffs between strata

Description

Usage

Arguments

Value

Examples

Extract cutoffs between strata

Description

Usage

Arguments

Value

Fit Prognostic Model

Parse `propensity` input to obtain propensity scores

Checks `auto_strata` class

Checks `manual_strata` class

Checks `strata` class