Type: Package
Date: 2022-03-30
Title: Stratification and Matching for Large Observational Data Sets
Version: 0.1.9
Maintainer: Rachael C. Aikens <rockyaikens@gmail.com>
BugReports: https://github.com/raikens1/stratamatch/issues
Description: A pilot matching design to automatically stratify and match large datasets. The manual_stratify() function allows users to manually stratify a dataset based on categorical variables of interest, while the auto_stratify() function does automatically by allocating a held-aside (pilot) data set, fitting a prognostic score (see Hansen (2008) <doi:10.1093/biomet/asn004>) on the pilot set, and stratifying the data set based on prognostic score quantiles. The strata_match() function then does optimal matching of the data set in parallel within strata.
License: GPL-3
Encoding: UTF-8
LazyData: true
Imports: dplyr (≥ 0.8.3), Hmisc (≥ 4.2-0), magrittr (≥ 1.5), rlang (≥ 0.4.0), survival(≥ 2.44.1.1)
Depends: R (≥ 3.4.0)
Suggests: knitr, optmatch (≥ 0.9-11), rmarkdown, testthat (≥ 2.1.0), glmnet (≥ 4.0), randomForest (≥ 4.6-14)
URL: https://github.com/raikens1/stratamatch
RoxygenNote: 7.1.2
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2022-03-31 00:19:08 UTC; rocky
Author: Rachael C. Aikens [aut, cre], Joseph Rigdon [aut], Justin Lee [aut], Michael Baiocchi [aut], Jonathan Chen [aut]
Repository: CRAN
Date/Publication: 2022-03-31 06:00:02 UTC

Pipe operator

Description

Pipe operator


Demographics and comorbidities of 10,157 ICU patients

Description

An deidentified data set containing the demographics, comorbidities, DNR code status, and surgical team assignment of 10,157 patients in the Stanford University Hospital Intensive Care Unit (ICU). This data was extracted from the electronic record system, deidentified, and made publically available by Chavez et al (2018) <doi:10.1371/journal.pone.0190569>. It was reprocessed for use in the stratamatch package as a sample data set. For more details on the data extraction and inclusion criteria, see Chavez et al.

Usage

ICU_data

Format

A data frame with 10157 rows and 29 variables:

patid

patient id, numeric

Birth.preTimeDays

age of patient at time of admission to the ICU in days, numeric

Female.pre

whether the patient was documented to be female prior to ICU visit, binary

RaceAsian.pre

whether the patient's race/ethnicity was documented as Asian prior to ICU visit, binary

RaceUnknown.pre

whether the patient's race/ethnicity was unknown prior to ICU visit, binary

RaceOther.pre

whether the patient's race/ethnicity was documented as Other" prior to ICU visit, binary

RaceBlack.pre

whether the patient's race/ethnicity was documented as Black/African American prior to ICU visit, binary

RacePacificIslander.pre

whether the patient's race/ethnicity was documented as PacificIslander prior to ICU visit, binary

RaceNativeAmerican.pre

whether the patient's race/ethnicity was documented as Native American prior to ICU visit, binary

self_pay

whether the patient was "self pay" (i.e. uninsured), binary

all_latinos

whether the patient was documented to be latino prior to ICU visit, binary

DNR

whether the patient had code status set to any DNR "Do not resuscitate" order at any point during their ICU stay, binary

surgicalTeam

whether the patient was assigned to a surgical team at any point during their ICU stay, binary

Details

License information for this data is as follows:

Copyright (c) 2016, Stanford University

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Source

https://simtk.org/frs/download_confirm.php/latestzip/1969/ICUDNR-latest.zip?group_id=892


Auto Stratify

Description

Automatically creates strata for matching based on a prognostic score formula or a vector of prognostic scores already estimated by the user. Creates a auto_strata object, which can be passed to strata_match for stratified matching or unpacked by the user to be matched by some other means.

Usage

auto_stratify(
  data,
  treat,
  prognosis,
  outcome = NULL,
  size = 2500,
  pilot_fraction = 0.1,
  pilot_size = NULL,
  pilot_sample = NULL,
  group_by_covariates = NULL
)

Arguments

data

data.frame with observations as rows, features as columns

treat

string giving the name of column designating treatment assignment

prognosis

information on how to build prognostic scores. Three different input types are allowed:

  1. vector of prognostic scores for all individuals in the data set. Should be in the same order as the rows of data.

  2. a formula for fitting a prognostic model

  3. an already-fit prognostic score model

outcome

string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula

size

numeric, desired size of strata (default = 2500)

pilot_fraction

numeric between 0 and 1 giving the proportion of controls to be allotted for building the prognostic score (default = 0.1)

pilot_size

alternative to pilot_fraction. Approximate number of observations to be used in pilot set. Note that the actual pilot set size returned may not be exactly pilot_size if group_by_covariates is specified because balancing by covariates may result in deviations from desired size. If pilot_size is specified, pilot_fraction is ignored.

pilot_sample

a data.frame of held aside samples for building prognostic score model. If pilot_sample is specified, pilot_size and pilot_fraction are both ignored.

group_by_covariates

character vector giving the names of covariates to be grouped by (optional). If specified, the pilot set will be sampled in a stratified manner, so that the composition of the pilot set reflects the composition of the whole data set in terms of these covariates. The specified covariates must be categorical.

Details

Stratifying by prognostic score quantiles can be more effective than manually stratifying a data set because the prognostic score is continuous, thus the strata produced tend to be of equal size with similar prognosis.

Automatic stratification requires information on how the prognostic scores should be derived. This is primarily determined by the specifciation of the prognosis argument. Three main forms of input for prognosis are allowed:

  1. A vector of prognostic scores. This vector should be the same length and order of the rows in the data set. If this method is used, the outcome argument must also be specified; this is simply a string giving the name of the column which contains outcome information.

  2. A formula for prognosis (e.g. outcome ~ X1 + X2). If this method is used, auto_stratify will automatically split the data set into a pilot_set and an analysis_set. The pilot set will be used to fit a logistic regression model for outcome in the absence of treatment, and this model will be used to estimate prognostic scores on the analysis set. The analysis set will then be stratified based on the estimated prognostic scores. In this case the outcome argument need not be specified since it can be inferred from the input formula.

  3. A model for prognosis (e.g. a glm object). If this method is used, the outcome argument must also be specified

Value

Returns an auto_strata object. This contains:

Troubleshooting

This section suggests fixes for common errors that appear while fitting the prognostic score or using it to estimate prognostic scores on the analysis set.

Other errors or warnings can occur if the pilot set is too small and the prognostic formula is too complicated. Always make sure that the number of observations in the pilot set is large enough that you can confidently fit a prognostic model with the number of covariates you want.

See Also

manual_stratify, new_auto_strata

Examples

# make sample data set
set.seed(111)
dat <- make_sample_data(n = 75)

# construct a pilot set, build a prognostic score for `outcome` based on X2
# and stratify the data set based on the scores into sets of about 25
# observations
a.strat_formula <- auto_stratify(dat, "treat", outcome ~ X2, size = 25)

# stratify the data set based on a model for prognosis
pilot_data <- make_sample_data(n = 30)
prognostic_model <- glm(outcome ~ X2, pilot_data, family = "binomial")
a.strat_model <- auto_stratify(dat, "treat", prognostic_model,
  outcome = "outcome", size = 25
)

# stratify the data set based on a vector of prognostic scores
prognostic_scores <- predict(prognostic_model,
  newdata = dat,
  type = "response"
)
a.strat_scores <- auto_stratify(dat, "treat", prognostic_scores,
  outcome = "outcome", size = 25
)

# diagnostic plots
plot(a.strat_formula)
plot(a.strat_formula, type = "AC", propensity = treat ~ X1, stratum = 1)
plot(a.strat_formula, type = "hist", propensity = treat ~ X1, stratum = 1)
plot(a.strat_formula, type = "residual")

Build Autostrata object

Description

Not meant to be called externally. Given the arguments to auto_stratify, build the prognostic scores and return the analysis set, the prognostic scores, the pilot set, the prognostic model, and the outcome string. The primary function of this code is to determine the type of prognosis and handle it appropriately.

Usage

build_autostrata(
  data,
  treat,
  prognosis,
  outcome,
  pilot_fraction,
  pilot_size,
  pilot_sample,
  group_by_covariates
)

Arguments

data

data.frame with observations as rows, features as columns

treat

string giving the name of column designating treatment assignment

prognosis

information on how to build prognostic scores. Three different input types are allowed:

  1. vector of prognostic scores for all individuals in the data set. Should be in the same order as the rows of data.

  2. a formula for fitting a prognostic model

  3. an already-fit prognostic score model

outcome

string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula

pilot_fraction

numeric between 0 and 1 giving the proportion of controls to be allotted for building the prognostic score (default = 0.1)

pilot_size

alternative to pilot_fraction. Approximate number of observations to be used in pilot set. Note that the actual pilot set size returned may not be exactly pilot_size if group_by_covariates is specified because balancing by covariates may result in deviations from desired size. If pilot_size is specified, pilot_fraction is ignored.

pilot_sample

a data.frame of held aside samples for building prognostic score model. If pilot_sample is specified, pilot_size and pilot_fraction are both ignored.

group_by_covariates

character vector giving the names of covariates to be grouped by (optional). If specified, the pilot set will be sampled in a stratified manner, so that the composition of the pilot set reflects the composition of the whole data set in terms of these covariates. The specified covariates must be categorical.

Value

a list of: analysis set, prognostic scores, pilot set, prognostic model, and outcome string

See Also

auto_stratify


Check inputs from auto_stratify

Description

Not meant to be called externally. Throws errors if basic auto_stratify inputs are incorrect.

Usage

check_base_inputs_auto_stratify(data, treat, outcome)

Arguments

data

data.frame with observations as rows, features as columns

treat

string giving the name of column designating treatment assignment

outcome

string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula

Value

nothing; produces errors and warnings if anything is wrong


Check inputs to manual_stratify

Description

Not meant to be called externally. Checks validity of formula, types of all inputs to manual stratify, and warns if covariates are continuous.

Usage

check_inputs_manual_stratify(data, strata_formula, force)

Arguments

data

data.frame with observations as rows, features as columns

strata_formula

the formula to be used for stratification. (e.g. treat ~ X1) the variable on the left is taken to be the name of the treatment assignment column, and the variables on the left are taken to be the variables by which the data should be stratified

force

a boolean. If true, run even if a variable appears continuous. (default = FALSE)

Value

nothing; produces errors and warnings if anything is wrong


Check inputs to any matching function

Description

Check inputs to any matching function

Usage

check_inputs_matcher(object, model, k)

Arguments

object

a strata object

model

(optional) formula for matching. If left blank, all columns of the analysis set in object will be used as covariates in the propensity model or mahalanobis match (except outcome, treatment and stratum)

k

the number of control individuals to be matched to each treated individual. If "k = full" is used, fullmatching is done instead of pairmatching

Value

nothing


Check Outcome

Description

Checks that outcome is a string which is a column in the data

Usage

check_outcome(outcome, data, treat)

Arguments

outcome

string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula

data

data.frame with observations as rows, features as columns

treat

string giving the name of column designating treatment assignment

Value

nothing


Check Pilot set options

Description

Check Pilot set options

Usage

check_pilot_set_options(
  pilot_fraction,
  pilot_size,
  group_by_covariates,
  data,
  n_c
)

Arguments

pilot_fraction

numeric between 0 and 1 giving the proportion of controls to be allotted for building the prognostic score (default = 0.1)

pilot_size

alternative to pilot_fraction. Approximate number of observations to be used in pilot set. Note that the actual pilot set size returned may not be exactly pilot_size if group_by_covariates is specified because balancing by covariates may result in deviations from desired size. If pilot_size is specified, pilot_fraction is ignored.

group_by_covariates

character vector giving the names of covariates to be grouped by (optional). If specified, the pilot set will be sampled in a stratified manner, so that the composition of the pilot set reflects the composition of the whole data set in terms of these covariates. The specified covariates must be categorical.

data

data.frame with observations as rows, features as columns

n_c

number of control observations in data

Value

nothing


Check Prognostic Formula

Description

Check Prognostic Formula

Usage

check_prognostic_formula(prog_formula, data, outcome, treat)

Arguments

prog_formula

a formula for prognostic score

data

data.frame with observations as rows, features as columns

outcome

string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula

treat

string giving the name of column designating treatment assignment

Value

nothing


Check Propensity Formula

Description

Check Propensity Formula

Usage

check_prop_formula(prop_formula, data, treat)

Arguments

prop_formula

a formula

data

the analysis set data within a stratum

treat

the name of the treatment assignment column

Value

nothing


Check Scores

Description

Checks that prognostic scores are the same length as data

Usage

check_scores(prognostic_scores, data, outcome)

Arguments

prognostic_scores

a numeric vector

data

data.frame with observations as rows, features as columns

outcome

string giving the name of column with outcome information. Required if prognostic_scores is specified. Otherwise it will be inferred from prog_formula

Value

nothing


Estimate Prognostic Scores

Description

Tries to make prognostic scores. If successfull, returns them, otherwise throws an error message. Common failure mode is that the prognostic score is built on some categorical variable that takes on some values in the analysis set that are never seen in the pilot set. Outputs are on the response scale, (rather than the linear predictor), so the score is the expected value of the outcome under the control assignement based on the observed covariates.

Usage

estimate_scores(prognostic_model, analysis_set)

Arguments

prognostic_model

Model of prognosis

analysis_set

data set on which prognostic scores should be estimated

Value

vector of prognostic scores


Extract cutoffs between strata

Description

By default, returns only the internal cut points. Cutoffs at 0 and 1 are implied.

Usage

extract_cut_points(x)

Arguments

x

an autostrata object

Value

a vector of the score values delineating cutoffs between strata

Examples

dat <- make_sample_data()
a.strat <- auto_stratify(dat, "treat", outcome ~ X1 + X2)
cutoffs <- extract_cut_points(a.strat)

Extract cutoffs between strata

Description

Extract cutoffs between strata

Usage

## S3 method for class 'auto_strata'
extract_cut_points(x)

Arguments

x

an autostrata object

Value

a vector of the score values delineating cutoffs between strata


Fit Prognostic Model

Description

Given a pilot set and a prognostic formula, return the fitted formula. If the outcome is binary, fit a logistic regression. Otherwise, fit a linear model.

Usage

fit_prognostic_model(dat, prognostic_formula, outcome)

Arguments

dat

data.frame on which model should be fit

prognostic_formula

formula for prognostic model

outcome

string giving name of column of data where outcomes are recorded

Value

a glm or lm object fit from prognostic_formula on data


Get Issues

Description

Helper for make_issue_table to return issues string. Given a row which summarizes the Treat, Control, Total, and Control_Proportion of a stratum, return a string of potential issues with the stratum.

Usage

get_issues(row)

Arguments

row

a row of the data.frame produced in make_issue_table

Value

Returns a string of potential issues


Parse propensity input to obtain propensity scores

Description

the propensity input to plot.auto_strata or plot.manual_strata can be propensity scores, a propensity model, or a formula for propensity score. This function figures out which type propensity is and returns the propensity scores. Returns the propensity score on the response scale (rather than the linear predictor), so the scores are the predited probabilities of treatment.

Usage

get_prop_scores(propensity, data, treat)

Arguments

propensity

either a vector of propensity scores, a model for propensity, or a formula for propensity scores

data

the analysis set data within a stratum

treat

the name of the treatment assignment column

Value

vector of propensity scores


Checks auto_strata class

Description

Checks if the target object is an auto_strata object.

Usage

is.auto_strata(object)

Arguments

object

any R object

Value

Returns TRUE if its argument has auto_strata among its classes and FALSE otherwise.

Examples

dat <- make_sample_data()
a.strat <- auto_stratify(dat, "treat", outcome ~ X1 + X2)
is.auto_strata(a.strat) # returns TRUE

Checks manual_strata class

Description

Checks if the target object is a manual_strata object.

Usage

is.manual_strata(object)

Arguments

object

any R object

Value

Returns TRUE if its argument has manual_strata among its classes and FALSE otherwise.

Examples

dat <- make_sample_data()
m.strat <- manual_stratify(dat, treat ~ C1)
is.manual_strata(m.strat) # returns TRUE

Checks strata class

Description

Checks if the target object is a strata object.

Usage

is.strata(object)

Arguments

object

any R object

Value

Returns TRUE if its argument has strata among its classes and FALSE otherwise.

Examples

dat <- make_sample_data()
m.strat <- manual_stratify(dat, treat ~ C1)
is.strata(m.strat) # returns TRUE

Check if a vector is binary

Description

return TRUE if the input is logical or if it contains only 0's and 1's

Usage

is_binary(col)

Arguments

col

a column from a data frame

Value

logical


Make Size-Ratio plot

Description

Not meant to be called externally. Helper plot function for strata. Produces a scatter plot of strata by size and control proportion.

Usage

make_SR_plot(x, label)

Arguments

x

a strata object returned by auto_stratify or manual_stratify

label

ignored unless type = "SR". If TRUE, a clickable plot is produced. The user may click on any number of strata and press finish to have those strata labeled. Note: uses identify, which may not be supported on some devices


Make Assignment-Control plot

Description

Not meant to be called externally. Helper plot function for strata object with type = "AC". Produces a Assignment-Control plot of stratum s

Usage

make_ac_plot(
  x,
  propensity,
  strat,
  strata_lines,
  jitter_prognosis,
  jitter_propensity
)

Arguments

x

an auto_strata object returned by auto_stratify

propensity

ignored unless type = "hist" or type = "AC". Specifies propensity score information for plots where this is required. Accepts either a vector of propensity scores, a glm model for propensity scores, or a formula for fitting a propensity score model.

strat

the number code of the stratum to be plotted. If "all", plots all strata.

strata_lines

default = TRUE. Ignored unless type = "AC". If TRUE, lines on the plot indicate strata cut points.

jitter_prognosis

ignored unless type = "AC". Amount of uniform random noise to add to prognostic scores in plot.

jitter_propensity

ignored unless type = "AC". Amount of uniform random noise to add to propensity scores in plot.

See Also

Aikens et al. (preprint) https://arxiv.org/abs/1908.09077 . Section 3.2 for an explaination of Assignment-Control plots


Make strata table

Description

Make strata table

Usage

make_autostrata_table(qcut)

Arguments

qcut

the prognostic score quantile cuts

Value

data.frame of strata definitions


Make histogram plot

Description

Not meant to be called externally. Helper plot function for strata object with type = "hist". Produces a histogram of propensity scores within a stratum

Usage

make_hist_plot(x, propensity, strat)

Arguments

x

a strata object returned by auto_stratify or manual_stratify

propensity

ignored unless type = "hist" or type = "AC". Specifies propensity score information for plots where this is required. Accepts either a vector of propensity scores, a glm model for propensity scores, or a formula for fitting a propensity score model.

strat

the number code of the strata to be plotted. If "all", plots all strata


Make Issue Table

Description

Not meant to be called externally. Produce table of the number of treated and control individuals in each stratum. Also checks for potential problems with treat/control ratio or stratum size which might result in slow or poor quality matching.

Usage

make_issue_table(a_set, treat)

Arguments

a_set

data.frame with observations as rows, features as columns. This should be the analysis set from the recently stratified data.

treat

string name of treatment column

Value

Returns a 3 by [number of strata] dataframe with Treat, Control, Total, Control Proportion, and Potential Issues


Make match distances within strata

Description

Makes the match distance with strata specifications for strata_match. This function is largely unecessary to call outside of stratamatch, but it is exported for the benefit of the user to aid in debugging. Note that this function requires that the R package optmatch is installed.

Usage

make_match_distances(object, model, method)

Arguments

object

a strata object

model

(optional) formula for matching. If left blank, all columns of the analysis set in object will be used as covariates in the propensity model or mahalanobis match (except outcome, treatment and stratum)

method

either "prop" for propensity score matching based on a glm fit with model model, or "mahal" for mahalanobis distance matching by the covariates in model.

Value

a match distance matrix for optmatch

See Also

https://cran.r-project.org/package=optmatch

Examples


dat <- make_sample_data(n = 75)

# stratify with auto_stratify
a.strat <- auto_stratify(dat, "treat", outcome ~ X2, size = 25)

# make match distances.  Requires optmatch package to be installed.

md <- make_match_distances(a.strat, treat ~ X1 + X2, method = "mahal")


Make Residual Plot

Description

Not yet implemented. Not meant to be called externally. Helper plot function for strata object with type = "residual". Produces the diagnostic plots for the prognostic score model

Usage

make_resid_plot(x)

Arguments

x

an auto_strata object returned by auto_stratify


Make sample data

Description

Makes a simple data frame with treat (binary), outcome (binary), and five covariates: X1 (continuous), X2 (continuous), B1 (binary), B2 (binary), and C1 (categorical). Probability outcome = 1 is sigmoid(treat + X1). Probability treatment = 1 is sigmoid(- 0.2 * X1 + X2 - B1 + 2 * B2)

Usage

make_sample_data(n = 100)

Arguments

n

the size of the desired data set

Examples

# make sample data set of 30 observations
dat <- make_sample_data(n = 30)

Manual Stratify

Description

Stratifies a data set based on a set of blocking covariates specified by the user. Creates a manual_strata object, which can be passed to strata_match for stratified matching or unpacked by the user to be matched by some other means.

Usage

manual_stratify(data, strata_formula, force = FALSE)

Arguments

data

data.frame with observations as rows, features as columns

strata_formula

the formula to be used for stratification. (e.g. treat ~ X1) the variable on the left is taken to be the name of the treatment assignment column, and the variables on the left are taken to be the variables by which the data should be stratified

force

a boolean. If true, run even if a variable appears continuous. (default = FALSE)

Value

Returns a manual_strata object. This contains:

See Also

auto_stratify, new_manual_strata

Examples

# make sample data set
dat <- make_sample_data(n = 75)

# stratify based on B1 and B2
m.strat <- manual_stratify(dat, treat ~ B1 + B2)

# diagnostic plot
plot(m.strat)

New Autostrata

Description

Basic constructor for an auto_strata object. These objects hold all the information associated with a dataset that has been stratified via auto_stratify. This object may be passed to strata_match to be matched or it may be unpacked by the user to be matched by other means.

Usage

new_auto_strata(
  outcome,
  treat,
  analysis_set = NULL,
  call = NULL,
  issue_table = NULL,
  strata_table = NULL,
  prognostic_scores = NULL,
  prognostic_model = NULL,
  pilot_set = NULL
)

Arguments

outcome

a string giving the name of the column where outcome information is stored

treat

a string giving the name of the column where treatment information is stored

analysis_set

the data set which will be stratified

call

the call to auto_stratify used to generate this object

issue_table

a table of each stratum and potential issues of size and treat:control balance

strata_table

a table of each stratum and the prognostic score quantile bin this corresponds to

prognostic_scores

a vector of prognostic scores.

prognostic_model

a model for prognosis fit on a separate data set.

pilot_set

the set of controls used to fit the prognostic model. These are excluded from subsequent analysis so that the prognostic score is not overfit to the data used to estimate the treatment effect.

Value

a basic auto_strata object

See Also

auto_stratify, a function which calls this constructor to produce an auto_strata object.


New Manual Strata

Description

Basic constructor for an manual_strata object. These objects hold all the information associated with a dataset that has been stratified via manual_stratify. This object may be passed to strata_match to be matched or it may be unpacked by the user to be matched by other means.

Usage

new_manual_strata(
  treat = character(),
  covariates = character(),
  analysis_set = data.frame(),
  call = call(),
  issue_table = data.frame(),
  strata_table = data.frame()
)

Arguments

treat

a string giving the name of the column where treatment information is stored

covariates

a character vector with the names of the categorical columns on which to stratify

analysis_set

the data set which will be stratified

call

the call to manual_stratify used to generate this object

issue_table

a table of each stratum and potential issues of size and treat:control balance

strata_table

a table of each stratum and the covariate bin this corresponds to

Value

a basic manual_strata object


Plot method for auto_strata object

Description

Generates diagnostic plots for the product of a stratification by auto_stratify. There are four plot types:

  1. "SR" (default) - produces a scatter plot of strata by size and treat:control ratio

  2. "hist" - produces a histogram of propensity scores within a stratum

  3. "AC" - produces a Assignment-Control plot of individuals within a stratum

  4. "residual" - produces a residual plot for the prognostic model

Usage

## S3 method for class 'auto_strata'
plot(
  x,
  type = "SR",
  label = FALSE,
  stratum = "all",
  strata_lines = TRUE,
  jitter_prognosis,
  jitter_propensity,
  propensity,
  ...
)

Arguments

x

an auto_strata object returned by auto_stratify

type

string giving the plot type (default = "SR"). Other options are "hist", "AC" and "residual"

label

ignored unless type = "SR". If TRUE, a clickable plot is produced. The user may click on any number of strata and press finish to have those strata labeled. Note: uses identify, which may not be supported on some devices

stratum

ignored unless type = "hist" or type = "AC". A number specifying which stratum to plot.

strata_lines

default = TRUE. Ignored unless type = "AC". If TRUE, lines on the plot indicate strata cut points.

jitter_prognosis

ignored unless type = "AC". Amount of uniform random noise to add to prognostic scores in plot.

jitter_propensity

ignored unless type = "AC". Amount of uniform random noise to add to propensity scores in plot.

propensity

ignored unless type = "hist" or type = "AC". Specifies propensity score information for plots where this is required. Accepts either a vector of propensity scores, a glm model for propensity scores, or a formula for fitting a propensity score model.

...

other arguments

See Also

Aikens, Greaves, and Baiocchi (2020) in Statistics in Medicine, Section 3.2 for an explaination of Assignment-Control plots (formerly "Fisher-Mill" plots).

plot.manual_strata

Examples

dat <- make_sample_data()
a.strat <- auto_stratify(dat, "treat", outcome ~ X1 + X2)
plot(a.strat) # makes size-ratio scatter plot
plot(a.strat, type = "hist", propensity = treat ~ X1, stratum = 1)
plot(a.strat, type = "AC", propensity = treat ~ X1, stratum = 1)
plot(a.strat, type = "residual")

Plot method for manual_strata object

Description

Generates diagnostic plots for the product of a stratification by manual_stratify. There are two plot types:

  1. "SR" (default) - produces a scatter plot of strata by size and treat:control ratio

  2. "hist" - produces a histogram of propensity scores within a stratum.

Note that residual plots and AC plots are not supported for manual_strata objects because no prognostic model is fit.

Usage

## S3 method for class 'manual_strata'
plot(x, type = "SR", label = FALSE, stratum = "all", propensity, ...)

Arguments

x

a manual_strata object returned by manual_stratify

type

string giving the plot type (default = "SR"). Other option is "hist"

label

ignored unless type = "SR". If TRUE, a clickable plot is produced. The user may click on any number of strata and press finish to have those strata labeled. Note: uses identify, which may not be supported on some devices

stratum

ignored unless type = "hist". A number specifying which stratum to plot.

propensity

ignored unless type = "hist". Specifies propensity score information for plots where this is required. Accepts either a vector of propensity scores, a glm model for propensity scores, or a formula for fitting a propensity score model.

...

other arguments

Examples

dat <- make_sample_data()
m.strat <- manual_stratify(dat, treat ~ C1)
plot(m.strat) # makes size-ratio scatter plot
plot(m.strat, type = "hist", propensity = treat ~ X1, stratum = 1)

Print Auto Strata

Description

Print method for auto_strata object

Usage

## S3 method for class 'auto_strata'
print(x, ...)

Arguments

x

an auto_strata object

...

other arguments

Examples

dat <- make_sample_data()
a.strat <- auto_stratify(dat, "treat", outcome ~ X1 + X2)
print(a.strat) # prints information about a.strat

Print Manual Strata

Description

Print method for manual_strata object

Usage

## S3 method for class 'manual_strata'
print(x, ...)

Arguments

x

a manual_strata object

...

other arguments

Examples

dat <- make_sample_data()
m.strat <- manual_stratify(dat, treat ~ C1)
print(m.strat) # prints information about m.strat

Split data into pilot and analysis sets

Description

Given a data set and some parameters about how to split the data, this function partitions the data accordingly and returns the partitioned data as a list containing the analysis_set and pilot_set.

Usage

split_pilot_set(
  data,
  treat,
  pilot_fraction = 0.1,
  pilot_size = NULL,
  group_by_covariates = NULL
)

Arguments

data

data.frame with observations as rows, features as columns

treat

string giving the name of column designating treatment assignment

pilot_fraction

numeric between 0 and 1 giving the proportion of controls to be allotted for building the prognostic score (default = 0.1)

pilot_size

alternative to pilot_fraction. Approximate number of observations to be used in pilot set. Note that the actual pilot set size returned may not be exactly pilot_size if group_by_covariates is specified because balancing by covariates may result in deviations from desired size. If pilot_size is specified, pilot_fraction is ignored.

group_by_covariates

character vector giving the names of covariates to be grouped by (optional). If specified, the pilot set will be sampled in a stratified manner, so that the composition of the pilot set reflects the composition of the whole data set in terms of these covariates. The specified covariates must be categorical.

Value

a list with analaysis_set and pilot_set

Examples

dat <- make_sample_data()
splt <- split_pilot_set(dat, "treat", 0.2)
# can be passed into auto_stratify if desired
a.strat <- auto_stratify(splt$analysis_set, "treat", outcome ~ X1,
  pilot_sample = splt$pilot_set
)

Strata function from package Survival

Description

Strata function from package Survival


Strata Match

Description

Match within strata in series using optmatch. Note that this function requires that the R package optmatch is installed.

Usage

strata_match(object, model = NULL, method = "prop", k = 1)

Arguments

object

a strata object

model

(optional) formula for matching. If left blank, all columns of the analysis set in object will be used as covariates in the propensity model or mahalanobis match (except outcome, treatment and stratum)

method

either "prop" for propensity score matching based on a glm fit with model model, or "mahal" for mahalanobis distance matching by the covariates in model.

k

the number of control individuals to be matched to each treated individual. If "k = full" is used, fullmatching is done instead of pairmatching

Value

a named factor with matching assignments

See Also

https://cran.r-project.org/package=optmatch

Examples

# make a sample data set
set.seed(1)
dat <- make_sample_data(n = 75)

# stratify with auto_stratify
a.strat <- auto_stratify(dat, "treat", outcome ~ X2, size = 25)

# 1:1 match based on propensity formula: treat ~ X1 + X2
# Requires optmatch package to be installed.

strata_match(a.strat, model = treat ~ X1 + X2, k = 1)


# full match within strata based on mahalanobis distance.
# Requires optmatch package to be installed.

strata_match(a.strat, model = treat ~ X1 + X2, method = "mahal", k = 1)


Match without Stratification

Description

Not meant to be called externally. Match a data set without stratifying. Used to compare performance with and without stratification. Note that this function requires that the R package optmatch is installed.

Usage

strata_match_nstrat(object, model = NULL, k = 1)

Arguments

object

a strata object

model

(optional) formula for matching. If left blank, all columns of the analysis set in object will be used as covariates in the propensity model or mahalanobis match (except outcome, treatment and stratum)

k

the number of control individuals to be matched to each treated individual. If "k = full" is used, fullmatching is done instead of pairmatching

Value

a named factor with matching assignments

See Also

https://cran.r-project.org/package=optmatch


stratamatch: stratify and match large data sets

Description

This package employs a pilot matching design to automatically stratify and match large datasets. The manual_stratify function allows users to manually stratify a dataset based on categorical variables of interest, while the auto_stratify function does automatically by allocating a held-aside (pilot) data set, fitting a prognostic score (see Hansen (2008) <doi:10.1093/biomet/asn004>) on the pilot set, and stratifying the data set based on prognostic score quantiles. The strata_match function then does optimal matching of the data set within strata.

See Also

  1. https://github.com/raikens1/stratamatch


Summary for strata object

Description

Summarize number and sizes of strata in a strata object. Also prints number of strata with potential issues.

Usage

## S3 method for class 'strata'
summary(object, ...)

Arguments

object

a strata object

...

other arguments

Details

For more information, access the issue table for your strata object with mystrata$issue_table.

Examples

dat <- make_sample_data()
m.strat <- manual_stratify(dat, treat ~ C1)
summary(m.strat) # Summarizes strata in m.strat

Warn if continuous

Description

Throws an error if a column is continuous

Usage

warn_if_continuous(column, name, force, n)

Arguments

column

vector or factor column from a data.frame

name

name of the input column

force

a boolean. If true, warn but do not stop

n

the number of rows in the data set

Details

Not meant to be called externally. Only categorical or binary covariates should be used to manually stratify a data set. However, it's hard to tell for sure if something is continuous or just discrete with real-numbered values. Returns without throwing an error if the column is a factor, but throws an error or warning if the column has many distinct values.

Value

Does not return anything