Title: | Language to Manage Many Models |
Version: | 0.1.0 |
Description: | A system for describing and manipulating the many models that are generated in causal inference and data analysis projects, as based on the causal theory and criteria of Austin Bradford Hill (1965) <doi:10.1177/003591576505800503>. This system includes the addition of formal attributes that modify base 'R' objects, including terms and formulas, with a focus on variable roles in the "do-calculus" of modeling, as described in Pearl (2010) <doi:10.2202/1557-4679.1203>. For example, the definition of exposure, outcome, and interaction are implicit in the roles variables take in a formula. These premises allow for a more fluent modeling approach focusing on variable relationships, and assessing effect modification, as described by VanderWeele and Robins (2007) <doi:10.1097/EDE.0b013e318127181b>. The essential goal is to help contextualize formulas and models in causality-oriented workflows. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.1 |
Depends: | R (≥ 4.1.0), vctrs (≥ 0.5.0), tibble (≥ 3.0.0), |
Imports: | stats, utils, generics, methods, dplyr, broom, tidyr, rlang, pillar, purrr, janitor |
Suggests: | testthat (≥ 3.0.0), covr, cli, rmarkdown, knitr, ggplot2, gt, survival, cmprsk, tidycmprsk |
VignetteBuilder: | knitr |
URL: | https://github.com/shah-in-boots/rmdl |
BugReports: | https://github.com/shah-in-boots/rmdl/issues |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2024-05-02 16:48:30 UTC; asshah4 |
Author: | Anish S. Shah |
Maintainer: | Anish S. Shah <ashah282@uic.edu> |
Repository: | CRAN |
Date/Publication: | 2024-05-02 22:20:02 UTC |
rmdl package
Description
A system for describing and manipulating the many models that are generated in causal inference and data analysis projects, as based on the causal theory and criteria of Austin Bradford Hill (1965) doi:10.1177/003591576505800503. This system includes the addition of formal attributes that modify base 'R' objects, including terms and formulas, with a focus on variable roles in the "do-calculus" of modeling, as described in Pearl (2010) doi:10.2202/1557-4679.1203. For example, the definition of exposure, outcome, and interaction are implicit in the roles variables take in a formula. These premises allow for a more fluent modeling approach focusing on variable relationships, and assessing effect modification, as described by VanderWeele and Robins (2007) doi:10.1097/EDE.0b013e318127181b. The essential goal is to help contextualize formulas and models in causality-oriented workflows.
Author(s)
Maintainer: Anish S. Shah ashah282@uic.edu (ORCID) [copyright holder]
See Also
Useful links:
Restructure formulas to fit within a model table
Description
Restructure formulas to fit within a model table
Usage
construct_table_from_formulas(x, ...)
Arguments
x |
Vector of |
Restructure models to fit within a model table
Passes information to new_model_table()
for initialization
Description
Restructure models to fit within a model table
Passes information to new_model_table()
for initialization
Usage
construct_table_from_models(x, ...)
Arguments
x |
Vector of |
Data summarization and classification methods
Description
These related functions are intended to analyze a single data vector (e.g. column from a dataset) and help predict its classification, or other relevant attributes. These are simple yet opionated convenience functions.
Usage
number_of_missing(x)
is_dichotomous(x)
Arguments
x |
A vector of any of the atomic types (see [ |
Details
The functions that are currently supported are:
-
number_of_missing()
returns the number of missing values in a vector -
is_dichotomous()
returns TRUE if the vector is dichotomous, FALSE otherwise
Value
Returns a single value determined by the individual functions
Describe attributes of a tm
vector
Description
Describe attributes of a tm
vector
Usage
describe(x, property)
Arguments
x |
A vector |
property |
A character vector of the following attributes of a |
Value
A list of term = property
pairs, where the term is the name of the
element (e.g. could be the ‘role’ of the term).
Examples
f <- .o(output) ~ .x(input) + .m(mediator) + random
t <- tm(f)
describe(t, "role")
If objects are model tables, attributes are carried over to subset object
Description
If objects are model tables, attributes are carried over to subset object
Usage
df_reconstruct(x, to)
Extending dplyr
for tm
class
Description
The filter()
function extension subsets tm
that satisfy set conditions.
To be retained, the tm
object must produce a value of TRUE
for all conditions.
Note that when a condition evaluates to NA
, the row will be dropped, unlike
base subsetting with [
.
Usage
## S3 method for class 'tm'
filter(.data, ...)
Arguments
.data |
A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details. |
... |
< |
Value
An object of the same type as .data
. The output as the following properties:
-
tm
objects are a subset of the input, but appear in the same order Underlying
data.frame
columns are not modifiedUnderlying
data.frame
object's attributes are preserved
See Also
dplyr::filter()
for examples of generic implementation
Estimating interaction effect estimates
Description
When using categorical interaction terms in a mdl_tbl
object, estimates
on interaction terms and their confidence intervals can be evaluated. The
effect of interaction on the estimates is based on the levels of interaction
term. The estimates and intervals can be derived through the
estimate_interaction()
function. The approach is based on the method
described by Figueiras et al. (1998).
Usage
estimate_interaction(object, exposure, interaction, conf_level = 0.95, ...)
Arguments
object |
A |
exposure |
The exposure variable in the model |
interaction |
The interaction variable in the model |
conf_level |
The confidence level for the confidence interval |
... |
Arguments to be passed to or from other methods |
Details
The estimate_interaction()
requires a mdl_tbl
object that is a
single row in length. Filtering the mdl_tbl
should occur prior to
passing it to this function. Additionally, this function assumes the
interaction term is binary. If it is categorical, the current
recommendation is to use dummy variables for the corresponding levels prior
to modeling.
Value
A data.frame
with n = levels(interaction)
rows (for the
presence or absence of the interaction term) and n = 5
columns:
estimate: beta coefficient for the interaction effect based on level
conf_low: lower bound of confidence interval for the estimate
conf_high: higher bound of confidence interval for the estimate
p_value: p-value for the overall interaction effect across levels
nobs: number of observations within the interaction level
level: level of the interaction term
References
A. Figueiras, J. M. Domenech-Massons, and Carmen Cadarso, 'Regression models: calculating the confidence intervals of effects in the presence of interactions', Statistics in Medicine, 17, 2099-2105 (1998)
Vectorized formulas
Description
This function defines a modified formula
class that has been
vectorized. The fmls
serves as a set of instructions or a script for the
formula and its tm. It expands upon the functionality of formulas,
allowing for additional descriptions and relationships to exist between the
tm.
Usage
fmls(
x = unspecified(),
pattern = c("direct", "sequential", "parallel", "fundamental"),
...
)
is_fmls(x)
key_terms(x)
Arguments
x |
Objects of the following types can be used as inputs
|
pattern |
A
|
... |
Arguments to be passed to or from other methods |
Details
This is not meant to supersede a stats::formula()
object, but provide a
series of relationships that can be helpful in causal modeling. All fmls
can be converted to a traditional formula
with ease. The base for this
object is built on the tm()
object.
Value
An object of class fmls
Patterns
The expansion pattern allows for instructions on how the covariates should be included in different formulas. Below, assuming that x1, x2, and x3 are covariates...
y = x1 + x2 + x3
Direct:
y = x1 + x2 + x3
Seqential:
y = x1
y = x1 + x2
y = x1 + x2 + x3
Parallel:
y = x1
y = x2
y = x3
Roles
Specific roles the variable plays within the formula. These are of particular importance, as they serve as special terms that can effect how a formula is interpreted.
Role | Shortcut | Description |
outcome | .o(...) | outcome ~ exposure |
exposure | .x(...) | outcome ~ exposure |
predictor | .p(...) | outcome ~ exposure + predictor |
confounder | .c(...) | outcome + exposure ~ confounder |
mediator | .m(...) | outcome mediator exposure |
interaction | .i(...) | outcome ~ exposure * interaction |
strata | .s(...) | outcome ~ exposure / strata |
group | .g(...) | outcome ~ exposure + group |
unknown | - | not yet assigned |
Formulas can be condensed by applying their specific role to individual runes
as a function/wrapper. For example, y ~ .x(x1) + x2 + x3
. This would
signify that x1
has the specific role of an exposure.
Grouped variables are slightly different in that they are placed together in
a hierarchy or tier. To indicate the group and the tier, the shortcut can
have an integer
following the .g
. If no number is given, then it is
assumed they are all on the same tier. Ex: y ~ x1 + .g1(x2) + .g1(x3)
Warning: Only a single shortcut can be applied to a variable within a formula directly.
Pluralized Labeling Arguments
For a single argument, e.g. for the tm.formula()
method, such as to
identify variable X as an exposure, a formula
should be given with the
term of interest on the LHS, and the description or instruction on the
RHS. This would look like role = "exposure" ~ X
.
For the arguments that would be dispatched for objects that are plural, e.g.
containing multiple terms, each formula()
should be placed within a
list()
. For example, the role argument would be written:
role = list(X ~ "exposure", M ~ "mediator", C ~ "confounder")
Further implementation details can be seen in the implementation of
labeled_formulas_to_named_list()
.
Tools for working with formula-like objects
Description
Tools for working with formula-like objects
Usage
lhs(x, ...)
rhs(x, ...)
## S3 method for class 'formula'
rhs(x, ...)
## S3 method for class 'formula'
lhs(x, ...)
Arguments
x |
A formula-like object |
... |
Arguments to be passed to or from other methods |
Value
A character
describing part of a formula
or fmls
object
Convert labeling formulas to named lists
Description
Take list of formulas, or a similar construct, and returns a named list. The convention here is similar to reading from left to right, where the name or position is the term is the on the LHS and the output label or target instruction is on the RHS.
If no label is desired, then the LHS can be left empty, such as ~ x
.
Usage
labeled_formulas_to_named_list(x)
Arguments
x |
An argument that may represent a formula to label variables, or can
be converted to one. This includes, |
Value
A named list with the index as a character
representing the term
or variable of interest, and the value at that position as a character
representing the label value.
Model tables
Description
The model_table()
or mdl_tbl()
function creates a mdl_tbl
object that
is composed of either fmls
objects or mdl
objects, which are
thin/informative wrappers for generic formulas and hypothesis-based models.
The mdl_tbl
is a data frame of model information, such as model fit,
parameter estimates, and summary statistics about a model, or a formula if it
has not yet been fit.
Usage
mdl_tbl(..., data = NULL)
model_table(..., data = NULL)
is_model_table(x)
Arguments
... |
Named or unnamed |
data |
A |
x |
A |
Details
The table itself allows for ease of organization of model information and has three additional, major components (stored as scalar attributes).
A formula matrix that describes the terms used in each model, and how they are combined.
A term table that describes the terms and their properties and/or labels.
A list of datasets used for the analyses that can help support additional diagnostic testing.
We go into further detail in the sections below.
Value
A mdl_tbl
object, which is essentially a data.frame
with
additional information on the relevant data, terms, and formulas used to
generate the models.
Data List
NA
Term Table
NA
Formula Matrix
NA
Model table helper functions
Description
These functions are used to help manage the mdl_tbl
object. They allow
for specific manipulation of the internal components, and are intended to
generally extend the functionality of the object.
-
attach_data()
: Attaches a dataset to amdl_tbl
object -
flatten_models()
: Flattens amdl_tbl
object down to its specific parameters
Usage
attach_data(x, data, ...)
flatten_models(x, exponentiate = FALSE, which = NULL, ...)
Arguments
x |
A |
data |
A |
... |
Arguments to be passed to or from other methods |
exponentiate |
A |
which |
A |
Value
When using attach_data()
, this returns a modified version of the
mdl_tbl
object however with the dataset attached. When using the
flatten_models()
function, this returns a simplified data.frame
of the
original model table that contains the model-level and parameter-level
statistics.
Attaching Data
When models are built, oftentimes the included matrix of data is available within the raw model, however when handling many models, this can be expensive in terms of memory and space. By attaching datasets independently that persist regardless of the underlying models, and by knowing which models used which datasets, it can be ease to back-transform information.
Flattening Models
A mdl_tbl
object can be flattened to its specific parameters, their
estimates, and model-level summary statistics. This function additionally
helps by allowing for exponentiation of estimates when deemed appropriate.
The user can specify which models to exponentiate by name. This heavily
relies on the broom::tidy()
functionality.
Can mdl_tbl
be reconstructed based on invariants?
Description
Can mdl_tbl
be reconstructed based on invariants?
Usage
model_table_reconstructable(x, to)
Arguments
x |
data frame that will have invariants checked |
to |
the tibble subclass of |
Model Prototypes
Description
Usage
mdl(x = unspecified(), ...)
## S3 method for class 'character'
mdl(
x,
formulas,
parameter_estimates = data.frame(),
summary_info = list(),
data_name,
strata_variable = NA_character_,
strata_level = NA_character_,
...
)
## S3 method for class 'lm'
mdl(
x = unspecified(),
formulas = fmls(),
data_name = character(),
strata_variable = character(),
strata_level = character(),
...
)
## S3 method for class 'glm'
mdl(
x = unspecified(),
formulas = fmls(),
data_name = character(),
strata_variable = character(),
strata_level = character(),
...
)
## S3 method for class 'coxph'
mdl(
x = unspecified(),
formulas = fmls(),
data_name = character(),
strata_variable = character(),
strata_level = character(),
...
)
## Default S3 method:
mdl(x, ...)
model(x = unspecified(), ...)
Arguments
x |
Model object or representation |
... |
Arguments to be passed to or from other methods |
formulas |
Formula(s) given as either an |
parameter_estimates |
A
|
summary_info |
A
|
data_name |
String representing name of dataset that was used |
strata_variable |
String of a term that served as a stratifying variable |
strata_level |
Value of the level of the term specified by
|
Value
An object of the mdl
class, which is essentially an equal-length
list of parameters that describe a single model. It retains the original
formula call and the related roles in the formula.
Apply patterns to formulas
Description
The family of apply_*_pattern()
functions that are used to expand fmls
by specified patterns. These functions are not intended to be used directly
but as internal functions. They have been exposed to allow for potential
user-defined use cases.
Usage
apply_pattern(x, pattern)
apply_fundamental_pattern(x)
apply_direct_pattern(x)
apply_sequential_pattern(x)
apply_parallel_pattern(x)
apply_rolling_interaction_pattern(x)
Arguments
x |
A |
pattern |
A character string that specifies the pattern to use |
Details
Currently supported patterns are: fundamental, direct, sequential, parallel.
Value
Returns a tbl_df
object that has special column names and rows.
Each row is essentially a precursor to a new formula.
These columns and rows must be present to be used with the fmls()
function, and generally are the expected result of the specified pattern.
They will undergo further internal modification prior to being turned into
a fmls
object, but this is an developer consideration. If developing a
pattern, please use this guide to ensure that the output is compatible with
the fmls()
function.
outcome: a single term that is the expected outcome variable
exposure: a single term that is the expected exposure variable, which may not be present in every row
covariate_*: the covariates expand based on the number that are present (e.g. "covariate_1", "covariate_2", etc)
Objects exported from other packages
Description
These objects are imported from other packages. Follow the links below to see their documentation.
Internal vctrs methods
Description
Internal vctrs methods
Create vectorized terms
Description
Usage
tm(x = unspecified(), ...)
## S3 method for class 'character'
tm(
x,
role = character(),
side = character(),
label = character(),
group = integer(),
type = character(),
distribution = character(),
description = character(),
transformation = character(),
...
)
## S3 method for class 'formula'
tm(
x,
role = formula(),
label = formula(),
group = formula(),
type = formula(),
distribution = formula(),
description = formula(),
transformation = formula(),
...
)
## S3 method for class 'fmls'
tm(x, ...)
## S3 method for class 'tm'
tm(x, ...)
## Default S3 method:
tm(x = unspecified(), ...)
is_tm(x)
Arguments
x |
An object that can be coerced to a |
... |
Arguments to be passed to or from other methods |
role |
Specific roles the variable plays within the formula. These are of particular importance, as they serve as special terms that can effect how a formula is interpreted. Please see the Roles section below for further details. The options for roles are as below:
|
side |
Which side of a formula should the term be on. Options are
|
label |
Display-quality label describing the variable |
group |
Grouping variable name for modeling or placing terms together.
An integer value is given to identify which group the term will be in. The
hierarchy will be |
type |
Type of variable, either categorical (qualitative) or continuous (quantitative) |
distribution |
How the variable itself is more specifically subcategorized, e.g. ordinal, continuous, dichotomous, etc |
description |
Option for further descriptions or definitions needed for the tm, potentially part of a data dictionary |
transformation |
Modification of the term to be applied when combining with data |
Details
A vectorized term object that allows for additional information to be carried with the variable name.
This is not meant to replace traditional stats::terms()
, but to supplement
it using additional information that is more informative for causal modeling.
Value
A tm
object, which is a series of individual terms with
corresponding attributes, including the role, formula side, label,
grouping, and other related features.
Roles
Specific roles the variable plays within the formula. These are of particular importance, as they serve as special terms that can effect how a formula is interpreted.
Role | Shortcut | Description |
outcome | .o(...) | outcome ~ exposure |
exposure | .x(...) | outcome ~ exposure |
predictor | .p(...) | outcome ~ exposure + predictor |
confounder | .c(...) | outcome + exposure ~ confounder |
mediator | .m(...) | outcome mediator exposure |
interaction | .i(...) | outcome ~ exposure * interaction |
strata | .s(...) | outcome ~ exposure / strata |
group | .g(...) | outcome ~ exposure + group |
unknown | - | not yet assigned |
Formulas can be condensed by applying their specific role to individual runes
as a function/wrapper. For example, y ~ .x(x1) + x2 + x3
. This would
signify that x1
has the specific role of an exposure.
Grouped variables are slightly different in that they are placed together in
a hierarchy or tier. To indicate the group and the tier, the shortcut can
have an integer
following the .g
. If no number is given, then it is
assumed they are all on the same tier. Ex: y ~ x1 + .g1(x2) + .g1(x3)
Warning: Only a single shortcut can be applied to a variable within a formula directly.
Pluralized Labeling Arguments
For a single argument, e.g. for the tm.formula()
method, such as to
identify variable X as an exposure, a formula
should be given with the
term of interest on the LHS, and the description or instruction on the
RHS. This would look like role = "exposure" ~ X
.
For the arguments that would be dispatched for objects that are plural, e.g.
containing multiple terms, each formula()
should be placed within a
list()
. For example, the role argument would be written:
role = list(X ~ "exposure", M ~ "mediator", C ~ "confounder")
Further implementation details can be seen in the implementation of
labeled_formulas_to_named_list()
.
Update tm
objects
Description
This updates properties or attributes of a tm
vector. This only updates
objects that already exist.
Usage
## S3 method for class 'tm'
update(object, ...)
Arguments
object |
A |
... |
A series of |
Value
A tm
object with updated attributes
Model table object validation
Description
Model table object validation
Usage
validate_model_table(x)
Arguments
x |
data frame that will have invariants checked |