Type: | Package |
Version: | 0.5.7 |
Title: | Simulating Longitudinal Data with Causal Inference Applications |
Description: | A flexible tool for simulating complex longitudinal data using structural equations, with emphasis on problems in causal inference. Specify interventions and simulate from intervened data generating distributions. Define and evaluate treatment-specific means, the average treatment effects and coefficients from working marginal structural models. User interface designed to facilitate the conduct of transparent and reproducible simulation studies, and allows concise expression of complex functional dependencies for a large number of time-varying nodes. See the package vignette for more information, documentation and examples. |
URL: | https://github.com/osofr/simcausal |
BugReports: | https://github.com/osofr/simcausal/issues |
Depends: | R (≥ 3.2.0) |
Imports: | data.table, igraph, stringr, R6, assertthat, Matrix, methods |
Suggests: | copula, RUnit, ltmle, knitr, ggplot2, Hmisc, mvtnorm, bindata |
VignetteBuilder: | knitr |
License: | GPL-2 |
RoxygenNote: | 7.3.1 |
Encoding: | UTF-8 |
NeedsCompilation: | no |
Packaged: | 2024-10-19 14:11:05 UTC; fred |
Author: | Oleg Sofrygin [aut], Mark J. van der Laan [aut], Romain Neugebauer [aut], Fred Gruber [ctb, cre] |
Maintainer: | Fred Gruber <fgruber@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-10-19 14:30:02 UTC |
Simulating Longitudinal Data with Causal Inference Applications
Description
The simcausal R package is a tool for specification and simulation of complex longitudinal data structures that are based on structural equation models. The package provides a flexible tool for conducting transparent and reproducible simulation studies, with a particular emphasis on the types of data and interventions frequently encountered in typical causal inference problems, such as, observational data with time-dependent confounding, selection bias, and random monitoring processes. The package interface allows for concise expression of complex functional dependencies between a large number of nodes, where each node may represent a time-varying random variable. The package allows for specification and simulation of counterfactual data under various user-specified interventions (e.g., static, dynamic, deterministic, or stochastic). In particular, the interventions may represent exposures to treatment regimens, the occurrence or non-occurrence of right-censoring events, or of clinical monitoring events. simcausal enables the computation of a selected set of user-specified features of the distribution of the counterfactual data that represent common causal quantities of interest, such as, treatment-specific means, the average treatment effects and coefficients from working marginal structural models. For additional details and examples please see the package vignette and the function-specific documentation.
Documentation
To see the package vignette use:
vignette("simcausal_vignette", package="simcausal")
To see all available package documentation use:
help(package = 'simcausal')
Routines
The following routines will be generally invoked by a user, in the same order as presented below.
DAG.empty
Initiates an empty
DAG
object that contains no nodes.node
Defines node(s) in the structural equation model and its conditional distribution(s) using a language of vector-like R expressions. A call to
node
can specify either a single node or multiple nodes at once.add.nodes
or+node
Provide two equivalent ways of growing the structural equation model by adding new nodes and their conditional distributions. Sequentially define nodes in the
DAG
object, with each node representing the outcomes of one or more structural equation(s), altogether making-up the causal model of interest.set.DAG
Performs consistency checks and locks the
DAG
object so that no additional nodes can be subsequently added to the structural equation model.sim
orsimobs
Simulates iid observations of the complete node sequence defined by the
DAG
object. The output dataset is stored as adata.frame
and is referred to as the observed data.add.action
or+action
Provide two equivalent ways to define one or more actions. Each action modifies the conditional distribution for a subset of nodes in the original
DAG
object. The resulting data generating distribution is referred to as the post-intervention distribution. It is saved in theDAG
object alongside the original structural equation model (DAG
object).sim
orsimfull
Simulates independent observations from one or more post-intervention distribution(s). Produces a named list of
data.frame
s, collectively referred to as the full data. The number of outputdata.frame
s is equal to the number of post-intervention distributions specified in theactions
argument, where eachdata.frame
object is an iid sample from a particular post-intervention distribution.set.targetE
andset.targetMSM
Define two distinct types of target causal parameters. The function
set.targetE
defines causal parameters as the expected value(s) ofDAG
node(s) under one post-intervention distribution or the contrast of such expected value(s) from two post-intervention distributions. The functionset.targetMSM
defines causal parameters based on a user-specified working marginal structural model.eval.target
Evaluates the previously defined causal parameter using simulated full data
Data structures
The following most common types of output are produced by the package:
-
parameterized causal
DAG
model - object that specifies the structural equation model, along with interventions and the causal target parameter of interest. -
observed data - data simulated from the (pre-intervention) distribution specified by the structural equation model.
-
full data - data simulated from one or more post-intervention distributions defined by actions on the structural equation model.
-
causal target parameter - the true value of the causal target parameter evaluated with full data.
Updates
Check for updates and report bugs at https://github.com/osofr/simcausal.
Author(s)
Maintainer: Fred Gruber fgruber@gmail.com [contributor]
Authors:
Oleg Sofrygin oleg.sofrygin@gmail.com
Mark J. van der Laan laan@berkeley.edu
Romain Neugebauer Romain.S.Neugebauer@kp.org
References
Sofrygin O, van der Laan MJ, Neugebauer R (2017). "simcausal R Package: Conducting Transparent and Reproducible Simulation Studies of Causal Effect Estimation with Complex Longitudinal Data." Journal of Statistical Software, 81(2), 1-47. doi: 10.18637/jss.v081.i02.
See Also
Useful links:
Subsetting/Indexing Actions Defined for DAG
Object
Description
Subsetting/Indexing Actions Defined for DAG
Object
Usage
A(DAG)
Arguments
DAG |
A DAG object that was defined using functions |
Value
returns a list of actions, which are intervened versions of the original observed data DAG.
Examples
D <- DAG.empty()
D <- D + node(name="W1", distr="rbern", prob=plogis(-0.5))
D <- D + node(name="W2", distr="rbern", prob=plogis(-0.5 + 0.5*W1))
D <- D + node(name="A", distr="rbern", prob=plogis(-0.5 + 0.5*W1+ 0.5*W2))
D <- set.DAG(D)
# Define two actions, acting on node "A"
D <- D + action("A0", nodes=node("A", distr="rbern", prob=0))
D <- D + action("A1", nodes=node("A", distr="rbern", prob=1))
# Select both actions
A(D)
# Select action "A1" only
A(D)["A1"]
Initialize an empty DAG object
Description
Initialize an empty DAG object
Usage
DAG.empty()
Convert Data from Wide to Long Format Using reshape
Description
This utility function takes a simulated data.frame in wide format as an input and converts it into a long format (slower compared to DF.to.longDT
).
Usage
DF.to.long(df_wide)
Arguments
df_wide |
A |
Details
Keeps all covariates that appear only once and at the first time-point constant (carry-forward).
All covariates that appear fewer than range(t) times are imputed with NA for missing time-points.
Observations with all NA's for all time-varying covariates are removed.
When removing NA's the time-varying covariates that are attributes (attnames) are not considered.
Value
A data.frame
object in long format
See Also
DF.to.longDT
- a faster version of DF.to.long
that uses data.table
package
Other data manipulation functions:
DF.to.longDT()
,
doLTCF()
Faster Conversion of Data from Wide to Long Format Using dcast.data.table
Description
Faster utility function for converting wide-format data.frame
into a long format.
Internally uses data.table package functions melt.data.table
and dcast.data.table
.
Usage
DF.to.longDT(df_wide, return_DF = TRUE)
Arguments
df_wide |
A |
return_DF |
|
Details
Keeps all covariates that appear only once and at the first time-point constant (carry-forward).
All covariates that appear fewer than range(t) times are imputed with NA for missing time-points.
Observations with all NA's for all time-varying covariates are removed.
When removing NA's the time-varying covariates that are attributes (attnames) are not considered.
Value
A data.frame
in long format
See Also
Other data manipulation functions:
DF.to.long()
,
doLTCF()
Class for defining and evaluating user-specified summary measures (exprs_list)
Description
Evaluates and and stores arbitrary summary measure expressions. The expressions (exprs_list) are evaluated in the environment of the input data.frame.
Format
An R6 class object.
Details
Following fields are created during initialization
nodes ...
subset_regs ...
sA_nms ...
sW_nms ...
Kmax ...
Methods
Public methods
Method new()
Usage
Define_sVar$new()
Method set.new.exprs()
Usage
Define_sVar$set.new.exprs(exprs_list)
Method eval.nodeforms()
Usage
Define_sVar$eval.nodeforms(cur.node, data.df)
Method eval.EFU()
Usage
Define_sVar$eval.EFU(cur.node, data.df)
Method df.names()
Usage
Define_sVar$df.names(data.df)
Method setnode.setenv()
Usage
Define_sVar$setnode.setenv(cur.node)
Method set.net()
Usage
Define_sVar$set.net(netind_cl)
Method clone()
The objects of this class are cloneable with this method.
Usage
Define_sVar$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Subsetting/Indexing DAG
Nodes
Description
Subsetting/Indexing DAG
Nodes
Usage
N(DAG)
Arguments
DAG |
A DAG object that was defined using functions |
Value
returns a list of nodes that can be indexed as a typical named list "[[]]".
Examples
D <- DAG.empty()
D <- D + node(name="W1", distr="rbern", prob=plogis(-0.5))
D <- D + node(name="W2", distr="rbern", prob=plogis(-0.5 + 0.5*W1))
D <- set.DAG(D)
#Returns all nodes from DAG D
N(D)
#Returns node W1 from DAG D
N(D)["W1"]
Convert Network IDs Matrix into Sparse Adjacency Matrix
Description
Convert simcausal
network ID matrix (NetInd_k
) into a network represented by a sparse adjacency matrix.
Usage
NetInd.to.sparseAdjMat(NetInd_k, nF)
Arguments
NetInd_k |
Matrix of network IDs of dimension |
nF |
Integer vector of length |
Value
Network represented as a sparse adjacency matrix (S4 class object dgCMatrix
from package Matrix
).
NOTE: The friend IDs for observation i
will be listed in column i
(i.e, which(sparseAdjMat[,i])
are friends of i
).
See Also
network
; sparseAdjMat.to.igraph
; igraph.to.sparseAdjMat
; sparseAdjMat.to.NetInd
;
R6 class for creating and storing a friend matrix (network IDs) for network data
Description
This R6 class defines fields and methods for creating and storing NetInd_k
,
a matrix of friend indices (network IDs) of dim = (nobs x Kmax)
.
Format
An R6Class
generator object
Details
NetInd - Matrix of friend indices (network IDs) of
dim = (nobs x Kmax)
(Active Binding).nF - Vector of integers, where
nF[i]
is the integer number of friends (0 toKmax
) for observationi
.nobs - Number of observations
Kmax - Maximum number of friends for any observation.
Methods
new(nobs, Kmax = 1)
Uses
nobs
andKmax
to instantiate an object of R6 class and pre-allocate memory for the future network ID matrix.makeNetInd.fromIDs(Net_str, IDs_str = NULL, sep = ' ')
Build the matrix of network IDs (
NetInd_k
) from IDs string vector, all friends of one observationi
are located in a string Net_str[i], with two distinct friend IDs ofi
separated by charactersep
. IfIDs_str
is NULL it is assumed that the friends in Net_str are actual row numbers in1:nobs
, otherwise IDs from Net_str will be used for looking up the observation row numbers inIDs_str
.make.nF(NetInd_k = self$NetInd_k, nobs = self$nobs, Kmax = self$Kmax)
This method calculates the integer number of friends for each row of the network ID matrix (
self$NetInd_k
). The result is assigned to a fieldself$nF
and is returned invisibly.mat.nF(nFnode)
nFnode
- the character name for the number of friends variable that is assigned as a column name to a single column matrix inself$nF
.
Methods
Public methods
Method new()
Usage
NetIndClass$new(nobs, Kmax = 1)
Method makeNetInd.fromIDs()
Usage
NetIndClass$makeNetInd.fromIDs(Net_str, IDs_str = NULL, sep = " ")
Method make.nF()
Usage
NetIndClass$make.nF( NetInd_k = self$NetInd_k, nobs = self$nobs, Kmax = self$Kmax )
Method mat.nF()
Usage
NetIndClass$mat.nF(nFnode)
Method clone()
The objects of this class are cloneable with this method.
Usage
NetIndClass$clone(deep = FALSE)
Arguments
deep
Whether to make a deep clone.
Define and Add Actions (Interventions)
Description
Define and add new action (intervention) to the existing DAG object. Use either syntax DAG +
action(name = ,nodes = )
or add.action((DAG = ,name = ,nodes = )
. Both give identical results, see the examples in the vignette and below for details.
Usage
add.action(DAG, name, nodes, ..., attr = list())
action(...)
Arguments
DAG |
DAG object |
name |
Unique name of the action |
nodes |
A list of node objects that defines the action on the DAG (replaces the distributions of the corresponding nodes in DAG) |
... |
Additional named attributes defining / indexing the action |
attr |
Additional named attributes defining / indexing the action |
Details
In addition to the action name and list of action nodes, both of these functions accept arbitrary named attributes (as additional arguments which must be given a name). This additional attributes can be used to simplify specification of dynamic regimes (actions that depend on the past observed covariates).
The formula of the intervention node is allowed to contain undefined variables, as long as those are later defined as a named argument to action
.
In Example 2 below, node("A",..., mean = ifelse(W1 >= theta, 1, 0))
,
defines the mean of the node "A" as a function of some undefined variable theta
, setting A
to 1 if the baseline node W1
is above or equal to theta
and 0 vice versa.
One specifies actual values of theta
while defining a new action, possible creating a series of actions, each indexed by a different value of theta
.
A new action can be defined with D<-D+action("A1th0.1", nodes=actN, theta=0.1)
.
Note that any name can be used in place of theta
. This attribute variable can appear anywhere inside the node distribution formula.
Finally, the attribute variable can also be time varying and, just like with DAG nodes, can be indexed by square bracket notation, theta[t]
. See Example 3 for defining time-varying attributes.
Value
A modified DAG
object with the added action
Examples
#---------------------------------------------------------------------------------------
# EXAMPLE 1: Showing two equivalent ways of defining an action for a simple DAG
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D + node(name="W1", distr="rbern", prob=plogis(-0.5))
D <- D + node(name="W2", distr="rbern", prob=plogis(-0.5 + 0.5*W1))
D <- D + node(name="A", distr="rbern", prob=plogis(-0.5 + 0.5*W1+ 0.5*W2))
Dset <- set.DAG(D)
# Syntax '+ action': define two actions, intervening on node "A", imputing order
Dset <- Dset + action("A0", nodes=node("A", distr="rbern", prob=0))
Dset <- Dset + action("A1", nodes=node("A", distr="rbern", prob=1))
# Equivalent syntax 'add.action': define two actions, intervening on node "A"
Dset <- add.action(Dset, "A0", nodes=node("A", distr="rbern", prob=0))
Dset <- add.action(Dset, "A1", nodes=node("A", distr="rbern", prob=1))
#---------------------------------------------------------------------------------------
# EXAMPLE 2: Adding named attributes that define (index) the action.
# Define intervention on A that is conditional on W1 crossing some threshold theta
#---------------------------------------------------------------------------------------
# Redefining node W1 as uniform [0,1]
D <- DAG.empty()
D <- D + node(name="W1", distr="runif", min=0, max=1)
D <- D + node(name="W2", distr="rbern", prob=plogis(-0.5 + 0.5*W1))
D <- D + node(name="A", distr="rbern", prob=plogis(-0.5 + 0.5*W1+ 0.5*W2))
Dset <- set.DAG(D)
# Define a node that is indexed by unknown variable theta
actN <- node("A",distr="rbern",prob=ifelse(W1 >= theta,1,0))
# Define 3 actions for theta=0.1, 0.5, 0.9
Dset <- Dset + action("A1th0.1", nodes = actN, theta = 0.1)
Dset <- Dset + action("A1th0.5", nodes = actN, theta = 0.5)
Dset <- Dset + action("A1th0.9", nodes = actN, theta = 0.9)
# Simulate 50 observations per each action above
simfull(A(Dset), n=50)
#---------------------------------------------------------------------------------------
# EXAMPLE 3: Time-varying action attributes for longitudinal DAG
#---------------------------------------------------------------------------------------
# Define longitudinal data structure over 6 time-points t=(0:5) with survival outcome "Y"
t_end <- 5
D <- DAG.empty()
D <- D + node("L2", t=0, distr="rbern", prob=0.05)
D <- D + node("L1", t=0, distr="rbern", prob=ifelse(L2[0]==1,0.5,0.1))
D <- D + node("A1", t=0, distr="rbern", prob=ifelse(L1[0]==1, 0.5, 0.1))
D <- D + node("Y", t=0, distr="rbern",
prob=plogis(-6.5 + L1[0] + 4*L2[0] + 0.05*I(L2[0]==0)), EFU=TRUE)
D <- D + node("L2", t=1:t_end, distr="rbern", prob=ifelse(A1[t-1]==1, 0.1, 0.9))
D <- D + node("A1", t=1:t_end, distr="rbern",
prob=ifelse(A1[t-1]==1, 1, ifelse(L1[0]==1 & L2[0]==0, 0.3, 0.5)))
D <- D + node("Y", t=1:t_end, distr="rbern", prob=plogis(-6.5+L1[0]+4*L2[t]), EFU=TRUE)
D <- set.DAG(D)
#---------------------------------------------------------------------------------------
# Dynamic actions indexed by constant value of parameter theta={0,1})
#---------------------------------------------------------------------------------------
# Define time-varying node A1: sets A1 to 1 if L2 at t is >= theta
actN_A1 <- node("A1",t=0:t_end, distr="rbern", prob=ifelse(L2[t] >= theta,1,0))
# Define two actions, indexed by fixed values of theta={0,1}
D_act <- D + action("A1_th0", nodes=actN_A1, theta=0)
D_act <- D_act + action("A1_th1", nodes=actN_A1, theta=1)
# Simulate 50 observations for per each action above
simfull(simcausal::A(D_act), n=50)
#---------------------------------------------------------------------------------------
# Dynamic actions indexed by time-varying parameter theta[t]
#---------------------------------------------------------------------------------------
# This defines an action node with threshold theta varying in time (note syntax theta[t])
actN_A1 <- node("A1",t=0:t_end, distr="rbern", prob=ifelse(L2[t] >= theta[t],1,0))
# Now define 3 actions that are indexed by various values of theta over time
D_act <- D + action("A1_th_const0", nodes=actN_A1, theta=rep(0,(t_end+1)))
D_act <- D_act + action("A1_th_var1", nodes=actN_A1, theta=c(0,0,0,1,1,1))
D_act <- D_act + action("A1_th_var2", nodes=actN_A1, theta=c(0,1,1,1,1,1))
# Simulate 50 observations for per each action above
simfull(simcausal::A(D_act), n=50)
Adding Node(s) to DAG
Description
Adding nodes to a growing DAG object, as in DAG + node()
. Use either syntax DAG + node()
or add.nodes(DAG = , nodes = node())
. Both give identical results, see the examples in the vignette and below for details.
Usage
add.nodes(DAG, nodes)
## S3 method for class 'DAG'
obj1 + obj2
Arguments
DAG |
DAG object |
nodes |
A node or several nodes returned from a call to |
obj1 |
Object that belongs to either classes: |
obj2 |
Object that belongs to either classes: |
Value
An updated DAG object with new nodes
See Also
List All Custom Distribution Functions in simcausal
.
Description
List All Custom Distribution Functions in simcausal
.
Usage
distr.list()
Missing Variable Imputation with Last Time Point Value Carried Forward (LTCF)
Description
Forward imputation for missing variable values in simulated data after a particular end of the follow-up event. The end of follow-up event is defined by the node of type EOF=TRUE
being equal to 1.
Usage
doLTCF(data, LTCF)
Arguments
data |
Simulated |
LTCF |
Character string specifying the outcome node that is the indicator of the end of follow-up (observations with value of the outcome variable being 1 indicate that the end of follow-up has been reached). The outcome variable must be a binary node that was declared with |
Value
Modified data.frame
, all time-varying missing variables after the EFU
outcome specified in LTCF
are forward imputed with their last available non-missing value.
Details
The default behavior of the sim
function consists in setting all nodes that temporally follow an EFU
node whose simulated value is 1 to missing (i.e., NA
).
The argument LTCF
of the sim
function can however be used to change this default behavior and impute some of these missing values with last time point value carried forward (LTCF).
More specifically, only the missing values of time-varying nodes (i.e., those with non-missing t
argument) that follow the end of follow-up event encoded by the EFU
node specified by the LTCF
argument will be imputed.
One can use the function doLTCF
to apply the last time point value carried forward (LTCF) imputation to an existing simulated dataset obtained from the function sim
that was called with its default imputation setting (i.e., with no LTCF
argument).
Illustration of the use of the LTCF imputation functionality are provided in the package vignette.
The first example below shows the default data format of the sim
function after an end of the follow-up event and how this behavior can be modified to generate data with LTCF imputation by either using the LTCF
argument of the
sim
function or by calling the doLTCF
function. The second example demonstrates how to use the doLTCF
function to perform LTCF imputation on already existing data simulated with the sim
function based on its default non-imputation behavior.
See Also
sim
, simobs
and simfull
for simulating data with and without carry forward imputation.
Other data manipulation functions:
DF.to.long()
,
DF.to.longDT()
Examples
t_end <- 10
lDAG <- DAG.empty()
lDAG <- lDAG +
node(name = "L2", t = 0, distr = "rconst", const = 0) +
node(name = "A1", t = 0, distr = "rconst", const = 0) +
node(name = "L2", t = 1:t_end, distr = "rbern",
prob = ifelse(A1[t - 1] == 1, 0.1,
ifelse(L2[t-1] == 1, 0.9,
min(1,0.1 + t/t_end)))) +
node(name = "A1", t = 1:t_end, distr = "rbern",
prob = ifelse(A1[t - 1] == 1, 1,
ifelse(L2[0] == 0, 0.3,
ifelse(L2[0] == 0, 0.1,
ifelse(L2[0] == 1, 0.7, 0.5))))) +
node(name = "Y", t = 1:t_end, distr = "rbern",
prob = plogis(-6.5 + 4 * L2[t] + 0.05 * sum(I(L2[0:t] == rep(0,(t + 1))))),
EFU = TRUE)
lDAG <- set.DAG(lDAG)
#---------------------------------------------------------------------------------------
# EXAMPLE 1. No forward imputation.
#---------------------------------------------------------------------------------------
Odat.wide <- sim(DAG = lDAG, n = 1000, rndseed = 123)
Odat.wide[c(21,47), 1:18]
Odat.wideLTCF <- sim(DAG = lDAG, n = 1000, LTCF = "Y", rndseed = 123)
Odat.wideLTCF[c(21,47), 1:18]
#---------------------------------------------------------------------------------------
# EXAMPLE 2. With forward imputation.
#---------------------------------------------------------------------------------------
Odat.wideLTCF2 <- doLTCF(data = Odat.wide, LTCF = "Y")
Odat.wideLTCF2[c(21,47), 1:18]
# all.equal(Odat.wideLTCF, Odat.wideLTCF2)
Evaluate the True Value of the Causal Target Parameter
Description
This function estimates the true value of the previously set target parameter (set.targetE
or set.targetMSM
) using the DAG object and either 1) data
: list of action-specific simulated data.frames
; or 2) actions
; or 3) when data
and actions
are missing, using all distinct actions previously defined on the DAG
object.
Usage
eval.target(
DAG,
n,
data,
actions,
rndseed = NULL,
verbose = getOption("simcausal.verbose")
)
Arguments
DAG |
DAG object with target parameter set via |
n |
Number of observations to simulate (if simulating full data), this is overwritten by the number of observations in each data |
data |
List of action-specific |
actions |
Character vector of action names which play the role of the data generating mechanism for simulated data when argument |
rndseed |
Seed for the random number generator. |
verbose |
Set to |
Details
For examples and additional details see documentation for set.targetE
or set.targetMSM
Value
For targetE returns a vector of counterfactual means, ATE or ATR; for targetMSM returns a named list with the MSM model fit ("msm"
),
MSM model coefficients ("coef"
), the mapping of the MSM summary terms S()
to the actual variable names used in the data, ("S.msm.map"
),
and the long format full data that was used for fitting this MSM "df_long"
.
Convert igraph Network Object into Sparse Adjacency Matrix
Description
Convert igraph network object into its sparse adjacency matrix representation using as_adjacency_matrix
function from the igraph
package.
Usage
igraph.to.sparseAdjMat(igraph_network)
Arguments
igraph_network |
Network as an |
Value
Sparase adjacency matrix returned by igraph::as_adjacency_matrix
function.
NOTE: for directed graphs the friend IDs pointing into vertex i
are assumed to be listed in the column i
(i.e, which(adjmat[,i])
are friends of i
).
See Also
network
; sparseAdjMat.to.NetInd
; NetInd.to.sparseAdjMat
; sparseAdjMat.to.igraph
;
List All Custom Network Generator Functions in simcausal
.
Description
List All Custom Network Generator Functions in simcausal
.
Usage
net.list()
Define a Network Generator
Description
Define a network generator by providing a function (using the argument netfun
) which will simulate a network of connected friends for observations i
in 1:n
.
This network then serves as a backbone for defining and simulating from the structural equation models for dependent data.
In particular, the network allows new nodes to be defined as functions of the previously simulated node values of i
's friends, across all observations i
.
Let F_i
denote the set of friends of one observation i
(observations in F_i
are assumed to be "connected" to i
) and
refer to the union of these sets F_i
as a "network" on n
observations, denoted by F
.
A user-supplied network generating function netfun
should be able to simulate such network F
by returning a matrix of n
rows,
where each row i
defines a friend set F_i
, i.e., row i
should be a vector of observations in 1:n
that are connected to i
(friends of i
),
with the remainder filled by NA
s.
Each friend set F_i
can contain up to Kmax
unique indices j
from 1:n
, except for i
itself.
F_i
is also allowed to be empty (row i
has only NA
s), implying that i
has no friends.
The functionality is illustrated in the examples below. For additional information see Details.
To learn how to use the node
function for defining a node as a function of the friend node values, see Syntax and Network Summary Measures.
Usage
network(name, netfun, ..., params = list())
Arguments
name |
Character string specifiying the name of the current network, may be used for adding new network that replaces the existing one (resample previous network) |
netfun |
Character name of the user-defined network generating function, can be any R function that returns a matrix of friend IDs of dimension |
... |
Named arguments specifying distribution parameters that are accepted by the network sampling function in |
params |
A list of additional named parameters to be passed on to the |
Details
Without the network of friends, the DAG
objects constructed by calling the node
function can only specify structural equation models for independent and identically distributed data.
That is, if no network is specified, for each observation i
a node can be defined conditionally only on i
's own previously simulated node values.
As a result, any two observations simulated under such data-generating model are always independent and identically distributed.
Defining a network F
allows one to define a new structural equation model where a node for each observation i
can depend
on its own simulated past, but also on the previously simulated node values of i
's friends (F_i
).
This is accomplished by allowing the data generating distribution for each observation i
's node to be defined conditionally
on the past node values of i
's friends (observations in F_i
).
The network of friends can be used in subsequent calls to node
function where new nodes (random variables) defined by the node
function can depend on the node values of i
's friends
(observations in the set F_i
). During simulation it is assumed observations on F_i
can simultaneously influence i
.
Note that the current version of the package does not allow combining time-varying node indexing Var[t]
and network node indexing Var[[net_indx]]
for the same data generating distribution.
Each argument for the input network can be an evaluable R expression. All formulas are captured by delayed evaluation and are evaluated during the simulation.
Formulas can refer to standard or user-specified R functions that must only apply to the values of previously defined nodes
(i.e. node(s) that were called prior to network()
function call).
Value
A list containing the network object(s) of type DAG.net
, this will be utilized when data is simulated with sim
function.
Syntax
The network
function call that defines the network of friends can be added to a growing DAG
object by using '+'
syntax, much like a new node
is added to a DAG
.
Subsequently defined nodes (node
function calls) can employ the double square bracket subsetting syntax to reference previously simulated node values
for specific friends in F_i
simultaneously across all observations i
.
For example, VarName[[net_indx]]
can be used inside the node
formula to reference the node VarName
values of i
's friends in F_i[net_indx]
,
simultaneously across all i
in 1:n
.
The friend subsetting index net_indx
can be any non-negative integer vector that takes values from 0 to Kmax
,
where 0 refers to the VarName
node values of observation i
itself (this is equivalent to just using VarnName
in the node
formula),
net_indx
value of 1 refers to node VarName
values for observations in F_i[1]
, across all i
in 1:n
(that is, the value of VarName
of i
's first friend F_i[1]
, if the friend exists and NA
otherwise),
and so on, up to net_indx
value of Kmax
, which would reference to the last friend node values of VarName
, as defined by observations in F_i[Kmax]
across all i
.
Note that net_indx
can be a vector (e.g, net_indx=c(1:Kmax)
),
in which case the result of the query VarName[[c(1:Kmax)]]
is a matrix of Kmax
columns and n
rows.
By default, VarName[[j]]
evaluates to missing (NA
) when observation i
does not have a friend under F_i[j]
(i.e., in the j
th spot of i
's friend set).
This default behavior however can be changed to return 0 instead of NA
, by passing an additional argument replaceNAw0 = TRUE
to the corresponding node
function.
Network Summary Measures
One can also define summary measures of the network covariates by specifying a node formula that applies an R function to the result of VarName[[net_indx]]
.
The rules for defining and applying such summary measures are identical to the rules for defining summary measures for time-varying nodes VarName[t_indx].
For example, use sum(VarName[[net_indx]])
to define a summary measure as a sum of VarName
values of friends in F_i[net_indx]
, across all observations i
in 1:n
.
Similarly, use mean(VarName[[net_indx]])
to define a summary measure as a mean of VarName
values of friends in F_i[net_indx]
, across all i
.
For more details on defining such summary functions see the simcausal
vignette.
See Also
igraph.to.sparseAdjMat
; sparseAdjMat.to.NetInd
; NetInd.to.sparseAdjMat
; sparseAdjMat.to.igraph
Examples
#--------------------------------------------------------------------------------------------------
# EXAMPLE 1. USING igraph R PACKAGE TO SIMULATE NETWORKS
#--------------------------------------------------------------------------------------------------
#--------------------------------------------------------------------------------------------------
# Example of a network sampler, will be provided as "netfun" argument to network(, netfun=);
# Generates a random graph according to the G(n,m) Erdos-Renyi model using the igraph package;
# Returns (n,Kmax) matrix of net IDs (friends) by row;
# Row i contains the IDs (row numbers) of i's friends;
# i's friends are assumed connected to i and can influence i in equations defined by node())
# When i has less than Kmax friends, the remaining i row entries are filled with NAs;
# Argument m_pn: > 0
# a total number of edges in the network as a fraction (or multiplier) of n (sample size)
#--------------------------------------------------------------------------------------------------
gen.ER <- function(n, m_pn, ...) {
m <- as.integer(m_pn*n)
if (n<=10) m <- 20
igraph.ER <- igraph::sample_gnm(n = n, m = m, directed = TRUE)
sparse_AdjMat <- igraph.to.sparseAdjMat(igraph.ER)
NetInd_out <- sparseAdjMat.to.NetInd(sparse_AdjMat)
return(NetInd_out$NetInd_k)
}
D <- DAG.empty()
# Sample ER model network using igraph::sample_gnm with m_pn argument:
D <- D + network("ER.net", netfun = "gen.ER", m_pn = 50)
# W1 - categorical (6 categories, 1-6):
D <- D +
node("W1", distr = "rcat.b1",
probs = c(0.0494, 0.1823, 0.2806, 0.2680, 0.1651, 0.0546)) +
# W2 - binary infection status, positively correlated with W1:
node("W2", distr = "rbern", prob = plogis(-0.2 + W1/3)) +
# W3 - binary confounder:
node("W3", distr = "rbern", prob = 0.6)
# A[i] is a function W1[i] and the total of i's friends values W1, W2 and W3:
D <- D + node("A", distr = "rbern",
prob = plogis(2 + -0.5 * W1 +
-0.1 * sum(W1[[1:Kmax]]) +
-0.4 * sum(W2[[1:Kmax]]) +
-0.7 * sum(W3[[1:Kmax]])),
replaceNAw0 = TRUE)
# Y[i] is a function of netW3 (friends of i W3 values) and the total N of i's friends
# who are infected AND untreated:
D <- D + node("Y", distr = "rbern",
prob = plogis(-1 + 2 * sum(W2[[1:Kmax]] * (1 - A[[1:Kmax]])) +
-2 * sum(W3[[1:Kmax]])
),
replaceNAw0 = TRUE)
# Can add N untreated friends to the above outcome Y equation: sum(1 - A[[1:Kmax]]):
D <- D + node("Y", distr = "rbern",
prob = plogis(-1 + 1.5 * sum(W2[[1:Kmax]] * (1 - A[[1:Kmax]])) +
-2 * sum(W3[[1:Kmax]]) +
0.25 * sum(1 - A[[1:Kmax]])
),
replaceNAw0 = TRUE)
# Can add N infected friends at baseline to the above outcome Y equation: sum(W2[[1:Kmax]]):
D <- D + node("Y", distr = "rbern",
prob = plogis(-1 + 1 * sum(W2[[1:Kmax]] * (1 - A[[1:Kmax]])) +
-2 * sum(W3[[1:Kmax]]) +
0.25 * sum(1 - A[[1:Kmax]]) +
0.25 * sum(W2[[1:Kmax]])
),
replaceNAw0 = TRUE)
Dset <- set.DAG(D, n.test = 100)
# Simulating data from the above sem:
datnet <- sim(Dset, n = 1000, rndseed = 543)
head(datnet)
# Obtaining the network object from simulated data:
net_object <- attributes(datnet)$netind_cl
# Max number of friends:
net_object$Kmax
# Network matrix
head(attributes(datnet)$netind_cl$NetInd)
#--------------------------------------------------------------------------------------------------
# EXAMPLE 2. USING CUSTOM NETWORK GENERATING FUNCTION
#--------------------------------------------------------------------------------------------------
#--------------------------------------------------------------------------------------------------
# Example of a user-defined network sampler(s) function
# Arguments K, bslVar[i] (W1) & nF are evaluated in the environment of the simulated data then
# passed to genNET() function
# - K: maximum number of friends for any unit
# - bslVar[i]: used for contructing weights for the probability of selecting i as
# someone else's friend (weighted sampling), when missing the sampling goes to uniform
# - nF[i]: total number of friends that need to be sampled for observation i
#--------------------------------------------------------------------------------------------------
genNET <- function(n, K, bslVar, nF, ...) {
prob_F <- plogis(-4.5 + 2.5*c(1:K)/2) / sum(plogis(-4.5 + 2.5*c(1:K)/2))
NetInd_k <- matrix(NA_integer_, nrow = n, ncol = K)
nFriendTot <- rep(0L, n)
for (index in (1:n)) {
FriendSampSet <- setdiff(c(1:n), index)
nFriendSamp <- max(nF[index] - nFriendTot[index], 0L)
if (nFriendSamp > 0) {
if (length(FriendSampSet) == 1) {
friends_i <- FriendSampSet
} else {
friends_i <- sort(sample(FriendSampSet, size = nFriendSamp,
prob = prob_F[bslVar[FriendSampSet] + 1]))
}
NetInd_k[index, ] <- c(as.integer(friends_i),
rep_len(NA_integer_, K - length(friends_i)))
nFriendTot[index] <- nFriendTot[index] + nFriendSamp
}
}
return(NetInd_k)
}
D <- DAG.empty()
D <- D +
# W1 - categorical or continuous confounder (5 categories, 0-4):
node("W1", distr = "rcat.b0",
probs = c(0.0494, 0.1823, 0.2806, 0.2680, 0.1651, 0.0546)) +
# W2 - binary infection status at t=0, positively correlated with W1:
node("W2", distr = "rbern", prob = plogis(-0.2 + W1/3)) +
# W3 - binary confounder:
node("W3", distr = "rbern", prob = 0.6)
# def.nF: total number of friends for each i (0-K), each def.nF[i] is influenced by categorical W1
K <- 10
set.seed(12345)
normprob <- function(x) x / sum(x)
p_nF_W1_mat <- apply(matrix(runif((K+1)*6), ncol = 6, nrow = (K+1)), 2, normprob)
colnames(p_nF_W1_mat) <- paste0("p_nF_W1_", c(0:5))
create_probs_nF <- function(W1) t(p_nF_W1_mat[,W1+1])
vecfun.add("create_probs_nF")
D <- D + node("def.nF", distr = "rcat.b0", probs = create_probs_nF(W1))
# Adding the network generator that depends on nF and categorical W1:
D <- D + network(name="net.custom", netfun = "genNET", K = K, bslVar = W1, nF = def.nF)
# Define A[i] is a function W1[i] as well as the total sum of i's friends values for W1, W2 and W3:
D <- D + node("A", distr = "rbern",
prob = plogis(2 + -0.5 * W1 +
-0.1 * sum(W1[[1:Kmax]]) +
-0.4 * sum(W2[[1:Kmax]]) +
-0.7 * sum(W3[[1:Kmax]])),
replaceNAw0 = TRUE)
# Y[i] is a the total N of i's friends who are infected AND untreated
# + a function of friends W3 values
D <- D + node("pYRisk", distr = "rconst",
const = plogis(-1 + 2 * sum(W2[[1:Kmax]] * (1 - A[[1:Kmax]])) +
-1.5 * sum(W3[[1:Kmax]])),
replaceNAw0 = TRUE)
D <- D + node("Y", distr = "rbern", prob = pYRisk)
Dset <- set.DAG(D, n.test = 100)
# Simulating data from the above sem:
datnet <- sim(Dset, n = 1000, rndseed = 543)
head(datnet, 10)
# Obtaining the network object from simulated data:
net_object <- attributes(datnet)$netind_cl
# Max number of friends:
net_object$Kmax
# Network matrix
head(attributes(datnet)$netind_cl$NetInd)
plotDAG(Dset)
Create Node Object(s)
Description
Define a single DAG node and its distribution or define many repeated-measure/time-varying nodes by using argument t
.
The node distribution is allowed to vary as a function of time (t
). Conditionaing on past nodes is accomplished by using the syntactic sugar, such as, NodeName[t]
.
After all the nodes have been added to the DAG, call set.DAG
, a DAG object constructor, and add.action
, an action (intervention) constructor.
Usage
node(name, t, distr, EFU, order, ..., params = list(), asis.params = list())
Arguments
name |
Character node name or a vector of names when specifying a multivariate node. For time-dependent nodes the names will be automatically expanded to a scheme "name_t" for each t provided specified. |
t |
Node time-point(s). Allows specification of several time-points when t is a vector of positive integers, in which case the output will consist of a named list of length(t) nodes, corresponding to each value in t. |
distr |
Character name of the node distribution, can be a standard distribution R function, s.a. rnorm, rbinom, runif or user defined. The function must accept a named argument "n" to specify the total sample size. Distributional parameters (arguments) must be passed as either named arguments to node or as a named list of parameters "params". |
EFU |
End-of-Follow Up flag for designating a survival/censoring type node, only applies to Bernoulli nodes. When |
order |
An optional integer parameter specifying the order in which these nodes will be sampled. The value of order has to start at 1 and be unique for each new node,
can be specified as a range / vector and has to be of the same length as the argument |
... |
Named arguments specifying distribution parameters that are accepted by the |
params |
A list of additional named parameters to be passed on to the |
asis.params |
(ADVANCED USE) A list of additional named distributional parameters that will be evaluated "as is",
inside the currently simulated data.frame + the calling environment, without any modifications to the R expression strings inside the |
Value
A list containing node object(s) (expanded to several nodes if t is an integer vector of length > 1)
Details
The combination of name
and t
must uniquely identify each node in the DAG. Use argument t
to identify measurements of the same attribute (e.g. 'A1c') at various time points.
The collection of all unique t
values, across all nodes, should consist of non-negative integers (i.e., starting at 0).
The optional order
argument can be specified, used for determining the sampling order of each node.
When order
not specified, it is automatically inferred based on the actual order in which the nodes were added to the DAG (earlier added nodes get lower order
value and are sampled first)
All node calls that share the same generic name name
must also share the same EFU
value (if any is specified in at least one of them).
A value of TRUE
for the EFU
indicates that if a simulated value for a measurement of the attribute represented by node is 1
then all the following nodes with that measurement (in terms of higher t
values) in the DAG will be unobserved (i.e., their simulated value will be set to NA).
Node formulas (parameters of the distribution)
Each user-supplied argument to the node function is an evaluable R expression, their evaluation is delayed until the actual simulation time.
These arguments can refer to standard or user-specified R functions that must only apply to the values of parent nodes,
i.e. a subset of the node(s) with an order
value strictly lower than that of the node characterized by the formula.
Formulas must reference the parent nodes with unique name
identifiers, employing the square bracket vector subsetting name[t]
for referencing a
parent node at a particular time point t
(if any time-points were specified).
The square bracket notation is used to index a generic name with the relevant time point as illustrated in the examples.
When an input node is used to define several nodes (i.e., several measurement of the same attribute, t=0:5
), the formula(s) specified in that node can apply
to each node indexed by a given time point denoted by t
. This generic expression t
can then be referenced within a formula to simultaneously identify a
different set of parent nodes for each time point as illustrated below. Note that the parents of each node represented by a given node
object are implicitly defined
by the nodes referenced in formulas of that node
call.
Different types of evaluation for node function arguments
There is quite a bit of flexibility in the way in which the node
function arguments can be evaluated.
By default, the named arguments specified as expressions are first captured by delayed-evaluation and then
modified by simcausal
to enable the special types of functional syntax.
For example, simcausal will over-ride the subsetting operators '[...]
' (for time varying nodes) and '[[...]]' (for networks), implying
that these operators can no longer be used in their typical R
way.
Furthermore, simcausal will over-ride the standard 'c' function, with its own definition. Similarly, it will over-ride any calls to sum
and mean
functions
with their row-matrix counterpart functions rowSums
and rowMeans
.
When programming with simcausal
(such as passing node arguments inside a function, prior to defining the node), it may be helpful to instead pass
such node arguments as character strings, rather than as R expressions. In this case one should use the argument params
by adopting the following syntax node(...,params = list(mean="A+B"))
, which in this case is equivalent to: node(..., mean = A+B)
.
There are also instances when it might be desirable to retain the original behavior of all R
expressions and functions and evaluate a particular node argument "as is".
For example, the user may wish to retain the
original R
functionality of all its operators, including those of [...]
and [[...]]
.
In this case the node argument (or a specific part of the node argument) should be wrapped in .()
or eval()
.
Note that once the expression has been wrapped with .(...)
(or eval(...)
), the simcausal
definitions of operators [...]
and [[...]]
no
longer apply to these expressions and no error checking for "correctness" of these node arguments will be performed.
The forced-evaluation operator .()
can be also used as part of an expression,
which will prevent the typical simcausal
evaluation on only that specific part of the expression. Example 8 below demonstrates the following use case
for the expression .(coefAi[t]) * A[t-1]
,
which will look for vector coefAi
and then subset it by current value of t
(and return a scalar),
while A[t-1]
will evaluate to the entire column vector of variable A
for time point t-1
.
Such an expression will multiply the entire time-varying vector A[t-1]
by scalar value determined
by current value of t
and the previously defined vector coefAi
.
Furthermore, even when a vector or a matrix is wrapped in .(...) it still will be automatically re-parsed into K column matrix with n rows.
When this is not desired, for example, when defining a multivariate node distribution, the user may pass such vector or matrix node arguments as a character string
in a list argument asis.params
. See Example 7 and 8 below for additional details.
Multivariate random variables (multivariate nodes)
Starting from v.0.5, a single node
call can be used for defining a multivariate (and possibly correlated) random vector.
To define a random vector that has more than 1 dimension, use the argument name
to specify a vector with names for each dimension, e.g.,
node(c("X1","X2"), distr = "mvtnorm::rmvnorm", asis.params = list(mean = "c(0,1)", sigma = "matrix(c(1,0.75,0.75,1), ncol=2)"))
will define a bi-variate (correlated) normally distributed node,
the simulated data set will contain this bi-variately distributed random variable in columns "X1" and "X2".
Note that the multivariate sampling distribution function (such as function rmvnorm
from the package mvtnorm
) must return a matrix of
n
rows (number of observations) and length(name)
columns (dimensionality). See additional examples below.
Note that one can also define time-varying multivariate nodes, e.g.,
node(c("X1","X2"), t=0:5, distr = "mvtnorm::rmvnorm", asis.params = list(mean = "c(0,1)"))
.
References
Sofrygin O, van der Laan MJ, Neugebauer R (2017). "simcausal R Package: Conducting Transparent and Reproducible Simulation Studies of Causal Effect Estimation with Complex Longitudinal Data." Journal of Statistical Software, 81(2), 1-47. doi: 10.18637/jss.v081.i02.
Examples
#---------------------------------------------------------------------------------------
# EXAMPLE 1A: Define some Bernoulli nodes, survival outcome Y and put it together in a
# DAG object
#---------------------------------------------------------------------------------------
W1node <- node(name = "W1", distr = "rbern",
prob = plogis(-0.5), order = 1)
W2node <- node(name = "W2", distr = "rbern",
prob = plogis(-0.5 + 0.5 * W1), order = 2)
Anode <- node(name = "A", distr = "rbern",
prob = plogis(-0.5 - 0.3 * W1 - 0.3 * W2), order = 3)
Ynode <- node(name = "Y", distr = "rbern",
prob = plogis(-0.1 + 1.2 * A + 0.3 * W1 + 0.3 * W2), order = 4)
D1Aset <- set.DAG(c(W1node,W2node,Anode,Ynode))
#---------------------------------------------------------------------------------------
# EXAMPLE 1B: Same as 1A using +node interface and no order argument
#---------------------------------------------------------------------------------------
D1B <- DAG.empty()
D1B <- D1B +
node(name = "W1", distr = "rbern", prob = plogis(-0.5)) +
node(name = "W2", distr = "rbern", prob = plogis(-0.5 + 0.5 * W1)) +
node(name = "A", distr = "rbern", prob = plogis(-0.5 - 0.3 * W1 - 0.3 * W2)) +
node(name = "Y", distr = "rbern", prob = plogis(-0.1 + 1.2 * A + 0.3 * W1 + 0.3 * W2))
D1Bset <- set.DAG(D1B)
#---------------------------------------------------------------------------------------
# EXAMPLE 1C: Same as 1A and 1B using add.nodes interface and no order argument
#---------------------------------------------------------------------------------------
D1C <- DAG.empty()
D1C <- add.nodes(D1C, node(name = "W1", distr = "rbern", prob = plogis(-0.5)))
D1C <- add.nodes(D1C, node(name = "W2", distr = "rbern",
prob = plogis(-0.5 + 0.5 * W1)))
D1C <- add.nodes(D1C, node(name = "A", distr = "rbern",
prob = plogis(-0.5 - 0.3 * W1 - 0.3 * W2)))
D1C <- add.nodes(D1C, node(name = "Y", distr = "rbern",
prob = plogis(-0.1 + 1.2 * A + 0.3 * W1 + 0.3 * W2)))
D1C <- set.DAG(D1C)
#---------------------------------------------------------------------------------------
# EXAMPLE 1D: Add a uniformly distributed node and redefine outcome Y as categorical
#---------------------------------------------------------------------------------------
D_unif <- DAG.empty()
D_unif <- D_unif +
node("W1", distr = "rbern", prob = plogis(-0.5)) +
node("W2", distr = "rbern", prob = plogis(-0.5 + 0.5 * W1)) +
node("W3", distr = "runif", min = plogis(-0.5 + 0.7 * W1 + 0.3 * W2), max = 10) +
node("An", distr = "rbern", prob = plogis(-0.5 - 0.3 * W1 - 0.3 * W2 - 0.2 * sin(W3)))
# Categorical syntax 1 (probabilities as values):
D_cat_1 <- D_unif + node("Y", distr = "rcat.b1", probs = {0.3; 0.4})
D_cat_1 <- set.DAG(D_cat_1)
# Categorical syntax 2 (probabilities as formulas):
D_cat_2 <- D_unif +
node("Y", distr = "rcat.b1",
probs={plogis(-0.1 + 1.2 * An + 0.3 * W1 + 0.3 * W2 + 0.2 * cos(W3));
plogis(-0.5 + 0.7 * W1)})
D_cat_2 <- set.DAG(D_cat_2)
#---------------------------------------------------------------------------------------
# EXAMPLE 2A: Define Bernoulli nodes using R rbinom() function, defining prob argument
# for L2 as a function of node L1
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D +
node("L1", t = 0, distr = "rbinom", prob = 0.05, size = 1) +
node("L2", t = 0, distr = "rbinom", prob = ifelse(L1[0] == 1, 0.5, 0.1), size = 1)
Dset <- set.DAG(D)
#---------------------------------------------------------------------------------------
# EXAMPLE 2B: Equivalent to 2A, passing argument size to rbinom inside a named list
# params
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D +
node("L1", t = 0, distr = "rbinom", prob = 0.05, params = list(size = 1)) +
node("L2", t = 0, distr = "rbinom",
prob = ifelse(L1[0] == 1,0.5,0.1), params = list(size = 1))
Dset <- set.DAG(D)
#---------------------------------------------------------------------------------------
# EXAMPLE 2C: Equivalent to 2A and 2B, define Bernoulli nodes using a wrapper "rbern"
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D +
node("L1", t = 0, distr = "rbern", prob = 0.05) +
node("L2", t = 0, distr = "rbern", prob = ifelse(L1[0] == 1, 0.5, 0.1))
Dset <- set.DAG(D)
#---------------------------------------------------------------------------------------
# EXAMPLE 3: Define node with normal distribution using rnorm() R function
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D + node("L2", t = 0, distr = "rnorm", mean = 10, sd = 5)
Dset <- set.DAG(D)
#---------------------------------------------------------------------------------------
# EXAMPLE 4: Define 34 Bernoulli nodes, or 2 Bernoulli nodes over 17 time points,
#---------------------------------------------------------------------------------------
t_end <- 16
D <- DAG.empty()
D <- D +
node("L2", t = 0:t_end, distr = "rbinom",
prob = ifelse(t == t_end, 0.5, 0.1), size = 1) +
node("L1", t = 0:t_end, distr = "rbinom",
prob = ifelse(L2[0] == 1, 0.5, 0.1), size = 1)
Dset <- set.DAG(D)
sim(Dset, n=10)
#---------------------------------------------------------------------------------------
# EXAMPLE 5: Defining new distribution function 'rbern', defining and passing a custom
# vectorized node function 'customfun'
#---------------------------------------------------------------------------------------
rbern <- function(n, prob) { # defining a bernoulli wrapper based on R rbinom function
rbinom(n = n, prob = prob, size = 1)
}
customfun <- function(arg, lambda) {
res <- ifelse(arg == 1, lambda, 0.1)
res
}
D <- DAG.empty()
D <- D +
node("W1", distr = "rbern", prob = 0.05) +
node("W2", distr = "rbern", prob = customfun(W1, 0.5)) +
node("W3", distr = "rbern", prob = ifelse(W1 == 1, 0.5, 0.1))
D1d <- set.DAG(D, vecfun = c("customfun"))
sim(D1d, n = 10, rndseed = 1)
#---------------------------------------------------------------------------------------
# EXAMPLE 6: Defining latent variables I and U.Y (will be hidden from simulated data)
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D +
node("I",
distr = "rcat.b1",
probs = c(0.1, 0.2, 0.2, 0.2, 0.1, 0.1, 0.1)) +
node("W1",
distr = "rnorm",
mean = ifelse(I == 1, 0, ifelse(I == 2, 3, 10)) + 0.6 * I,
sd = 1) +
node("W2",
distr = "runif",
min = 0.025*I, max = 0.7*I) +
node("W3",
distr = "rbern",
prob = plogis(-0.5 + 0.7*W1 + 0.3*W2 - 0.2*I)) +
node("A",
distr = "rbern",
prob = plogis(+4.2 - 0.5*W1 + 0.2*W2/2 + 0.2*W3)) +
node("U.Y", distr = "rnorm", mean = 0, sd = 1) +
node("Y",
distr = "rconst",
const = -0.5 + 1.2*A + 0.1*W1 + 0.3*W2 + 0.2*W3 + 0.2*I + U.Y)
Dset1 <- set.DAG(D, latent.v = c("I", "U.Y"))
sim(Dset1, n = 10, rndseed = 1)
#---------------------------------------------------------------------------------------
# EXAMPLE 7: Multivariate random variables
#---------------------------------------------------------------------------------------
if (requireNamespace("mvtnorm", quietly = TRUE)) {
D <- DAG.empty()
# 2 dimensional normal (uncorrelated), using rmvnorm function from rmvnorm package:
D <- D +
node(c("X1","X2"), distr = "mvtnorm::rmvnorm",
asis.params = list(mean = "c(0,1)")) +
# Can define a degenerate (rconst) multivariate node:
node(c("X1.copy", "X2.copy"), distr = "rconst", const = c(X1, X2))
Dset1 <- set.DAG(D, verbose = TRUE)
sim(Dset1, n = 10)
}
# On the other hand this syntax wont work,
# since simcausal will parse c(0,1) into a two column matrix:
## Not run:
D <- DAG.empty()
D <- D + node(c("X1","X2"), distr = "mvtnorm::rmvnorm", mean = c(0,1))
Dset1 <- set.DAG(D, verbose = TRUE)
## End(Not run)
if (requireNamespace("mvtnorm", quietly = TRUE)) {
D <- DAG.empty()
# Bivariate normal (correlation coef 0.75):
D <- D +
node(c("X1","X2"), distr = "mvtnorm::rmvnorm",
asis.params = list(mean = "c(0,1)",
sigma = "matrix(c(1,0.75,0.75,1), ncol=2)")) +
# Can use any component of such multivariate nodes when defining new nodes:
node("A", distr = "rconst", const = 1 - X1)
Dset2 <- set.DAG(D, verbose = TRUE)
sim(Dset2, n = 10)
}
# Time-varying multivar node (3 time-points, 2 dimensional normal)
# plus changing the mean over time, as as function of t:
if (requireNamespace("mvtnorm", quietly = TRUE)) {
D <- DAG.empty()
D <- D +
node(c("X1", "X2"), t = 0:2, distr = "mvtnorm::rmvnorm",
asis.params = list(
mean = "c(0,1) + t",
sigma = "matrix(rep(0.75,4), ncol=2)"))
Dset5b <- set.DAG(D)
sim(Dset5b, n = 10)
}
# Two ways to define the same bivariate uniform copula:
if (requireNamespace("copula", quietly = TRUE)) {
D <- DAG.empty()
D <- D +
# with a warning since normalCopula() returns an object unknown to simcausal:
node(c("X1","X2"), distr = "copula::rCopula",
copula = eval(copula::normalCopula(0.75, dim = 2))) +
# same, as above:
node(c("X3","X4"), distr = "copula::rCopula",
asis.params = list(copula = "copula::normalCopula(0.75, dim = 2)"))
vecfun.add("qbinom")
# Bivariate binomial derived from previous copula, with same correlation:
D <- D +
node(c("A.Bin1", "A.Bin2"), distr = "rconst",
const = c(qbinom(X1, 10, 0.5), qbinom(X2, 15, 0.7)))
Dset3 <- set.DAG(D)
sim(Dset3, n = 10)
}
# Same as "A.Bin1" and "A.Bin2", but directly using rmvbin function in bindata package:
if (requireNamespace("bindata", quietly = TRUE)) {
D <- DAG.empty()
D <- D +
node(c("B.Bin1","B.Bin2"), distr = "bindata::rmvbin",
asis.params = list(
margprob = "c(0.5, 0.5)",
bincorr = "matrix(c(1,0.75,0.75,1), ncol=2)"))
Dset4b <- set.DAG(D)
sim(Dset4b, n = 10)
}
#---------------------------------------------------------------------------------------
# EXAMPLE 8: Combining simcausal non-standard evaluation with eval() forced evaluation
#---------------------------------------------------------------------------------------
coefAi <- 1:10
D <- DAG.empty()
D <- D +
node("A", t = 1, distr = "rbern", prob = 0.7) +
node("A", t = 2:10, distr = "rconst", const = eval(coefAi[t]) * A[t-1])
Dset8 <- set.DAG(D)
sim(Dset8, n = 10)
#---------------------------------------------------------------------------------------
# TWO equivalent ways of creating a multivariate node (combining nodes W1 and W2):
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D + node("W1", distr = "rbern", prob = 0.45)
D <- D + node("W2", distr = "rconst", const = 1)
# option 1:
D <- D + node(c("W1.copy1", "W2.copy1"), distr = "rconst", const = c(W1, W2))
# equivalent option 2:
create_mat <- function(W1, W2) cbind(W1, W2)
vecfun.add("create_mat")
D <- D + node(c("W1.copy2", "W2.copy2"), distr = "rconst", const = create_mat(W1, W2))
Dset <- set.DAG(D)
sim(Dset, n=10, rndseed=1)
Show Node Parents Given DAG Object
Description
Given a vector of node names, this function provides the name(s) of node parents that were obtained by parsing the node formulas.
Usage
parents(DAG, nodesChr)
Arguments
DAG |
A DAG object that was specified by calling |
nodesChr |
A vector of node names that are already defined in DAG |
Value
A list with parent names for each node name in nodesChr
Examples
D <- DAG.empty()
D <- D + node(name="W1", distr="rbern", prob=plogis(-0.5))
D <- D + node(name="W2", distr="rbern", prob=plogis(-0.5 + 0.5*W1))
D <- D + node(name="A", distr="rbern", prob=plogis(-0.5 - 0.3*W1 - 0.3*W2))
D <- D + node(name="Y", distr="rbern", prob=plogis(-0.1 + 1.2*A + 0.3*W1 + 0.3*W2), EFU=TRUE)
D <- set.DAG(D)
parents(D, c("W2", "A", "Y"))
Plot DAG
Description
Plot DAG object using functions from igraph
package.
The default setting is to keep the regular (observed) DAG nodes with shape
set to "none", which can be over-ridden by the user.
For latent (hidden) DAG nodes the default is to:
1) set the node color as grey;
2) enclose the node by a circle; and
3) all directed edges coming out of the latent node are plotted as dashed.
Usage
plotDAG(
DAG,
tmax = NULL,
xjitter,
yjitter,
node.action.color,
vertex_attrs = list(),
edge_attrs = list(),
excludeattrs,
customvlabs,
verbose = getOption("simcausal.verbose")
)
Arguments
DAG |
A DAG object that was specified by calling |
tmax |
Maximum time-point to plot for time-varying DAG objects |
xjitter |
Amount of random jitter for node x-axis plotting coordinates |
yjitter |
Amount of random jitter for node y-axis plotting coordinates |
node.action.color |
Color of the action node labels (only for action DAG of class DAG.action). If missing, defaults to red. |
vertex_attrs |
A named list of |
edge_attrs |
A named list of |
excludeattrs |
A character vector for DAG nodes that should be excluded from the plot |
customvlabs |
A named vector of custom DAG node labels (replaces node names from the DAG object). |
verbose |
Set to |
References
Sofrygin O, van der Laan MJ, Neugebauer R (2017). "simcausal R Package: Conducting Transparent and Reproducible Simulation Studies of Causal Effect Estimation with Complex Longitudinal Data." Journal of Statistical Software, 81(2), 1-47. doi: 10.18637/jss.v081.i02.
(EXPERIMENTAL) Plot Discrete Survival Function(s)
Description
Plot discrete survival curves from a list of discrete survival probabilities by calling plot
with type='b'
.
Usage
plotSurvEst(
surv = list(),
xindx = NULL,
ylab = "",
xlab = "t",
ylim = c(0, 1),
legend.xyloc = "topright",
...
)
Arguments
surv |
A list of vectors, each containing action-specific discrete survival probabilities over time. |
xindx |
A vector of indices for subsetting the survival vectors in |
ylab |
An optional title for y axis, passed to |
xlab |
An optional title for x axis, passed to |
ylim |
Optional y limits for the plot, passed to |
legend.xyloc |
Optional x and y co-ordinates to be used to position the legend.
Can be specified by keyword or as a named list with (x,y), uses the same convention as in |
... |
Additional arguments passed to |
Print DAG Object
Description
Print DAG Object
Usage
## S3 method for class 'DAG'
print(x, ...)
Arguments
x |
A DAG object. |
... |
Other arguments to generic print. |
Print Action Object
Description
Print Action Object
Usage
## S3 method for class 'DAG.action'
print(x, ...)
Arguments
x |
An object. |
... |
Other arguments to generic print. |
Print DAG.node Object
Description
Print DAG.node Object
Usage
## S3 method for class 'DAG.node'
print(x, ...)
Arguments
x |
A Node object. |
... |
Other arguments to generic print. |
Random Sample from Bernoulli Distribution
Description
Wrapper for Bernoulli node distribution.
Usage
rbern(n, prob)
Arguments
n |
Sample size. |
prob |
A vector of success probabilities. |
Value
Binary vector of length n
.
Examples
#---------------------------------------------------------------------------------------
# Specifying and simulating from a DAG with 3 Bernoulli nodes
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D + node("W1", distr="rbern", prob=0.05)
D <- D + node("W2", distr="rbern", prob=ifelse(W1==1,0.5,0.1))
D <- D + node("W3", distr="rbern", prob=ifelse(W1==1,0.5,0.1))
Dset <- set.DAG(D)
simdat <- sim(Dset, n=200, rndseed=1)
Random Sample for a Categorical Factor
Description
Matrix version of the categorical distribution. The argument probs
can be a matrix of n rows,
specifying individual (varying in sample) categorical probabilities.
The number of categories generated is equal to ncol(probs)+1
, the levels labeled as: 1,...,ncol(probs)+1
.
Usage
rcat.factor(n, probs)
rcategor(n, probs)
Arguments
n |
Sample size. |
probs |
Either a vector or a matrix of success probabilities.
When |
Value
A factor of length n
with levels: 1,2, ...,ncol(probs)+1
.
Functions
-
rcategor()
: (Deperecated) Random Sample of a Categorical Factor
See Also
Examples
#---------------------------------------------------------------------------------------
# Specifying and simulating from a DAG with one categorical node with constant
# probabilities
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D + node("race",t=0,distr="rcat.factor",probs=c(0.2,0.1,0.4,0.15,0.05,0.1))
Dset <- set.DAG(D)
simdat <- sim(Dset, n=200, rndseed=1)
#---------------------------------------------------------------------------------------
# Specifying and simulating from a DAG with a categorical node with varying
# probabilities (probabilities are determined by values sampled for nodes L0 and L1)
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D + node("L0", distr="rnorm", mean=10, sd=5)
D <- D + node("L1", distr="rnorm", mean=10, sd=5)
D <- D + node("L2", distr="rcat.factor", probs=c(abs(1/L0), abs(1/L1)))
Dset <- set.DAG(D)
simdat <- sim(Dset, n=200, rndseed=1)
Random Sample from Base 1 (rcat.b1) or Base 0 (rcat.b0) Categorical (Integer) Distribution
Description
Same as , but returning a vector of sampled integers with range 1, 2, ..., ncol(probs)+1
for rcat.b1
or range 0, 1, ..., ncol(probs)
for rcat.b0
. For sampling categorical factors see rcat.factor.
Usage
rcategor.int(n, probs)
rcat.b1(n, probs)
rcat.b0(n, probs)
Arguments
n |
Sample size. |
probs |
Either a vector or a matrix of success probabilities.
When probs is a vector, |
Value
An integer vector of length n
with range either in 0,...,ncol(probs)
or in 1,...,ncol(probs)+1
.
Functions
-
rcategor.int()
: (Deperecated) Random Sample from Base 1 Categorical (Integer) Distribution -
rcat.b1()
: Random Sample from Base 1 Categorical (Integer) Distribution -
rcat.b0()
: Random Sample from Base 0 Categorical (Integer) Distribution
See Also
Constant (Degenerate) Distribution (Returns its Own Argument const
)
Description
Wrapper for constant value (degenerate) distribution.
Usage
rconst(n, const)
Arguments
n |
Sample size. |
const |
Either a vector with one constant value (replicated |
Value
A vector of constants of length n
.
Examples
#---------------------------------------------------------------------------------------
# Specifying and simulating from a DAG with 1 Bernoulli and 2 constant nodes
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D + node("W1", distr = "rbern", prob = 0.45)
D <- D + node("W2", distr = "rconst", const = 1)
D <- D + node("W3", distr = "rconst", const = ifelse(W1 == 1, 5, 10))
# TWO equivalent ways of creating a multivariate node (just repeating W1 and W2):
create_mat <- function(W1, W2) cbind(W1, W2)
vecfun.add("create_mat")
D <- D + node(c("W1.copy1", "W2.copy1"), distr = "rconst", const = c(W1, W2))
D <- D + node(c("W1.copy2", "W2.copy2"), distr = "rconst", const = create_mat(W1, W2))
Dset <- set.DAG(D)
sim(Dset, n=10, rndseed=1)
Template for Writing Custom Distribution Functions
Description
Template function for writing SimCausal
custom distribution wrappers.
Usage
rdistr.template(n, arg1, arg2, ...)
Arguments
n |
Sample size that needs to be generated |
arg1 |
Argument 2 |
arg2 |
Argument 1 |
... |
Additional optional parameters |
Details
One of the named arguments must be 'n', this argument is passed on to the function automatically by
the package and is assigned to the number of samples that needs to be generated from this distribution.
Other arguments (in this example arg1 and arg2) must be declared by the user as
arguments inside the node() function that uses this distribution,
e.g., node("Node1"
, distr="distr.template"
, arg1 = ...
, arg2 = ...)
.
Both, arg1 and arg2, can be either numeric constants or formulas involving past node names.
The constants get passed on to the distribution function unchanged.
The formulas are evaluated inside the environment of the simulated data and are passed on to the
distribution functions as vectors.
The output of the distribution function is expected to be a vector of length n of the sampled covariates.
Value
A vector of length n
Call igraph::sample_smallworld
to Generate Random Graph Object from the Watts-Strogatz Small-World Model
Description
Call igraph::sample_smallworld
and convert the output to simcausal
network matrix.
The parameters are the same as those of igraph::sample_smallworld
.
The loop edges aren't allowed (loops = FALSE
) and the multiple edges aren't allowed either multiple = FALSE
.
Usage
rnet.SmWorld(n, dim, nei, p)
Arguments
n |
Size of the network graph (the number of nodes). |
dim |
Same as in |
nei |
Same as in |
p |
Same as in |
Value
A matrix with n rows, each row lists the indices of friends connected to that particular observation.
See Also
Call igraph::sample_gnm
to Generate Random Graph Object According to the G(n,m) Erdos-Renyi Model
Description
Call igraph::sample_gnm
and convert the output to simcausal
network matrix.
The parameter m
of igraph::sample_gnm
is derived from n
and m_pn
as as.integer(m_pn*n)
Usage
rnet.gnm(n, m_pn)
Arguments
n |
Size of the network graph (number of nodes). |
m_pn |
The total number of edges as a fraction of the sample size |
Value
A matrix with n rows, each row lists the indices of friends connected to that particular observation.
See Also
Call igraph::sample_gnp
to Generate Random Graph Object According to the G(n,p) Erdos-Renyi Model
Description
Call igraph::sample_gnp
and convert the output to simcausal
network matrix.
Usage
rnet.gnp(n, p)
Arguments
n |
Size of the network graph (number of nodes). |
p |
Same as |
Value
A matrix with n rows, each row lists the indices of friends connected to that particular observation.
See Also
Create and Lock DAG Object
Description
Check current DAG (created with node
) for errors and consistency of its node distributions, set the observed data generating distribution.
Attempts to simulates several observations to catch any errors in DAG definition. New nodes cannot be added after function set.DAG has been applied.
Usage
set.DAG(
DAG,
vecfun,
latent.v,
n.test = 10,
verbose = getOption("simcausal.verbose")
)
Arguments
DAG |
Named list of node objects that together will form a DAG.
Temporal ordering of nodes is either determined by the order in which the nodes were added to the DAG (using |
vecfun |
A character vector with names of the vectorized user-defined node formula functions. See examples and the vignette for more information. |
latent.v |
The names of the unobserved (latent) DAG node names. These variables will be hidden from the observed simulated data and will be marked differently on the DAG plot. |
n.test |
Non-negative simulation sample size used for testing the consistency of the |
verbose |
Set to |
Value
A DAG (S3) object, which is a list consisting of node object(s) sorted by their temporal order.
References
Sofrygin O, van der Laan MJ, Neugebauer R (2017). "simcausal R Package: Conducting Transparent and Reproducible Simulation Studies of Causal Effect Estimation with Complex Longitudinal Data." Journal of Statistical Software, 81(2), 1-47. doi: 10.18637/jss.v081.i02.
Examples
#---------------------------------------------------------------------------------------
# EXAMPLE 1A: Define some Bernoulli nodes, survival outcome Y and put it together in a
# DAG object
#---------------------------------------------------------------------------------------
W1node <- node(name = "W1", distr = "rbern",
prob = plogis(-0.5), order = 1)
W2node <- node(name = "W2", distr = "rbern",
prob = plogis(-0.5 + 0.5 * W1), order = 2)
Anode <- node(name = "A", distr = "rbern",
prob = plogis(-0.5 - 0.3 * W1 - 0.3 * W2), order = 3)
Ynode <- node(name = "Y", distr = "rbern",
prob = plogis(-0.1 + 1.2 * A + 0.3 * W1 + 0.3 * W2), order = 4)
D1Aset <- set.DAG(c(W1node,W2node,Anode,Ynode))
#---------------------------------------------------------------------------------------
# EXAMPLE 1B: Same as 1A using +node interface and no order argument
#---------------------------------------------------------------------------------------
D1B <- DAG.empty()
D1B <- D1B +
node(name = "W1", distr = "rbern", prob = plogis(-0.5)) +
node(name = "W2", distr = "rbern", prob = plogis(-0.5 + 0.5 * W1)) +
node(name = "A", distr = "rbern", prob = plogis(-0.5 - 0.3 * W1 - 0.3 * W2)) +
node(name = "Y", distr = "rbern", prob = plogis(-0.1 + 1.2 * A + 0.3 * W1 + 0.3 * W2))
D1Bset <- set.DAG(D1B)
#---------------------------------------------------------------------------------------
# EXAMPLE 1C: Same as 1A and 1B using add.nodes interface and no order argument
#---------------------------------------------------------------------------------------
D1C <- DAG.empty()
D1C <- add.nodes(D1C, node(name = "W1", distr = "rbern", prob = plogis(-0.5)))
D1C <- add.nodes(D1C, node(name = "W2", distr = "rbern",
prob = plogis(-0.5 + 0.5 * W1)))
D1C <- add.nodes(D1C, node(name = "A", distr = "rbern",
prob = plogis(-0.5 - 0.3 * W1 - 0.3 * W2)))
D1C <- add.nodes(D1C, node(name = "Y", distr = "rbern",
prob = plogis(-0.1 + 1.2 * A + 0.3 * W1 + 0.3 * W2)))
D1C <- set.DAG(D1C)
#---------------------------------------------------------------------------------------
# EXAMPLE 1D: Add a uniformly distributed node and redefine outcome Y as categorical
#---------------------------------------------------------------------------------------
D_unif <- DAG.empty()
D_unif <- D_unif +
node("W1", distr = "rbern", prob = plogis(-0.5)) +
node("W2", distr = "rbern", prob = plogis(-0.5 + 0.5 * W1)) +
node("W3", distr = "runif", min = plogis(-0.5 + 0.7 * W1 + 0.3 * W2), max = 10) +
node("An", distr = "rbern", prob = plogis(-0.5 - 0.3 * W1 - 0.3 * W2 - 0.2 * sin(W3)))
# Categorical syntax 1 (probabilities as values):
D_cat_1 <- D_unif + node("Y", distr = "rcat.b1", probs = {0.3; 0.4})
D_cat_1 <- set.DAG(D_cat_1)
# Categorical syntax 2 (probabilities as formulas):
D_cat_2 <- D_unif +
node("Y", distr = "rcat.b1",
probs={plogis(-0.1 + 1.2 * An + 0.3 * W1 + 0.3 * W2 + 0.2 * cos(W3));
plogis(-0.5 + 0.7 * W1)})
D_cat_2 <- set.DAG(D_cat_2)
#---------------------------------------------------------------------------------------
# EXAMPLE 2A: Define Bernoulli nodes using R rbinom() function, defining prob argument
# for L2 as a function of node L1
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D +
node("L1", t = 0, distr = "rbinom", prob = 0.05, size = 1) +
node("L2", t = 0, distr = "rbinom", prob = ifelse(L1[0] == 1, 0.5, 0.1), size = 1)
Dset <- set.DAG(D)
#---------------------------------------------------------------------------------------
# EXAMPLE 2B: Equivalent to 2A, passing argument size to rbinom inside a named list
# params
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D +
node("L1", t = 0, distr = "rbinom", prob = 0.05, params = list(size = 1)) +
node("L2", t = 0, distr = "rbinom",
prob = ifelse(L1[0] == 1,0.5,0.1), params = list(size = 1))
Dset <- set.DAG(D)
#---------------------------------------------------------------------------------------
# EXAMPLE 2C: Equivalent to 2A and 2B, define Bernoulli nodes using a wrapper "rbern"
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D +
node("L1", t = 0, distr = "rbern", prob = 0.05) +
node("L2", t = 0, distr = "rbern", prob = ifelse(L1[0] == 1, 0.5, 0.1))
Dset <- set.DAG(D)
#---------------------------------------------------------------------------------------
# EXAMPLE 3: Define node with normal distribution using rnorm() R function
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D + node("L2", t = 0, distr = "rnorm", mean = 10, sd = 5)
Dset <- set.DAG(D)
#---------------------------------------------------------------------------------------
# EXAMPLE 4: Define 34 Bernoulli nodes, or 2 Bernoulli nodes over 17 time points,
#---------------------------------------------------------------------------------------
t_end <- 16
D <- DAG.empty()
D <- D +
node("L2", t = 0:t_end, distr = "rbinom",
prob = ifelse(t == t_end, 0.5, 0.1), size = 1) +
node("L1", t = 0:t_end, distr = "rbinom",
prob = ifelse(L2[0] == 1, 0.5, 0.1), size = 1)
Dset <- set.DAG(D)
sim(Dset, n=10)
#---------------------------------------------------------------------------------------
# EXAMPLE 5: Defining new distribution function 'rbern', defining and passing a custom
# vectorized node function 'customfun'
#---------------------------------------------------------------------------------------
rbern <- function(n, prob) { # defining a bernoulli wrapper based on R rbinom function
rbinom(n = n, prob = prob, size = 1)
}
customfun <- function(arg, lambda) {
res <- ifelse(arg == 1, lambda, 0.1)
res
}
D <- DAG.empty()
D <- D +
node("W1", distr = "rbern", prob = 0.05) +
node("W2", distr = "rbern", prob = customfun(W1, 0.5)) +
node("W3", distr = "rbern", prob = ifelse(W1 == 1, 0.5, 0.1))
D1d <- set.DAG(D, vecfun = c("customfun"))
sim(D1d, n = 10, rndseed = 1)
#---------------------------------------------------------------------------------------
# EXAMPLE 6: Defining latent variables I and U.Y (will be hidden from simulated data)
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D +
node("I",
distr = "rcat.b1",
probs = c(0.1, 0.2, 0.2, 0.2, 0.1, 0.1, 0.1)) +
node("W1",
distr = "rnorm",
mean = ifelse(I == 1, 0, ifelse(I == 2, 3, 10)) + 0.6 * I,
sd = 1) +
node("W2",
distr = "runif",
min = 0.025*I, max = 0.7*I) +
node("W3",
distr = "rbern",
prob = plogis(-0.5 + 0.7*W1 + 0.3*W2 - 0.2*I)) +
node("A",
distr = "rbern",
prob = plogis(+4.2 - 0.5*W1 + 0.2*W2/2 + 0.2*W3)) +
node("U.Y", distr = "rnorm", mean = 0, sd = 1) +
node("Y",
distr = "rconst",
const = -0.5 + 1.2*A + 0.1*W1 + 0.3*W2 + 0.2*W3 + 0.2*I + U.Y)
Dset1 <- set.DAG(D, latent.v = c("I", "U.Y"))
sim(Dset1, n = 10, rndseed = 1)
#---------------------------------------------------------------------------------------
# EXAMPLE 7: Multivariate random variables
#---------------------------------------------------------------------------------------
if (requireNamespace("mvtnorm", quietly = TRUE)) {
D <- DAG.empty()
# 2 dimensional normal (uncorrelated), using rmvnorm function from rmvnorm package:
D <- D +
node(c("X1","X2"), distr = "mvtnorm::rmvnorm",
asis.params = list(mean = "c(0,1)")) +
# Can define a degenerate (rconst) multivariate node:
node(c("X1.copy", "X2.copy"), distr = "rconst", const = c(X1, X2))
Dset1 <- set.DAG(D, verbose = TRUE)
sim(Dset1, n = 10)
}
# On the other hand this syntax wont work,
# since simcausal will parse c(0,1) into a two column matrix:
## Not run:
D <- DAG.empty()
D <- D + node(c("X1","X2"), distr = "mvtnorm::rmvnorm", mean = c(0,1))
Dset1 <- set.DAG(D, verbose = TRUE)
## End(Not run)
if (requireNamespace("mvtnorm", quietly = TRUE)) {
D <- DAG.empty()
# Bivariate normal (correlation coef 0.75):
D <- D +
node(c("X1","X2"), distr = "mvtnorm::rmvnorm",
asis.params = list(mean = "c(0,1)",
sigma = "matrix(c(1,0.75,0.75,1), ncol=2)")) +
# Can use any component of such multivariate nodes when defining new nodes:
node("A", distr = "rconst", const = 1 - X1)
Dset2 <- set.DAG(D, verbose = TRUE)
sim(Dset2, n = 10)
}
# Time-varying multivar node (3 time-points, 2 dimensional normal)
# plus changing the mean over time, as as function of t:
if (requireNamespace("mvtnorm", quietly = TRUE)) {
D <- DAG.empty()
D <- D +
node(c("X1", "X2"), t = 0:2, distr = "mvtnorm::rmvnorm",
asis.params = list(
mean = "c(0,1) + t",
sigma = "matrix(rep(0.75,4), ncol=2)"))
Dset5b <- set.DAG(D)
sim(Dset5b, n = 10)
}
# Two ways to define the same bivariate uniform copula:
if (requireNamespace("copula", quietly = TRUE)) {
D <- DAG.empty()
D <- D +
# with a warning since normalCopula() returns an object unknown to simcausal:
node(c("X1","X2"), distr = "copula::rCopula",
copula = eval(copula::normalCopula(0.75, dim = 2))) +
# same, as above:
node(c("X3","X4"), distr = "copula::rCopula",
asis.params = list(copula = "copula::normalCopula(0.75, dim = 2)"))
vecfun.add("qbinom")
# Bivariate binomial derived from previous copula, with same correlation:
D <- D +
node(c("A.Bin1", "A.Bin2"), distr = "rconst",
const = c(qbinom(X1, 10, 0.5), qbinom(X2, 15, 0.7)))
Dset3 <- set.DAG(D)
sim(Dset3, n = 10)
}
# Same as "A.Bin1" and "A.Bin2", but directly using rmvbin function in bindata package:
if (requireNamespace("bindata", quietly = TRUE)) {
D <- DAG.empty()
D <- D +
node(c("B.Bin1","B.Bin2"), distr = "bindata::rmvbin",
asis.params = list(
margprob = "c(0.5, 0.5)",
bincorr = "matrix(c(1,0.75,0.75,1), ncol=2)"))
Dset4b <- set.DAG(D)
sim(Dset4b, n = 10)
}
#---------------------------------------------------------------------------------------
# EXAMPLE 8: Combining simcausal non-standard evaluation with eval() forced evaluation
#---------------------------------------------------------------------------------------
coefAi <- 1:10
D <- DAG.empty()
D <- D +
node("A", t = 1, distr = "rbern", prob = 0.7) +
node("A", t = 2:10, distr = "rconst", const = eval(coefAi[t]) * A[t-1])
Dset8 <- set.DAG(D)
sim(Dset8, n = 10)
#---------------------------------------------------------------------------------------
# TWO equivalent ways of creating a multivariate node (combining nodes W1 and W2):
#---------------------------------------------------------------------------------------
D <- DAG.empty()
D <- D + node("W1", distr = "rbern", prob = 0.45)
D <- D + node("W2", distr = "rconst", const = 1)
# option 1:
D <- D + node(c("W1.copy1", "W2.copy1"), distr = "rconst", const = c(W1, W2))
# equivalent option 2:
create_mat <- function(W1, W2) cbind(W1, W2)
vecfun.add("create_mat")
D <- D + node(c("W1.copy2", "W2.copy2"), distr = "rconst", const = create_mat(W1, W2))
Dset <- set.DAG(D)
sim(Dset, n=10, rndseed=1)
Define Non-Parametric Causal Parameters
Description
Set up the causal target parameter as a vector of expectations, ratio of expectations or contrast of expectations (average treatment effect) over the nodes of specified actions.
These settings are then used to evaluate the true value of the causal target parameter by calling eval.target
function.
Usage
set.targetE(DAG, outcome, t, param, ..., attr = list())
Arguments
DAG |
Object specifying the directed acyclic graph (DAG) for the observed data |
outcome |
Name of the outcome node |
t |
Integer vector of time points to use for expectations, has to be omitted or NULL for non-time-varying DAGs. |
param |
A character vector |
... |
Additional attributes (to be used in future versions) |
attr |
Additional attributes (to be used in future versions) |
Value
A modified DAG object with the target parameter saved as part of the DAG,
this DAG can now be passed as an argument to eval.target
function for actual Monte-Carlo evaluation of the target parameter. See Examples.
References
Sofrygin O, van der Laan MJ, Neugebauer R (2017). "simcausal R Package: Conducting Transparent and Reproducible Simulation Studies of Causal Effect Estimation with Complex Longitudinal Data." Journal of Statistical Software, 81(2), 1-47. doi: 10.18637/jss.v081.i02.
Examples
#---------------------------------------------------------------------------------------
# EXAMPLE 1: DAG with single point treatment
#---------------------------------------------------------------------------------------
# Define a DAG with single-point treatment ("Anode")
D <- DAG.empty()
D <- D + node("W1", distr="rbern", prob=plogis(-0.5))
D <- D + node("W2", distr="rbern", prob=plogis(-0.5 + 0.5*W1))
D <- D + node("Anode", distr="rbern", prob=plogis(-0.5 - 0.3*W1 - 0.3*W2))
D <- D + node("Y", distr="rbern", prob=plogis(-0.1 + 1.2*Anode + 0.3*W1 + 0.3*W2),
EFU=TRUE)
D_WAY <- set.DAG(D)
# Defining interventions (actions)
# define action "A1" that sets the treatment node to constant 1
D_WAY <- D_WAY + action("A1", nodes=node("Anode",distr="rbern", prob=1))
# define another action "A0" that sets the treatment node to constant 0
D_WAY <- D_WAY + action("A0", nodes=node("Anode",distr="rbern", prob=0))
#---------------------------------------------------------------------------------------
# Defining and calculating causal parameters:
#---------------------------------------------------------------------------------------
# Counterfactual mean of node "Y" under action "A1"
D_WAY <- set.targetE(D_WAY, outcome="Y", param="A1")
eval.target(D_WAY, n=10000)
# Contrasts of means of "Y" under action "A1" minus action "A0"
D_WAY <- set.targetE(D_WAY, outcome="Y", param="A1-A0")
eval.target(D_WAY, n=10000)
# Ratios of "Y" under action "A1" over action "A0"
D_WAY <- set.targetE(D_WAY, outcome="Y", param="A1/A0")
eval.target(D_WAY, n=10000)
# Alternative parameter evaluation by passing already simulated full data to
# \code{eval.target}
X_dat1 <- simfull(A(D_WAY), n=10000)
D_WAY <- set.targetE(D_WAY, outcome="Y", param="A1/A0")
eval.target(D_WAY, data=X_dat1)
#---------------------------------------------------------------------------------------
# EXAMPLE 2: DAG with time-varying outcomes (survival outcome)
#---------------------------------------------------------------------------------------
# Define longitudinal data structure over 6 time-points t=(0:5)
t_end <- 5
D <- DAG.empty()
D <- D + node("L2", t=0, distr="rbern", prob=0.05)
D <- D + node("L1", t=0, distr="rbern", prob=ifelse(L2[0]==1,0.5,0.1))
D <- D + node("A1", t=0, distr="rbern", prob=ifelse(L1[0]==1 & L2[0]==0, 0.5,
ifelse(L1[0]==0 & L2[0]==0, 0.1,
ifelse(L1[0]==1 & L2[0]==1, 0.9, 0.5))))
D <- D + node("A2", t=0, distr="rbern", prob=0, order=4, EFU=TRUE)
D <- D + node("Y", t=0, distr="rbern",
prob=plogis(-6.5 + L1[0] + 4*L2[0] + 0.05*I(L2[0]==0)),
EFU=TRUE)
D <- D + node("L2", t=1:t_end, distr="rbern", prob=ifelse(A1[t-1]==1, 0.1,
ifelse(L2[t-1]==1, 0.9,
min(1,0.1 + t/16))))
D <- D + node("A1", t=1:t_end, distr="rbern", prob=ifelse(A1[t-1]==1, 1,
ifelse(L1[0]==1 & L2[0]==0, 0.3,
ifelse(L1[0]==0 & L2[0]==0, 0.1,
ifelse(L1[0]==1 & L2[0]==1, 0.7,
0.5)))))
D <- D + node("A2", t=1:t_end, distr="rbern", prob=0, EFU=TRUE)
D <- D + node("Y", t=1:t_end, distr="rbern",
prob=plogis(-6.5 + L1[0] + 4*L2[t] + 0.05*sum(I(L2[0:t]==rep(0,(t+1))))),
EFU=TRUE)
D <- set.DAG(D)
# Add two dynamic actions (indexed by values of the parameter theta={0,1})
# Define intervention nodes
act_t0_theta <- node("A1",t=0, distr="rbern", prob=ifelse(L2[0] >= theta,1,0))
act_tp_theta <- node("A1",t=1:t_end, distr="rbern",
prob=ifelse(A1[t-1]==1,1,ifelse(L2[t] >= theta,1,0)))
# Add two actions to current DAG object
D <- D + action("A1_th0", nodes=c(act_t0_theta, act_tp_theta), theta=0)
D <- D + action("A1_th1", nodes=c(act_t0_theta, act_tp_theta), theta=1)
#---------------------------------------------------------------------------------------
# Defining and calculating the target parameter
#---------------------------------------------------------------------------------------
# Counterfactual mean of node "Y" at time-point t=4 under action "A1_th0"
D <- set.targetE(D, outcome="Y", t=4, param="A1_th0")
eval.target(D, n=5000)
# Vector of counterfactual means of"Y" over all time points under action "A1_th1"
D <- set.targetE(D, outcome="Y", t=0:5, param="A1_th1")
eval.target(D, n=5000)
# Vector of counterfactual contrasts of "Y" over all time points
# for action "A1_th1" minus action "A1_th0"
D <- set.targetE(D, outcome="Y", t=0:5, param="A1_th1 - A1_th0")
eval.target(D, n=5000)
# Vector of counterfactual ratios of "Y" over all time points
# for action "A1_th0" over action "A1_th1"
D <- set.targetE(D, outcome="Y", t=0:5, param="A1_th0 / A1_th1")
eval.target(D, n=5000)
Define Causal Parameters with a Working Marginal Structural Model (MSM)
Description
Set up the MSM causal target parameter for the current DAG object. These settings can be later used to evaluate the true value of the MSM parameter on the full (counterfactual) data by calling eval.target
function.
Usage
set.targetMSM(
DAG,
outcome,
t,
formula,
family = "quasibinomial",
hazard,
...,
attr = list()
)
Arguments
DAG |
Object specifying the directed acyclic graph (DAG) for the observed data |
outcome |
Name of the outcome node |
t |
Vector of time points which are used for pooling the |
formula |
MSM formula for modeling pooled outcome on the full data with glm regression. Left hand side should be equal to the |
family |
Model family to use in the |
hazard |
When TRUE MSM fits the discrete hazard function for survival |
... |
Additional attributes (to be used in future versions) |
attr |
Additional attributes (to be used in future versions) |
Details
Enclosing an MSM formula term inside S(), e.g., S(mean(A[0:t])), forces this term to be evaluated as a summary measure of time-indexed nodes in the full data environment. All such MSM terms are parsed and then evaluated inside the previously simulated full data environment, each S() term is then replaced with a vector name 'XMSMterms.i' that is a result of this evaluation.
Value
A modified DAG object with well-defined target parameter saved as part of the DAG, this DAG can now be passed as an argument to eval.target
function for actual Monte-Carlo evaluation of the target parameter. See Examples.
References
Sofrygin O, van der Laan MJ, Neugebauer R (2017). "simcausal R Package: Conducting Transparent and Reproducible Simulation Studies of Causal Effect Estimation with Complex Longitudinal Data." Journal of Statistical Software, 81(2), 1-47. doi: 10.18637/jss.v081.i02.
Examples
#---------------------------------------------------------------------------------------
# DAG with time-varying outcomes (survival outcome)
#---------------------------------------------------------------------------------------
# Define longitudinal data structure over 6 time-points t=(0:5)
t_end <- 5
D <- DAG.empty()
D <- D + node("L2", t=0, distr="rbern", prob=0.05)
D <- D + node("L1", t=0, distr="rbern", prob=ifelse(L2[0]==1,0.5,0.1))
D <- D + node("A1", t=0, distr="rbern", prob=ifelse(L1[0]==1 & L2[0]==0, 0.5,
ifelse(L1[0]==0 & L2[0]==0, 0.1,
ifelse(L1[0]==1 & L2[0]==1, 0.9, 0.5))))
D <- D + node("A2", t=0, distr="rbern", prob=0, order=4, EFU=TRUE)
D <- D + node("Y", t=0, distr="rbern",
prob=plogis(-6.5 + L1[0] + 4*L2[0] + 0.05*I(L2[0]==0)),
EFU=TRUE)
D <- D + node("L2", t=1:t_end, distr="rbern", prob=ifelse(A1[t-1]==1, 0.1,
ifelse(L2[t-1]==1, 0.9,
min(1,0.1 + t/16))))
D <- D + node("A1", t=1:t_end, distr="rbern", prob=ifelse(A1[t-1]==1, 1,
ifelse(L1[0]==1 & L2[0]==0, 0.3,
ifelse(L1[0]==0 & L2[0]==0, 0.1,
ifelse(L1[0]==1 & L2[0]==1, 0.7,
0.5)))))
D <- D + node("A2", t=1:t_end, distr="rbern", prob=0, EFU=TRUE)
D <- D + node( "Y", t=1:t_end, distr="rbern",
prob=plogis(-6.5 + L1[0] + 4*L2[t] + 0.05*sum(I(L2[0:t]==rep(0,(t+1))))),
EFU=TRUE)
D <- set.DAG(D)
# Add two dynamic actions (indexed by values of the parameter theta={0,1})
# Define intervention nodes
act_t0_theta <- node("A1",t=0, distr="rbern", prob=ifelse(L2[0] >= theta,1,0))
act_tp_theta <- node("A1",t=1:t_end, distr="rbern",
prob=ifelse(A1[t-1]==1,1,ifelse(L2[t] >= theta,1,0)))
# Add two actions to current DAG object
D <- D + action("A1_th0", nodes=c(act_t0_theta, act_tp_theta), theta=0)
D <- D + action("A1_th1", nodes=c(act_t0_theta, act_tp_theta), theta=1)
#---------------------------------------------------------------------------------------
# MSM EXAMPLE 1: Modeling survival over time
#---------------------------------------------------------------------------------------
# Modeling pooled survival Y_t over time as a projection on the following working
# logistic model:
msm.form <- "Y ~ theta + t + I(theta*t)"
D <- set.targetMSM(D, outcome="Y", t=0:5, formula=msm.form, family="binomial",
hazard=FALSE)
MSMres <- eval.target(D, n=1000)
MSMres$coef
#---------------------------------------------------------------------------------------
# MSM EXAMPLE 2: Modeling survival over time with exposure-based summary measures
#---------------------------------------------------------------------------------------
# Now we want to model Y_t by adding a summary measure covariate defined as mean
# exposure A1 from time 0 to t;
# Enclosing any term inside S() forces its evaluation in the environment
# of the full (counterfactual) data.
msm.form_sum <- "Y ~ theta + t + I(theta*t) + S(mean(A1[0:t]))"
D <- set.targetMSM(D, outcome="Y", t=0:5, formula=msm.form_sum, family="binomial",
hazard=FALSE)
MSMres <- eval.target(D, n=1000)
MSMres$coef
Simulate Observed or Full Data from DAG
Object
Description
This function simulates full data based on a list of intervention DAGs, returning a list of data.frame
s. See the vignette for examples and detailed description.
Usage
sim(
DAG,
actions,
n,
wide = TRUE,
LTCF = NULL,
rndseed = NULL,
rndseed.reset.node = NULL,
verbose = getOption("simcausal.verbose")
)
Arguments
DAG |
A DAG objects that has been locked with set.DAG(DAG). Observed data from this DAG will be simulated if actions argument is omitted. |
actions |
Character vector of action names which will be extracted from the DAG object. Alternatively, this can be a list of action DAGs selected with |
n |
Number of observations to sample. |
wide |
A logical, if TRUE the output data is generated in wide format, if FALSE, the output longitudinal data in generated in long format |
LTCF |
If forward imputation is desired for the missing variable values, this argument should be set to the name of the node that indicates the end of follow-up event. |
rndseed |
Seed for the random number generator. |
rndseed.reset.node |
When |
verbose |
Set to |
Value
If actions argument is missing a simulated data.frame is returned, otherwise the function returns a named list of action-specific simulated data.frames with action names giving names to corresponding list items.
Forward Imputation
By default, when LTCF is left unspecified, all variables that follow after any end of follow-up (EFU) event are set to missing (NA).
The end of follow-up event occurs when a binary node of type EFU=TRUE
is equal to 1, indicating a failing or right-censoring event.
To forward impute the values of the time-varying nodes after the occurrence of the EFU
event, set the LTCF argument to a name of the EFU node representing this event.
For additional details and examples see the vignette and doLTCF
function.
References
Sofrygin O, van der Laan MJ, Neugebauer R (2017). "simcausal R Package: Conducting Transparent and Reproducible Simulation Studies of Causal Effect Estimation with Complex Longitudinal Data." Journal of Statistical Software, 81(2), 1-47. doi: 10.18637/jss.v081.i02.
See Also
simobs
- a wrapper function for simulating observed data only; simfull
- a wrapper function for simulating full data only; doLTCF
- forward imputation of the missing values in already simulating data; DF.to.long
, DF.to.longDT
- converting longitudinal data from wide to long formats.
Other simulation functions:
simfull()
,
simobs()
Examples
t_end <- 10
lDAG <- DAG.empty()
lDAG <- lDAG +
node(name = "L2", t = 0, distr = "rconst", const = 0) +
node(name = "A1", t = 0, distr = "rconst", const = 0) +
node(name = "L2", t = 1:t_end, distr = "rbern",
prob = ifelse(A1[t - 1] == 1, 0.1,
ifelse(L2[t-1] == 1, 0.9,
min(1,0.1 + t/t_end)))) +
node(name = "A1", t = 1:t_end, distr = "rbern",
prob = ifelse(A1[t - 1] == 1, 1,
ifelse(L2[0] == 0, 0.3,
ifelse(L2[0] == 0, 0.1,
ifelse(L2[0] == 1, 0.7, 0.5))))) +
node(name = "Y", t = 1:t_end, distr = "rbern",
prob = plogis(-6.5 + 4 * L2[t] + 0.05 * sum(I(L2[0:t] == rep(0,(t + 1))))),
EFU = TRUE)
lDAG <- set.DAG(lDAG)
#---------------------------------------------------------------------------------------
# EXAMPLE 1. No forward imputation.
#---------------------------------------------------------------------------------------
Odat.wide <- sim(DAG = lDAG, n = 1000, rndseed = 123)
Odat.wide[c(21,47), 1:18]
Odat.wideLTCF <- sim(DAG = lDAG, n = 1000, LTCF = "Y", rndseed = 123)
Odat.wideLTCF[c(21,47), 1:18]
#---------------------------------------------------------------------------------------
# EXAMPLE 2. With forward imputation.
#---------------------------------------------------------------------------------------
Odat.wideLTCF2 <- doLTCF(data = Odat.wide, LTCF = "Y")
Odat.wideLTCF2[c(21,47), 1:18]
# all.equal(Odat.wideLTCF, Odat.wideLTCF2)
Simulate Full Data (From Action DAG(s))
Description
This function simulates full data based on a list of intervention DAGs, returning a list of data.frame
s.
Usage
simfull(
actions,
n,
wide = TRUE,
LTCF = NULL,
rndseed = NULL,
rndseed.reset.node = NULL,
verbose = getOption("simcausal.verbose")
)
Arguments
actions |
Actions specifying the counterfactual DAG. This argument must be either an object of class DAG.action or a list of DAG.action objects. |
n |
Number of observations to sample. |
wide |
A logical, if TRUE the output data is generated in wide format, if FALSE, the output longitudinal data in generated in long format |
LTCF |
If forward imputation is desired for the missing variable values, this argument should be set to the name of the node that indicates the end of follow-up event. See the vignette, |
rndseed |
Seed for the random number generator. |
rndseed.reset.node |
When |
verbose |
Set to |
Value
A named list, each item is a data.frame
corresponding to an action specified by the actions argument, action names are used for naming these list items.
See Also
simobs
- a wrapper function for simulating observed data only; sim
- a wrapper function for simulating both types of data; doLTCF
for forward imputation of the missing values in already simulating data; DF.to.long
, DF.to.longDT
- converting longitudinal data from wide to long formats.
Other simulation functions:
sim()
,
simobs()
Simulate Observed Data
Description
This function simulates observed data from a DAG object.
Usage
simobs(
DAG,
n,
wide = TRUE,
LTCF = NULL,
rndseed = NULL,
rndseed.reset.node = NULL,
verbose = getOption("simcausal.verbose")
)
Arguments
DAG |
A DAG objects that has been locked with set.DAG(DAG). Observed data from this DAG will be simulated. |
n |
Number of observations to sample. |
wide |
A logical, if TRUE the output data is generated in wide format, if FALSE, the output longitudinal data in generated in long format |
LTCF |
If forward imputation is desired for the missing variable values, this argument should be set to the name of the node that indicates the end of follow-up event. See the vignette, |
rndseed |
Seed for the random number generator. |
rndseed.reset.node |
When |
verbose |
Set to |
Value
A data.frame
where each column is sampled from the conditional distribution specified by the corresponding DAG
object node.
See Also
simfull
- a wrapper function for simulating full data only; sim
- a wrapper function for simulating both types of data; doLTCF
for forward imputation of the missing values in already simulating data; DF.to.long
, DF.to.longDT
- converting longitudinal data from wide to long formats.
Other simulation functions:
sim()
,
simfull()
Convert Network from Sparse Adjacency Matrix into Network IDs Matrix
Description
Convert network represented by a sparse adjacency matrix into simcausal
network IDs matrix (NetInd_k
).
Usage
sparseAdjMat.to.NetInd(sparseAdjMat, trimKmax)
Arguments
sparseAdjMat |
Network represented as a sparse adjacency matrix (S4 class object |
trimKmax |
Trim the maximum number of friends to this integer value. If this argument is not missing,
the conversion network matrix obtained from |
Value
A named list with 3 items: 1) NetInd_k
; 2) nF
; and 3) Kmax
.
1) NetInd_k
- matrix of network IDs of dimension (n=nrow(sparseAdjMat),Kmax)
, where each row i
consists of the network IDs (friends) for observation i
.
Remainders are filled with NAs.
2) nF
- integer vector of length n
specifying the number of friends for each observation.
3) Kmax
- integer constant specifying the maximum observed number of friends in input sparseAdjMat
(this is the column dimension for the output matrix NetInd_k
).
See Also
network
; NetInd.to.sparseAdjMat
; sparseAdjMat.to.igraph
; igraph.to.sparseAdjMat
;
Convert Network from Sparse Adjacency Matrix into igraph Object
Description
Uses graph_from_adjacency_matrix
function from the igraph
package to convert the network in sparse adjacency matrix format into igraph
network object.
Usage
sparseAdjMat.to.igraph(sparseAdjMat, mode = "directed")
Arguments
sparseAdjMat |
Network represented as a sparse adjacency matrix (S4 class object |
mode |
Character scalar, passed on to |
Value
A list containing the network object(s) of type DAG.net
.
See Also
network
; igraph.to.sparseAdjMat
; sparseAdjMat.to.NetInd
; NetInd.to.sparseAdjMat
;
Add Custom Vectorized Functions
Description
Add user-defined function names to a global list of custom vectorized functions.
The functions in vecfun_names
are intended for use inside the node formulas.
Adding functions to this list will generally greatly expedite the simulation run time.
Any node formula calling a function on this list will be evaluated "as is", the function should
be written to accept arguments as either vectors of length n
or as matrices with n
rows.
Adding function to this list will effects simulation from all DAG objects that call this function. See vignette for more details.
Usage
vecfun.add(vecfun_names)
Arguments
vecfun_names |
A character vector of function names that will be treated as "vectorized" by the node formula R parser |
Value
An old vector of user-defined vectorized function names
Print Names of All Vectorized Functions
Description
Print all vectorized function names (build-in and user-defined).
Usage
vecfun.all.print()
Value
A vector of build-in and user-defined vectorized function names
Print Names of Custom Vectorized Functions
Description
Print current user-defined vectorized function names.
Usage
vecfun.print()
Value
A vector of vectorized function names
Remove Custom Vectorized Functions
Description
Remove user-defined function names from a global list of custom vectorized functions. See vignette for more details.
Usage
vecfun.remove(vecfun_names)
Arguments
vecfun_names |
A character vector of function names that will be removed from the custom list |
Value
An old vector of user-defined vectorized function names
Reset Custom Vectorized Function List
Description
Reset a listing of user-defined vectorized functions.
Usage
vecfun.reset()
Value
An old vector of user-defined vectorized function names