Type: | Package |
Title: | Additional Predictor with Maximum Effect Size |
Version: | 0.1.1 |
Date: | 2025-04-01 |
Description: | Methods of selecting one from many numeric predictors for a regression model, to ensure that the additional predictor has the maximum effect size. |
RoxygenNote: | 7.3.2 |
Encoding: | UTF-8 |
License: | GPL-2 |
Depends: | R (≥ 4.4), |
Language: | en-US |
Imports: | caret, parallel, rpart, spatstat.geom |
Suggests: | knitr, groupedHyperframe, survival, rmarkdown |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-04-01 17:18:23 UTC; tingtingzhan |
Author: | Tingting Zhan |
Maintainer: | Tingting Zhan <tingtingzhan@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-04-02 17:40:01 UTC |
maxEff: Additional Predictor with Maximum Effect Size
Description
Methods of selecting one from many numeric predictors for a regression model, to ensure that the additional predictor has the maximum effect size.
Note
Help files of individual functions are intentionally suppressed in this pdf
manual.
Users are encouraged to get started with
vignette('intro', package = 'maxEff')
Author(s)
Maintainer: Tingting Zhan tingtingzhan@gmail.com (ORCID)
Authors:
Inna Chervoneva Inna.Chervoneva@jefferson.edu (ORCID)
S3 Method Dispatches to 'add_'
Class
Description
S3 Method Dispatches to 'add_'
Class
Usage
## S3 method for class 'add_'
print(x, ...)
## S3 method for class 'add_'
sort_by(x, y, ...)
Arguments
x |
an object returned from functions
|
... |
additional parameters of S3 generic sort_by, etc. |
y |
Details
..
Function sort_by.add_()
sorts the elements of an 'add_'
object by a certain criterion y
.
We suggest using y = abc(effsize)
and decreasing = TRUE
order of the absolute values of the effect sizes of additional predictor.
Value
Function print.add_ does not have a returned value
Function sort_by.add_()
returns an object of the same class as input x
.
S3 Method Dispatches to 'add_dummy'
Class
Description
S3 Method Dispatches to 'add_dummy'
Class
Usage
## S3 method for class 'add_dummy'
subset(x, subset, ...)
## S3 method for class 'add_dummy'
predict(object, ...)
Arguments
x , object |
an object returned from functions |
subset |
|
... |
additional parameters of function |
Details
Function subset.add_dummy()
, default subset (p1>.15 & p1<.85)
.
See explanation of p_1
in function splitd()
.
Value
Function subset.add_dummy()
returns a add_dummy()
object.
Function predict.add_dummy()
returns a listof regression models.
Additional Predictor as logical
Description
Additional predictor as logical.
Usage
add_dummy_partition(
start.model,
x,
data = eval(start.model$call$data),
times,
mc.cores = switch(.Platform$OS.type, windows = 1L, detectCores()),
...
)
add_dummy(
start.model,
x,
data = eval(start.model$call$data),
mc.cores = switch(.Platform$OS.type, windows = 1L, detectCores()),
...
)
Arguments
start.model |
|
x |
one-sided formula,
numeric predictors |
data |
(optional) data.frame in the model call of |
times , ... |
additional parameters of function |
mc.cores |
Details
Function add_dummy_partition()
partitions each additional numeric predictor
into a logical variable in the following steps.
-
Generate multiple, i.e., repeated, partitions via functions createDataPartition or
statusPartition()
. -
For each partition, create a dichotomizing rule (via function
node1()
) on the training set. Apply this dichotomizing rule on the test set and obtain the estimated regression coefficient (i.e., effect size) of the additional logical predictor. -
Among all partitions, select the one with median effect size of the additional logical predictor.
Function add_dummy()
partitions each additional
numeric predictor into a logical variable
using function node1()
,
then updates the starting model by adding in each of the dichotomized
logical predictor.
Value
Function add_dummy_partition()
returns an object of class 'add_dummy'
, which is a listof node1 objects.
Function add_dummy()
returns an object of class 'add_dummy'
,
which is a listof node1 objects.
Additional Predictor as numeric
Description
Additional predictor as numeric.
Usage
add_num(
start.model,
x,
data = eval(start.model$call$data),
mc.cores = switch(.Platform$OS.type, windows = 1L, detectCores()),
...
)
Arguments
start.model |
|
x |
one-sided formula to specify
the numeric predictors |
data |
|
mc.cores |
|
... |
additional parameters, currently of no use |
Details
Function add_num()
treats each additional predictor as a numeric variable,
and updates the starting model with each additional predictor.
Value
Function add_num()
returns an add_num object,
which is a listof objects with an internal class 'add_num_'
.
Get Cutoff Value from a Dichotomizing Rule node1()
Description
To get the cutoff value from a Dichotomizing Rule node1()
.
Usage
get_cutoff(x)
## S3 method for class 'node1'
get_cutoff(x)
Arguments
x |
see Usage |
Value
Function get_cutoff.node1()
returns a numeric scalar.
Find labels from node1
Description
Usage
## S3 method for class 'node1'
labels(object, ...)
Arguments
object |
a node1 object |
... |
additional parameters, currently not in use |
Value
Function labels.node1()
returns a character scalar.
Dichotomize via 1st Node of Recursive Partitioning
Description
Dichotomize one or more predictors of a Surv, a logical, or a double response, using recursive partitioning and regression tree rpart.
Usage
node1(x, check_degeneracy = TRUE, ...)
Arguments
x |
a rpart object |
check_degeneracy |
logical scalar, whether to allow the
dichotomized value to be all- |
... |
additional parameters of rpart and/or rpart.control |
Details
Function node1()
dichotomizes one predictor in the following steps,
-
Recursive partitioning and regression tree rpart analysis is performed for the response
y
and the predictorx
. -
The labels.rpart of the first node of the rpart tree is considered as the dichotomizing rule of the double predictor
x
. The term dichotomizing rule indicates the combination of an inequality sign (>, >=, < and <=) and a double cutoff thresholda
-
The dichotomizing rule from Step 2 is further processed, such that
-
<a
is regarded as\geq a
-
\leq a
is regarded as>a
-
> a
and\geq a
are regarded as is.
This step is necessary for a narrative of greater than or greater than or equal to the threshold
a
. -
-
A warning message is produced, if the dichotomizing rule, applied to a new double predictor
newx
, creates an all-TRUE
or all-FALSE
result. We do not make the algorithm stop, as most regression models in R are capable of handling an all-TRUE
or all-FALSE
predictor, by returning aNA_real_
regression coefficient estimate.
Value
Function node1()
returns an object of class 'node1'
,
which is a function
with one parameter newx
taking a double vector.
Note
In future integer and factor predictors will be supported.
Function rpart is quite slow.
Examples
library(rpart)
(r = rpart(Price ~ Mileage, data = cu.summary, control = rpart.control(maxdepth = 2L)))
(foo = r |> node1())
get_cutoff(foo)
labels(foo)
rnorm(6L, mean = 24.5) |> foo()
Regression Models with Optimal Dichotomizing Predictors
Description
Regression models with optimal dichotomizing predictor(s), used either as boolean or continuous predictor(s).
Usage
## S3 method for class 'add_num'
predict(object, ...)
Arguments
object |
an add_num object |
... |
additional parameters of function |
Value
Function predict.add_num()
returns a listof regression models.
Regression Models with Optimal Dichotomizing Predictors
Description
Regression models with optimal dichotomizing predictor(s), used either as boolean or continuous predictor(s).
Usage
## S3 method for class 'node1'
predict(object, newdata, ...)
Arguments
object |
an node1 object, as an element of the listof return from functions |
newdata |
data.frame, candidate numeric predictors |
... |
additional parameters, currently not in use |
Value
Function predict.node1()
returns a updated regression model.
Split-Dichotomized Regression Model
Description
Split-dichotomized regression model.
Usage
splitd(start.model, x_, data, id, ...)
Arguments
start.model |
a regression model |
x_ |
|
data |
|
id |
logical vector, indices of training ( |
... |
additional parameters, currently not in use |
Value
Function splitd()
returns a function,
the dichotomizing rule \mathcal{D}
based on the training set (y_0, x_0)
,
with additional attributes
attr(,'p1')
double scalar,
p_1 = \text{Pr}(\mathcal{D}(x_1)=1)
attr(,'effsize')
double scalar, univariable regression coefficient estimate of
y_1\sim\mathcal{D}(x_1)
Split-Dichotomized Regression Model
Function splitd()
performs a univariable regression model on the test set with a dichotomized predictor, using a dichotomizing rule determined by a recursive partitioning of the training set.
Specifically, given a training-test sample split,
find the dichotomizing rule
\mathcal{D}
of the predictorx_0
given the responsey_0
in the training set (via functionnode1()
);fit a univariable regression model of the response
y_1
with the dichotomized predictor\mathcal{D}(x_1)
in the test set.
Currently the Cox proportional hazards (coxph) regression for Surv response, logistic (glm) regression for logical response and linear (lm) regression for gaussian response are supported.
Stratified Partition
Description
A variation of createDataPartition,
to split Surv y
by survival status
instead of the percentiles survival time.
Usage
statusPartition(y, times, p = 0.8, ...)
Arguments
y |
response |
times |
positive integer scalar |
p |
double scalar between 0 and 1,
percentage |
... |
additional parameters, currently not in use |
Details
See vignette('intro', package = 'maxEff')
.
Value
Function statusPartition()
returns a length-n
listof
integer vectors.
In each integer vector indicates the training subjects.
Note
Function caTools::sample.split
is not what we need.