Type: Package
Title: Additional Predictor with Maximum Effect Size
Version: 0.1.1
Date: 2025-04-01
Description: Methods of selecting one from many numeric predictors for a regression model, to ensure that the additional predictor has the maximum effect size.
RoxygenNote: 7.3.2
Encoding: UTF-8
License: GPL-2
Depends: R (≥ 4.4),
Language: en-US
Imports: caret, parallel, rpart, spatstat.geom
Suggests: knitr, groupedHyperframe, survival, rmarkdown
VignetteBuilder: knitr
NeedsCompilation: no
Packaged: 2025-04-01 17:18:23 UTC; tingtingzhan
Author: Tingting Zhan ORCID iD [aut, cre], Inna Chervoneva ORCID iD [aut]
Maintainer: Tingting Zhan <tingtingzhan@gmail.com>
Repository: CRAN
Date/Publication: 2025-04-02 17:40:01 UTC

maxEff: Additional Predictor with Maximum Effect Size

Description

Methods of selecting one from many numeric predictors for a regression model, to ensure that the additional predictor has the maximum effect size.

Note

Help files of individual functions are intentionally suppressed in this pdf manual. Users are encouraged to get started with

vignette('intro', package = 'maxEff')

Author(s)

Maintainer: Tingting Zhan tingtingzhan@gmail.com (ORCID)

Authors:


S3 Method Dispatches to 'add_' Class

Description

S3 Method Dispatches to 'add_' Class

Usage

## S3 method for class 'add_'
print(x, ...)

## S3 method for class 'add_'
sort_by(x, y, ...)

Arguments

x

an object returned from functions add_dummy_partition(), add_dummy() or add_num()

...

additional parameters of S3 generic sort_by, etc.

y

language, see function sort_by

Details

..

Function sort_by.add_() sorts the elements of an 'add_' object by a certain criterion y. We suggest using y = abc(effsize) and decreasing = TRUE order of the absolute values of the effect sizes of additional predictor.

Value

Function print.add_ does not have a returned value

Function sort_by.add_() returns an object of the same class as input x.


S3 Method Dispatches to 'add_dummy' Class

Description

S3 Method Dispatches to 'add_dummy' Class

Usage

## S3 method for class 'add_dummy'
subset(x, subset, ...)

## S3 method for class 'add_dummy'
predict(object, ...)

Arguments

x, object

an object returned from functions add_dummy_partition() or add_dummy()

subset

language

...

additional parameters of function predict.node1(), e.g., newdata

Details

Function subset.add_dummy(), default subset (p1>.15 & p1<.85). See explanation of p_1 in function splitd().

Value

Function subset.add_dummy() returns a add_dummy() object.

Function predict.add_dummy() returns a listof regression models.


Additional Predictor as logical

Description

Additional predictor as logical.

Usage

add_dummy_partition(
  start.model,
  x,
  data = eval(start.model$call$data),
  times,
  mc.cores = switch(.Platform$OS.type, windows = 1L, detectCores()),
  ...
)

add_dummy(
  start.model,
  x,
  data = eval(start.model$call$data),
  mc.cores = switch(.Platform$OS.type, windows = 1L, detectCores()),
  ...
)

Arguments

start.model

a regression model, e.g., lm, glm, or coxph, etc.

x

one-sided formula, numeric predictors x's as the columns of one matrix column in data

data

(optional) data.frame in the model call of start.model

times, ...

additional parameters of function statusPartition() for function add_dummy_partition. For function add_dummy(), these parameters are not in use

mc.cores

integer scalar, see function mclapply

Details

Function add_dummy_partition() partitions each additional numeric predictor into a logical variable in the following steps.

  1. Generate multiple, i.e., repeated, partitions via functions createDataPartition or statusPartition().

  2. For each partition, create a dichotomizing rule (via function node1()) on the training set. Apply this dichotomizing rule on the test set and obtain the estimated regression coefficient (i.e., effect size) of the additional logical predictor.

  3. Among all partitions, select the one with median effect size of the additional logical predictor.

Function add_dummy() partitions each additional numeric predictor into a logical variable using function node1(), then updates the starting model by adding in each of the dichotomized logical predictor.

Value

Function add_dummy_partition() returns an object of class 'add_dummy', which is a listof node1 objects.

Function add_dummy() returns an object of class 'add_dummy', which is a listof node1 objects.


Additional Predictor as numeric

Description

Additional predictor as numeric.

Usage

add_num(
  start.model,
  x,
  data = eval(start.model$call$data),
  mc.cores = switch(.Platform$OS.type, windows = 1L, detectCores()),
  ...
)

Arguments

start.model

a regression model (e.g., lm, glm, or coxph, etc.)

x

one-sided formula to specify the numeric predictors x's as the columns of one matrix column in data

data

data.frame

mc.cores

integer scalar, see function mclapply

...

additional parameters, currently of no use

Details

Function add_num() treats each additional predictor as a numeric variable, and updates the starting model with each additional predictor.

Value

Function add_num() returns an add_num object, which is a listof objects with an internal class 'add_num_'.


Get Cutoff Value from a Dichotomizing Rule node1()

Description

To get the cutoff value from a Dichotomizing Rule node1().

Usage

get_cutoff(x)

## S3 method for class 'node1'
get_cutoff(x)

Arguments

x

see Usage

Value

Function get_cutoff.node1() returns a numeric scalar.


Find labels from node1

Description

Find labels from node1

Usage

## S3 method for class 'node1'
labels(object, ...)

Arguments

object

a node1 object

...

additional parameters, currently not in use

Value

Function labels.node1() returns a character scalar.


Dichotomize via 1st Node of Recursive Partitioning

Description

Dichotomize one or more predictors of a Surv, a logical, or a double response, using recursive partitioning and regression tree rpart.

Usage

node1(x, check_degeneracy = TRUE, ...)

Arguments

x

a rpart object

check_degeneracy

logical scalar, whether to allow the dichotomized value to be all-FALSE or all-TRUE (i.e., degenerate) for any one of the predictors. Default TRUE to produce a warning message for degeneracy.

...

additional parameters of rpart and/or rpart.control

Details

Function node1() dichotomizes one predictor in the following steps,

  1. Recursive partitioning and regression tree rpart analysis is performed for the response y and the predictor x.

  2. The labels.rpart of the first node of the rpart tree is considered as the dichotomizing rule of the double predictor x. The term dichotomizing rule indicates the combination of an inequality sign (>, >=, < and <=) and a double cutoff threshold a

  3. The dichotomizing rule from Step 2 is further processed, such that

    • <a is regarded as \geq a

    • \leq a is regarded as >a

    • > a and \geq a are regarded as is.

    This step is necessary for a narrative of greater than or greater than or equal to the threshold a.

  4. A warning message is produced, if the dichotomizing rule, applied to a new double predictor newx, creates an all-TRUE or all-FALSE result. We do not make the algorithm stop, as most regression models in R are capable of handling an all-TRUE or all-FALSE predictor, by returning a NA_real_ regression coefficient estimate.

Value

Function node1() returns an object of class 'node1', which is a function with one parameter newx taking a double vector.

Note

In future integer and factor predictors will be supported.

Function rpart is quite slow.

Examples

library(rpart)
(r = rpart(Price ~ Mileage, data = cu.summary, control = rpart.control(maxdepth = 2L)))
(foo = r |> node1())
get_cutoff(foo)
labels(foo)
rnorm(6L, mean = 24.5) |> foo()

Regression Models with Optimal Dichotomizing Predictors

Description

Regression models with optimal dichotomizing predictor(s), used either as boolean or continuous predictor(s).

Usage

## S3 method for class 'add_num'
predict(object, ...)

Arguments

object

an add_num object

...

additional parameters of function predict.add_num_, e.g., newdata

Value

Function predict.add_num() returns a listof regression models.


Regression Models with Optimal Dichotomizing Predictors

Description

Regression models with optimal dichotomizing predictor(s), used either as boolean or continuous predictor(s).

Usage

## S3 method for class 'node1'
predict(object, newdata, ...)

Arguments

object

an node1 object, as an element of the listof return from functions add_dummy() or add_dummy_partition()

newdata

data.frame, candidate numeric predictors x's must have the same name and dimension as the training data. If missing, the training data is used

...

additional parameters, currently not in use

Value

Function predict.node1() returns a updated regression model.


Split-Dichotomized Regression Model

Description

Split-dichotomized regression model.

Usage

splitd(start.model, x_, data, id, ...)

Arguments

start.model

a regression model

x_

language

data

data.frame

id

logical vector, indices of training (TRUE) and test (FALSE) subjects

...

additional parameters, currently not in use

Value

Function splitd() returns a function, the dichotomizing rule \mathcal{D} based on the training set (y_0, x_0), with additional attributes

attr(,'p1')

double scalar, p_1 = \text{Pr}(\mathcal{D}(x_1)=1)

attr(,'effsize')

double scalar, univariable regression coefficient estimate of y_1\sim\mathcal{D}(x_1)

Split-Dichotomized Regression Model

Function splitd() performs a univariable regression model on the test set with a dichotomized predictor, using a dichotomizing rule determined by a recursive partitioning of the training set. Specifically, given a training-test sample split,

  1. find the dichotomizing rule \mathcal{D} of the predictor x_0 given the response y_0 in the training set (via function node1());

  2. fit a univariable regression model of the response y_1 with the dichotomized predictor \mathcal{D}(x_1) in the test set.

Currently the Cox proportional hazards (coxph) regression for Surv response, logistic (glm) regression for logical response and linear (lm) regression for gaussian response are supported.


Stratified Partition

Description

A variation of createDataPartition, to split Surv y by survival status instead of the percentiles survival time.

Usage

statusPartition(y, times, p = 0.8, ...)

Arguments

y

response y, a Surv object

times

positive integer scalar n, number of replicates of partitions. Default 1L.

p

double scalar between 0 and 1, percentage p of training subjects, default .8

...

additional parameters, currently not in use

Details

See vignette('intro', package = 'maxEff').

Value

Function statusPartition() returns a length-n listof integer vectors. In each integer vector indicates the training subjects.

Note

Function caTools::sample.split is not what we need.