Help for package cenROC

Type:

Package

Title:

Estimating Time-Dependent ROC Curve and AUC for Censored Data

Version:

2.0.0

Description:

Contains functions to estimate a smoothed and a non-smoothed (empirical) time-dependent receiver operating characteristic curve and the corresponding area under the receiver operating characteristic curve and the optimal cutoff point for the right and interval censored survival data. See Beyene and El Ghouch (2020)<doi:10.1002/sim.8671> and Beyene and El Ghouch (2022) <doi:10.1002/bimj.202000382>.

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Depends:

R (≥ 3.5.0)

Imports:

Rcpp (≥ 1.0.0), icenReg, condSURV, survival, stats, graphics, methods

LinkingTo:

Rcpp, RcppEigen

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.2.3

NeedsCompilation:

yes

Packaged:

2023-03-27 03:57:11 UTC; m2kas

Author:

Kassu Mehari Beyene [aut, cre], Anouar El Ghouch [aut, ths]

Maintainer:

Kassu Mehari Beyene <kassu.mehari@wu.edu.et>

Repository:

CRAN

Date/Publication:

2023-03-27 08:10:05 UTC

The cross-validation bandwidth selection for weighted data

Description

This function computes the data-driven bandwidth for smoothing the ROC (or distribution) function using the CV method of Beyene and El Ghouch (2020). This is an extension of the classical (unweighted) cross-validation bandwith selection method to the case of weighted data.

Usage

CV(X, wt, ktype = "normal")

Arguments

X

The numeric data vector.

wt

The non-negative weight vector.

ktype

A character string giving the type kernel to be used: "normal", "epanechnikov", "biweight", or "triweight". By default, the "normal" kernel is used.

Details

Bowman et al (1998) proposed the cross-validation bandwidth selection method for unweighted kernal smoothed distribution function. This method is implemented in the R package kerdiest. We adapted this for the case of weighted data by incorporating the weight variable into the cross-validation function of Bowman's method. See Beyene and El Ghouch (2020) for details.

Value

Returns the computed value for the bandwith parameter.

Author(s)

Kassu Mehari Beyene and Anouar El Ghouch

References

Beyene, K. M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373– 3396.

Bowman A., Hall P. and Trvan T.(1998). Bandwidth selection for the smoothing of distribution functions. Biometrika 85:799-808.

Quintela-del-Rio, A. and Estevez-Perez, G. (2015). kerdiest: Nonparametric kernel estimation of the distribution function, bandwidth selection and estimation of related functions. R package version 1.2.

Examples

library(cenROC)

X <- rnorm(100) # random data vector
wt <- runif(100) # weight vector

## Cross-validation bandwidth selection
CV(X = X, wt = wt)$bw

Survival probability conditional to the observed data estimation for right censored data.

Description

Survival probability conditional to the observed data estimation for right censored data.

Usage

Csurv(Y, M, censor, t, h = NULL, kernel = "normal")

Arguments

Y

The numeric vector of event-times or observed times.

M

The numeric vector of marker values for which we want to compute the time-dependent ROC curves.

censor

The censoring indicator, 1 if event, 0 otherwise.

t

A scaler time point at which we want to compute the time-dependent ROC curve.

h

A scaler for the bandwidth of Beran's weight calculaions. The defualt is using the method of Sheather and Jones (1991).

kernel

A character string giving the type kernel to be used: "normal", "epanechnikov", , "tricube", "boxcar", "triangular", or "quartic". The defaults is "normal" kernel density.

Value

Return a vectors:

positive P(T<t|Y,censor,M).

negative P(T>t|Y,censor,M).

References

Beyene, K. M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373– 3396.

Li, Liang, Bo Hu and Tom Greene (2018). A simple method to estimate the time-dependent receiver operating characteristic curve and the area under the curve with right censored data, Statistical Methods in Medical Research, 27(8): 2264-2278.

Pablo Martínez-Camblor and Gustavo F. Bayón and Sonia Pérez-Fernández (2016). Cumulative/dynamic roc curve estimation, Journal of Statistical Computation and Simulation, 86(17): 3582-3594.

Survival probability conditional on the observed data estimation for interval censored data

Description

Survival probability conditional on the observed data estimation for interval censored data

Usage

ICsur(L, R, M, t, method, dist)

Arguments

L

The numericvector of left limit of observed time. For left censored observations L == 0.

R

The numericvector of right limit of observed time. For right censored observation R == inf.

M

The numeric vector of marker value.

t

A scaler time point used to calculate the the ROC curve

method

A character indication type of modeling. This include nonparametric "np",parmetric "pa" and semiparametric "sp".

dist

A character incating the type of distribution for parametric model. This includes are "exponential", "weibull", "gamma", "lnorm", "loglogistic" and "generalgamma".

Value

Return a vectors:

positive P(T<t|L,R,M).

negative P(T>t|L,R,M).

References

Beyene, K. M. and El Ghouch A. (2022). Time-dependent ROC curve estimation for interval-censored data. Biometrical Journal, 64, 1056– 1074.

Time-dependent ROC curve estimation for interval-censored survival data

Description

This function computes the time-dependent ROC curve for interval censored survival data using the cumulative sensitivity and dynamic specificity definitions. The ROC curves can be either empirical (non-smoothed) or smoothed with/without boundary correction. It also calculates the time-dependent AUC.

Usage

IntROC(L, R, M, t, U = NULL, method = "emp", method2 = "pa", dist = "weibull",
        bw = NULL, ktype = "normal", len = 151, B = 0, alpha = 0.05, plot = "TRUE")

Arguments

L

The numericvector of left limit of observed time. For left censored observations L == 0.

R

The numericvector of right limit of observed time. For right censored observation R == inf.

M

The numeric vector of marker values.

t

A scaler time point used to calculate the ROC curve.

U

The numeric vector of cutoff values.

method

The method of ROC curve estimation. The possible options are "emp" empirical metod; "untra" smooth without boundary correction and "tra" is smooth ROC curve estimation with boundary correction. The default is the "emp" empirical method.

method2

A character indication type of modeling. This include nonparametric "np", parmetric "pa" and semiparametric "sp". The default is the "np" parametric with weibull distribution.

dist

A character incating the type of distribution for parametric model. This includes are "exponential", "weibull", "gamma", "lnorm", "loglogistic" and "generalgamma".

bw

A character string specifying the bandwidth estimation method. The possible options are "NR" for the normal reference, the plug-in "PI" and the cross-validation "CV". The default is the "NR" normal reference method. It is also possible to use a numeric value.

ktype

A character string giving the type kernel distribution to be used for smoothing the ROC curve: "normal", "epanechnikov", "biweight", or "triweight". By default, the "normal" kernel is used.

len

The length of the grid points for ROC estimation. Default is 151.

B

The number of bootstrap samples to be used for variance estimation. The default is 0, no variance estimation.

alpha

The significance level. The default is 0.05.

plot

The logigal parameter to see the ROC curve plot. Default is TRUE.

Details

This function implments time-dependent ROC curve and the corresponding AUC using the model-band and nonparametric for the estimation of conditional survival function. The empirical (non-smoothed) ROC estimate and the smoothed ROC estimate with/without boundary correction can be obtained using this function. The smoothed ROC curve estimators require selecting a bandwidth parametr for smoothing the ROC curve. To this end, three data-driven methods: the normal reference "NR", the plug-in "PI" and the cross-validation "CV" were implemented. See Beyene and El Ghouch (2020) for details.

Value

Returns the following items:

ROC The vector of estimated ROC values. These will be numeric numbers between zero

and one.

U The vector of grid points used.

AUC A data frame of dimension 1 \times 4. The columns are: AUC, standard error of AUC, the lower

and upper limits of bootstrap CI.

bw The computed value of bandwidth. For the empirical method this is always NA.

Dt The vector of estimated event status.

M The vector of Marker values.

References

Beyene, K. M. and El Ghouch A. (2022). Time-dependent ROC curve estimation for interval-censored data. Biometrical Journal, 64, 1056– 1074.

Beyene, K. M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373– 3396.

Examples

library(cenROC)

data(hds)

est = IntROC(L=hds$L, R=hds$R, M=hds$M, t=2)
est$AUC

The normal reference bandwidth selection for weighted data

Description

This function computes the data-driven bandwidth for smoothing the ROC (or distribution) function using the NR method of Beyene and El Ghouch (2020). This is an extension of the classical (unweighted) normal reference bandwith selection method to the case of weighted data.

Usage

NR(X, wt, ktype = "normal")

Arguments

X

The numeric data vector.

wt

The non-negative weight vector.

ktype

A character string giving the type kernel to be used: "normal", "epanechnikov", "biweight", or "triweight". By default, the "normal" kernel is used.

Details

See Beyene and El Ghouch (2020) for details.

Value

Returns the computed value for the bandwith parameter.

Author(s)

Kassu Mehari Beyene and Anouar El Ghouch

References

Beyene, K. M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373– 3396.

Examples

library(cenROC)

X <- rnorm(100) # random data vector
wt <- runif(100) # weight vector

## Normal reference bandwidth selection
NR(X = X, wt = wt)$bw

The plug-in bandwidth selection for weighted data

Description

This function computes the data-driven bandwidth for smoothing the ROC (or distribution) function using the PI method of Beyene and El Ghouch (2020). This is an extension of the classical (unweighted) direct plug-in bandwith selection method to the case of weighted data.

Usage

PI(X, wt, ktype = "normal")

Arguments

X

The numeric vector of random variable.

wt

The non-negative weight vector.

ktype

A character string giving the type kernel to be used: "normal", "epanechnikov", "biweight", or "triweight". By default, the "normal" kernel is used.

Details

See Beyene and El Ghouch (2020) for details.

Value

Returns the computed value for the bandwith parameter.

Author(s)

Kassu Mehari Beyene and Anouar El Ghouch

References

Beyene, K. M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373– 3396.

Examples

library(cenROC)

X <- rnorm(100) # random data vector
wt <- runif(100) # weight vector

## Plug-in bandwidth selection
PI(X = X, wt = wt)$bw

ROC estimation function

Description

ROC estimation function

Usage

RocFun(U, D, M, bw = "NR", method, ktype)

Arguments

U

The vector of grid points where the ROC curve is estimated.

D

The event indicator.

M

The numeric vector of marker values for which the time-dependent ROC curves is computed.

bw

The bandwidth parameter for smoothing the ROC function. The possible options are NR normal reference method; PI plug-in method and CV cross-validation method. The default is the NR normal reference method.

method

is the method of ROC curve estimation. The possible options are emp emperical metod; untra smooth without boundary correction and tra is smooth ROC curve estimation with boundary correction.

ktype

A character string giving the type kernel to be used: "normal", "epanechnikov", "biweight", or "triweight".

Author(s)

Beyene K. Mehari and El Ghouch Anouar

References

Beyene, K. M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373– 3396.

Estimation of the time-dependent ROC curve for right censored survival data

Description

This function computes the time-dependent ROC curve for right censored survival data using the cumulative sensitivity and dynamic specificity definitions. The ROC curves can be either empirical (non-smoothed) or smoothed with/wtihout boundary correction. It also calculates the time-dependent area under the ROC curve (AUC).

Usage

cenROC(Y, M, censor, t, U = NULL, h = NULL, bw = "NR", method = "tra",
    ktype = "normal", ktype1 = "normal", B = 0, alpha = 0.05, plot = "TRUE")

Arguments

Y

The numeric vector of event-times or observed times.

M

The numeric vector of marker values for which the time-dependent ROC curves is computed.

censor

The censoring indicator, 1 if event, 0 otherwise.

t

A scaler time point at which the time-dependent ROC curve is computed.

U

The vector of grid points where the ROC curve is estimated. The default is a sequence of 151 numbers between 0 and 1.

h

A scaler for the bandwidth of Beran's weight calculaions. The defualt is the value obtained by using the method of Sheather and Jones (1991).

bw

A character string specifying the bandwidth estimation method for the ROC itself. The possible options are "NR" for the normal reference, the plug-in "PI" and the cross-validation "CV". The default is the "NR" normal reference method. The user can also introduce a numerical value.

method

The method of ROC curve estimation. The possible options are "emp" emperical metod; "untra" smooth without boundary correction and "tra" is smooth ROC curve estimation with boundary correction. The default is the "tra" smooth ROC curve estimate with boundary correction.

ktype

A character string giving the type kernel distribution to be used for smoothing the ROC curve: "normal", "epanechnikov", "biweight", or "triweight". By default, the "normal" kernel is used.

ktype1

A character string specifying the desired kernel needed for Beran weight calculation. The possible options are "normal", "epanechnikov", "tricube", "boxcar", "triangular", or "quartic". The defaults is "normal" kernel density.

B

The number of bootstrap samples to be used for variance estimation. The default is 0, no variance estimation.

alpha

The significance level. The default is 0.05.

plot

The logical parameter to see the ROC curve plot. The default is TRUE.

Details

The empirical (non-smoothed) ROC estimate and the smoothed ROC estimate with/without boundary correction can be obtained using this function. The smoothed ROC curve estimators require selecting two bandwidth parametrs: one for Beran’s weight calculation and one for smoothing the ROC curve. For the latter, three data-driven methods: the normal reference "NR", the plug-in "PI" and the cross-validation "CV" were implemented. To select the bandwidth parameter needed for Beran’s weight calculation, by default, the plug-in method of Sheather and Jones (1991) is used but it is also possible introduce a numeric value. See Beyene and El Ghouch (2020) for details.

Value

Returns the following items:

ROC The vector of estimated ROC values. These will be numeric numbers between zero

and one.

U The vector of grid points used.

AUC A data frame of dimension 1 \times 4. The columns are: AUC, standard error of AUC, the lower

and upper limits of bootstrap CI.

bw The computed value of bandwidth. For the empirical method this is always NA.

Dt The vector of estimated event status.

M The vector of Marker values.

Author(s)

Kassu Mehari Beyene and Anouar El Ghouch

References

Beyene, K. M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373– 3396.

Sheather, S. J. and Jones, M. C. (1991). A Reliable data-based bandwidth selection method for kernel density estimation. Journal of the Royal Statistical Society. Series B (Methodological) 53(3): 683–690.

Examples

library(cenROC)

data(mayo)

est = cenROC(Y=mayo$time, M=mayo$mayoscore5, censor=mayo$censor, t=365*6)
est$AUC

Compute the conditional survival function for Interval Censored Survival Data

Description

A method to compute the survival function for the interval censored survival data based on a spline function based constrained maximum likelihood estimator. The maximization process of likelihood is carried out by generalized gradient projection method.

Usage

condS(L, R, M, Delta, t, m)

Arguments

L

The numericvector of left limit of observed time. For left censored observations L == 0.

R

The numericvector of right limit of observed time. For right censored observation R == inf.

M

An array contains marker levels for the samples.

Delta

An array of indicator for the censored type, use 1, 2, 3 for event happened before the left bound time, within the defined time range, and after.

t

A scalar indicates the predict time.

m

A scalar for the cutoff of the marker variable.

References

Wu, Yuan; Zhang, Ying. Partially monotone tensor spline estimation of the joint distribution function with bivariate current status data. Ann. Statist. 40, 2012, 1609-1636 <doi:10.1214/12-AOS1016>

Derivative of normal distribution

Description

Derivative of normal distribution

Usage

dnorkernel(ord, X)

Arguments

ord

The order of derivative.

X

The numeric data vector.

NASA Hypobaric Decompression Sickness Marker Data

Description

This data contains the marker values with the left and right limits of the observed time for the subjects in NASA Hypobaric Decompression Sickness Data.

Usage

data(hds)

Format

This is a data frame with 238 observations and 3 variables: L (left limit of the observed time), R (right limit of the observed time) and M (marker). The marker is a score derived by combining the covariates Age, Sex, TR360, and Noadyn.

References

Beyene, K. M. and El Ghouch A. (2022). Time-dependent ROC curve estimation for interval-censored data. Biometrical Journal, 64, 1056– 1074.

Numerical Integral function using Simpson's rule

Description

Numerical Integral function using Simpson's rule

Usage

integ(x, fx, method, n.pts = 256)

Arguments

x

The numeric data vector.

fx

The function.

method

The character string specifying method of numerical integration. The possible options are trap for trapezoidal rule and simps for simpson'r rule.

n.pts

Number of points.

Distribution function without the ith observation

Description

Distribution function without the ith observation

Usage

ker_dis_i(X, y, wt, ktype, bw)

Arguments

X

The numeric data vector.

y

The vector where the kernel estimation is computed.

wt

The non-negative weight vector.

ktype

A character string giving the type kernel to be used: "normal", "epanechnikov", "biweight", or "triweight".

bw

A numeric bandwidth value.

Value

Returns the estimated value for the bandwith parameter.

Author(s)

Kassu Mehari Beyene and Anouar El Ghouch

Function to evaluate the matrix of data vector minus the grid points divided by the bandwidth value.

Description

Function to evaluate the matrix of data vector minus the grid points divided by the bandwidth value.

Usage

kfunc(ktype = "normal", difmat)

Arguments

ktype

A character string giving the type kernel to be used: "normal", "epanechnikov", "biweight", or "triweight". By default, the "normal" kernel is used.

difmat

A numeric matrix of sample data (X) minus evaluation points (x0) divided by bandwidth value (bw).

Value

Returns the matrix resulting from evaluating difmat.

Kernel distribution function

Description

Kernel distribution function

Usage

kfunction(ktype, X)

Arguments

ktype

A character string giving the type kernel to be used: "normal", "epanechnikov", "biweight", or "triweight".

X

A numeric vector of sample data.

Value

Returns a vector resulting from evaluating X.

Mayo Marker Data

Description

Two marker values with event time and censoring status for the subjects in Mayo PBC data.

Usage

data(mayo)

Format

A data frame with 312 observations and 4 variables: time (event time/censoring time), censor (censoring indicator), mayoscore4, mayoscore5. The two scores are derived from 4 and 5 covariates respectively.

References

Heagerty, P. J., and Zheng, Y. (2005). Survival model predictive accuracy and ROC curves. Biometrics, 61(1), 92-105.

The value of squared integral x^2 k(x) dx and integral x k(x) K(x) dx

Description

The value of squared integral x^2 k(x) dx and integral x k(x) K(x) dx

Usage

muro(ktype)

Arguments

ktype

A character string giving the type kernel to be used: "normal", "epanechnikov", "biweight", or "triweight".

Weighted inter-quartile range estimation

Description

Weighted inter-quartile range estimation

Usage

wIQR(X, wt)

Arguments

X

The numeric data vector.

wt

The non-negative weight vector.

Function to select the bandwidth parameter needed for smoothing the time-dependent ROC curve.

Description

This function computes the data-driven bandwidth value for smoothing the ROC curve. It contains three methods: the normal refrence, the plug-in and the cross-validation methods.

Usage

wbw(X, wt, bw = "NR", ktype = "normal")

Arguments

X

The numeric data vector.

wt

The non-negative weight vector.

bw

A character string specifying the bandwidth selection method. The possible options are "NR" for the normal reference, the plug-in "PI" and cross-validation "CV".

ktype

A character string indicating the type of kernel function: "normal", "epanechnikov", "biweight", or "triweight". Default is "normal" kernel.

Value

Returns the estimated value for the bandwith parameter.

Author(s)

Kassu Mehari Beyene and Anouar El Ghouch

References

Beyene, K. M. and El Ghouch A. (2020). Smoothed time-dependent receiver operating characteristic curve for right censored survival data. Statistics in Medicine. 39: 3373– 3396.

Weighted quartile estimation

Description

Weighted quartile estimation

Usage

wquantile(X, wt, p = 0.5)

Arguments

X

The numeric data vector.

wt

The non-negative weight vector.

p

The percentile value. The defult is 0.5.

Weighted variance estimation

Description

Weighted variance estimation

Usage

wvar(X, wt, na.rm = FALSE)

Arguments

X

The numeric data vector.

wt

The non-negative weight vector.

na.rm

The character indicator wether to consider missing value(s) or not. The defult is FALSE.

Computes optimal cutoff point using the Youden index criteria

Description

This function computes the optimal cutoff point using the Youden index criteria of both right and interval censored time-to-event data. The Youden index estimator can be either empirical (non-smoothed) or smoothed with/without boundary correction.

Usage

youden(est, plot = "FALSE")

Arguments

est

The object returned either by cenROC or IntROC.

plot

The logical parameter to see the ROC curve plot along with the Youden inex. The default is TRUE.

Details

In medical decision-making, obtaining the optimal cutoff value is crucial to identify subject at high risk of experiencing the event of interest. Therefore, it is necessary to select a marker value that classifies subjects into healthy and diseased groups. To this end, in the literature, several methods for selecting optimal cutoff point have been proposed. In this package, we only included the Youden index criteria.

Value

Returns the following items:

Youden.index The maximum Youden index value.

cutopt The optimal cutoff value.

sens The sensitivity corresponding to the optimal cutoff value.

spec The specificity corresponding to the optimal cutoff value.

References

Beyene, K. M. and El Ghouch A. (2022). Time-dependent ROC curve estimation for interval-censored data. Biometrical Journal, 64, 1056– 1074.

Youden, W.J. (1950). Index for rating diagnostic tests. Cancer 3, 32–35.

Examples

library(cenROC)

# Right censored data
data(mayo)

resu <- cenROC(Y=mayo$time, M=mayo$mayoscore5, censor=mayo$censor, t=365*6, plot="FALSE")
youden(resu,  plot="TRUE")

# Interval censored data
data(hds)

resu1 = IntROC(L=hds$L, R=hds$R, M=hds$M, t=2)
youden(resu1,  plot="TRUE")