Type: | Package |
Title: | Estimating the Error Variance in a High-Dimensional Linear Model |
Version: | 0.9.0 |
Maintainer: | Guo Yu <gy63@cornell.edu> |
Description: | Implementation of the two error variance estimation methods in high-dimensional linear models of Yu, Bien (2017) <doi:10.48550/arXiv.1712.02412>. |
URL: | https://arxiv.org/abs/1712.02412 |
BugReports: | https://github.com/hugogogo/natural/issues |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 6.0.1 |
Imports: | Matrix, glmnet |
Suggests: | knitr, rmarkdown |
VignetteBuilder: | knitr |
NeedsCompilation: | yes |
Packaged: | 2018-01-16 01:32:01 UTC; hugo |
Author: | Guo Yu [aut, cre] |
Repository: | CRAN |
Date/Publication: | 2018-01-16 10:35:43 UTC |
natural: Natural and Organic lasso estimates of error variance in high-dimensional linear models
Description
The package contains implementation of the two methods introduced in Yu, Bien (2017) https://arxiv.org/abs/1712.02412.
Details
The main functions are nlasso_cv
, olasso_cv
, and olasso
.
Get the two (theoretical) values of lambdas used in the organic lasso
Description
Get the two (theoretical) values of lambdas used in the organic lasso
Usage
getLam_olasso(x)
Arguments
x |
design matrix |
Get the two (theoretical) values of lambdas used in scaled lasso
Description
Get the two (theoretical) values of lambdas used in scaled lasso
Usage
getLam_slasso(n, p)
Arguments
n |
number of observations |
p |
number of features |
Generate sparse linear model and random samples
Description
Generate design matrix and response following linear models
y = X \beta + \epsilon
, where
\epsilon ~ N(0, \sigma^2)
, and X ~ N(0, \Sigma)
.
Usage
make_sparse_model(n, p, alpha, rho, snr, nsim)
Arguments
n |
the sample size |
p |
the number of features |
alpha |
sparsity, i.e., |
rho |
pairwise correlation among features |
snr |
signal to noise ratio, defined as |
nsim |
the number of simulations |
Value
A list object containing:
x
:The
n
byp
design matrixy
:The
n
bynsim
matrix of response vector, each column representing one replication of the simulationbeta
:The true regression coefficient vector
sigma
:The true error standard deviation
Cross-validation for natural lasso
Description
Provide natural lasso estimate (of the error standard deviation) using cross-validation to select the tuning parameter value The output also includes the cross-validation result of the naive estimate and the degree of freedom adjusted estimate of the error standard deviation.
Usage
nlasso_cv(x, y, lambda = NULL, intercept = TRUE, nlam = 100,
flmin = 0.01, nfold = 5, foldid = NULL, thresh = 1e-08,
glmnet_output = NULL)
Arguments
x |
An |
y |
A response vector of size |
lambda |
A user specified list of tuning parameter. Default to be NULL, and the program will compute its own |
intercept |
Indicator of whether intercept should be fitted. Default to be |
nlam |
The number of |
flmin |
The ratio of the smallest and the largest values in |
nfold |
Number of folds in cross-validation. Default value is 5. If each fold gets too view observation, a warning is thrown and the minimal |
foldid |
A vector of length |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
glmnet_output |
Should the estimate be computed using a user-specified output from |
Value
A list object containing:
n
andp
:The dimension of the problem.
lambda
:The path of tuning parameter used.
beta
:Estimate of the regression coefficients, in the original scale, corresponding to the tuning parameter selected by cross-validation.
a0
:Estimate of intercept
mat_mse
:The estimated prediction error on the test sets in cross-validation. A matrix of size
nlam
bynfold
. Ifglmnet_output
is notNULL
, thenmat_mse
will be NULL.cvm
:The averaged estimated prediction error on the test sets over K folds.
cvse
:The standard error of the estimated prediction error on the test sets over K folds.
ibest
:The index in
lambda
that attains the minimal mean cross-validated error.foldid
:Fold assignment. A vector of length
n
.nfold
:The number of folds used in cross-validation.
sig_obj
:Natural lasso estimate of standard deviation of the error, with the optimal tuning parameter selected by cross-validation.
sig_obj_path
:Natural lasso estimates of standard deviation of the error. A vector of length
nlam
.sig_naive
:Naive estimates of the error standard deviation based on lasso regression, i.e.,
||y - X \hat{\beta}||_2 / \sqrt n
, selected by cross-validation.sig_naive_path
:Naive estimate of standard deviation of the error based on lasso regression. A vector of length
nlam
.sig_df
:Degree-of-freedom adjusted estimate of standard deviation of the error, selected by cross-validation. See Reid, et, al (2016).
sig_df_path
:Degree-of-freedom adjusted estimate of standard deviation of the error. A vector of length
nlam
.type
:whether the output is of a natural or an organic lasso.
See Also
Examples
set.seed(123)
sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1)
nl_cv <- nlasso_cv(x = sim$x, y = sim$y[, 1])
Fit a linear model with natural lasso
Description
Calculate a solution path of the natural lasso estimate (of error standard deviation) with a list of tuning parameter values. In particular, this function solves the lasso problems and returns the lasso objective function values as estimates of the error variance:
\hat{\sigma}^2_{\lambda} = \min_{\beta} ||y - X \beta||_2^2 / n + 2 \lambda ||\beta||_1.
The output also includes a path of naive estimates and a path of degree of freedom adjusted estimates of the error standard deviation.
Usage
nlasso_path(x, y, lambda = NULL, nlam = 100, flmin = 0.01,
thresh = 1e-08, intercept = TRUE, glmnet_output = NULL)
Arguments
x |
An |
y |
A response vector of size |
lambda |
A user specified list of tuning parameter. Default to be NULL, and the program will compute its own |
nlam |
The number of |
flmin |
The ratio of the smallest and the largest values in |
thresh |
Threshold value for the underlying optimization algorithm to claim convergence. Default to be |
intercept |
Indicator of whether intercept should be fitted. Default to be |
glmnet_output |
Should the estimate be computed using a user-specified output from |
Value
A list object containing:
n
andp
:The dimension of the problem.
lambda
:The path of tuning parameters used.
beta
:Matrix of estimates of the regression coefficients, in the original scale. The matrix is of size
p
bynlam
. Thej
-th column represents the estimate of coefficient corresponding to thej
-th tuning parameter inlambda
.a0
:Estimate of intercept. A vector of length
nlam
.sig_obj_path
:Natural lasso estimates of the error standard deviation. A vector of length
nlam
.sig_naive_path
:Naive estimates of the error standard deviation based on lasso regression, i.e.,
||y - X \hat{\beta}||_2 / \sqrt n
. A vector of lengthnlam
.sig_df_path
:Degree-of-freedom adjusted estimate of standard deviation of the error. A vector of length
nlam
. See Reid, et, al (2016).type
:whether the output is of a natural or an organic lasso.
See Also
Examples
set.seed(123)
sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1)
nl_path <- nlasso_path(x = sim$x, y = sim$y[, 1])
Error standard deviation estimation using organic lasso
Description
Solve the organic lasso problem
\tilde{\sigma}^2_{\lambda} = \min_{\beta} ||y - X \beta||_2^2 / n + 2 \lambda ||\beta||_1^2
with two pre-specified values of tuning parameter:
\lambda_1 = log p / n
, and \lambda_2
, which is a Monte-Carlo estimate of ||X^T e||_\infty^2 / n^2
, where e
is n-dimensional standard normal.
Usage
olasso(x, y, intercept = TRUE, thresh = 1e-08)
Arguments
x |
An |
y |
A response vector of size |
intercept |
Indicator of whether intercept should be fitted. Default to be |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
Value
A list object containing:
n
andp
:The dimension of the problem.
lam_1
,lam_2
:log(p) / n
, and an Monte-Carlo estimate of||X^T e||_\infty^2 / n^2
, wheree
is n-dimensional standard normal.a0_1
,a0_2
:Estimate of intercept, corresponding to
lam_1
andlam_2
.beta_1
,beta_2
:Organic lasso estimate of regression coefficients, corresponding to
lam_1
andlam_2
.sig_obj_1
,sig_obj_2
:Organic lasso estimate of the error standard deviation, corresponding to
lam_1
andlam_2
.
See Also
Examples
set.seed(123)
sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1)
ol <- olasso(x = sim$x, y = sim$y[, 1])
Cross-validation for organic lasso
Description
Provide organic lasso estimate (of the error standard deviation) using cross-validation to select the tuning parameter value
Usage
olasso_cv(x, y, lambda = NULL, intercept = TRUE, nlam = 100,
flmin = 0.01, nfold = 5, foldid = NULL, thresh = 1e-08)
Arguments
x |
An |
y |
A response vector of size |
lambda |
A user specified list of tuning parameter. Default to be NULL, and the program will compute its own |
intercept |
Indicator of whether intercept should be fitted. Default to be |
nlam |
The number of |
flmin |
The ratio of the smallest and the largest values in |
nfold |
Number of folds in cross-validation. Default value is 5. If each fold gets too view observation, a warning is thrown and the minimal |
foldid |
A vector of length |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
Value
A list object containing:
n
andp
:The dimension of the problem.
lambda
:The path of tuning parameter used.
beta
:Estimate of the regression coefficients, in the original scale, corresponding to the tuning parameter selected by cross-validation.
a0
:Estimate of intercept
mat_mse
:The estimated prediction error on the test sets in cross-validation. A matrix of size
nlam
bynfold
cvm
:The averaged estimated prediction error on the test sets over K folds.
cvse
:The standard error of the estimated prediction error on the test sets over K folds.
ibest
:The index in
lambda
that attains the minimal mean cross-validated error.foldid
:Fold assignment. A vector of length
n
.nfold
:The number of folds used in cross-validation.
sig_obj
:Organic lasso estimate of the error standard deviation, selected by cross-validation.
sig_obj_path
:Organic lasso estimates of the error standard deviation. A vector of length
nlam
.type
:whether the output is of a natural or an organic lasso.
See Also
Examples
set.seed(123)
sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1)
ol_cv <- olasso_cv(x = sim$x, y = sim$y[, 1])
Fit a linear model with organic lasso
Description
Calculate a solution path of the organic lasso estimate (of error standard deviation) with a list of tuning parameter values. In particular, this function solves the squared-lasso problems and returns the objective function values as estimates of the error variance:
\tilde{\sigma}^2_{\lambda} = \min_{\beta} ||y - X \beta||_2^2 / n + 2 \lambda ||\beta||_1^2.
Usage
olasso_path(x, y, lambda = NULL, nlam = 100, flmin = 0.01,
thresh = 1e-08, intercept = TRUE)
Arguments
x |
An |
y |
A response vector of size |
lambda |
A user specified list of tuning parameter. Default to be NULL, and the program will compute its own |
nlam |
The number of |
flmin |
The ratio of the smallest and the largest values in |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
intercept |
Indicator of whether intercept should be fitted. Default to be |
Details
This package also includes the outputs of the naive and the degree-of-freedom adjusted estimates, in analogy to nlasso_path
.
Value
A list object containing:
n
andp
:The dimension of the problem.
lambda
:The path of tuning parameter used.
a0
:Estimate of intercept. A vector of length
nlam
.beta
:Matrix of estimates of the regression coefficients, in the original scale. The matrix is of size
p
bynlam
. Thej
-th column represents the estimate of coefficient corresponding to thej
-th tuning parameter inlambda
.sig_obj_path
:Organic lasso estimates of the error standard deviation. A vector of length
nlam
.sig_naive
:Naive estimate of the error standard deviation based on the squared-lasso regression. A vector of length
nlam
.sig_df
:Degree-of-freedom adjusted estimate of the error standard deviation, based on the squared-lasso regression. A vector of length
nlam
.type
:whether the output is of a natural or an organic lasso.
See Also
Examples
set.seed(123)
sim <- make_sparse_model(n = 50, p = 200, alpha = 0.6, rho = 0.6, snr = 2, nsim = 1)
ol_path <- olasso_path(x = sim$x, y = sim$y[, 1])
Solve organic lasso problem with a single value of lambda The lambda values are for slow rates, which could give less satisfying results
Description
Solve organic lasso problem with a single value of lambda The lambda values are for slow rates, which could give less satisfying results
Usage
olasso_slow(x, y, thresh = 1e-08)
Arguments
x |
An |
y |
A response vector of size |
thresh |
Threshold value for underlying optimization algorithm to claim convergence. Default to be |
plot a natural.cv object
Description
This function is adapted from the ggb R package.
Usage
## S3 method for class 'natural.cv'
plot(x, ...)
Arguments
x |
an object of class |
... |
additional argument(not used here, only for S3 generic/method consistency) |
plot a natural.path object
Description
This function is adapted from the ggb R package.
Usage
## S3 method for class 'natural.path'
plot(x, ...)
Arguments
x |
an object of class |
... |
additional argument(not used here, only for S3 generic/method consistency) |
print a natural.path object
Description
This function is adapted from the ggb R package.
Usage
## S3 method for class 'natural.path'
print(x, ...)
Arguments
x |
an object of class |
... |
additional argument(not used here, only for S3 generic/method consistency) |
Standardize the n -by- p design matrix X to have column means zero and ||X_j||_2^2 = n for all j
Description
Standardize the n -by- p design matrix X to have column means zero and ||X_j||_2^2 = n for all j
Usage
standardize(x, center = TRUE)
Arguments
x |
design matrix |
center |
should we set column means equal to zero |