Type: | Package |
Title: | Bayesian Deep Gaussian Processes using MCMC |
Version: | 1.1.3 |
Date: | 2024-07-31 |
Depends: | R (≥ 3.6) |
Description: | Performs Bayesian posterior inference for deep Gaussian processes following Sauer, Gramacy, and Higdon (2023, <doi:10.48550/arXiv.2012.08015>). See Sauer (2023, http://hdl.handle.net/10919/114845) for comprehensive methodological details and https://bitbucket.org/gramacylab/deepgp-ex/ for a variety of coding examples. Models are trained through MCMC including elliptical slice sampling of latent Gaussian layers and Metropolis-Hastings sampling of kernel hyperparameters. Vecchia-approximation for faster computation is implemented following Sauer, Cooper, and Gramacy (2023, <doi:10.48550/arXiv.2204.02904>). Optional monotonic warpings are implemented following Barnett et al. (2024, <doi:10.48550/arXiv.2408.01540>). Downstream tasks include sequential design through active learning Cohn/integrated mean squared error (ALC/IMSE; Sauer, Gramacy, and Higdon, 2023), optimization through expected improvement (EI; Gramacy, Sauer, and Wycoff, 2022 <doi:10.48550/arXiv.2112.07457>), and contour location through entropy (Booth, Renganathan, and Gramacy, 2024 <doi:10.48550/arXiv.2308.04420>). Models extend up to three layers deep; a one layer model is equivalent to typical Gaussian process regression. Incorporates OpenMP and SNOW parallelization and utilizes C/C++ under the hood. |
License: | LGPL-2 | LGPL-2.1 | LGPL-3 [expanded from: LGPL] |
Encoding: | UTF-8 |
NeedsCompilation: | yes |
Imports: | grDevices, graphics, stats, doParallel, foreach, parallel, GpGp, Matrix, Rcpp, mvtnorm, FNN |
LinkingTo: | Rcpp, RcppArmadillo, |
Suggests: | interp, knitr, rmarkdown |
VignetteBuilder: | knitr |
RoxygenNote: | 7.2.3 |
Packaged: | 2024-08-19 14:07:08 UTC; asauer3 |
Author: | Annie S. Booth [aut, cre] |
Maintainer: | Annie S. Booth <annie_booth@ncsu.edu> |
Repository: | CRAN |
Date/Publication: | 2024-08-19 15:00:02 UTC |
Package deepgp
Description
Performs Bayesian posterior inference for deep Gaussian processes following Sauer, Gramacy, and Higdon (2023). See Sauer (2023) for comprehensive methodological details and https://bitbucket.org/gramacylab/deepgp-ex/ for a variety of coding examples. Models are trained through MCMC including elliptical slice sampling of latent Gaussian layers and Metropolis-Hastings sampling of kernel hyperparameters. Vecchia-approximation for faster computation is implemented following Sauer, Cooper, and Gramacy (2023). Optional monotonic warpings are implemented following Barnett et al. (2024). Downstream tasks include sequential design through active learning Cohn/integrated mean squared error (ALC/IMSE; Sauer, Gramacy, and Higdon, 2023), optimization through expected improvement (EI; Gramacy, Sauer, and Wycoff, 2022), and contour location through entropy (Booth, Renganathan, and Gramacy, 2024). Models extend up to three layers deep; a one layer model is equivalent to typical Gaussian process regression. Incorporates OpenMP and SNOW parallelization and utilizes C/C++ under the hood.
Important Functions
-
fit_one_layer
: conducts MCMC sampling of hyperparameters for a one layer GP -
fit_two_layer
: conducts MCMC sampling of hyperparameters and hidden layer for a two layer deep GP -
fit_three_layer
: conducts MCMC sampling of hyperparameters and hidden layers for a three layer deep GP -
continue
: collects additional MCMC samples -
trim
: cuts off burn-in and optionally thins samples -
predict
: calculates posterior mean and variance over a set of input locations (optionally calculates EI or entropy) -
plot
: produces trace plots, hidden layer plots, and posterior predictive plots -
ALC
: calculates active learning Cohn over set of input locations using reference grid -
IMSE
: calculates integrated mean-squared error over set of input locations
Author(s)
Annie S. Booth annie_booth@ncsu.edu
References
Sauer, A. (2023). Deep Gaussian process surrogates for computer experiments.
*Ph.D. Dissertation, Department of Statistics, Virginia Polytechnic Institute and State University.*
http://hdl.handle.net/10919/114845
Sauer, A., Gramacy, R.B., & Higdon, D. (2023). Active learning for deep
Gaussian process surrogates. *Technometrics, 65,* 4-18. arXiv:2012.08015
Sauer, A., Cooper, A., & Gramacy, R. B. (2023). Vecchia-approximated deep Gaussian
processes for computer experiments.
*Journal of Computational and Graphical Statistics, 32*(3), 824-837. arXiv:2204.02904
Gramacy, R. B., Sauer, A. & Wycoff, N. (2022). Triangulation candidates for Bayesian
optimization. *Advances in Neural Information Processing Systems (NeurIPS), 35,*
35933-35945. arXiv:2112.07457
Booth, A., Renganathan, S. A. & Gramacy, R. B. (2024). Contour location for
reliability in airfoil simulation experiments using deep Gaussian
processes. *In Review.* arXiv:2308.04420
Barnett, S., Beesley, L. J., Booth, A. S., Gramacy, R. B., & Osthus D. (2024). Monotonic warpings for additive and deep Gaussian processes. *In Review.* arXiv:2408.01540
Examples
# See vignette, ?fit_one_layer, ?fit_two_layer, ?fit_three_layer,
# ?ALC, or ?IMSE for examples
# Many more examples including real-world computer experiments are available at:
# https://bitbucket.org/gramacylab/deepgp-ex/
Active Learning Cohn for Sequential Design
Description
Acts on a gp
, dgp2
, or dgp3
object.
Current version requires squared exponential covariance
(cov = "exp2"
). Calculates ALC over the input locations
x_new
using specified reference grid. If no reference grid is
specified, x_new
is used as the reference. Optionally utilizes
SNOW parallelization. User should
select the point with the highest ALC to add to the design.
Usage
ALC(object, x_new, ref, cores)
## S3 method for class 'gp'
ALC(object, x_new = NULL, ref = NULL, cores = 1)
## S3 method for class 'dgp2'
ALC(object, x_new = NULL, ref = NULL, cores = 1)
## S3 method for class 'dgp3'
ALC(object, x_new = NULL, ref = NULL, cores = 1)
Arguments
object |
object of class |
x_new |
matrix of possible input locations, if object has been run
through |
ref |
optional reference grid for ALC approximation, if |
cores |
number of cores to utilize in parallel, by default no parallelization is used |
Details
Not yet implemented for Vecchia-approximated fits or Matern kernels.
All iterations in the object are used in the calculation, so samples
should be burned-in. Thinning the samples using trim
will
speed up computation. This function may be used in two ways:
Option 1: called on an object with only MCMC iterations, in which case
x_new
must be specifiedOption 2: called on an object that has been predicted over, in which case the
x_new
frompredict
is used
In Option 2, it is recommended to set store_latent = TRUE
for
dgp2
and dgp3
objects so
latent mappings do not have to be re-calculated. Through predict
,
the user may specify a mean mapping (mean_map = TRUE
) or a full
sample from the MVN distribution over w_new
(mean_map = FALSE
). When the object has not yet been predicted
over (Option 1), the mean mapping is used.
SNOW parallelization reduces computation time but requires more memory storage. C code derived from the "laGP" package (Robert B Gramacy and Furong Sun).
Value
list with elements:
-
value
: vector of ALC values, indices correspond tox_new
-
time
: computation time in seconds
References
Sauer, A., Gramacy, R.B., & Higdon, D. (2023). Active learning for deep
Gaussian process surrogates. *Technometrics, 65,* 4-18. arXiv:2012.08015
Seo, S, M Wallat, T Graepel, and K Obermayer. 2000. Gaussian Process Regression:
Active Data Selection and Test Point Rejection. In Mustererkennung 2000,
2734. New York, NY: SpringerVerlag.
Gramacy, RB and F Sun. (2016). laGP: Large-Scale Spatial Modeling via Local
Approximate Gaussian Processes in R. Journal of Statistical Software
72 (1), 1-46. doi:10.18637/jss.v072.i01
Examples
# Additional examples including real-world computer experiments are available at:
# https://bitbucket.org/gramacylab/deepgp-ex/
# --------------------------------------------------------
# Example 1: toy step function, runs in less than 5 seconds
# --------------------------------------------------------
f <- function(x) {
if (x <= 0.4) return(-1)
if (x >= 0.6) return(1)
if (x > 0.4 & x < 0.6) return(10*(x-0.5))
}
x <- seq(0.05, 0.95, length = 7)
y <- sapply(x, f)
x_new <- seq(0, 1, length = 100)
# Fit model and calculate ALC
fit <- fit_two_layer(x, y, nmcmc = 100, cov = "exp2")
fit <- trim(fit, 50)
fit <- predict(fit, x_new, cores = 1, store_latent = TRUE)
alc <- ALC(fit)
# --------------------------------------------------------
# Example 2: damped sine wave
# --------------------------------------------------------
f <- function(x) {
exp(-10*x) * (cos(10*pi*x - 1) + sin(10*pi*x - 1)) * 5 - 0.2
}
# Training data
x <- seq(0, 1, length = 30)
y <- f(x) + rnorm(30, 0, 0.05)
# Testing data
xx <- seq(0, 1, length = 100)
yy <- f(xx)
plot(xx, yy, type = "l")
points(x, y, col = 2)
# Conduct MCMC (can replace fit_two_layer with fit_one_layer/fit_three_layer)
fit <- fit_two_layer(x, y, D = 1, nmcmc = 2000, cov = "exp2")
plot(fit)
fit <- trim(fit, 1000, 2)
# Option 1 - calculate ALC from MCMC iterations
alc <- ALC(fit, xx)
# Option 2 - calculate ALC after predictions
fit <- predict(fit, xx, cores = 1, store_latent = TRUE)
alc <- ALC(fit)
# Visualize fit
plot(fit)
par(new = TRUE) # overlay ALC
plot(xx, alc$value, type = 'l', lty = 2, axes = FALSE, xlab = '', ylab = '')
# Select next design point
x_new <- xx[which.max(alc$value)]
Integrated Mean-Squared (prediction) Error for Sequential Design
Description
Acts on a gp
, dgp2
, or dgp3
object.
Current version requires squared exponential covariance
(cov = "exp2"
). Calculates IMSE over the input locations
x_new
. Optionally utilizes SNOW parallelization. User should
select the point with the lowest IMSE to add to the design.
Usage
IMSE(object, x_new, cores)
## S3 method for class 'gp'
IMSE(object, x_new = NULL, cores = 1)
## S3 method for class 'dgp2'
IMSE(object, x_new = NULL, cores = 1)
## S3 method for class 'dgp3'
IMSE(object, x_new = NULL, cores = 1)
Arguments
object |
object of class |
x_new |
matrix of possible input locations, if object has been run
through |
cores |
number of cores to utilize in parallel, by default no parallelization is used |
Details
Not yet implemented for Vecchia-approximated fits or Matern kernels.
All iterations in the object are used in the calculation, so samples
should be burned-in. Thinning the samples using trim
will speed
up computation. This function may be used in two ways:
Option 1: called on an object with only MCMC iterations, in which case
x_new
must be specifiedOption 2: called on an object that has been predicted over, in which case the
x_new
frompredict
is used
In Option 2, it is recommended to set store_latent = TRUE
for
dgp2
and dgp3
objects so latent mappings do not have to
be re-calculated. Through predict
, the user may
specify a mean mapping (mean_map = TRUE
) or a full sample from
the MVN distribution over w_new
(mean_map = FALSE
). When
the object has not yet been predicted over (Option 1), the mean mapping
is used.
SNOW parallelization reduces computation time but requires more memory storage.
Value
list with elements:
-
value
: vector of IMSE values, indices correspond tox_new
-
time
: computation time in seconds
References
Sauer, A., Gramacy, R.B., & Higdon, D. (2023). Active learning for
deep Gaussian process surrogates. *Technometrics, 65,* 4-18. arXiv:2012.08015
Binois, M, J Huang, RB Gramacy, and M Ludkovski. 2019. "Replication or Exploration?
Sequential Design for Stochastic Simulation Experiments." Technometrics
61, 7-23. Taylor & Francis. doi:10.1080/00401706.2018.1469433
Examples
# Additional examples including real-world computer experiments are available at:
# https://bitbucket.org/gramacylab/deepgp-ex/
# --------------------------------------------------------
# Example 1: toy step function, runs in less than 5 seconds
# --------------------------------------------------------
f <- function(x) {
if (x <= 0.4) return(-1)
if (x >= 0.6) return(1)
if (x > 0.4 & x < 0.6) return(10*(x-0.5))
}
x <- seq(0.05, 0.95, length = 7)
y <- sapply(x, f)
x_new <- seq(0, 1, length = 100)
# Fit model and calculate IMSE
fit <- fit_one_layer(x, y, nmcmc = 100, cov = "exp2")
fit <- trim(fit, 50)
fit <- predict(fit, x_new, cores = 1, store_latent = TRUE)
imse <- IMSE(fit)
# --------------------------------------------------------
# Example 2: Higdon function
# --------------------------------------------------------
f <- function(x) {
i <- which(x <= 0.48)
x[i] <- 2 * sin(pi * x[i] * 4) + 0.4 * cos(pi * x[i] * 16)
x[-i] <- 2 * x[-i] - 1
return(x)
}
# Training data
x <- seq(0, 1, length = 30)
y <- f(x) + rnorm(30, 0, 0.05)
# Testing data
xx <- seq(0, 1, length = 100)
yy <- f(xx)
plot(xx, yy, type = "l")
points(x, y, col = 2)
# Conduct MCMC (can replace fit_three_layer with fit_one_layer/fit_two_layer)
fit <- fit_three_layer(x, y, D = 1, nmcmc = 2000, cov = "exp2")
plot(fit)
fit <- trim(fit, 1000, 2)
# Option 1 - calculate IMSE from only MCMC iterations
imse <- IMSE(fit, xx)
# Option 2 - calculate IMSE after predictions
fit <- predict(fit, xx, cores = 1, store_latent = TRUE)
imse <- IMSE(fit)
# Visualize fit
plot(fit)
par(new = TRUE) # overlay IMSE
plot(xx, imse$value, col = 2, type = 'l', lty = 2, axes = FALSE,
xlab = '', ylab = '')
# Select next design point
x_new <- xx[which.min(imse$value)]
Continues MCMC sampling
Description
Acts on a gp
, gpvec
, dgp2
,
dgp2vec
, dgp3
, or dgp3vec
object.
Continues MCMC sampling of hyperparameters and hidden layers using
settings from the original object. Appends new samples to existing
samples. When vecchia = TRUE
, this function provides the option
to update Vecchia ordering/conditioning sets based on latent layer
warpings through the specification of re_approx = TRUE
.
Usage
continue(object, new_mcmc, verb, re_approx, ...)
## S3 method for class 'gp'
continue(object, new_mcmc = 1000, verb = TRUE, ...)
## S3 method for class 'dgp2'
continue(object, new_mcmc = 1000, verb = TRUE, ...)
## S3 method for class 'dgp3'
continue(object, new_mcmc = 1000, verb = TRUE, ...)
## S3 method for class 'gpvec'
continue(object, new_mcmc = 1000, verb = TRUE, re_approx = FALSE, ...)
## S3 method for class 'dgp2vec'
continue(object, new_mcmc = 1000, verb = TRUE, re_approx = FALSE, ...)
## S3 method for class 'dgp3vec'
continue(object, new_mcmc = 1000, verb = TRUE, re_approx = FALSE, ...)
Arguments
object |
object from |
new_mcmc |
number of new MCMC iterations to conduct and append |
verb |
logical indicating whether to print iteration progress |
re_approx |
logical indicating whether to re-randomize the ordering
and update Vecchia nearest-neighbor conditioning sets (only for fits
with |
... |
N/A |
Details
See fit_one_layer
, fit_two_layer
, or
fit_three_layer
for details on MCMC. The resulting
object will have nmcmc
equal to the previous nmcmc
plus
new_mcmc
. It is recommended to start an MCMC fit then
investigate trace plots to assess burn-in. The primary use of this
function is to gather more MCMC iterations in order to obtain burned-in
samples.
Specifying re_approx = TRUE
updates random orderings and
nearest-neighbor conditioning sets (only for vecchia = TRUE
fits). In one-layer, there is no latent warping but the Vecchia
approximation is still re-randomized and nearest-neighbors are adjusted
accordingly. In two- and three-layers, the latest samples of hidden
layers are used to update nearest-neighbors. If you update the
Vecchia approximation, you should later remove previous samples
(updating the approximation effectively starts a new chain). When
re_approx = FALSE
the previous orderings and conditioning sets
are used (maintaining the continuity of the previous chain).
Value
object of the same class with the new iterations appended
Examples
# See ?fit_two_layer for an example
Calculates CRPS
Description
Calculates continuous ranked probability score (lower CRPS indicate better fits, better uncertainty quantification).
Usage
crps(y, mu, s2)
Arguments
y |
response vector |
mu |
predicted mean |
s2 |
predicted point-wise variances |
References
Gneiting, T, and AE Raftery. 2007. Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association 102 (477), 359-378.
Examples
# Additional examples including real-world computer experiments are available at:
# https://bitbucket.org/gramacylab/deepgp-ex/
MCMC sampling for one layer GP
Description
Conducts MCMC sampling of hyperparameters for a one layer
GP. Length scale parameter theta
governs
the strength of the correlation and nugget parameter g
governs noise. In Matern covariance, v
governs smoothness.
Usage
fit_one_layer(
x,
y,
nmcmc = 10000,
sep = FALSE,
verb = TRUE,
g_0 = 0.001,
theta_0 = 0.1,
true_g = NULL,
settings = NULL,
cov = c("matern", "exp2"),
v = 2.5,
vecchia = FALSE,
m = min(25, length(y) - 1),
ordering = NULL
)
Arguments
x |
vector or matrix of input locations |
y |
vector of response values |
nmcmc |
number of MCMC iterations |
sep |
logical indicating whether to use separable ( |
verb |
logical indicating whether to print iteration progress |
g_0 |
initial value for |
theta_0 |
initial value for |
true_g |
if true nugget is known it may be specified here (set to a small value to make fit deterministic). Note - values that are too small may cause numerical issues in matrix inversions. |
settings |
hyperparameters for proposals and priors (see details) |
cov |
covariance kernel, either Matern or squared exponential
( |
v |
Matern smoothness parameter (only used if |
vecchia |
logical indicating whether to use Vecchia approximation |
m |
size of Vecchia conditioning sets (only used if
|
ordering |
optional ordering for Vecchia approximation, must correspond
to rows of |
Details
Utilizes Metropolis Hastings sampling of the length scale and
nugget parameters with proposals and priors controlled by
settings
. When true_g
is set to a specific value, the
nugget is not estimated. When vecchia = TRUE
, all calculations
leverage the Vecchia approximation with specified conditioning set size
m
. Vecchia approximation is only implemented for
cov = "matern"
.
NOTE on OpenMP: The Vecchia implementation relies on OpenMP parallelization for efficient computation. This function will produce a warning message if the package was installed without OpenMP (this is the default for CRAN packages installed on Apple machines). To set up OpenMP parallelization, download the package source code and install using the gcc/g++ compiler.
Proposals for g
and theta
follow a uniform sliding window
scheme, e.g.
g_star <- runif(1, l * g_t / u, u * g_t / l)
,
with defaults l = 1
and u = 2
provided in settings
.
To adjust these, set settings = list(l = new_l, u = new_u)
.
Priors on g
and theta
follow Gamma distributions with
shape parameters (alpha
) and rate parameters (beta
)
controlled within the settings
list object.
Defaults have been updated with package version 1.1.3. Default priors differ
for noisy/deterministic settings and depend on whether monowarp = TRUE
.
All default values are visible in the internal
deepgp:::check_settings
function.
These priors are designed for x
scaled
to [0, 1] and y
scaled to have mean 0 and variance 1. These may
be adjusted using the settings
input.
The output object of class gp
is designed for use with
continue
, trim
, and predict
.
Value
a list of the S3 class gp
or gpvec
with elements:
-
x
: copy of input matrix -
y
: copy of response vector -
nmcmc
: number of MCMC iterations -
settings
: copy of proposal/prior settings -
v
: copy of Matern smoothness parameter (v = 999
indicatescov = "exp2"
) -
g
: vector of MCMC samples forg
-
theta
: vector of MCMC samples fortheta
-
tau2
: vector of MLE estimates fortau2
(scale parameter) -
ll
: vector of MVN log likelihood for each Gibbs iteration -
time
: computation time in seconds
References
Sauer, A. (2023). Deep Gaussian process surrogates for computer experiments.
*Ph.D. Dissertation, Department of Statistics, Virginia Polytechnic Institute and State University.*
Sauer, A., Gramacy, R.B., & Higdon, D. (2023). Active learning for deep
Gaussian process surrogates. *Technometrics, 65,* 4-18. arXiv:2012.08015
Sauer, A., Cooper, A., & Gramacy, R. B. (2023). Vecchia-approximated deep Gaussian
processes for computer experiments.
*Journal of Computational and Graphical Statistics, 32*(3), 824-837. arXiv:2204.02904
Examples
# Additional examples including real-world computer experiments are available at:
# https://bitbucket.org/gramacylab/deepgp-ex/
# Booth function (inspired by the Higdon function)
f <- function(x) {
i <- which(x <= 0.58)
x[i] <- sin(pi * x[i] * 6) + cos(pi * x[i] * 12)
x[-i] <- 5 * x[-i] - 4.9
return(x)
}
# Training data
x <- seq(0, 1, length = 25)
y <- f(x)
# Testing data
xx <- seq(0, 1, length = 200)
yy <- f(xx)
plot(xx, yy, type = "l")
points(x, y, col = 2)
# Example 1: full model (nugget fixed)
fit <- fit_one_layer(x, y, nmcmc = 2000, true_g = 1e-6)
plot(fit)
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit)
# Example 2: full model (nugget estimated, EI calculated)
fit <- fit_one_layer(x, y, nmcmc = 2000)
plot(fit)
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1, EI = TRUE)
plot(fit)
par(new = TRUE) # overlay EI
plot(xx[order(xx)], fit$EI[order(xx)], type = 'l', lty = 2,
axes = FALSE, xlab = '', ylab = '')
# Example 3: Vecchia approximated model (nugget estimated)
fit <- fit_one_layer(x, y, nmcmc = 2000, vecchia = TRUE, m = 10)
plot(fit)
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit)
MCMC sampling for three layer deep GP
Description
Conducts MCMC sampling of hyperparameters, hidden layer
z
, and hidden layer w
for a three layer deep GP.
Separate length scale parameters theta_z
,
theta_w
, and theta_y
govern the correlation
strength of the inner layer, middle layer, and outer layer respectively.
Nugget parameter g
governs noise on the outer layer. In Matern
covariance, v
governs smoothness.
Usage
fit_three_layer(
x,
y,
nmcmc = 10000,
D = ifelse(is.matrix(x), ncol(x), 1),
verb = TRUE,
w_0 = NULL,
z_0 = NULL,
g_0 = 0.001,
theta_y_0 = 0.1,
theta_w_0 = 0.1,
theta_z_0 = 0.1,
true_g = NULL,
settings = NULL,
cov = c("matern", "exp2"),
v = 2.5,
vecchia = FALSE,
m = min(25, length(y) - 1),
ordering = NULL
)
Arguments
x |
vector or matrix of input locations |
y |
vector of response values |
nmcmc |
number of MCMC iterations |
D |
integer designating dimension of hidden layers, defaults to
dimension of |
verb |
logical indicating whether to print iteration progress |
w_0 |
initial value for hidden layer |
z_0 |
initial value for hidden layer |
g_0 |
initial value for |
theta_y_0 |
initial value for |
theta_w_0 |
initial value for |
theta_z_0 |
initial value for |
true_g |
if true nugget is known it may be specified here (set to a small value to make fit deterministic). Note - values that are too small may cause numerical issues in matrix inversions. |
settings |
hyperparameters for proposals and priors (see details) |
cov |
covariance kernel, either Matern or squared exponential
( |
v |
Matern smoothness parameter (only used if |
vecchia |
logical indicating whether to use Vecchia approximation |
m |
size of Vecchia conditioning sets (only used if
|
ordering |
optional ordering for Vecchia approximation, must correspond
to rows of |
Details
pmx = TRUE
option not yet implemented for three-layer DGP.
Maps inputs x
through hidden layer z
then hidden
layer w
to outputs y
. Conducts sampling of the hidden
layers using Elliptical Slice sampling. Utilizes Metropolis Hastings
sampling of the length scale and nugget parameters with proposals and
priors controlled by settings
. When true_g
is set to a
specific value, the nugget is not estimated. When
vecchia = TRUE
, all calculations leverage the Vecchia
approximation with specified conditioning set size m
. Vecchia
approximation is only implemented for cov = "matern"
.
NOTE on OpenMP: The Vecchia implementation relies on OpenMP parallelization for efficient computation. This function will produce a warning message if the package was installed without OpenMP (this is the default for CRAN packages installed on Apple machines). To set up OpenMP parallelization, download the package source code and install using the gcc/g++ compiler.
Proposals for g
,
theta_y
, theta_w
, and theta_z
follow a uniform
sliding window scheme, e.g.
g_star <- runif(1, l * g_t / u, u * g_t / l)
,
with defaults l = 1
and u = 2
provided in settings
.
To adjust these, set settings = list(l = new_l, u = new_u)
.
Priors on g
, theta_y
, theta_w
, and theta_z
follow Gamma distributions with shape parameters (alpha
) and rate
parameters (beta
) controlled within the settings
list
object. Defaults have been updated with package version 1.1.3.
Default priors differ for noisy/deterministic settings and
depend on whether monowarp = TRUE
. All default values are
visible in the internal deepgp:::check_settings
function.
These priors are designed for x
scaled to [0, 1] and y
scaled to have mean 0 and variance 1. These may be adjusted using the
settings
input.
In the current version, the three-layer does not have any equivalent
setting for monowarp = TRUE
or pmx = TRUE
as in
fit_two_layer
.
When w_0 = NULL
and/or z_0 = NULL
, the hidden layers are
initialized at x
(i.e. the identity mapping). The default prior
mean of the inner hidden layer z
is zero, but may be adjusted to x
using settings = list(z_prior_mean = x)
. The prior mean of the
middle hidden layer w
is set at zero is is not user adjustable.
If w_0
and/or z_0
is of dimension nrow(x) - 1
by
D
, the final row is predicted using kriging. This is helpful in
sequential design when adding a new input location and starting the MCMC
at the place where the previous MCMC left off.
The output object of class dgp3
or dgp3vec
is designed for
use with continue
, trim
, and predict
.
Value
a list of the S3 class dgp3
or dgp3vec
with elements:
-
x
: copy of input matrix -
y
: copy of response vector -
nmcmc
: number of MCMC iterations -
settings
: copy of proposal/prior settings -
v
: copy of Matern smoothness parameter (v = 999
indicatescov = "exp2"
) -
g
: vector of MCMC samples forg
-
theta_y
: vector of MCMC samples fortheta_y
(length scale of outer layer) -
theta_w
: matrix of MCMC samples fortheta_w
(length scale of middle layer) -
theta_z
: matrix of MCMC samples fortheta_z
(length scale of inner layer) -
tau2
: vector of MLE estimates fortau2
(scale parameter of outer layer) -
w
: list of MCMC samples for middle hidden layerw
-
z
: list of MCMC samples for inner hidden layerz
-
ll
: vector of MVN log likelihood of the outer layer for reach Gibbs iteration -
time
: computation time in seconds
References
Sauer, A. (2023). Deep Gaussian process surrogates for computer experiments.
*Ph.D. Dissertation, Department of Statistics, Virginia Polytechnic Institute and State University.*
Sauer, A., Gramacy, R.B., & Higdon, D. (2023). Active learning for deep
Gaussian process surrogates. *Technometrics, 65,* 4-18. arXiv:2012.08015
Sauer, A., Cooper, A., & Gramacy, R. B. (2023). Vecchia-approximated deep Gaussian
processes for computer experiments.
*Journal of Computational and Graphical Statistics, 32*(3), 824-837. arXiv:2204.02904
Examples
# Additional examples including real-world computer experiments are available at:
# https://bitbucket.org/gramacylab/deepgp-ex/
# G function in 2 dimensions (https://www.sfu.ca/~ssurjano/gfunc.html)
f <- function(xx, a = (c(1:length(xx)) - 1) / 2) {
new1 <- abs(4 * xx - 2) + a
new2 <- 1 + a
prod <- prod(new1 / new2)
return((prod - 1) / 0.86)
}
# Training data
d <- 2
n <- 30
x <- matrix(runif(n * d), ncol = d)
y <- apply(x, 1, f)
# Testing data
n_test <- 500
xx <- matrix(runif(n_test * d), ncol = d)
yy <- apply(xx, 1, f)
i <- interp::interp(xx[, 1], xx[, 2], yy)
image(i, col = heat.colors(128))
contour(i, add = TRUE)
contour(i, level = -0.5, col = 4, add = TRUE) # potential failure limit
points(x)
# Example 1: full model (nugget estimated, entropy calculated)
fit <- fit_three_layer(x, y, nmcmc = 2000)
plot(fit)
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, entropy_limit = -0.5, cores = 1)
plot(fit)
i <- interp::interp(xx[, 1], xx[, 2], fit$entropy)
image(i, col = heat.colors(128), main = "Entropy")
# Example 2: Vecchia approximated model (nugget fixed)
# (Vecchia approximation is faster for larger data sizes)
fit <- fit_three_layer(x, y, nmcmc = 2000, vecchia = TRUE,
m = 10, true_g = 1e-6)
plot(fit)
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit)
MCMC sampling for two layer deep GP
Description
Conducts MCMC sampling of hyperparameters and hidden layer
w
for a two layer deep GP. Separate length scale
parameters theta_w
and theta_y
govern the correlation
strength of the hidden layer and outer layer respectively. Nugget
parameter g
governs noise on the outer layer. In Matern
covariance, v
governs smoothness.
Usage
fit_two_layer(
x,
y,
nmcmc = 10000,
D = ifelse(is.matrix(x), ncol(x), 1),
monowarp = FALSE,
pmx = FALSE,
verb = TRUE,
w_0 = NULL,
g_0 = 0.001,
theta_y_0 = 0.1,
theta_w_0 = 0.1,
true_g = NULL,
settings = NULL,
cov = c("matern", "exp2"),
v = 2.5,
vecchia = FALSE,
m = min(25, length(y) - 1),
ordering = NULL
)
Arguments
x |
vector or matrix of input locations |
y |
vector of response values |
nmcmc |
number of MCMC iterations |
D |
integer designating dimension of hidden layer, defaults to
dimension of |
monowarp |
indicates whether warpings should be forced to be
monotonic. Input may be a matrix of
grid points (or a vector which will be applied to every dimension)
for interpolation of the cumulative sum, an integer
specifying the number of grid points to use over the range [0, 1],
or simply the boolean |
pmx |
"prior mean x", logical indicating whether |
verb |
logical indicating whether to print iteration progress |
w_0 |
initial value for hidden layer |
g_0 |
initial value for |
theta_y_0 |
initial value for |
theta_w_0 |
initial value for |
true_g |
if true nugget is known it may be specified here (set to a small value to make fit deterministic). Note - values that are too small may cause numerical issues in matrix inversions. |
settings |
hyperparameters for proposals and priors (see details) |
cov |
covariance kernel, either Matern or squared exponential
( |
v |
Matern smoothness parameter (only used if |
vecchia |
logical indicating whether to use Vecchia approximation |
m |
size of Vecchia conditioning sets (only used if
|
ordering |
optional ordering for Vecchia approximation, must correspond
to rows of |
Details
Maps inputs x
through hidden layer w
to outputs
y
. Conducts sampling of the hidden layer using Elliptical
Slice sampling. Utilizes Metropolis Hastings sampling of the length
scale and nugget parameters with proposals and priors controlled by
settings
. When true_g
is set to a specific value, the
nugget is not estimated. When vecchia = TRUE
, all calculations
leverage the Vecchia approximation with specified conditioning set size
m
. Vecchia approximation is only implemented for
cov = "matern"
.
When monowarp = TRUE
, each input dimension is warped separately and
monotonically. This requires D = ncol(x)
. Monotonic warpings trigger
separable lengthscales on the outer layer (theta_y
). As a default, monotonic
warpings use the reference grid: seq(0, 1, length = 50)
. The grid size
may be controlled by passing a numeric integer to monowarp
(i.e., monowarp = 100
uses the grid seq(0, 1, length = 100)
).
Alternatively, any user-specified grid may be passed as the argument to
monowarp
.
When pmx = TRUE
, the prior on the latent layer is set at x
(rather than the default of zero). This requires D = ncol(x)
. If
pmx
is set to a numeric value, then that value is used as the scale
parameter on the latent layer. Specifying a small value here encourages
an identity mapping.
NOTE on OpenMP: The Vecchia implementation relies on OpenMP parallelization for efficient computation. This function will produce a warning message if the package was installed without OpenMP (this is the default for CRAN packages installed on Apple machines). To set up OpenMP parallelization, download the package source code and install using the gcc/g++ compiler.
Proposals for g
, theta_y
, and
theta_w
follow a uniform sliding window scheme, e.g.
g_star <- runif(1, l * g_t / u, u * g_t / l)
,
with defaults l = 1
and u = 2
provided in settings
.
To adjust these, set settings = list(l = new_l, u = new_u)
.
Priors on g
, theta_y
, and theta_w
follow Gamma
distributions with shape parameters (alpha
) and rate parameters
(beta
) controlled within the settings
list object.
Defaults have been updated with package version 1.1.3. Default priors differ
for noisy/deterministic settings and depend on whether monowarp = TRUE
.
All default values are visible in the internal
deepgp:::check_settings
function.
These priors are designed for x
scaled to
[0, 1] and y
scaled to have mean 0 and variance 1. These may be
adjusted using the settings
input.
When w_0 = NULL
, the hidden layer is initialized at x
(i.e. the identity mapping). If w_0
is of dimension
nrow(x) - 1
by D
, the final row is predicted using kriging.
This is helpful in sequential design when adding a new input location
and starting the MCMC at the place where the previous MCMC left off.
The output object of class dgp2
or dgp2vec
is designed for
use with continue
, trim
, and predict
.
Value
a list of the S3 class dgp2
or dgp2vec
with elements:
-
x
: copy of input matrix -
y
: copy of response vector -
nmcmc
: number of MCMC iterations -
settings
: copy of proposal/prior settings -
v
: copy of Matern smoothness parameter (v = 999
indicatescov = "exp2"
) -
g
: vector of MCMC samples forg
-
theta_y
: vector of MCMC samples fortheta_y
(length scale of outer layer) -
theta_w
: matrix of MCMC samples fortheta_w
(length scale of inner layer) -
tau2
: vector of MLE estimates fortau2
(scale parameter of outer layer) -
w
: list of MCMC samples for hidden layerw
-
ll
: vector of MVN log likelihood of the outer layer for reach Gibbs iteration -
time
: computation time in seconds
References
Sauer, A. (2023). Deep Gaussian process surrogates for computer experiments.
*Ph.D. Dissertation, Department of Statistics, Virginia Polytechnic Institute and State University.*
Sauer, A., Gramacy, R.B., & Higdon, D. (2023). Active learning for deep
Gaussian process surrogates. *Technometrics, 65,* 4-18. arXiv:2012.08015
Sauer, A., Cooper, A., & Gramacy, R. B. (2023). Vecchia-approximated deep Gaussian
processes for computer experiments.
*Journal of Computational and Graphical Statistics, 32*(3), 824-837. arXiv:2204.02904
Barnett, S., Beesley, L. J., Booth, A. S., Gramacy, R. B., & Osthus D. (2024). Monotonic
warpings for additive and deep Gaussian processes. *In Review.* arXiv:2408.01540
Examples
# Additional examples including real-world computer experiments are available at:
# https://bitbucket.org/gramacylab/deepgp-ex/
# Booth function (inspired by the Higdon function)
f <- function(x) {
i <- which(x <= 0.58)
x[i] <- sin(pi * x[i] * 6) + cos(pi * x[i] * 12)
x[-i] <- 5 * x[-i] - 4.9
return(x)
}
# Training data
x <- seq(0, 1, length = 25)
y <- f(x)
# Testing data
xx <- seq(0, 1, length = 200)
yy <- f(xx)
plot(xx, yy, type = "l")
points(x, y, col = 2)
# Example 1: full model (nugget estimated, using continue)
fit <- fit_two_layer(x, y, nmcmc = 1000)
plot(fit)
fit <- continue(fit, 1000)
plot(fit, hidden = TRUE) # trace plots and ESS samples
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit)
# Example 2: Vecchia approximated model (nugget estimated)
# (Vecchia approximation is faster for larger data sizes)
fit <- fit_two_layer(x, y, nmcmc = 2000, vecchia = TRUE, m = 10)
plot(fit, hidden = TRUE) # trace plots and ESS samples
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit)
# Example 3: Vecchia approximated model, re-approximated after burn-in
fit <- fit_two_layer(x, y, nmcmc = 1000, vecchia = TRUE, m = 10)
fit <- continue(fit, 1000, re_approx = TRUE)
plot(fit, hidden = TRUE) # trace plots and ESS samples
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit)
# Example 4: full model with monotonic warpings (nugget estimated)
fit <- fit_two_layer(x, y, nmcmc = 2000, monowarp = TRUE)
plot(fit, hidden = TRUE) # trace plots and ESS samples
fit <- trim(fit, 1000, 2)
fit <- predict(fit, xx, cores = 1)
plot(fit)
Plots object from deepgp
package
Description
Acts on a gp
, gpvec
, dgp2
, dgp2vec
,
dgp3
, or dgp3vec
object.
Generates trace plots for outer log likelihood, length scale,
and nugget hyperparameters.
Generates plots of hidden layers for one-dimensional inputs or monotonic
warpings. Generates
plots of the posterior mean and estimated 90% prediction intervals for
one-dimensional inputs; generates heat maps of the posterior mean and
point-wise variance for two-dimensional inputs.
Usage
## S3 method for class 'gp'
plot(x, trace = NULL, predict = NULL, ...)
## S3 method for class 'gpvec'
plot(x, trace = NULL, predict = NULL, ...)
## S3 method for class 'dgp2'
plot(x, trace = NULL, hidden = NULL, predict = NULL, ...)
## S3 method for class 'dgp2vec'
plot(x, trace = NULL, hidden = NULL, predict = NULL, ...)
## S3 method for class 'dgp3'
plot(x, trace = NULL, hidden = NULL, predict = NULL, ...)
## S3 method for class 'dgp3vec'
plot(x, trace = NULL, hidden = NULL, predict = NULL, ...)
Arguments
x |
object of class |
trace |
logical indicating whether to generate trace plots (default is
TRUE if the object has not been through |
predict |
logical indicating whether to generate posterior predictive
plot (default is TRUE if the object has been through |
... |
N/A |
logical indicating whether to generate plots of hidden layers (two or three layer only, default is FALSE) |
Details
Trace plots are useful in assessing burn-in. If there are too
many hyperparameters to plot them all, then it is most useful to
visualize the log likelihood (e.g., plot(fit$ll, type = "l")
).
Hidden layer plots are colored on a gradient - red lines represent earlier iterations and yellow lines represent later iterations - to help assess burn-in of the hidden layers. Only every 100th sample is plotted.
Examples
# See ?fit_one_layer, ?fit_two_layer, or ?fit_three_layer
# for examples
Predict posterior mean and variance/covariance
Description
Acts on a gp
, dgp2
, or dgp3
object.
Calculates posterior mean and variance/covariance over specified input
locations. Optionally calculates expected improvement (EI) or entropy
over candidate inputs. Optionally utilizes SNOW parallelization.
Usage
## S3 method for class 'gp'
predict(
object,
x_new,
lite = TRUE,
return_all = FALSE,
EI = FALSE,
entropy_limit = NULL,
cores = 1,
...
)
## S3 method for class 'dgp2'
predict(
object,
x_new,
lite = TRUE,
store_latent = FALSE,
mean_map = TRUE,
return_all = FALSE,
EI = FALSE,
entropy_limit = NULL,
cores = 1,
...
)
## S3 method for class 'dgp3'
predict(
object,
x_new,
lite = TRUE,
store_latent = FALSE,
mean_map = TRUE,
return_all = FALSE,
EI = FALSE,
entropy_limit = NULL,
cores = 1,
...
)
## S3 method for class 'gpvec'
predict(
object,
x_new,
m = object$m,
ordering_new = NULL,
lite = TRUE,
return_all = FALSE,
EI = FALSE,
entropy_limit = NULL,
cores = 1,
...
)
## S3 method for class 'dgp2vec'
predict(
object,
x_new,
m = object$m,
ordering_new = NULL,
lite = TRUE,
store_latent = FALSE,
mean_map = TRUE,
return_all = FALSE,
EI = FALSE,
entropy_limit = NULL,
cores = 1,
...
)
## S3 method for class 'dgp3vec'
predict(
object,
x_new,
m = object$m,
ordering_new = NULL,
lite = TRUE,
store_latent = FALSE,
mean_map = TRUE,
return_all = FALSE,
EI = FALSE,
entropy_limit = NULL,
cores = 1,
...
)
Arguments
object |
object from |
x_new |
matrix of predictive input locations |
lite |
logical indicating whether to calculate only point-wise
variances ( |
return_all |
logical indicating whether to return mean and point-wise
variance prediction for ALL samples (only available for |
EI |
logical indicating whether to calculate expected improvement (for minimizing the response) |
entropy_limit |
optional limit state for entropy calculations (separating
passes and failures), default value of |
cores |
number of cores to utilize in parallel |
... |
N/A |
store_latent |
logical indicating whether to store and return mapped values of latent layers (two or three layer models only) |
mean_map |
logical indicating whether to map hidden layers using
conditional mean ( |
m |
size of Vecchia conditioning sets (only for fits with
|
ordering_new |
optional ordering for Vecchia approximation, must correspond
to rows of |
Details
All iterations in the object are used for prediction, so samples
should be burned-in. Thinning the samples using trim
will speed
up computation. Posterior moments are calculated using conditional
expectation and variance. As a default, only point-wise variance is
calculated. Full covariance may be calculated using lite = FALSE
.
Expected improvement is calculated with the goal of minimizing the response. See Chapter 7 of Gramacy (2020) for details. Entropy is calculated based on two classes separated by the specified limit. See Sauer (2023, Chapter 3) for details.
SNOW parallelization reduces computation time but requires more memory storage.
Value
object of the same class with the following additional elements:
-
x_new
: copy of predictive input locations -
mean
: predicted posterior mean, indices correspond tox_new
locations -
s2
: predicted point-wise variances, indices correspond tox_new
locations (only returned whenlite = TRUE
) -
mean_all
: predicted posterior mean for each sample (column indices), only returned whenreturn_all = TRUE
-
s2_all
predicted point-wise variances for each sample (column indices), only returned whenreturn-all = TRUE
-
Sigma
: predicted posterior covariance, indices correspond tox_new
locations (only returned whenlite = FALSE
) -
EI
: vector of expected improvement values, indices correspond tox_new
locations (only returned whenEI = TRUE
) -
entropy
: vector of entropy values, indices correspond tox_new
locations (only returned whenentropy_limit
is numeric) -
w_new
: list of hidden layer mappings (only returned whenstore_latent = TRUE
), list index corresponds to iteration and row index corresponds tox_new
location (two or three layer models only) -
z_new
: list of hidden layer mappings (only returned whenstore_latent = TRUE
), list index corresponds to iteration and row index corresponds tox_new
location (three layer models only)
Computation time is added to the computation time of the existing object.
References
Sauer, A. (2023). Deep Gaussian process surrogates for computer experiments.
*Ph.D. Dissertation, Department of Statistics, Virginia Polytechnic Institute and State University.*
Sauer, A., Gramacy, R.B., & Higdon, D. (2023). Active learning for deep
Gaussian process surrogates. *Technometrics, 65,* 4-18. arXiv:2012.08015
Sauer, A., Cooper, A., & Gramacy, R. B. (2023). Vecchia-approximated deep Gaussian
processes for computer experiments.
*Journal of Computational and Graphical Statistics, 32*(3), 824-837. arXiv:2204.02904
Barnett, S., Beesley, L. J., Booth, A. S., Gramacy, R. B., & Osthus D. (2024). Monotonic
warpings for additive and deep Gaussian processes. *In Review.* arXiv:2408.01540
Examples
# See ?fit_one_layer, ?fit_two_layer, or ?fit_three_layer
# for examples
Calculates RMSE
Description
Calculates root mean square error (lower RMSE indicate better fits).
Usage
rmse(y, mu)
Arguments
y |
response vector |
mu |
predicted mean |
Examples
# Additional examples including real-world computer experiments are available at:
# https://bitbucket.org/gramacylab/deepgp-ex/
Calculates score
Description
Calculates score, proportional to the multivariate normal log
likelihood. Higher scores indicate better fits. Only
applicable to noisy data. Requires full covariance matrix
(e.g. predict
with lite = FALSE
).
Usage
score(y, mu, sigma)
Arguments
y |
response vector |
mu |
predicted mean |
sigma |
predicted covariance |
References
Gneiting, T, and AE Raftery. 2007. Strictly Proper Scoring Rules, Prediction, and Estimation. Journal of the American Statistical Association 102 (477), 359-378.
Examples
# Additional examples including real-world computer experiments are available at:
# https://bitbucket.org/gramacylab/deepgp-ex/
Calculates squared pairwise distances
Description
Calculates squared pairwise euclidean distances using C.
Usage
sq_dist(X1, X2 = NULL)
Arguments
X1 |
matrix of input locations |
X2 |
matrix of second input locations (if |
Details
C code derived from the "laGP" package (Robert B Gramacy and Furong Sun).
Value
symmetric matrix of squared euclidean distances
References
Gramacy, RB and F Sun. (2016). laGP: Large-Scale Spatial Modeling via Local Approximate Gaussian Processes in R. Journal of Statistical Software 72 (1), 1-46. doi:10.18637/jss.v072.i01
Examples
x <- seq(0, 1, length = 10)
d2 <- sq_dist(x)
Trim/Thin MCMC iterations
Description
Acts on a gp
, gpvec
, dgp2
, dgp2vec
,
dgp3vec
, or dgp3
object.
Removes the specified number of MCMC iterations (starting at the first
iteration). After these samples are removed, the remaining samples are
optionally thinned.
Usage
trim(object, burn, thin)
## S3 method for class 'gp'
trim(object, burn, thin = 1)
## S3 method for class 'gpvec'
trim(object, burn, thin = 1)
## S3 method for class 'dgp2'
trim(object, burn, thin = 1)
## S3 method for class 'dgp2vec'
trim(object, burn, thin = 1)
## S3 method for class 'dgp3'
trim(object, burn, thin = 1)
## S3 method for class 'dgp3vec'
trim(object, burn, thin = 1)
Arguments
object |
object from |
burn |
integer specifying number of iterations to cut off as burn-in |
thin |
integer specifying amount of thinning ( |
Details
The resulting object will have nmcmc
equal to the previous
nmcmc
minus burn
divided by thin
. It is
recommended to start an MCMC fit then investigate trace plots to assess
burn-in. Once burn-in has been achieved, use this function to remove
the starting iterations. Thinning reduces the size of the resulting
object while accounting for the high correlation between consecutive
iterations.
Value
object of the same class with the selected iterations removed
Examples
# See ?fit_one_layer, ?fit_two_layer, or ?fit_three_layer
# for examples