Type: | Package |
Title: | Computing MSPE Estimates in Small Area Estimation |
Version: | 1.4 |
Date: | 2024-11-25 |
Maintainer: | Peiwen Xiao <2569613200@qq.com> |
Description: | Compute various common mean squared predictive error (MSPE) estimators, as well as several existing variance component predictors as a byproduct, for FH model (Fay and Herriot, 1979) and NER model (Battese et al., 1988) in small area estimation. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Depends: | R (≥ 3.5.0), Matrix, smallarea |
Imports: | Rcpp (≥ 1.0.7) |
LinkingTo: | Rcpp, RcppArmadillo |
Suggests: | testthat |
RoxygenNote: | 7.3.2 |
Encoding: | UTF-8 |
LazyData: | true |
NeedsCompilation: | yes |
Author: | Peiwen Xiao [aut, cre], Xiaohui Liu [aut], Yu Zhang [aut], Yuzi Liu [aut], Jiming Jiang [ths] |
Packaged: | 2024-11-25 07:34:19 UTC; 25696 |
Repository: | CRAN |
Date/Publication: | 2024-11-25 07:50:02 UTC |
Compute MSPE Estimates for the Fay Herriot Model and Nested Error Regression Model
Description
We describe a new R package entitled 'saeMSPE' for the well-known Fay Herriot model and nested error regression model in small area estimation. Based on this package, it is possible to easily compute various common mean squared predictive error (MSPE) estimators, as well as several existing variance component predictors as a byproduct, for these two models.
Details
Package: | saeMSPE |
Type: | Package |
Version: | 1.2 |
Date: | 2022-10-19 |
License: | GPL (>=2) |
Depends: | Matrix, smallarea |
Author(s)
Peiwen Xiao, Xiaohui Liu, Yu Zhang, Yuzi Liu, Jiming Jiang
Maintainer: Peiwen Xiao <2569613200@qq.com>
References
- V. Arora and P. Lahiri. On the superiority of the bayesian method over the blup in small area estimation problems. Statistica Sinica, 7(4):1053-1063, 1997.
- G. E. Battese, R. M. Harter, andW. A. Fuller. An error components model for prediction of county crop areas using survey and satellite data. Journal of the American Statistical Association, 83(401):28-36,1988.
- F. B. Butar and P. Lahiri. On measures of uncertainty of empirical bayes small area estimators. Journal of Statistical Planning and Inference, 112(1-2):63-76, 2003.
- G. S. Datta and P. Lahiri. A unified measure of uncertainty of estimated best linear unbiased predictors in small area estimation problems. Statistica Sinica, 10(2):613-627, 2000.
- G. S. Datta, J. N. K. Rao, and D. D. Smith. On measuring the variability of small area estimators under a basic area level model. Biometrika, 92(1):183-196, 2005.
- D. Eddelbuettel and C. Sanderson. Rcpparmadillo: Accelerating r with high-performance c++ linear algebra. Computational Statistics and Data Analysis, 71(March):1054-1063, 2014.
- D. Eddelbuettel, R. Francois, J. J. Allaire, K. Ushey, and Q. Kou. Rcpp: Seamless r and c++ integration. Journal of Statistical Software, 40(i08), 2011.
- R. E. Fay and R. A. Herriot. Estimates of income for small places: An application of james-stein procedures to census data. Journal of the American Statistical Association, 74(366a):269-277, 1979.
- W. González, M. J. Lombardía, I. Molina, D. Morales, and L. Santamaría. Bootstrap mean squared error of a small-area eblup. Journal of Statistical Computation and Simulation, 78(5):443-462, 2008.
- P. Hall and T. Maiti. On parametric bootstrap methods for small area prediction. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2006a.
- P. Hall and T. Maiti. Nonparametric estimation of mean squared prediction error in nested error regression models. Annals of Statistics, 34(4):1733-1750, 2006b.
- C. R. Henderson. Best linear unbiased estimation and prediction under a selection model. Biometrics, 31(2):423-447, 1975.
- J. Jiang. Asymptotic Analysis of Mixed Effects Models: Theory, Applications, and Open Problems. 09 2017. ISBN 978-1-3151-1928-1. doi: 10.1201/9781315119281.
- J. Jiang and T. Nguyen. Linear and Generalized Linear Mixed Models and Their Applications. 01 2021. ISBN 978-1-0716-1281-1. doi: 10.1007/978-1-0716-1282-8.
- J. Jiang and M. Torabi. Sumca: simple; unified; monte carlo assisted approach to second order unbiased mean squared prediction error estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(2):467-485, 2020
- J. Jiang and L. S. M.Wan. A unified jackknife theory for empirical best prediction with m estimation. Annals of Statistics, 30(6):1782-1810, 2002.
- J. Jiang, T. Nguyen, and J. S. Rao. Best predictive small area estimation. Journal of the American Statistical Association, 106(494):732-745, 2011.
- J. Jiang, P. Lahiri, and T. Nguyen. A unified monte carlo jackknife for small area estimation after model selection. Annals of Mathematiacal Sciences and Applications, 3(2):405-438, 2018.
- R. N. Kackar and D. A. Harville. Approximations for standard errors of estimators of fixed and random effects in mixed linear models. Journal of the American Statistical Association, 1984.
- X. Liu, H. Ma, and J. Jiang. That prasad-rao is robust: Estimation of mean squared prediction error of observed best predictor under potential model misspecification. Statistica Sinica, 2020.
- N. G. N. Prasad and J. N. K. Rao. The estimation of the mean squared error of small area estimators. Journal of the American Statistical Association, 85(409):163-171, 1990.
- M. H. Quenouille. Approximate tests of correlation in time series. Journal of the Royal Statistical Society. Series B (Methodological), 11(1):68-84, 1949.
- J. N. K. Rao and I. Molina. Small area estimation. Wiley, 2015.
- J. W. Tukey. Bias and confidence in not quite large samples. Annals of Mathematical Statistics, 29(2):614, 1958.
- Y. You and B. Chapman. Small area estimation using area level models and estimated sampling variances. Survey Methodology, 32(1):97-103, 2006.
Compute MSPE through double bootstrap method for Fay Herriot model
Description
This function returns MSPE estimate with double bootstrap appoximation method for Fay Herriot model.
Usage
mspeFHdb(formula, data, D, K = 50, C = 50, method = 1, na_rm, na_omit)
Arguments
formula |
(formula). Stands for the model formula that specifies the auxiliary variables to be used in the regression model. This should follow the R model formula syntax. |
data |
(data frame). It represents the data containing the response values and auxiliary variables for the Nested Error Regression Model. |
D |
(vector). It represents the knowing sampling variance for Fay Herriot model. |
K |
(integer). It represents the first bootstrap sample number. Default value is 50. |
C |
(integer). It represents the second bootstrap sample number. Default value is 50. |
method |
It represents the variance component estimation method. See "Details". |
na_rm |
A logical value indicating whether to remove missing values (NaN) from the input matrices and vectors.
If |
na_omit |
A logical value indicating whether to stop the execution if missing values (NaN) are present in the input data.
If |
Details
This method was proposed by P. Hall and T. Maiti. Double bootstrap method uses boostrap tool twice for Fay Herriot model to avoid the unattractivitive bias correction: one is to estimate the estimator bias, the other is to correct for bias.
Default value for method
is 1, method = 1
represents the MOM method, method = 2
and method = 3
represents ML and REML method, respectively.
Value
A list with components:
MSPE |
(vector) MSPE estimate based on double bootstrap method. |
bhat |
(vector) estimate of the unknown regression coefficients. |
Ahat |
(numeric) estimate of the variance component. |
Author(s)
Peiwen Xiao, Xiaohui Liu, Yu Zhang, Yuzi Liu, Jiming Jiang
References
P. Hall and T. Maiti. On parametric bootstrap methods for small area prediction. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2006.
Examples
X <- matrix(runif(10 * 3), 10, 3)
X[,1] <- rep(1, 10)
D <- (1:10) / 10 + 0.5
Y <- X %*% c(0.5, 1, 1.5) + rnorm(10, 0, sqrt(2)) + rnorm(10, 0, sqrt(D))
data <- data.frame(Y = Y, X1 = X[,2], X2 = X[,3])
formula <- Y ~ X1 + X2
result <- mspeFHdb(formula, data, D, K = 10, C = 10, method = 1)
Compute MSPE through Jackknife-based MSPE estimation method for Fay Herriot model
Description
This function returns MSPE estimator with jackknife method for Fay Herriot model.
Usage
mspeFHjack(formula, data, D, method = 1, na_rm, na_omit)
Arguments
formula |
(formula). Stands for the model formula that specifies the auxiliary variables to be used in the regression model. This should follow the R model formula syntax. |
data |
(data frame). It represents the data containing the response values and auxiliary variables for the Nested Error Regression Model. |
D |
(vector). Stands for the known sampling variances of each small area levels. |
method |
The variance component estimation method to be used. See "Details". |
na_rm |
A logical value indicating whether to remove missing values (NaN) from the input matrices and vectors.
If |
na_omit |
A logical value indicating whether to stop the execution if missing values (NaN) are present in the input data.
If |
Details
This bias-corrected jackknife MSPE estimator was proposed by J. Jiang and L. S. M. Wan, it covers a fairly general class of mixed models which includes gLMM, mixed logistic model and some of the widely used mixed linear models as special cases.
Default value for method
is 1, method = 1
represents the MOM method, method = 2
and method = 3
represents ML and REML method, respectively.
Value
This function returns a list with components:
MSPE |
(vector) MSPE estimates for Fay Herriot model. |
bhat |
(vector) Estimates of the unknown regression coefficients. |
Ahat |
(numeric) Estimates of the variance component. |
Author(s)
Peiwen Xiao, Xiaohui Liu, Yu Zhang, Yuzi Liu, Jiming Jiang
References
M. H. Quenouille. Approximate tests of correlation in time series. Journal of the Royal Statistical Society. Series B (Methodological), 11(1):68-84, 1949.
J. W. Tukey. Bias and confidence in not quite large samples. Annals of Mathematical Statistics, 29(2):614, 1958.
J. Jiang and L. S. M. Wan. A unified jackknife theory for empirical best prediction with m estimation. Annals of Statistics, 30(6):1782-1810, 2002.
Examples
X <- matrix(runif(10 * 3), 10, 3)
X[,1] <- rep(1, 10)
D <- (1:10) / 10 + 0.5
Y <- X %*% c(0.5, 1, 1.5) + rnorm(10, 0, sqrt(2)) + rnorm(10, 0, sqrt(D))
data <- data.frame(Y = Y, X1 = X[,2], X2 = X[,3])
formula <- Y ~ X1 + X2
result <- mspeFHjack(formula, data, D, method = 1)
Compute MSPE through linearization method for Fay Herriot model
Description
This function returns MSPE estimator with linearization method for Fay Herriot model. These include the seminal Prasad-Rao method and its generalizations by Datta-Lahiri, Datta-Rao-Smith and Liu et.al. All these methods are developed for general linear mixed effects models
Usage
mspeFHlin(formula, data, D, method = "PR", var.method = "default", na_rm, na_omit)
mspeFHPR(formula, data, D, var.method = "default", na_rm, na_omit)
mspeFHDL(formula, data, D, var.method = "default", na_rm, na_omit)
mspeFHDRS(formula, data, D, var.method = "default", na_rm, na_omit)
mspeFHMPR(formula, data, D, var.method = "default", na_rm, na_omit)
Arguments
formula |
(formula). Stands for the model formula that specifies the auxiliary variables to be used in the regression model. This should follow the R model formula syntax. |
data |
(data frame). It represents the data containing the response values and auxiliary variables for the Nested Error Regression Model. |
D |
(vector). Stands for the known sampling variances of each small area levels. |
method |
The MSPE estimation method to be used. See "Details". |
var.method |
The variance component estimation method to be used. See "Details". |
na_rm |
A logical value indicating whether to remove missing values (NaN) from the input matrices and vectors.
If |
na_omit |
A logical value indicating whether to stop the execution if missing values (NaN) are present in the input data.
If |
Details
Default method
for mspeFHlin
is "PR",proposed by N. G. N. Prasad and J. N. K. Rao, Prasad-Rao (PR) method uses Taylor series expansion to obtain a second-order approximation to the MSPE. Function mspeFHlin
also provide the following methods:
Method "DL" proposed by Datta and Lahiri, It advanced PR method to cover the cases when the variance components are estimated by ML and REML estimator. Set method = "DL"
.
Method "DRS" proposed by Datta and Smith, It focus on the second order unbiasedness appoximation when the variance component is replaced by Empirical Bayes estimator. Set method = "DRS"
.
Method "MPR" is a modified version of "PR", It was proposed by Liu et al. It is a robust method that broaden the mean function from the linear form. Set method = "MPR"
.
Default var.method
and available variance component estimation method for each method is list as follows:
For method = "PR"
, var.method = "MOM"
is the only available variance component estimation method,
For method = "DL"
, var.method = "ML"
or var.method = "REML"
is available,
For method = "DRS"
, var.method = "EB"
is the only available variance component estimation method,
For method = "MPR"
, var.method = "OBP"
is the only available variance component estimation method.
Value
This function returns a list with components:
MSPE |
(vector) MSPE estimates for Fay Herriot model. |
bhat |
(vector) Estimates of the unknown regression coefficients. |
Ahat |
(numeric) Estimates of the variance component. |
Author(s)
Peiwen Xiao, Xiaohui Liu, Yu Zhang, Yuzi Liu, Jiming Jiang
References
N. G. N. Prasad and J. N. K. Rao. The estimation of the mean squared error of small-area estimators. Journal of the American Statistical Association, 85(409):163-171, 1990.
G. S. Datta and P. Lahiri. A unified measure of uncertainty of estimated best linear unbiased predictors in small area estimation problems. Statistica Sinica, 10(2):613-627, 2000.
G. S. Datta and R. D. D. Smith. On measuring the variability of small area estimators under a basic area level model. Biometrika, 92(1):183-196, 2005.
X. Liu, H. Ma, and J. Jiang. That prasad-rao is robust: Estimation of mean squared prediction error of observed best predictor under potential model misspecification. Statistica Sinica, 2020.
Examples
X = matrix(runif(10 * 3), 10, 3)
X[,1] = rep(1, 10)
D = (1:10) / 10 + 0.5
Y = X %*% c(0.5,1,1.5) + rnorm(10, 0, sqrt(2)) + rnorm(10, 0, sqrt(D))
data <- data.frame(Y = Y, X1 = X[,2], X2 = X[,3])
formula <- Y ~ X1 + X2
result <- mspeFHlin(formula, data, D, method = "PR", var.method = "default")
Compute MSPE through parameter bootstrap method for Fay Herriot model
Description
This function returns MSPE estimator with parameter bootstrap method for Fay Herriot model.
Usage
mspeFHpb(formula, data, D, K = 50, method = 4, na_rm, na_omit)
Arguments
formula |
(formula). Stands for the model formula that specifies the auxiliary variables to be used in the regression model. This should follow the R model formula syntax. |
data |
(data frame). It represents the data containing the response values and auxiliary variables for the Nested Error Regression Model. |
D |
(vector). It represents the knowing sampling variance for Fay Herriot model. |
K |
(integer). It represents the bootstrap sample number. Default value is 50. |
method |
The variance component estimation method to be used. See "Details". |
na_rm |
A logical value indicating whether to remove missing values (NaN) from the input matrices and vectors.
If |
na_omit |
A logical value indicating whether to stop the execution if missing values (NaN) are present in the input data.
If |
Details
This method was proposed by Peter Hall and T. Maiti. Parametric bootstrap (pb) method uses bootstrap-based method to measure the accuracy of the EB estimator. In this case, only EB estimator is available (method = 4
).
Value
This function returns a list with components:
MSPE |
(vector) MSPE estimates for Fay Herriot model. |
bhat |
(vector) Estimates of the unknown regression coefficients. |
Ahat |
(numeric) Estimates of the variance component. |
Author(s)
Peiwen Xiao, Xiaohui Liu, Yu Zhang, Yuzi Liu, Jiming Jiang
References
F. B. Butar and P. Lahiri. On measures of uncertainty of empirical bayes small area estimators. Journal of Statistical Planning and Inference, 112(1-2):63-76, 2003.
N. G. N. Prasad and J. N. K. Rao. The estimation of the mean squared error of small-area estimators. Journal of the American Statistical Association, 85(409):163-171, 1990.
Peter Hall and T. Maiti. On parametric bootstrap methods for small area prediction. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2006a.
H. T. Maiti and T. Maiti. Nonparametric estimation of mean squared prediction error in nested error regression models. Annals of Statistics, 34(4):1733-1750, 2006b.
Examples
X <- matrix(runif(10 * 3), 10, 3)
X[,1] <- rep(1, 10)
D <- (1:10) / 10 + 0.5
Y <- X %*% c(0.5, 1, 1.5) + rnorm(10, 0, sqrt(2)) + rnorm(10, 0, sqrt(D))
data <- data.frame(Y = Y, X1 = X[,2], X2 = X[,3])
formula <- Y ~ X1 + X2
result <- mspeFHpb(formula, data, D, K = 50, method = 4)
Compute MSPE through Sumca method for Fay Herriot model
Description
This function returns MSPE estimator with the combination of linearization and resampling appoximation method called "Sumca", for Fay Herriot model.
Usage
mspeFHsumca(formula, data, D, K = 50, method = 1, na_rm, na_omit)
Arguments
formula |
(formula). Stands for the model formula that specifies the auxiliary variables to be used in the regression model. This should follow the R model formula syntax. |
data |
(data frame). It represents the data containing the response values and auxiliary variables for the Nested Error Regression Model. |
D |
(vector). It represents the knowing sampling variance for Fay Herriot model. |
K |
(integer). It represents the Monte-Carlo sample size for "Sumca". Default value is 50. |
method |
It represents the variance component estimation method. See "Details". |
na_rm |
A logical value indicating whether to remove missing values (NaN) from the input matrices and vectors.
If |
na_omit |
A logical value indicating whether to stop the execution if missing values (NaN) are present in the input data.
If |
Details
This method was proposed by J. Jiang, P. Lahiri, and T. Nguyen, sumca method combines the advantages of linearization and resampling methods and obtains unified, positive, low-computation burden and second-order unbiased MSPE estimators.
Default value for method
is 1, method = 1
represents the MOM method, method = 2
and method = 3
represents ML and REML method, respectively.
Value
This function returns a list with components:
MSPE |
(vector) MSPE estimates for Fay Herriot model. |
bhat |
(vector) Estimates of the unknown regression coefficients. |
Ahat |
(numeric) Estimates of the variance component. |
Author(s)
Peiwen Xiao, Xiaohui Liu, Yu Zhang, Yuzi Liu, Jiming Jiang
References
J. Jiang and M. Torabi. Sumca: simple; unified; monte carlo assisted approach to second order unbiased mean squared prediction error estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(2):467-485, 2020.
Examples
X <- matrix(runif(10 * 3), 10, 3)
X[,1] <- rep(1, 10)
D <- (1:10) / 10 + 0.5
Y <- X %*% c(0.5, 1, 1.5) + rnorm(10, 0, sqrt(2)) + rnorm(10, 0, sqrt(D))
data <- data.frame(Y = Y, X1 = X[,2], X2 = X[,3])
formula <- Y ~ X1 + X2
result <- mspeFHsumca(formula, data, D, K = 50, method = 3)
Compute MSPE through double bootstrap(DB) method for Nested error regression model
Description
This function returns MSPE estimator with double bootstrap method for Nested error regression model.
Usage
mspeNERdb(ni, formula, data, Xmean, K = 50, C = 50, method = 1, na_rm, na_omit)
Arguments
ni |
(vector). It represents the sample number for every small area. |
formula |
(formula). Stands for the model formula that specifies the auxiliary variables to be used in the regression model. This should follow the R model formula syntax. |
data |
(data frame). It represents the data containing the response values and auxiliary variables for the Nested Error Regression Model. |
Xmean |
(matrix). Stands for the population mean of auxiliary values. |
K |
(integer). It represents the first bootstrap sample number. Default value is 50. |
C |
(integer). It represents the second bootstrap sample number. Default value is 50. |
method |
The variance component estimation method to be used. See "Details". |
na_rm |
A logical value indicating whether to remove missing values (NaN) from the input matrices and vectors.
If |
na_omit |
A logical value indicating whether to stop the execution if missing values (NaN) are present in the input data.
If |
Details
This method was proposed by P. Hall and T. Maiti. Double bootstrap method uses boostrap tool twice for NER model to avoid the unattractivitive bias correction: one is to estimate the estimator bias, the other is to correct for bias.
Default value for method
is 1, method = 1
represents the MOM method, method = 2
and method = 3
represents ML and REML method, respectively.
Value
This function returns a list with components:
MSPE |
(vector) MSPE estimates for NER model. |
bhat |
(vector) Estimates of the unknown regression coefficients. |
sigvhat2 |
(numeric) Estimates of the area-specific variance component. |
sigehat2 |
(numeric) Estimates of the random error variance component. |
Author(s)
Peiwen Xiao, Xiaohui Liu, Yu Zhang, Yuzi Liu, Jiming Jiang
References
F. B. Butar and P. Lahiri. On measures of uncertainty of empirical bayes small area estimators. Journal of Statistical Planning and Inference, 112(1-2):63-76, 2003.
N. G. N. Prasad and J. N. K. Rao. The estimation of the mean squared error of small-area estimators. Journal of the American Statistical Association, 85(409):163-171, 1990.
Peter Hall and T. Maiti. On parametric bootstrap methods for small area prediction. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2006a.
H. T. Maiti and T. Maiti. Nonparametric estimation of mean squared prediction error in nested error regression models. Annals of Statistics, 34(4):1733-1750, 2006b.
Examples
Ni <- 1000
sigmaX <- 1.5
m <- 10
beta <- c(0.5, 1)
sigma_v2 <- 0.8
sigma_e2 <- 1
ni <- sample(seq(1, 10), m, replace = TRUE)
n <- sum(ni)
p <- length(beta)
pop.model <- function(Ni, sigmaX, beta, sigma_v2, sigma_e2, m) {
x <- rnorm(m * Ni, 1, sqrt(sigmaX))
v <- rnorm(m, 0, sqrt(sigma_v2))
y <- numeric(m * Ni)
theta <- numeric(m)
kk <- 1
for (i in 1:m) {
sumx <- 0
for (j in 1:Ni) {
sumx <- sumx + x[kk]
y[kk] <- beta[1] + beta[2] * x[kk] + v[i] + rnorm(1, 0, sqrt(sigma_e2))
kk <- kk + 1
}
meanx <- sumx / Ni
theta[i] <- beta[1] + beta[2] * meanx + v[i]
}
group <- rep(seq(m), each = Ni)
data <- data.frame(y = y, group = group, x1 = x)
return(list(data = data, theta = theta))
}
sampleXY <- function(Ni, ni, m, Population) {
Indx <- c()
for (i in 1:m) {
Indx <- c(Indx, sample(c(((i - 1) * Ni + 1):(i * Ni)), ni[i]))
}
Sample <- Population[Indx, ]
return(Sample)
}
Population <- pop.model(Ni, sigmaX, beta, sigma_v2, sigma_e2, m)$data
XY <- sampleXY(Ni, ni, m, Population)
formula <- y ~ x1
data <- XY
Xmean <- matrix(NA, m, p)
for (tt in 1:m) {
Xmean[tt, ] <- colMeans(Population[which(Population$group == tt), "x1", drop = FALSE])
}
result <- mspeNERdb(ni, formula, data, Xmean, K = 10, C = 10, method = 1)
print(result)
Compute MSPE through Jackknife-based MSPE estimation method for Nested error regression model
Description
This function returns MSPE estimator with Jackknife-based MSPE estimation method for Nested error regression model.
Usage
mspeNERjack(ni, formula, data, Xmean, method = 1, na_rm, na_omit)
Arguments
ni |
(vector). It represents the sample number for every small area. |
formula |
(formula). Stands for the model formula that specifies the auxiliary variables to be used in the regression model. This should follow the R model formula syntax. |
data |
(data frame). It represents the data containing the response values and auxiliary variables for the Nested Error Regression Model. |
Xmean |
(matrix). Stands for the population mean of auxiliary values. |
method |
The MSPE estimation method to be used. See "Details". |
na_rm |
A logical value indicating whether to remove missing values (NaN) from the input matrices and vectors.
If |
na_omit |
A logical value indicating whether to stop the execution if missing values (NaN) are present in the input data.
If |
Details
This bias-corrected jackknife MSPE estimator was proposed by J. Jiang and L. S. M. Wan, it covers a fairly general class of mixed models which includes gLMM, mixed logistic model and some of the widely used mixed linear models as special cases.
Default value for method
is 1, method = 1
represents the MOM method, method = 2
and method = 3
represents ML and REML method, respectively.
Value
This function returns a list with components:
MSPE |
(vector) MSPE estimates for NER model. |
bhat |
(vector) Estimates of the unknown regression coefficients. |
sigvhat2 |
(numeric) Estimates of the area-specific variance component. |
sigehat2 |
(numeric) Estimates of the random error variance component. |
Author(s)
Peiwen Xiao, Xiaohui Liu, Yu Zhang, Yuzi Liu, Jiming Jiang
References
M. H. Quenouille. Approximate tests of correlation in time series. Journal of the Royal Statistical Society. Series B (Methodological), 11(1):68-84, 1949.
J. W. Tukey. Bias and confidence in not quite large samples. Annals of Mathematical Statistics, 29(2):614, 1958.
J. Jiang and L. S. M. Wan. A unified jackknife theory for empirical best prediction with m estimation. Annals of Statistics, 30(6):1782-1810, 2002.
Examples
### parameter setting
Ni <- 1000
sigmaX <- 1.5
m <- 5
beta <- c(0.5, 1)
sigma_v2 <- 0.8
sigma_e2 <- 1
ni <- sample(seq(1, 10), m, replace = TRUE)
n <- sum(ni)
p <- length(beta)
### population function
pop.model <- function(Ni, sigmaX, beta, sigma_v2, sigma_e2, m) {
x <- rnorm(m * Ni, 1, sqrt(sigmaX))
v <- rnorm(m, 0, sqrt(sigma_v2))
y <- numeric(m * Ni)
theta <- numeric(m)
kk <- 1
for (i in 1:m) {
sumx <- 0
for (j in 1:Ni) {
sumx <- sumx + x[kk]
y[kk] <- beta[1] + beta[2] * x[kk] + v[i] + rnorm(1, 0, sqrt(sigma_e2))
kk <- kk + 1
}
meanx <- sumx / Ni
theta[i] <- beta[1] + beta[2] * meanx + v[i]
}
group <- rep(seq(m), each = Ni)
data <- data.frame(y = y, group = group, x1 = x)
return(list(data = data, theta = theta))
}
### sample function
sampleXY <- function(Ni, ni, m, Population) {
Indx <- c()
for (i in 1:m) {
Indx <- c(Indx, sample(c(((i - 1) * Ni + 1):(i * Ni)), ni[i]))
}
Sample <- Population[Indx, ]
return(Sample)
}
### data generation process
Population <- pop.model(Ni, sigmaX, beta, sigma_v2, sigma_e2, m)$data
XY <- sampleXY(Ni, ni, m, Population)
### Creating formula and data frame
formula <- y ~ x1
data <- XY
### Compute group-wise means for X
Xmean <- matrix(NA, m, p)
for (tt in 1:m) {
Xmean[tt, ] <- colMeans(Population[which(Population$group == tt), "x1", drop = FALSE])
}
result <- mspeNERjack(ni, formula, data, Xmean, method = 1)
Compute MSPE through linearization method for Nested error regression model
Description
This function returns MSPE estimator with linearization method for Nested error regression model. These include the seminal Prasad-Rao method and its generalizations by Datta-Lahiri. All these methods are developed for general linear mixed effects models
Usage
mspeNERlin(ni, formula, data, X.mean,
method = "PR", var.method = "default", na_rm, na_omit)
mspeNERPR(ni, formula, data, X.mean,
var.method = "default", na_rm, na_omit)
mspeNERDL(ni, formula, data, X.mean,
var.method = "default", na_rm, na_omit)
Arguments
ni |
(vector). It represents the sample number for every small area. |
formula |
(formula). Stands for the model formula that specifies the auxiliary variables to be used in the regression model. This should follow the R model formula syntax. |
data |
(data frame). It represents the data containing the response values and auxiliary variables for the Nested Error Regression Model. |
X.mean |
(matrix). Stands for the population mean of auxiliary values. |
method |
The MSPE estimation method to be used. See "Details". |
var.method |
The variance component estimation method to be used. See "Details". |
na_rm |
A logical value indicating whether to remove missing values (NaN) from the input matrices and vectors.
If |
na_omit |
A logical value indicating whether to stop the execution if missing values (NaN) are present in the input data.
If |
Details
Default method
for mspeNERlin
is "PR", proposed by N. G. N. Prasad and J. N. K. Rao, Prasad-Rao (PR) method uses Taylor series expansion to obtain a second-order approximation to the MSPE. Function mspeNERlin
also provide the following method:
Method "DL" advanced PR method to cover the cases when the variance components are estimated by ML and REML estimator. Set method = "DL"
.
For method = "PR"
, var.method = "MOM"
is the only available variance component estimation method,
For method = "DL"
, var.method = "ML"
or var.method = "REML"
are available.
Value
This function returns a list with components:
MSPE |
(vector) MSPE estimates for NER model. |
bhat |
(vector) Estimates of the unknown regression coefficients. |
sigvhat2 |
(numeric) Estimates of the area-specific variance component. |
sigehat2 |
(numeric) Estimates of the random error variance component. |
Author(s)
Peiwen Xiao, Xiaohui Liu, Yu Zhang, Yuzi Liu, Jiming Jiang
References
N. G. N. Prasad and J. N. K. Rao. The estimation of the mean squared error of small-area estimators. Journal of the American Statistical Association, 85(409):163-171, 1990.
G. S. Datta and P. Lahiri. A unified measure of uncertainty of estimated best linear unbiased predictors in small area estimation problems. Statistica Sinica, 10(2):613-627, 2000.
Examples
### parameter setting
Ni <- 1000
sigmaX <- 1.5
m <- 10
beta <- c(0.5, 1)
sigma_v2 <- 0.8
sigma_e2 <- 1
ni <- sample(seq(1, 10), m, replace = TRUE)
n <- sum(ni)
p <- length(beta)
pop.model <- function(Ni, sigmaX, beta, sigma_v2, sigma_e2, m) {
x <- rnorm(m * Ni, 1, sqrt(sigmaX))
v <- rnorm(m, 0, sqrt(sigma_v2))
y <- numeric(m * Ni)
theta <- numeric(m)
kk <- 1
for (i in 1:m) {
sumx <- 0
for (j in 1:Ni) {
sumx <- sumx + x[kk]
y[kk] <- beta[1] + beta[2] * x[kk] + v[i] + rnorm(1, 0, sqrt(sigma_e2))
kk <- kk + 1
}
meanx <- sumx / Ni
theta[i] <- beta[1] + beta[2] * meanx + v[i]
}
group <- rep(seq(m), each = Ni)
data <- data.frame(y = y, group = group, x1 = x)
return(list(data = data, theta = theta))
}
sampleXY <- function(Ni, ni, m, Population) {
Indx <- c()
for (i in 1:m) {
Indx <- c(Indx, sample(c(((i - 1) * Ni + 1):(i * Ni)), ni[i]))
}
Sample <- Population[Indx, ]
return(Sample)
}
Population <- pop.model(Ni, sigmaX, beta, sigma_v2, sigma_e2, m)$data
XY <- sampleXY(Ni, ni, m, Population)
formula <- y ~ x1
data <- XY
Xmean <- matrix(NA, m, p)
for (tt in 1:m) {
Xmean[tt, ] <- colMeans(Population[which(Population$group == tt), "x1", drop = FALSE])
}
result <- mspeNERlin(ni, formula, data, Xmean, method = "PR", var.method = "default")
Compute MSPE through parameter bootstrap method for Nested error regression model
Description
This function returns MSPE estimator with parameter bootstrap appoximation method for Nested error regression model
Usage
mspeNERpb(ni, formula, data, Xmean, K = 50, method = 4, na_rm, na_omit)
Arguments
ni |
(vector). It represents the sample number for every small area. |
formula |
(formula). Stands for the model formula that specifies the auxiliary variables to be used in the regression model. This should follow the R model formula syntax. |
data |
(data frame). It represents the data containing the response values and auxiliary variables for the Nested Error Regression Model. |
Xmean |
(matrix). Stands for the population mean of auxiliary values. |
K |
(integer). It represents the bootstrap sample number. Default value is 50. |
method |
The variance component estimation method to be used. See "Details". |
na_rm |
A logical value indicating whether to remove missing values (NaN) from the input matrices and vectors.
If |
na_omit |
A logical value indicating whether to stop the execution if missing values (NaN) are present in the input data.
If |
Details
This method was proposed by Peter Hall and T. Maiti. Parametric bootstrap (pb) method uses bootstrap-based method to measure the accuracy of EB estimator. In this case, only EB estimator is available (method = 4
).
Value
This function returns a list with components:
MSPE |
(vector) MSPE estimates for NER model. |
bhat |
(vector) Estimates of the unknown regression coefficients. |
sigvhat2 |
(numeric) Estimates of the area-specific variance component. |
sigehat2 |
(numeric) Estimates of the random error variance component. |
Author(s)
Peiwen Xiao, Xiaohui Liu, Yu Zhang, Yuzi Liu, Jiming Jiang
References
F. B. Butar and P. Lahiri. On measures of uncertainty of empirical bayes small area estimators. Journal of Statistical Planning and Inference, 112(1-2):63-76, 2003.
N. G. N. Prasad and J. N. K. Rao. The estimation of the mean squared error of small-area estimators. Journal of the American Statistical Association, 85(409):163-171, 1990.
Peter Hall and T. Maiti. On parametric bootstrap methods for small area prediction. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2006a.
H. T. Maiti and T. Maiti. Nonparametric estimation of mean squared prediction error in nested error regression models. Annals of Statistics], 34(4):1733-1750, 2006b.
Examples
Ni <- 1000
sigmaX <- 1.5
K <- 50
C <- 50
m <- 10
beta <- c(0.5, 1)
sigma_v2 <- 0.8
sigma_e2 <- 1
ni <- sample(seq(1, 10), m, replace = TRUE)
n <- sum(ni)
p <- length(beta)
### population function
pop.model <- function(Ni, sigmaX, beta, sigma_v2, sigma_e2, m) {
x <- rnorm(m * Ni, 1, sqrt(sigmaX))
v <- rnorm(m, 0, sqrt(sigma_v2))
y <- numeric(m * Ni)
theta <- numeric(m)
kk <- 1
for (i in 1:m) {
sumx <- 0
for (j in 1:Ni) {
sumx <- sumx + x[kk]
y[kk] <- beta[1] + beta[2] * x[kk] + v[i] + rnorm(1, 0, sqrt(sigma_e2))
kk <- kk + 1
}
meanx <- sumx / Ni
theta[i] <- beta[1] + beta[2] * meanx + v[i]
}
group <- rep(seq(m), each = Ni)
data <- data.frame(y = y, group = group, x1 = x)
return(list(data = data, theta = theta))
}
### sample function
sampleXY <- function(Ni, ni, m, Population) {
Indx <- c()
for (i in 1:m) {
Indx <- c(Indx, sample(c(((i - 1) * Ni + 1):(i * Ni)), ni[i]))
}
Sample <- Population[Indx, ]
return(Sample)
}
### data generation process
Population <- pop.model(Ni, sigmaX, beta, sigma_v2, sigma_e2, m)$data
XY <- sampleXY(Ni, ni, m, Population)
formula <- y ~ x1
data <- XY
Xmean <- matrix(NA, m, p)
for (tt in 1:m) {
Xmean[tt, ] <- colMeans(Population[which(Population$group == tt), "x1", drop = FALSE])
}
result <- mspeNERpb(ni, formula, data, Xmean, K = 50, method = 4)
Compute MSPE through Sumca method for Nested error regression model
Description
This function returns MSPE estimator with the combination of linearization and resampling appoximation method for Nested error regression model.
Usage
mspeNERsumca(ni, formula, data, Xmean, K = 50, method = 1, na_rm, na_omit)
Arguments
ni |
(vector). It represents the sample number for every small area. |
formula |
(formula). Stands for the model formula that specifies the auxiliary variables to be used in the regression model. This should follow the R model formula syntax. |
data |
(data frame). It represents the data containing the response values and auxiliary variables for the Nested Error Regression Model. |
Xmean |
(matrix). Stands for the population mean of auxiliary values. |
K |
(integer). It represents the Monte-Carlo sample size for "Sumca". Default value is 50. |
method |
The MSPE estimation method to be used. See "Details". |
na_rm |
A logical value indicating whether to remove missing values (NaN) from the input matrices and vectors.
If |
na_omit |
A logical value indicating whether to stop the execution if missing values (NaN) are present in the input data.
If |
Details
This method was proposed by J. Jiang, P. Lahiri, and T. Nguyen, sumca method combines the advantages of linearization and resampling methods and obtains unified, positive, low-computation burden and second-order unbiased MSPE estimators.
Default value for method
is 1, method = 1
represents the MOM method, method = 2
and method = 3
represents ML and REML method, respectively.
Value
This function returns a list with components:
MSPE |
(vector) MSPE estimates for NER model. |
bhat |
(vector) Estimates of the unknown regression coefficients. |
sigvhat2 |
(numeric) Estimates of the area-specific variance component. |
sigehat2 |
(numeric) Estimates of the random error variance component. |
Author(s)
Peiwen Xiao, Xiaohui Liu, Yu Zhang, Yuzi Liu, Jiming Jiang
References
J. Jiang and M. Torabi. Sumca: simple; unified; monte carlo assisted approach to second order unbiased mean squared prediction error estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 82(2):467-485, 2020.
Examples
Ni <- 1000
sigmaX <- 1.5
m <- 10
beta <- c(0.5, 1)
sigma_v2 <- 0.8
sigma_e2 <- 1
ni <- sample(seq(1, 10), m, replace = TRUE)
n <- sum(ni)
p <- length(beta)
pop.model <- function(Ni, sigmaX, beta, sigma_v2, sigma_e2, m) {
x <- rnorm(m * Ni, 1, sqrt(sigmaX))
v <- rnorm(m, 0, sqrt(sigma_v2))
y <- numeric(m * Ni)
theta <- numeric(m)
kk <- 1
for (i in 1:m) {
sumx <- 0
for (j in 1:Ni) {
sumx <- sumx + x[kk]
y[kk] <- beta[1] + beta[2] * x[kk] + v[i] + rnorm(1, 0, sqrt(sigma_e2))
kk <- kk + 1
}
meanx <- sumx / Ni
theta[i] <- beta[1] + beta[2] * meanx + v[i]
}
group <- rep(seq(m), each = Ni)
data <- data.frame(y = y, group = group, x1 = x)
return(list(data = data, theta = theta))
}
sampleXY <- function(Ni, ni, m, Population) {
Indx <- c()
for (i in 1:m) {
Indx <- c(Indx, sample(c(((i - 1) * Ni + 1):(i * Ni)), ni[i]))
}
Sample <- Population[Indx, ]
return(Sample)
}
Population <- pop.model(Ni, sigmaX, beta, sigma_v2, sigma_e2, m)$data
XY <- sampleXY(Ni, ni, m, Population)
formula <- y ~ x1
data <- XY
Xmean <- matrix(NA, m, p)
for (tt in 1:m) {
Xmean[tt, ] <- colMeans(Population[which(Population$group == tt), "x1", drop = FALSE])
}
result <- mspeNERsumca(ni, formula, data, Xmean, K = 50, method = 1)
Estimates of the variance component using several methods for Fay Herriot model
Description
This function returns the estimate of variance component with several existing method for Fay Herriot model. This function does not accept missing values
Usage
varfh(formula, data, D, method, na_rm, na_omit)
varOBP(formula, data, D, na_rm, na_omit)
Arguments
formula |
(formula). Stands for the model formula that specifies the auxiliary variables to be used in the regression model. This should follow the R model formula syntax. |
data |
(data frame). It represents the data containing the response values and auxiliary variables for the Nested Error Regression Model. |
D |
(vector). It represents the knowing sampling variance for Fay Herriot model. |
method |
Variance component estimation method. See "Details". |
na_rm |
A logical value indicating whether to remove missing values (NaN) from the input matrices and vectors.
If |
na_omit |
A logical value indicating whether to stop the execution if missing values (NaN) are present in the input data.
If |
Details
Default value for method
is 1, It represents the moment estimator, Also called ANOVA estimator, The available variance component estimation method are list as follows:
method = 1
represents the moment (MOM) estimator, ;
method = 2
represents the restricted maximum likelihood (REML) estimator;
method = 3
represents the maximum likelihood (ML) estimator;
method = 4
represents the empirical bayesian (EB) estimator;
Value
This function returns a list with components:
bhat |
(vector) Estimates of the unknown regression coefficients. |
Ahat |
(numeric) Estimates of the variance component. |
Author(s)
Peiwen Xiao, Xiaohui Liu, Yu Zhang, Yuzi Liu, Jiming Jiang
References
J. Jiang. Linear and Generalized Linear Mixed Models and Their Applications. 2007.
Examples
X <- matrix(runif(10 * 3), 10, 3)
X[,1] <- rep(1, 10)
D <- (1:10) / 10 + 0.5
Y <- X %*% c(0.5, 1, 1.5) + rnorm(10, 0, sqrt(2)) + rnorm(10, 0, sqrt(D))
data <- data.frame(Y = Y, X1 = X[,2], X2 = X[,3])
formula <- Y ~ X1 + X2
result <- varfh(formula, data, D, method = 1)
Estimates of the variance component using several methods for Nested error regression model.
Description
This function returns the estimate of variance component with several existing method for Nested error regression model. This function does not accept missing values.
Usage
varner(ni, formula, data, method, na_rm, na_omit)
Arguments
ni |
(vector). It represents the sample number for every small area. |
formula |
(formula). Stands for the model formula that specifies the auxiliary variables to be used in the regression model. This should follow the R model formula syntax. |
data |
(data frame). It represents the data containing the response values and auxiliary variables for the Nested Error Regression Model. |
method |
The variance component estimation method to be used. See "Details". |
na_rm |
A logical value indicating whether to remove missing values (NaN) from the input matrices and vectors.
If |
na_omit |
A logical value indicating whether to stop the execution if missing values (NaN) are present in the input data.
If |
Details
Default value for method
is 1, It represents the moment estimator, Also called ANOVA estimator, The available variance component estimation method are list as follows:
method = 1
represents the MOM estimator;
method = 2
represents the restricted maximum likelihood (REML) estimator;
method = 3
represents the maximum likelihood (ML) estimator;
method = 4
represents the empirical bayesian (EB) estimator;
Value
This function returns a list with components:
bhat |
(vector) Estimates of the unknown regression coefficients. |
sigvhat2 |
(numeric) Estimates of the area-specific variance component. |
sigehat2 |
(numeric) Estimates of the random error variance component. |
Author(s)
Peiwen Xiao, Xiaohui Liu, Yu Zhang, Yuzi Liu, Jiming Jiang
References
J. Jiang. Linear and Generalized Linear Mixed Models and Their Applications. 2007.
Examples
### parameter setting
Ni <- 1000
sigmaX <- 1.5
m <- 10
beta <- c(0.5, 1)
sigma_v2 <- 0.8
sigma_e2 <- 1
ni <- sample(seq(1,10), m, replace = TRUE)
n <- sum(ni)
p <- length(beta)
### population function
pop.model <- function(Ni, sigmaX, beta, sigma_v2, sigma_e2, m) {
x <- rnorm(m * Ni, 1, sqrt(sigmaX))
v <- rnorm(m, 0, sqrt(sigma_v2))
y <- numeric(m * Ni)
theta <- numeric(m)
kk <- 1
for (i in 1:m) {
sumx <- 0
for (j in 1:Ni) {
sumx <- sumx + x[kk]
y[kk] <- beta[1] + beta[2] * x[kk] + v[i] + rnorm(1, 0, sqrt(sigma_e2))
kk <- kk + 1
}
meanx <- sumx / Ni
theta[i] <- beta[1] + beta[2] * meanx + v[i]
}
group <- rep(seq(m), each = Ni)
x <- cbind(rep(1, m*Ni), x)
data <- data.frame(y = y, group = group, x1 = x[,2])
return(list(data = data, theta = theta))
}
### sample function
sampleXY <- function(Ni, ni, m, Population) {
Indx <- c()
for (i in 1:m) {
Indx <- c(Indx, sample(c(((i - 1) * Ni + 1) : (i * Ni)), ni[i]))
}
Sample <- Population[Indx, ]
return(Sample)
}
Population <- pop.model(Ni, sigmaX, beta, sigma_v2, sigma_e2, m)$data
XY <- sampleXY(Ni, ni, m, Population)
formula <- y ~ x1
data <- XY
result <- varner(ni, formula, data, method = 1)
Wheat area measurement and satellite data.
Description
Wheat area data measured at the scene in the block of Yanzhou District, Jining City, Shandong Province. The data corresponding to each block comes from the ArcGIS platform. The whole dataset consists of a total number of 458 villages and 14750 wheat blocks.
Usage
data(wheatarea)
Format
A data frame with 14708 observations on the following 3 variables.
pixel
:Pixel sizes of each wheat blocks.
F_AREA
:Field inspection area of each wheat blocks.
code
:Street code.
Source
- Liu Y, Qu W, Cui Z, Liu X, Xu W, Jiang j,. (2021). Estimation of Wheat Growing Area via Mixed Model Prediction Using Satellite Data. Journal of Applied Statistics and Management.