Title: | Spatial Sampling Design and Analysis |
Version: | 5.5.1 |
Maintainer: | Michael Dumelle <Dumelle.Michael@epa.gov> |
Description: | A design-based approach to statistical inference, with a focus on spatial data. Spatially balanced samples are selected using the Generalized Random Tessellation Stratified (GRTS) algorithm. The GRTS algorithm can be applied to finite resources (point geometries) and infinite resources (linear / linestring and areal / polygon geometries) and flexibly accommodates a diverse set of sampling design features, including stratification, unequal inclusion probabilities, proportional (to size) inclusion probabilities, legacy (historical) sites, a minimum distance between sites, and two options for replacement sites (reverse hierarchical order and nearest neighbor). Data are analyzed using a wide range of analysis functions that perform categorical variable analysis, continuous variable analysis, attributable risk analysis, risk difference analysis, relative risk analysis, change analysis, and trend analysis. spsurvey can also be used to summarize objects, visualize objects, select samples that are not spatially balanced, select panel samples, measure the amount of spatial balance in a sample, adjust design weights, and more. For additional details, see Dumelle et al. (2023) <doi:10.18637/jss.v105.i03>. |
Depends: | R (≥ 3.5.0), sf, survey (≥ 4.1-1) |
Imports: | boot, crossdes, deldir, graphics, grDevices, lme4, MASS, sampling, stats, units |
Suggests: | knitr, testthat, rmarkdown |
License: | GPL (≥ 3) |
URL: | https://usepa.github.io/spsurvey/, https://github.com/USEPA/spsurvey |
BugReports: | https://github.com/USEPA/spsurvey/issues |
VignetteBuilder: | knitr |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.2.3 |
NeedsCompilation: | no |
Packaged: | 2024-01-08 20:47:59 UTC; MDUMELLE |
Author: | Michael Dumelle |
Repository: | CRAN |
Date/Publication: | 2024-01-09 08:20:02 UTC |
spsurvey: Spatial Sampling Design and Analysis
Description
spsurvey implements a design-based approach to statistical inference, with a focus on spatial data. Spatially balanced samples are selected using the Generalized Random Tessellation Stratified (GRTS) algorithm. The GRTS algorithm can be applied to finite resources (point geometries) and infinite resources (linear / linestring and areal / polygon geometries) and flexibly accommodates a diverse set of sampling design features, including stratification, unequal inclusion probabilities, proportional (to size) inclusion probabilities, legacy (historical) sites, a minimum distance between sites, and two options for replacement sites (reverse hierarchical order and nearest neighbor). Data are analyzed using a wide range of analysis functions that perform categorical variable analysis, continuous variable analysis, attributable risk analysis, risk difference analysis, relative risk analysis, change analysis, and trend analysis. spsurvey can also be used to summarize objects, visualize objects, select samples that are not spatially balanced, select panel samples, measure the amount of spatial balance in a sample, adjust design weights, and more. This R package has been reviewed in accordance with U.S. Environmental Protection Agency policy and approved for publication. Mention of trade names or commercial products does not constitute endorsement or recommendation for use.
Author(s)
Maintainer: Michael Dumelle Dumelle.Michael@epa.gov (ORCID)
Authors:
Tom Kincaid Kincaid.Tom@epa.gov
Tony Olsen Olsen.Tony@epa.gov
Marc Weber Weber.Marc@epa.gov
Other contributors:
Don Stevens [contributor]
Denis White [contributor]
See Also
Useful links:
Report bugs at https://github.com/USEPA/spsurvey/issues
Illinois River data
Description
An (sf
) MULTILINESTRING object of 244 segments of the
Illinois River in Arkansas and Oklahoma.
Usage
Illinois_River
Format
244 rows and 2 variables:
STATE_NAME
State name.
geometry
MULTILINESTRING geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070).
Illinois River legacy data
Description
An (sf
) POINT object of legacy sites for the Illinois
River data.
Usage
Illinois_River_Legacy
Format
5 rows and 2 variables:
STATE_NAME
State name.
geometry
POINT geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070).
Lake Ontario data
Description
An sf
MULTIPOLYGON object of 187 polygons consisting
of shore segments in Lake Ontario.
Usage
Lake_Ontario
Format
187 rows and 5 variables:
COUNTRY
Country.
RSRC_CLASS
Bay class.
PSTL_CODE
Postal code.
AREA_SQKM
Area in square kilometers
geometry
MULTIPOLYGON geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070).
New England Lakes data
Description
An sf
POINT object of 195 lakes in the Northeastern
United States.
Usage
NE_Lakes
Format
195 rows and 5 variables:
AREA
Lake area in hectares.
AREA_CAT
Lake area categories based on a hectare cutoff.
ELEV
Elevation in meters.
ELEV_CAT
Elevation categories based on a meter cutoff.
geometry
POINT geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070).
New England Lakes legacy data
Description
An sf
POINT object of 5 legacy sites for the NE Lakes data
Usage
NE_Lakes_Legacy
Format
5 rows and 5 variables:
AREA
Lake area in hectares.
AREA_CAT
Lake area categories based on a hectare cutoff.
ELEV
Elevation in meters.
ELEV_CAT
Elevation categories based on a meter cutoff.
geometry
POINT geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070).
New England Lakes data (as a data frame)
Description
An data frame of 195 lakes in the Northeastern United States.
Usage
NE_Lakes_df
Format
195 rows and 6 variables:
AREA
Lake area in hectares.
AREA_CAT
Lake area categories based on a hectare cutoff.
ELEV
Elevation in meters.
ELEV_CAT
Elevation categories based on a meter cutoff.
XCOORD
x-coordinate using the WGS 84 coordinate reference system (EPSG: 4326)
YCOORD
y-coordinate using WGS 84 coordinate reference system (EPSG: 4326)
NLA PNW data
Description
An sf
POINT object of 96 lakes in the Pacific Northwest Region of the United
States during the year 2017, from a subset of the Environmental
Protection Agency's "National Lakes Assessment."
Usage
NLA_PNW
Format
96 rows and 9 variables:
SITE_ID
A unique lake identifier.
WEIGHT
The sampling design weight.
URBAN
Urban category.
STATE
State name.
BMMI
Benthic MMI value.
BMMI_COND
Benthic MMI condition categories.
PHOS_COND
Phosphorus condition categories.
NITR_COND
Nitrogen condition categories.
geometry
POINT geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070).
NRSA EPA7 data
Description
An sf
POINT object of 353 stream segments in the Central
United States during the years 2008 and 2013, from a subset of the Environmental
Protection Agency's "National Rivers and Streams Assessment."
Usage
NRSA_EPA7
Format
353 rows and 10 variables:
SITE_ID
A unique site identifier.
YEAR
Year of design cycle.
WEIGHT
Sampling design weights.
ECOREGION
Ecoregion.
STATE
State name.
BMMI
Benthic MMI value.
BMMI_COND
Benthic MMI categories.
PHOS_COND
Phosphorus condition categories.
NITR_COND
Nitrogen condition categories.
geometry
POINT geometry using the NAD83 / Conus Albers coordinate reference system (EPSG: 5070).
Adjust survey design weights by categories
Description
Adjust initial survey design weights so that the
final weights sum to a desired frame size. Adjusted weights
proportionally scale the initial weights to sum to the desired frame size.
Separate adjustments are applied to each category specified in wgtcat
.
Usage
adjwgt(wgt, wgtcat = NULL, framesize, sites = NULL)
Arguments
wgt |
Vector of initial weights for each site. These equal the reciprocal of the site's inclusion probability. |
wgtcat |
Vector containing each site's weight adjustment
category name. The default is |
framesize |
Vector containing the known size of the frame
for each category name in |
sites |
Vector indicating site use; |
Value
Vector of adjusted weights, where the adjusted weight is set
to 0
for sites whose value in the sites argument was set to
FALSE
.
Author(s)
Tony Olsen olsen.tony@epa.gov
Examples
wgt <- runif(50)
wgtcat <- rep(c("A", "B"), c(30, 20))
framesize <- c(A = 15, B = 10)
sites <- rep(rep(c(TRUE, FALSE), c(9, 1)), 5)
adjwgt(wgt, wgtcat, framesize, sites)
Adjust survey design weights for non-response by categories
Description
Adjust weights for target sample units that do not respond and are missing at random within categories. The missing at random assumption implies that their sample weight may be assigned to specific categories of units that have responded (i.e., have been sampled). This is a class-based method for non-response adjustment.
Usage
adjwgtNR(wgt, MARClass, EvalStatus, TNRClass, TRClass)
Arguments
wgt |
vector of weights for each sample unit that will be adjusted for non-response. Weights must be weights for the design as implemented. All weights must be greater than zero. |
MARClass |
vector that identifies for each sample unit the category that will be used in non-response weight adjustment for sample units that are known to be target. Within each missing at random (MAR) category, the missing sample units that are not sampled are assumed to be missing at random. |
EvalStatus |
vector of the evaluation status for each sample unit. Values must include the values given in TNRclass and TRClass. May include other values not required for the non-response adjustment. |
TNRClass |
subset of values in EvalStatus that identify sample units whose target status is known and that do not respond (i.e., are not sampled). |
TRClass |
Subset of values in EvalStatus that identify sample units whose target status is known and that respond (i.e., are target and sampled). |
Value
Vector of sample unit weights that are adjusted for non-response and that is the same length of input weights. Weights for sample units that did not response but were known to be eligible are set to zero. Weights for all other sample units are also set to zero.
Author(s)
Tony Olsen olsen.tony@epa.gov
Examples
set.seed(5)
wgt <- runif(40)
MARClass <- rep(c("A", "B"), rep(20, 2))
EvalStatus <- sample(c("Not_Target", "Target_Sampled", "Target_Not_Sampled"), 40, replace = TRUE)
TNRClass <- "Target_Not_Sampled"
TRClass <- "Target_Sampled"
adjwgtNR(wgt, MARClass, EvalStatus, TNRClass, TRClass)
# function that has an error check
Compute the average shifted histogram (ASH) for one-dimensional weighted data
Description
Calculate the average shifted histogram estimate of a density based on one-dimensional data from a survey design with weights.
Usage
ash1_wgt(
x,
wgt = rep(1, length(x)),
m = 5,
nbin = 50,
ab = NULL,
support = "Continuous"
)
Arguments
x |
Vector used to estimate the density. |
wgt |
Vector of weights for each observation from a probability sample. The default assigns equal weights (equal probability). |
m |
Number of empty bins to add to the ends when the range is not
completely specified. The default is |
nbin |
Number of bins for density estimation. The default is |
ab |
Optional range for support associated with the density. Both
values may be equal to |
support |
Type of support. If equal to |
Value
List containing the ASH density estimate. List consists of
tcen
x-coordinate for center of bin
f
y-coordinate for density estimate height
Author(s)
Tony Olsen Olsen.tony@epa.gov
References
Scott, D. W. (1985). "Averaged shifted histograms: effective nonparametric density estimators in several dimensions." The Annals of Statistics 13(3): 1024-1040.
Examples
x <- rnorm(100, 10, sqrt(10))
wgt <- runif(100, 10, 100)
rslt <- ash1_wgt(x, wgt)
plot(rslt)
Attributable risk analysis
Description
This function organizes input and output for the analysis of attributable risk (for
categorical variables). The analysis data,
dframe
, can be either a data frame or a simple features (sf
) object. If an
sf
object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord
and ycoord
are assigned values
"xcoord"
and "ycoord"
, respectively, and the geometry column is
dropped from the object.
Usage
attrisk_analysis(
dframe,
vars_response,
vars_stressor,
response_levels = NULL,
stressor_levels = NULL,
subpops = NULL,
siteID = NULL,
weight = "weight",
xcoord = NULL,
ycoord = NULL,
stratumID = NULL,
clusterID = NULL,
weight1 = NULL,
xcoord1 = NULL,
ycoord1 = NULL,
sizeweight = FALSE,
sweight = NULL,
sweight1 = NULL,
fpc = NULL,
popsize = NULL,
vartype = "Local",
conf = 95,
All_Sites = FALSE
)
Arguments
dframe |
Data to be analyzed (analysis data). A data frame or
|
vars_response |
Vector composed of character values that identify the
names of response variables in |
vars_stressor |
Vector composed of character values that identify the
names of stressor variables in |
response_levels |
List providing the category values (levels) for each
element in the |
stressor_levels |
List providing the category values (levels) for each
element in the |
subpops |
Vector composed of character values that identify the
names of subpopulation (domain) variables in |
siteID |
Character value providing the name of the site ID variable in
|
weight |
Character value providing the name of the design weight
variable in |
xcoord |
Character value providing name of the x-coordinate variable in
|
ycoord |
Character value providing name of the y-coordinate variable in
|
stratumID |
Character value providing the name of the stratum ID
variable in |
clusterID |
Character value providing the name of the cluster
(stage one) ID variable in |
weight1 |
Character value providing the name of the stage one weight
variable in |
xcoord1 |
Character value providing the name of the stage one
x-coordinate variable in |
ycoord1 |
Character value providing the name of the stage one
y-coordinate variable in |
sizeweight |
Logical value that indicates whether size weights should be
used during estimation, where |
sweight |
Character value providing the name of the size weight variable
in |
sweight1 |
Character value providing the name of the stage one size
weight variable in |
fpc |
Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design:
Example fpc for a single-stage stratified survey design:
Example fpc for a two-stage unstratified survey design:
Example fpc for a two-stage stratified survey design:
|
popsize |
Object that provides values for the population argument of the
Example popsize for calibration:
Example popsize for post-stratification using a data frame:
Example popsize for post-stratification using a table:
Example popsize for post-stratification using an xtabs object:
|
vartype |
Character value providing the choice of the variance
estimator, where |
conf |
Numeric value providing the Gaussian-based confidence level. The default value
is |
All_Sites |
A logical variable used when |
Value
The analysis results. A data frame of population estimates for all combinations of subpopulations, categories within each subpopulation, response variables, and categories within each response variable. Estimates are provided for proportion and size of the population plus standard error, margin of error, and confidence interval estimates. The data frame contains the following variables:
- Type
subpopulation (domain) name
- Subpopulation
subpopulation name within a domain
- Response
response variable
- Stressor
stressor variable
- nResp
sample size
- Estimate
attributable risk estimate
- StdError_log
attributable risk standard error (on the log scale)
- MarginofError_log
attributable risk margin of error (on the log scale)
- LCBxxPct
xx% (default 95%) lower confidence bound
- UCBxxPct
xx% (default 95%) upper confidence bound
- WeightTotal
sum of design weights
- Count_RespPoor_StressPoor
number of observations in the poor response and poor stressor group
- Count_RespPoor_StressGood
number of observations in the poor response and good stressor group
- Count_RespGood_StressPoor
number of observations in the good response and poor stressor group
- Count_RespGood_StressGood
number of observations in the good response and good stressor group
- Prop_RespPoor_StressPoor
weighted proportion of observations in the poor response and poor stressor group
- Prop_RespPoor_StressGood
weighted proportion of observations in the poor response and good stressor group
- Prop_RespGood_StressPoor
weighted proportion of observations in the good response and poor stressor group
- Prop_RespGood_StressGood
weighted proportion of observations in the good response and good stressor group
Details
Attributable risk measures the proportional reduction in the extent of poor condition of a response variable that presumably would result from eliminating a stressor variable, where the response and stressor variables are classified as either good (i.e., reference condition) or poor (i.e., different from reference condition). Attributable risk is defined as one minus the ratio of two probabilities. The numerator of the ratio is the conditional probability that the response variable is in poor condition given that the stressor variable is in good condition. The denominator of the ratio is the probability that the response variable is in poor condition. Attributable risk values close to zero indicate that removing the stressor variable will have little or no impact on the probability that the response variable is in poor condition. Attributable risk values close to one indicate that removing the stressor variable will result in extensive reduction of the probability that the response variable is in poor condition.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
References
Sickle, J. V., & Paulsen, S. G. (2008). Assessing the attributable risks, relative risks, and regional extents of aquatic stressors. Journal of the North American Benthological Society, 27(4), 920-931.
See Also
relrisk_analysis
for relative risk analysis
diffrisk_analysis
for risk difference analysis
Examples
dframe <- data.frame(
siteID = paste0("Site", 1:100),
wgt = runif(100, 10, 100),
xcoord = runif(100),
ycoord = runif(100),
stratum = rep(c("Stratum1", "Stratum2"), 50),
RespVar1 = sample(c("Poor", "Good"), 100, replace = TRUE),
RespVar2 = sample(c("Poor", "Good"), 100, replace = TRUE),
StressVar = sample(c("Poor", "Good"), 100, replace = TRUE),
All_Sites = rep("All Sites", 100),
Resource_Class = rep(c("Agr", "Forest"), c(55, 45))
)
myresponse <- c("RespVar1", "RespVar2")
mystressor <- c("StressVar")
mysubpops <- c("All_Sites", "Resource_Class")
attrisk_analysis(dframe,
vars_response = myresponse,
vars_stressor = mystressor, subpops = mysubpops, siteID = "siteID",
weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
stratumID = "stratum"
)
Categorical variable analysis
Description
This function organizes input and output for the analysis of categorical variables. The analysis data,
dframe
, can be either a data frame or a simple features (sf
) object. If an
sf
object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord
and ycoord
are assigned values
"xcoord"
and "ycoord"
, respectively, and the geometry column is
dropped from the object.
Usage
cat_analysis(
dframe,
vars,
subpops = NULL,
siteID = NULL,
weight = "weight",
xcoord = NULL,
ycoord = NULL,
stratumID = NULL,
clusterID = NULL,
weight1 = NULL,
xcoord1 = NULL,
ycoord1 = NULL,
sizeweight = FALSE,
sweight = NULL,
sweight1 = NULL,
fpc = NULL,
popsize = NULL,
vartype = "Local",
jointprob = "overton",
conf = 95,
All_Sites = FALSE
)
Arguments
dframe |
Data to be analyzed (analysis data). A data frame or
|
vars |
Vector composed of character values that identify the
names of response variables in |
subpops |
Vector composed of character values that identify the
names of subpopulation (domain) variables in |
siteID |
Character value providing name of the site ID variable in
the |
weight |
Character value providing name of the design weight
variable in |
xcoord |
Character value providing name of the x-coordinate variable in
the |
ycoord |
Character value providing name of the y-coordinate variable in
the |
stratumID |
Character value providing name of the stratum ID variable in
the |
clusterID |
Character value providing the name of the cluster
(stage one) ID variable in |
weight1 |
Character value providing name of the stage one weight
variable in |
xcoord1 |
Character value providing the name of the stage one
x-coordinate variable in |
ycoord1 |
Character value providing the name of the stage one
y-coordinate variable in |
sizeweight |
Logical value that indicates whether size weights should be
used during estimation, where |
sweight |
Character value providing the name of the size weight variable
in |
sweight1 |
Character value providing name of the stage one size weight
variable in |
fpc |
Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design:
Example fpc for a single-stage stratified survey design:
Example fpc for a two-stage unstratified survey design:
Example fpc for a two-stage stratified survey design:
|
popsize |
Object that provides values for the population argument of the
Example popsize for calibration:
Example popsize for post-stratification using a data frame:
Example popsize for post-stratification using a table:
Example popsize for post-stratification using an xtabs object:
|
vartype |
Character value providing the choice of the variance
estimator, where |
jointprob |
Character value providing the choice of joint inclusion
probability approximation for use with Horvitz-Thompson and Yates-Grundy
variance estimators, where |
conf |
Numeric value providing the Gaussian-based confidence level. The default value
is |
All_Sites |
A logical variable used when |
Value
The analysis results. A data frame of population estimates for all combinations of subpopulations, categories within each subpopulation, response variables, and categories within each response variable. Estimates are provided for proportion and total of the population plus standard error, margin of error, and confidence interval estimates. The data frame contains the following variables:
- Type
subpopulation (domain) name
- Subpopulation
subpopulation name within a domain
- Indicator
response variable
- Category
category of response variable
- nResp
sample size
- Estimate.P
proportion estimate (in %)
- StdError.P
standard error of proportion estimate
- MarginofError.P
margin of error of proportion estimate
- LCBxxPct.P
xx% (default 95%) lower confidence bound of proportion estimate
- UCBxxPct.P
xx% (default 95%) upper confidence bound of proportion estimate
- Estimate.U
total estimate
- StdError.U
standard error of total estimate
- MarginofError.U
margin of error of total estimate
- LCBxxPct.U
xx% (default 95%) lower confidence bound of total estimate
- UCBxxPct.U
xx% (default 95%) upper confidence bound of total estimate
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
cont_analysis
for continuous variable analysis
Examples
dframe <- data.frame(
siteID = paste0("Site", 1:100),
wgt = runif(100, 10, 100),
xcoord = runif(100),
ycoord = runif(100),
stratum = rep(c("Stratum1", "Stratum2"), 50),
CatVar = rep(c("north", "south", "east", "west"), 25),
All_Sites = rep("All Sites", 100),
Resource_Class = rep(c("Good", "Poor"), c(55, 45))
)
myvars <- c("CatVar")
mysubpops <- c("All_Sites", "Resource_Class")
mypopsize <- data.frame(
Resource_Class = c("Good", "Poor"),
Total = c(4000, 1500)
)
cat_analysis(dframe,
vars = myvars, subpops = mysubpops, siteID = "siteID",
weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
stratumID = "stratum", popsize = mypopsize
)
Plot a cumulative distribution function (CDF)
Description
This function creates a CDF plot. Input data for the plots is provided by a
data frame with the same structure as the "CDF" output from cont_analysis
.
Confidence limits for the CDF also are plotted.
Usage
cdf_plot(
cdfest,
var = NULL,
subpop = NULL,
subpop_level = NULL,
units_cdf = "Percent",
type_cdf = "Continuous",
log = "",
xlab = NULL,
ylab = NULL,
ylab_r = NULL,
main = NULL,
legloc = NULL,
confcut = 0,
conflev = 95,
cex.main = 1.2,
cex.legend = 1,
...
)
Arguments
cdfest |
Data frame with the same structure as the "CDF" output from
|
var |
If |
subpop |
If |
subpop_level |
If |
units_cdf |
Indicator for the label utilized for the left side y-axis and the values used for the left side y-axis tick marks, where "Percent" means the label and values are in terms of percent of the population, and "Units" means the label and values are in terms of units (count, length, or area) of the population. The default is "Percent". |
type_cdf |
Character string consisting of the value "Continuous" or "Ordinal" that controls the type of CDF plot. The default is "Continuous". |
log |
Character string consisting of the value "" or "x" that controls whether the x axis uses the original scale ("") or the base 10 logarithmic scale ("x"). The default is "". |
xlab |
Character string providing the x-axis label. If this argument equals NULL, then the indicator name is used as the label. The default is NULL. |
ylab |
Character string providing the left side y-axis label. If argument units_cdf equals "Units", a value should be provided for this argument. Otherwise, the label will be "Percent". The default is "Percent". |
ylab_r |
Character string providing the label for the right side y-axis (and, hence, determining the values used for the right side y-axis tick marks), where NULL means a right side y-axis is not created. If this argument equals "Same", the right side y-axis will have the same label and tick mark values as the left side y-axis. If this argument equals a character string other than "Same", the right side y-axis label will be the value provided for argument ylab_r, and the right side y-axis tick mark values will be determined by the choice not utilized for argument units_cdf, which means that the default value of argument units_cdf (i.e., "Percent") will result in the right side y-axis tick mark values being expressed in terms of units of the population (i.e., count, length, or area). The default is NULL. |
main |
Character string providing the plot title. The default is NULL. |
legloc |
Indicator for location of the plot legend, where "BR" means bottom right, "BL" means bottom left, "TR" means top right, "TL" means top left, and NULL means no legend. The default is NULL. |
confcut |
Numeric value that controls plotting confidence limits at the CDF extremes. Confidence limits for CDF values (percent scale) less than confcut or greater than 100 minus confcut are not plotted. A value of zero means confidence limits are plotted for the complete range of the CDF. The default is 0. |
conflev |
Numeric value of the confidence level used for confidence limits. The default is 95. |
cex.main |
Expansion factor for the plot title. The default is 1.2. |
cex.legend |
Expansion factor for the legend title. The default is 1. |
... |
Additional arguments passed to the |
Value
A plot of a variable's CDF estimates associated confidence limits.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
cont_cdfplot
for creating a PDF file containing CDF plots
cont_cdftest
for CDF hypothesis testing
Examples
## Not run:
dframe <- data.frame(
siteID = paste0("Site", 1:100),
wgt = runif(100, 10, 100),
xcoord = runif(100),
ycoord = runif(100),
stratum = rep(c("Stratum1", "Stratum2"), 50),
ContVar = rnorm(100, 10, 1),
All_Sites = rep("All Sites", 100),
Resource_Class = rep(c("Good", "Poor"), c(55, 45))
)
myvars <- c("ContVar")
mysubpops <- c("All_Sites", "Resource_Class")
mypopsize <- data.frame(
Resource_Class = c("Good", "Poor"),
Total = c(4000, 1500)
)
myanalysis <- cont_analysis(dframe,
vars = myvars, subpops = mysubpops,
siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
stratumID = "stratum", popsize = mypopsize
)
keep <- with(myanalysis$CDF, Type == "Resource_Class" &
Subpopulation == "Good")
par(mfrow = c(2, 1))
cdf_plot(myanalysis$CDF[keep, ],
xlab = "ContVar",
ylab = "Percent of Stream Length", ylab_r = "Stream Length (km)",
main = "Estimates for Resource Class: Good"
)
cdf_plot(myanalysis$CDF[keep, ],
xlab = "ContVar",
ylab = "Percent of Stream Length", ylab_r = "Same",
main = "Estimates for Resource Class: Good"
)
## End(Not run)
Change analysis
Description
This function organizes input and output for the estimation of change between two
samples (for categorical and continuous variables). The analysis data,
dframe
, can be either a data frame or a simple features (sf
) object. If an
sf
object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord
and ycoord
are assigned values
"xcoord"
and "ycoord"
, respectively, and the geometry column is
dropped from the object.
Usage
change_analysis(
dframe,
vars_cat = NULL,
vars_cont = NULL,
test = "mean",
subpops = NULL,
surveyID = "surveyID",
survey_names = NULL,
siteID = "siteID",
weight = "weight",
revisitwgt = FALSE,
xcoord = NULL,
ycoord = NULL,
stratumID = NULL,
clusterID = NULL,
weight1 = NULL,
xcoord1 = NULL,
ycoord1 = NULL,
sizeweight = FALSE,
sweight = NULL,
sweight1 = NULL,
fpc = NULL,
popsize = NULL,
vartype = "Local",
jointprob = "overton",
conf = 95,
All_Sites = FALSE
)
Arguments
dframe |
Data to be analyzed (analysis data). A data frame or
|
vars_cat |
Vector composed of character values that identify the
names of categorical response variables in |
vars_cont |
Vector composed of character values that identify the
names of continuous response variables in |
test |
Character string or character vector providing the location
measure(s) to use for change estimation for continuous variables. The
choices are |
subpops |
Vector composed of character values that identify the
names of subpopulation (domain) variables in |
surveyID |
Character value providing name of the survey ID variable in
|
survey_names |
Character vector of length two that provides the survey
names contained in the |
siteID |
Character value providing name of the site ID variable in
|
weight |
Character value providing name of the design weight
variable in |
revisitwgt |
Logical value that indicates whether each repeat visit
site has the same design weight in the two surveys, where
|
xcoord |
Character value providing name of the x-coordinate variable in
|
ycoord |
Character value providing name of the y-coordinate variable in
|
stratumID |
Character value providing name of the stratum ID variable in
|
clusterID |
Character value providing the name of the cluster
(stage one) ID variable in |
weight1 |
Character value providing name of the stage one weight
variable in |
xcoord1 |
Character value providing the name of the stage one
x-coordinate variable in |
ycoord1 |
Character value providing the name of the stage one
y-coordinate variable in |
sizeweight |
Logical value that indicates whether size weights should be
used during estimation, where |
sweight |
Character value providing the name of the size weight variable
in |
sweight1 |
Character value providing name of the stage one size weight
variable in |
fpc |
Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design:
Example fpc for a single-stage stratified survey design:
Example fpc for a two-stage unstratified survey design:
Example fpc for a two-stage stratified survey design:
|
popsize |
Object that provides values for the population argument of the
Example popsize for calibration:
Example popsize for post-stratification using a data frame:
Example popsize for post-stratification using a table:
Example popsize for post-stratification using an xtabs object:
|
vartype |
Character value providing the choice of the variance
estimator, where |
jointprob |
Character value providing the choice of joint inclusion
probability approximation for use with Horvitz-Thompson and Yates-Grundy
variance estimators, where |
conf |
Numeric value providing the Gaussian-based confidence level. The default value
is |
All_Sites |
A logical variable used when |
Value
List of change estimates composed of four items:
(1) catsum
contains change estimates for categorical variables,
(2) contsum_mean
contains estimates for continuous variables using
the mean, (3) contsum_total
contains estimates for continuous
variables using the total, and (4) contsum_median
contains estimates for continuous
variables using the median. The items in the list will contain NULL
for estimates that were not calculated. Each data frame includes estimates
for all combinations of population Types, subpopulations within types,
response variables, and categories within each response variable (for
categorical variables and continuous variables using the median). Change
estimates are provided plus standard error estimates and confidence
interval estimates.
The catsum
data frame contains the following variables:
- Survey_1
first survey name
- Survey_2
second survey name
- Type
subpopulation (domain) name
- Subpopulation
subpopulation name within a domain
- Indicator
response variable
- Category
category of response variable
- DiffEst.P
proportion difference estimate (in %; second survey - first survey)
- StdError.P
standard error of proportion difference estimate
- MarginofError.P
margin of error of proportion difference estimate
- LCBxxPct.P
xx% (default 95%) lower confidence bound of proportion difference estimate
- UCBxxPct.P
xx% (default 95%) upper confidence bound of proportion difference estimate
- Estimate.U
total difference estimate (second survey - first survey)
- StdError.U
standard error of total difference estimate
- MarginofError.U
margin of error of total difference estimate
- LCBxxPct.U
xx% (default 95%) lower confidence bound of total difference estimate
- UCBxxPct.U
xx% (default 95%) upper confidence bound of total difference estimate
- nResp_1
sample size in the first survey
- Estimate.P_1
proportion estimate (in %) from the first survey
- StdError.P_1
standard error of proportion estimate from the first survey
- MarginofError.P_1
margin of error of proportion estimate from the first survey
- LCBxxPct.P_1
xx% (default 95%) lower confidence bound of proportion estimate from the first survey
- UCBxxPct.P_1
xx% (default 95%) upper confidence bound of proportion estimate from the first survey
- nResp_2
sample size in the second survey
- Estimate.U_1
total estimate from the first survey
- StdError.U_1
standard error of total estimate from the first survey
- MarginofError.U_1
margin of error of total estimate from the first survey
- LCBxxPct.U_1
xx% (default 95%) lower confidence bound of total estimate from the first survey
- UCBxxPct.U_1
xx% (default 95%) upper confidence bound of total estimate from the first survey
- Estimate.P_2
proportion estimate (in %) from the second survey
- StdError.P_2
standard error of proportion estimate from the second survey
- MarginofError.P_2
margin of error of proportion estimate from the second survey
- LCBxxPct.P_2
xx% (default 95%) lower confidence bound of proportion estimate from the second survey
- UCBxxPct.P_2
xx% (default 95%) upper confidence bound of proportion estimate from the second survey
- Estimate.U_2
total estimate from the second survey
- StdError.U_2
standard error of total estimate from the second survey
- MarginofError.U_2
margin of error of total estimate from the second survey
- LCBxxPct.U_2
xx% (default 95%) lower confidence bound of total estimate from the second survey
- UCBxxPct.U_2
xx% (default 95%) upper confidence bound of total estimate from the second survey
The contsum_mean
data frame contains the following variables:
- Survey_1
first survey name
- Survey_2
second survey name
- Type
subpopulation (domain) name
- Subpopulation
subpopulation name within a domain
- Indicator
response variable
- Statistic
value of percentile
- nResp
sample size at or below
Value
- DiffEst
mean difference estimate
- StdError
standard error of mean difference estimate
- MarginofError
margin of error of mean difference estimate
- LCBxxPct
xx% (default 95%) lower confidence bound of mean difference estimate
- UCBxxPct
xx% (default 95%) upper confidence bound of mean difference estimate
- nResp_1
sample size in the first survey
- Estimate_1
mean estimate from the first survey
- StdError_1
standard error of mean estimate from the first survey
- MarginofError_1
margin of error of mean estimate from the first survey
- LCBxxPct_1
xx% (default 95%) lower confidence bound of mean estimate from the first survey
- UCBxxPct_1
xx% (default 95%) upper confidence bound of mean estimate from the first survey
- nResp_2
sample size in the second survey
- Estimate_2
mean estimate from the second survey
- StdError_2
standard error of mean estimate from the second survey
- MarginofError_2
margin of error of mean estimate from the second survey
- LCBxxPct_2
xx% (default 95%) lower confidence bound of mean estimate from the second survey
- UCBxxPct_2
xx% (default 95%) upper confidence bound of mean estimate from the second survey
The contsum_total
data frame contains the following variables:
- Survey_1
first survey name
- Survey_2
second survey name
- Type
subpopulation (domain) name
- Subpopulation
subpopulation name within a domain
- Indicator
response variable
- Statistic
value of percentile
- nResp
sample size at or below
Value
- DiffEst
total difference estimate
- StdError
standard error of total difference estimate
- MarginofError
margin of error of total difference estimate
- LCBxxPct
xx% (default 95%) lower confidence bound of total difference estimate
- UCBxxPct
xx% (default 95%) upper confidence bound of total difference estimate
- nResp_1
sample size in the first survey
- Estimate_1
total estimate from the first survey
- StdError_1
standard error of total estimate from the first survey
- MarginofError_1
margin of error of total estimate from the first survey
- LCBxxPct_1
xx% (default 95%) lower confidence bound of total estimate from the first survey
- UCBxxPct_1
xx% (default 95%) upper confidence bound of total estimate from the first survey
- nResp_2
sample size in the second survey
- Estimate_2
total estimate from the second survey
- StdError_2
standard error of total estimate from the second survey
- MarginofError_2
margin of error of total estimate from the second survey
- LCBxxPct_2
xx% (default 95%) lower confidence bound of total estimate from the second survey
- UCBxxPct_2
xx% (default 95%) upper confidence bound of total estimate from the second survey
The contsum_median
data frame contains the following variables:
- Survey_1
first survey name
- Survey_2
second survey name
- Type
subpopulation (domain) name
- Subpopulation
subpopulation name within a domain
- Indicator
response variable
- Category
category of response variable
- DiffEst.P
proportion above or below median difference estimate (in %; second survey - first survey)
- StdError.P
standard error of proportion above or below median difference estimate
- MarginofError.P
margin of error of proportion above or below median difference estimate
- LCBxxPct.P
xx% (default 95%) lower confidence bound of proportion above or below median difference estimate
- UCBxxPct.P
xx% (default 95%) upper confidence bound of proportion above or below median difference estimate
- Estimate.U
total above or below median difference estimate (second survey - first survey)
- StdError.U
standard error of total above or below median difference estimate
- MarginofError.U
margin of error of total above or below median difference estimate
- LCBxxPct.U
xx% (default 95%) lower confidence bound of total above or below median difference estimate
- UCBxxPct.U
xx% (default 95%) upper confidence bound of total above or below median difference estimate
- nResp_1
sample size in the first survey
- Estimate.P_1
proportion above or below median estimate (in %) from the first survey
- StdError.P_1
standard error of proportion above or below median estimate from the first survey
- MarginofError.P_1
margin of error of proportion above or below median estimate from the first survey
- LCBxxPct.P_1
xx% (default 95%) lower confidence bound of proportion above or below median estimate from the first survey
- UCBxxPct.P_1
xx% (default 95%) upper confidence bound of proportion above or below median estimate from the first survey
- nResp_2
sample size in the second survey
- Estimate.U_1
total above or below median estimate from the first survey
- StdError.U_1
standard error of total above or below median estimate from the first survey
- MarginofError.U_1
margin of error of total above or below median estimate from the first survey
- LCBxxPct.U_1
xx% (default 95%) lower confidence bound of total above or below median estimate from the first survey
- UCBxxPct.U_1
xx% (default 95%) upper confidence bound of total above or below median estimate from the first survey
- Estimate.P_2
proportion above or below median estimate (in %) from the second survey
- StdError.P_2
standard error of proportion above or below median estimate from the second survey
- MarginofError.P_2
margin of error of proportion above or below median estimate from the second survey
- LCBxxPct.P_2
xx% (default 95%) lower confidence bound of proportion above or below median estimate from the second survey
- UCBxxPct.P_2
xx% (default 95%) upper confidence bound of proportion above or below median estimate from the second survey
- Estimate.U_2
total above or below median estimate from the second survey
- StdError.U_2
standard error of total above or below median estimate from the second survey
- MarginofError.U_2
margin of error of total above or below median estimate from the second survey
- LCBxxPct.U_2
xx% (default 95%) lower confidence bound of total above or below median estimate from the second survey
- UCBxxPct.U_2
xx% (default 95%) upper confidence bound of total above or below median estimate from the second survey
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
trend_analysis
for trend analysis
Examples
# Categorical variable example for three resource classes
dframe <- data.frame(
surveyID = rep(c("Survey 1", "Survey 2"), c(100, 100)),
siteID = paste0("Site", 1:200),
wgt = runif(200, 10, 100),
xcoord = runif(200),
ycoord = runif(200),
stratum = rep(rep(c("Stratum 1", "Stratum 2"), c(2, 2)), 50),
CatVar = rep(c("North", "South"), 100),
All_Sites = rep("All Sites", 200),
Resource_Class = sample(c("Good", "Fair", "Poor"), 200, replace = TRUE)
)
myvars <- c("CatVar")
mysubpops <- c("All_Sites", "Resource_Class")
change_analysis(dframe,
vars_cat = myvars, subpops = mysubpops,
surveyID = "surveyID", siteID = "siteID", weight = "wgt",
xcoord = "xcoord", ycoord = "ycoord", stratumID = "stratum"
)
Continuous variable analysis
Description
This function organizes input and output for the analysis of continuous
variables. The analysis data, dframe
, can be either a data frame or a
simple features (sf
) object. If an sf
object is used,
coordinates are extracted from the geometry column in the object, arguments
xcoord
and ycoord
are assigned values "xcoord"
and
"ycoord"
, respectively, and the geometry column is dropped from the
object.
Usage
cont_analysis(
dframe,
vars,
subpops = NULL,
siteID = NULL,
weight = "weight",
xcoord = NULL,
ycoord = NULL,
stratumID = NULL,
clusterID = NULL,
weight1 = NULL,
xcoord1 = NULL,
ycoord1 = NULL,
sizeweight = FALSE,
sweight = NULL,
sweight1 = NULL,
fpc = NULL,
popsize = NULL,
vartype = "Local",
jointprob = "overton",
conf = 95,
pctval = c(5, 10, 25, 50, 75, 90, 95),
statistics = c("CDF", "Pct", "Mean", "Total"),
All_Sites = FALSE
)
Arguments
dframe |
Data to be analyzed (analysis data). A data frame or
|
vars |
Vector composed of character values that identify the
names of response variables in |
subpops |
Vector composed of character values that identify the
names of subpopulation (domain) variables in |
siteID |
Character value providing name of the site ID variable in
the |
weight |
Character value providing name of the design weight
variable in |
xcoord |
Character value providing name of the x-coordinate variable in
the |
ycoord |
Character value providing name of the y-coordinate variable in
the |
stratumID |
Character value providing name of the stratum ID variable in
the |
clusterID |
Character value providing the name of the cluster
(stage one) ID variable in |
weight1 |
Character value providing name of the stage one weight
variable in |
xcoord1 |
Character value providing the name of the stage one
x-coordinate variable in |
ycoord1 |
Character value providing the name of the stage one
y-coordinate variable in |
sizeweight |
Logical value that indicates whether size weights should be
used during estimation, where |
sweight |
Character value providing the name of the size weight variable
in |
sweight1 |
Character value providing name of the stage one size weight
variable in |
fpc |
Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design:
Example fpc for a single-stage stratified survey design:
Example fpc for a two-stage unstratified survey design:
Example fpc for a two-stage stratified survey design:
|
popsize |
Object that provides values for the population argument of the
Example popsize for calibration:
Example popsize for post-stratification using a data frame:
Example popsize for post-stratification using a table:
Example popsize for post-stratification using an xtabs object:
|
vartype |
Character value providing the choice of the variance
estimator, where |
jointprob |
Character value providing the choice of joint inclusion
probability approximation for use with Horvitz-Thompson and Yates-Grundy
variance estimators, where |
conf |
Numeric value providing the Gaussian-based confidence level. The default value
is |
pctval |
Vector of the set of values at which percentiles are
estimated. The default set is: |
statistics |
Character vector specifying desired estimates, where
|
All_Sites |
A logical variable used when |
Value
The analysis results. A list composed of one, two, three, or four
data frames that contain population estimates for all combinations of
subpopulations, categories within each subpopulation, and response
variables, where the number of data frames is determined by argument
statistics
. The possible data frames in the output list are:
CDF
: a data frame containing CDF estimates
Pct
: data frame containing percentile estimates
Mean
: a data frame containing mean estimates
Total
: a data frame containing total estimates
The CDF
data frame contains the following variables:
- Type
subpopulation (domain) name
- Subpopulation
subpopulation name within a domain
- Indicator
response variable
- Value
value of response variable
- nResp
sample size at or below
Value
- Estimate.P
CDF proportion estimate (in %)
- StdError.P
standard error of CDF proportion estimate
- MarginofError.P
margin of error of CDF proportion estimate
- LCBxxPct.P
xx% (default 95%) lower confidence bound of CDF proportion estimate
- UCBxxPct.P
xx% (default 95%) upper confidence bound of CDF proportion estimate
- Estimate.U
CDF total estimate
- StdError.U
standard error of CDF total estimate
- MarginofError.U
margin of error of CDF total estimate
- LCBxxPct.U
xx% (default 95%) lower confidence bound of CDF total estimate
- UCBxxPct.U
xx% (default 95%) upper confidence bound of CDF total estimate
The Pct
data frame contains the following variables:
- Type
subpopulation (domain) name
- Subpopulation
subpopulation name within a domain
- Indicator
response variable
- Statistic
value of percentile
- nResp
sample size at or below
Value
- Estimate
percentile estimate
- StdError
standard error of percentile estimate
- MarginofError
margin of error of percentile estimate
- LCBxxPct
xx% (default 95%) lower confidence bound of percentile estimate
- UCBxxPct
xx% (default 95%) upper confidence bound of percentile estimate
The Mean
data frame contains the following variables:
- Type
subpopulation (domain) name
- Subpopulation
subpopulation name within a domain
- Indicator
response variable
- nResp
sample size at or below
Value
- Estimate
mean estimate
- StdError
standard error of mean estimate
- MarginofError
margin of error of mean estimate
- LCBxxPct
xx% (default 95%) lower confidence bound of mean estimate
- UCBxxPct
xx% (default 95%) upper confidence bound of mean estimate
The Total
data frame contains the following variables:
- Type
subpopulation (domain) name
- Subpopulation
subpopulation name within a domain
- Indicator
response variable
- nResp
sample size at or below
Value
- Estimate
total estimate
- StdError
standard error of total estimate
- MarginofError
margin of error of total estimate
- LCBxxPct
xx% (default 95%) lower confidence bound of total estimate
- UCBxxPct
xx% (default 95%) upper confidence bound of total estimate
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
cat_analysis
for categorical variable analysis
Examples
dframe <- data.frame(
siteID = paste0("Site", 1:100),
wgt = runif(100, 10, 100),
xcoord = runif(100),
ycoord = runif(100),
stratum = rep(c("Stratum1", "Stratum2"), 50),
ContVar = rnorm(100, 10, 1),
All_Sites = rep("All Sites", 100),
Resource_Class = rep(c("Good", "Poor"), c(55, 45))
)
myvars <- c("ContVar")
mysubpops <- c("All_Sites", "Resource_Class")
mypopsize <- data.frame(
Resource_Class = c("Good", "Poor"),
Total = c(4000, 1500)
)
cont_analysis(dframe,
vars = myvars, subpops = mysubpops, siteID = "siteID",
weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
stratumID = "stratum", popsize = mypopsize, statistics = "Mean"
)
Create a PDF file containing cumulative distribution functions (CDF) plots
Description
This function creates a PDF file containing CDF plots. Input data for the
plots is provided by a data frame with the same structure as the "CDF"
output from cont_analysis
. Plots are produced for every combination of Type
of
population, Subpopulation
within Type
, and Indicator
(every combination
of subpopulations, subpopulation levels, and variables).
Usage
cont_cdfplot(
pdffile = "cdf2x2.pdf",
cdfest,
units_cdf = "Percent",
ind_type = rep("Continuous", nind),
log = rep("", nind),
xlab = NULL,
ylab = NULL,
ylab_r = NULL,
legloc = NULL,
cdf_page = 4,
width = 10,
height = 8,
confcut = 0,
cex.main = 1.2,
cex.legend = 1,
...
)
Arguments
pdffile |
Name of the PDF file. The default is "cdf2x2.pdf". |
cdfest |
Data frame with the same structure as the "CDF"
output from |
units_cdf |
Indicator for the label utilized for the left side y-axis and the values used for the left side y-axis tick marks, where "Percent" means the label and values are in terms of percent of the population, and "Units" means the label and values are in terms of units (count, length, or area) of the population. The default is "Percent". |
ind_type |
Character vector consisting of the values "Continuous" or "Ordinal" that controls the type of CDF plot for each indicator. The default is "Continuous" for every indicator. |
log |
Character vector consisting of the values "" or "x" that controls whether the x axis uses the original scale ("") or the base 10 logarithmic scale ("x") for each indicator. The default is "" for every indicator. |
xlab |
Character vector consisting of the x-axis label for each indicator. If this argument equals NULL, then indicator names are used as the labels. The default is NULL. |
ylab |
Character string providing the left side y-axis label. If argument units_cdf equals "Units", a value should be provided for this argument. Otherwise, the label will be "Percent". The default is "Percent". |
ylab_r |
Character string providing the label for the right side y-axis (and, hence, determining the values used for the right side y-axis tick marks), where NULL means a right side y-axis is not created. If this argument equals "Same", the right side y-axis will have the same label and tick mark values as the left side y-axis. If this argument equals a character string other than "Same", the right side y-axis label will be the value provided for argument ylab_r, and the right side y-axis tick mark values will be determined by the choice not utilized for argument units_cdf, which means that the default value of argument units_cdf (i.e., "Percent") will result in the right side y-axis tick mark values being expressed in terms of units of the population (i.e., count, length, or area). The default is NULL. |
legloc |
Indicator for location of the plot legend, where "BR" means bottom right, "BL" means bottom left, "TR" means top right, "TL" means top left, and NULL means no legend. The default is NULL. |
cdf_page |
Number of CDF plots on each page, which must be chosen from the values: 1, 2, 4, or 6. The default is 4. |
width |
Width of the graphic region in inches. The default is 10. |
height |
Height of the graphic region in inches. The default is 8. |
confcut |
Numeric value that controls plotting confidence limits at the CDF extremes. Confidence limits for CDF values (percent scale) less than confcut or greater than 100 minus confcut are not plotted. A value of zero means confidence limits are plotted for the complete range of the CDF. The default is 0. |
cex.main |
Expansion factor for the plot title. The default is 1.2. |
cex.legend |
Expansion factor for the legend title. The default is 1. |
... |
Additional arguments passed to the |
Value
A PDF file containing the CDF plots.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
cdf_plot
for plotting a cumulative distribution function (CDF)
cont_cdftest
for CDF hypothesis testing
Examples
## Not run:
dframe <- data.frame(
siteID = paste0("Site", 1:100),
wgt = runif(100, 10, 100),
xcoord = runif(100),
ycoord = runif(100),
stratum = rep(c("Stratum1", "Stratum2"), 50),
ContVar = rnorm(100, 10, 1),
All_Sites = rep("All Sites", 100),
Resource_Class = rep(c("Good", "Poor"), c(55, 45))
)
myvars <- c("ContVar")
mysubpops <- c("All_Sites", "Resource_Class")
mypopsize <- data.frame(
Resource_Class = c("Good", "Poor"),
Total = c(4000, 1500)
)
myanalysis <- cont_analysis(dframe,
vars = myvars, subpops = mysubpops,
siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
stratumID = "stratum", popsize = mypopsize
)
cont_cdfplot("myanalysis.pdf", myanalysis$CDF, ylab_r = "Stream Length (km)")
## End(Not run)
Cumulative distribution function (CDF) inference for a probability survey
Description
This function organizes input and output for conducting inference regarding cumulative distribution functions (CDFs) generated by a probability survey. For every response variable and every subpopulation (domain) variable, differences between CDFs are tested for every pair of subpopulations within the domain. Data input to the function can be either a single survey or multiple surveys (two or more). If the data contain multiple surveys, then the domain variables will reference those surveys and (potentially) subpopulations within those surveys. The inferential procedures divide the CDFs into a discrete set of intervals (classes) and then utilize procedures that have been developed for analysis of categorical data from probability surveys. Choices for inference are the Wald, adjusted Wald, Rao-Scott first order corrected (mean eigenvalue corrected), and Rao-Scott second order corrected (Satterthwaite corrected) test statistics. The default test statistic is the adjusted Wald statistic. The input data argument can be either a data frame or a simple features (sf) object. If an sf object is used, coordinates are extracted from the geometry column in the object, arguments xcoord and ycoord are assigned values "xcoord" and "ycoord", respectively, and the geometry column is dropped from the object.
Usage
cont_cdftest(
dframe,
vars,
subpops = NULL,
surveyID = NULL,
siteID = "siteID",
weight = "weight",
xcoord = NULL,
ycoord = NULL,
stratumID = NULL,
clusterID = NULL,
weight1 = NULL,
xcoord1 = NULL,
ycoord1 = NULL,
sizeweight = FALSE,
sweight = NULL,
sweight1 = NULL,
fpc = NULL,
popsize = NULL,
vartype = "Local",
jointprob = "overton",
testname = "adjWald",
nclass = 3
)
Arguments
dframe |
Data frame containing survey design variables, response variables, and subpopulation (domain) variables. |
vars |
Vector composed of character values that identify the
names of response variables in the |
subpops |
Vector composed of character values that identify the
names of subpopulation (domain) variables in the |
surveyID |
Character value providing name of the survey ID variable in
the |
siteID |
Character value providing name of the site ID variable in
the |
weight |
Character value providing name of the survey design weight
variable in the |
xcoord |
Character value providing name of the x-coordinate variable in
the |
ycoord |
Character value providing name of the y-coordinate variable in
the |
stratumID |
Character value providing name of the stratum ID variable in
the |
clusterID |
Character value providing the name of the cluster
(stage one) ID variable in the |
weight1 |
Character value providing name of the stage one weight
variable in the |
xcoord1 |
Character value providing the name of the stage one
x-coordinate variable in the |
ycoord1 |
Character value providing the name of the stage one
y-coordinate variable in the |
sizeweight |
Logical value that indicates whether size weights should be
used during estimation, where |
sweight |
Character value providing the name of the size weight variable
in the |
sweight1 |
Character value providing name of the stage one size weight
variable in the |
fpc |
Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design:
Example fpc for a single-stage stratified survey design:
Example fpc for a two-stage unstratified survey design:
Example fpc for a two-stage stratified survey design:
|
popsize |
Object that provides values for the population argument of the
Example popsize for calibration:
Example popsize for post-stratification using a data frame:
Example popsize for post-stratification using a table:
Example popsize for post-stratification using an xtabs object:
|
vartype |
Character value providing the choice of the variance
estimator, where |
jointprob |
Character value providing the choice of joint inclusion
probability approximation for use with Horvitz-Thompson and Yates-Grundy
variance estimators, where |
testname |
Name of the test statistic to be reported in the output
data frame. Choices for the name are: |
nclass |
Number of classes into which the CDFs will be divided
(binned), which must equal at least |
Value
Data frame of CDF test results for all pairs of subpopulations
within each population type for every response variable. The data frame
includes the test statistic specified by argument testname
plus its
degrees of freedom and p-value.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
cdf_plot
for visualizing CDF plots
cont_cdfplot
for making CDF plots output to pdfs
Examples
n <- 200
mysiteID <- paste("Site", 1:n, sep = "")
dframe <- data.frame(
siteID = mysiteID,
wgt = runif(n, 10, 100),
xcoord = runif(n),
ycoord = runif(n),
stratum = rep(c("Stratum1", "Stratum2"), n / 2),
Resource_Class = sample(c("Agr", "Forest", "Urban"), n, replace = TRUE)
)
ContVar <- numeric(n)
tst <- dframe$Resource_Class == "Agr"
ContVar[tst] <- rnorm(sum(tst), 10, 1)
tst <- dframe$Resource_Class == "Forest"
ContVar[tst] <- rnorm(sum(tst), 10.1, 1)
tst <- dframe$Resource_Class == "Urban"
ContVar[tst] <- rnorm(sum(tst), 10.5, 1)
dframe$ContVar <- ContVar
myvars <- c("ContVar")
mysubpops <- c("Resource_Class")
mypopsize <- data.frame(
Resource_Class = rep(c("Agr", "Forest", "Urban"), rep(2, 3)),
stratum = rep(c("Stratum1", "Stratum2"), 3),
Total = c(2500, 1500, 1000, 500, 600, 450)
)
cont_cdftest(dframe,
vars = myvars, subpops = mysubpops, siteID = "siteID",
weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
stratumID = "stratum", popsize = mypopsize, testname = "RaoScott_First"
)
Create a covariance matrix for a panel design
Description
Covariance structure accounts for the panel design and the four variance components: unit variation, period variation, unit by period interaction variation and index (or residual) variation. The model incorporates unit, period, unit by period, and index variance components. It also includes a provision for unit correlation and period autocorrelation.
Usage
cov_panel_dsgn(
paneldsgn = matrix(50, 1, 10),
nrepeats = 1,
unit_var = NULL,
period_var = NULL,
unitperiod_var = NULL,
index_var = NULL,
unit_rho = 1,
period_rho = 0
)
Arguments
paneldsgn |
A matrix (dimensions: number of panels (rows) by number of periods (columns)) containing the number of units visited for each combination of panel and period. Default is matrix(50, 1, 10) which is a single panel of 50 units visited 10 times, typical time is a period. |
nrepeats |
Either |
unit_var |
The variance component estimate for unit. The default is
|
period_var |
The variance component estimate for period The default is
|
unitperiod_var |
The variance component estimate for unit by period
interaction. The default is |
index_var |
The variance component estimate for index error. The
default is |
unit_rho |
Unit correlation across periods. The default is |
period_rho |
Period autocorrelation. The default is |
Details
Covariance structure accounts for the panel design and the four variance components: unit variation, period variation, unit by period interaction variation and index (or residual) variation. Uses the model structure defined by Urquhart 2012.
If nrepeats
is NULL
, then no units sampled more than once in a specific
panel, period combination) and then unit by period and index variances are
added together or user may have only estimated unit, period and unit by
period variance components so that index component is zero. It calculates
the covariance matrix for the simple linear regression. The standard error
for a linear trend coefficient is the square root of the variance.
Value
A list containing the covariance matrix (cov
) for the panel design,
the input panel design (paneldsgn
), the input nrepeats
design
(nrepeats.dsgn
) and the function call.
Author(s)
Tony Olsen Olsen.Tony@epa.gov
References
Urquhart, N. S., W. S. Overton, et al. (1993) Comparing sampling designs for monitoring ecological status and trends: impact of temporal patterns. In: Statistics for the Environment. V. Barnett and K. F. Turkman. John Wiley & Sons, New York, pp. 71-86.
Urquhart, N. S. and T. M. Kincaid (1999). Designs for detecting trends from repeated surveys of ecological resources. Journal of Agricultural, Biological, and Environmental Statistics, 4(4), 404-414.
Urquhart, N. S. (2012). The role of monitoring design in detecting trend in long-term ecological monitoring studies. In: Design and Analysis of Long-term Ecological Monitoring Studies. R. A. Gitzen, J. J. Millspaugh, A. B. Cooper, and D. S. Licht (eds.). Cambridge University Press, New York, pp. 151-173.
See Also
power_dsgn
for power calculations of multiple panel designs
Risk difference analysis
Description
This function organizes input and output for risk difference analysis (of
categorical variables). The analysis data,
dframe
, can be either a data frame or a simple features (sf
) object. If an
sf
object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord
and ycoord
are assigned values
"xcoord"
and "ycoord"
, respectively, and the geometry column is
dropped from the object.
Usage
diffrisk_analysis(
dframe,
vars_response,
vars_stressor,
response_levels = NULL,
stressor_levels = NULL,
subpops = NULL,
siteID = NULL,
weight = "weight",
xcoord = NULL,
ycoord = NULL,
stratumID = NULL,
clusterID = NULL,
weight1 = NULL,
xcoord1 = NULL,
ycoord1 = NULL,
sizeweight = FALSE,
sweight = NULL,
sweight1 = NULL,
fpc = NULL,
popsize = NULL,
vartype = "Local",
conf = 95,
All_Sites = FALSE
)
Arguments
dframe |
Data to be analyzed (analysis data). A data frame or
|
vars_response |
Vector composed of character values that identify the
names of response variables in |
vars_stressor |
Vector composed of character values that identify the
names of stressor variables in |
response_levels |
List providing the category values (levels) for each
element in the |
stressor_levels |
List providing the category values (levels) for each
element in the |
subpops |
Vector composed of character values that identify the
names of subpopulation (domain) variables in |
siteID |
Character value providing the name of the site ID variable in
|
weight |
Character value providing the name of the design weight
variable in |
xcoord |
Character value providing name of the x-coordinate variable in
|
ycoord |
Character value providing name of the y-coordinate variable in
|
stratumID |
Character value providing the name of the stratum ID
variable in |
clusterID |
Character value providing the name of the cluster
(stage one) ID variable in |
weight1 |
Character value providing the name of the stage one weight
variable in |
xcoord1 |
Character value providing the name of the stage one
x-coordinate variable in |
ycoord1 |
Character value providing the name of the stage one
y-coordinate variable in |
sizeweight |
Logical value that indicates whether size weights should be
used during estimation, where |
sweight |
Character value providing the name of the size weight variable
in |
sweight1 |
Character value providing the name of the stage one size
weight variable in |
fpc |
Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design:
Example fpc for a single-stage stratified survey design:
Example fpc for a two-stage unstratified survey design:
Example fpc for a two-stage stratified survey design:
|
popsize |
Object that provides values for the population argument of the
Example popsize for calibration:
Example popsize for post-stratification using a data frame:
Example popsize for post-stratification using a table:
Example popsize for post-stratification using an xtabs object:
|
vartype |
Character value providing the choice of the variance
estimator, where |
conf |
Numeric value providing the Gaussian-based confidence level. The default value
is |
All_Sites |
A logical variable used when |
Value
The analysis results. A data frame of population estimates for all combinations of subpopulations, categories within each subpopulation, response variables, and categories within each response variable. Estimates are provided for proportion and size of the population plus standard error, margin of error, and confidence interval estimates. The data frame contains the following variables:
- Type
subpopulation (domain) name
- Subpopulation
subpopulation name within a domain
- Response
response variable
- Stressor
stressor variable
- nResp
sample size
- Estimate
risk difference estimate
- Estimate_StressPoor
risk estimate for poor condition stressor
- Estimate_StressGood
risk estimate for good condition stressor
- StdError
risk difference standard error
- MarginofError
risk difference margin of error
- LCBxxPct
xx% (default 95%) lower confidence bound
- UCBxxPct
xx% (default 95%) upper confidence bound
- WeightTotal
sum of design weights
- Count_RespPoor_StressPoor
number of observations in the poor response and poor stressor group
- Count_RespPoor_StressGood
number of observations in the poor response and good stressor group
- Count_RespGood_StressPoor
number of observations in the good response and poor stressor group
- Count_RespGood_StressGood
number of observations in the good response and good stressor group
- Prop_RespPoor_StressPoor
weighted proportion of observations in the poor response and poor stressor group
- Prop_RespPoor_StressGood
weighted proportion of observations in the poor response and good stressor group
- Prop_RespGood_StressPoor
weighted proportion of observations in the good response and poor stressor group
- Prop_RespGood_StressGood
weighted proportion of observations in the good response and good stressor group
Details
Risk difference measures the absolute strength of association between conditional probabilities defined for a response variable and a stressor variable, where the response and stressor variables are classified as either good (i.e., reference condition) or poor (i.e., different from reference condition). Risk difference is defined as the difference between two conditional probabilities: the probability that the response variable is in poor condition given that the stressor variable is in poor condition and the probability that the response variable is in poor condition given that the stressor variable is in good condition. Risk difference values close to zero indicate that the stressor variable has little or no impact on the probability that the response variable is in poor condition. Risk difference values much greater than zero indicate that the stressor variable has a significant impact on the probability that the response variable is in poor condition.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
attrisk_analysis
for attributable risk analysis
relrisk_analysis
for relative risk analysis
Examples
dframe <- data.frame(
siteID = paste0("Site", 1:100),
wgt = runif(100, 10, 100),
xcoord = runif(100),
ycoord = runif(100),
stratum = rep(c("Stratum1", "Stratum2"), 50),
RespVar1 = sample(c("Poor", "Good"), 100, replace = TRUE),
RespVar2 = sample(c("Poor", "Good"), 100, replace = TRUE),
StressVar = sample(c("Poor", "Good"), 100, replace = TRUE),
All_Sites = rep("All Sites", 100),
Resource_Class = rep(c("Agr", "Forest"), c(55, 45))
)
myresponse <- c("RespVar1", "RespVar2")
mystressor <- c("StressVar")
mysubpops <- c("All_Sites", "Resource_Class")
diffrisk_analysis(dframe,
vars_response = myresponse,
vars_stressor = mystressor, subpops = mysubpops, siteID = "siteID",
weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
stratumID = "stratum"
)
Print errors from analysis functions
Description
This function prints the error messages vector in the analysis functions.
Usage
errorprnt(error_vec = get("error_vec", envir = .GlobalEnv))
Arguments
error_vec |
Data frame that contains error messages. The default is
|
Value
Printed errors.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
Select a generalized random tessellation stratified (GRTS) sample
Description
Select a spatially balanced sample from a point (finite), linear / linestring (infinite), or areal / polygon (infinite) sampling frame using the Generalized Random Tessellation Stratified (GRTS) algorithm. The GRTS algorithm accommodates unstratified and stratified sampling designs and allows for equal inclusion probabilities, unequal inclusion probabilities according to a categorical variable, and inclusion probabilities proportional to a positive auxiliary variable. Several additional sampling options are included, such as including legacy (historical) sites, requiring a minimum distance between sites, and selecting replacement sites. For technical details, see Stevens and Olsen (2004).
Usage
grts(
sframe,
n_base,
stratum_var = NULL,
seltype = NULL,
caty_var = NULL,
caty_n = NULL,
aux_var = NULL,
legacy_var = NULL,
legacy_sites = NULL,
legacy_stratum_var = NULL,
legacy_caty_var = NULL,
legacy_aux_var = NULL,
mindis = NULL,
maxtry = 10,
n_over = NULL,
n_near = NULL,
wgt_units = NULL,
pt_density = NULL,
DesignID = "Site",
SiteBegin = 1,
sep = "-",
projcrs_check = TRUE
)
Arguments
sframe |
A sampling frame as an |
n_base |
The base sample size required. If the sampling design is unstratified,
this is a single numeric value. If the sampling design is stratified, this is a named
vector or list whose names represent each stratum and whose values represent each
stratum's sample size. These names must match the values of the stratification
variable represented by |
stratum_var |
A character string containing the name of the column from
|
seltype |
A character string or vector indicating the inclusion probability type,
which must be one of following: |
caty_var |
A character string containing the name of the column from
|
caty_n |
A character vector indicating the expected sample size for each
level of |
aux_var |
A character string containing the name of the column from
|
legacy_var |
This argument can be used instead of |
legacy_sites |
An sf object with a |
legacy_stratum_var |
A character string containing the name of the column from
|
legacy_caty_var |
A character string containing the name of the column from
|
legacy_aux_var |
A character string containing the name of the column from
|
mindis |
A numeric value indicating the desired minimum distance between sampled
sites. If the sampling design is stratified and |
maxtry |
The number of maximum attempts to apply the minimum distance algorithm to obtain
the desired minimum distance between sites. Each iteration takes roughly as long as the
standard GRTS algorithm. Successive iterations will always contain at least as many
sites satisfying the minimum distance requirement as the previous iteration. The algorithm stops
when the minimum distance requirement is met or there are |
n_over |
The number of reverse hierarchically ordered (rho) replacement sites.
If the sampling design is unstratified, then
|
n_near |
The number of nearest neighbor (nn) replacement sites.
If the sampling design is unstratified, |
wgt_units |
The units used to compute the design weights. These
units must be standard units as defined by the |
pt_density |
A positive integer controlling the density of the GRTS approximation
for infinite sampling frames. The GRTS approximation for infinite sample
frames vastly improves computational efficiency by generating many finite points and
selecting a sample from the points. |
DesignID |
A character string indicating the naming structure for each
site's identifier selected in the sample, which is matched with |
SiteBegin |
A character string indicating the first number to use to match
with |
sep |
A character string that acts as a separator between
|
projcrs_check |
A check for whether the coordinates are projected. If |
Details
n_base
is the number of sites used to calculate
the design weights, which is typically the number of sites used in an analysis. When a panel sampling design is implemented, n_base
is typically the
number of sites in all panels that will be sampled in the same temporal period –
n_base
is not the total number of sites in all panels. The sum of n_base
and
n_over
is equal to the total number of sites to be visited for all panels plus
any replacement sites that may be required.
Value
The sampling design sites and additional information about the sampling design. More specifically, it is, a list with five elements:
-
sites_legacy
An sf object containing legacy sites. This isNULL
if legacy sites were not included in the sample. -
sites_base
An sf object containing the base sites. This isNULL
ifn_base
equals the number of legacy sites. -
sites_over
An sf object containing the reverse hierarchically ordered replacement sites. This isNULL
if no reverse hierarchically ordered replacement sites were included in the sample. -
sites_near
An sf object containing the nearest neighbor replacement sites. This isNULL
if no nearest neighbor replacement sites were included in the sample. -
design
A list documenting the specifications of this sampling design. This can be checked to verify your sampling design ran as intended.-
call
The original function call. -
stratum_var
The name of the stratification variable insframe
. This equalsNULL
if no stratification is used. -
stratum
The unique strata. This equals"None"
if the sampling design is unstratified. -
n_base
The base sample size per stratum. -
seltype
The selection type per stratum. -
caty_var
The name of the unequal probability variable insframe
. This equalsNULL
if no unequal probability variable is used. -
caty_n
The expected sample sizes for each level of the unequal probability grouping variable per stratum. This equalsNULL
whenseltype
is not"unequal"
. -
aux_var
The name of the proportional probability (auxiliary) variable insframe
. This equalsNULL
if no proportional probability variable is used. -
legacy
A logical variable indicating whether legacy sites were included in the sample. -
legacy_stratum_var
The name of the stratification variable inlegacy_sites
. Omitted if legacy sites are not used. This equalsNULL
if legacy sites were used but no stratification variable is used. -
legacy_caty_var
The name of the unequal probability variable inlegacy_sites
. Omitted if legacy sites are not used. This equalsNULL
if legacy sites were used but no unequal probability variable is used. -
legacy_aux_var
The name of the proportional probability (auxiliary) variable inlegacy_sites
. Omitted if legacy sites are not used. This equalsNULL
if legacy sites were used but no proportional probability variable is used. -
mindis
The minimum distance requirement desired. This isNULL
when no minimum distance requirement was applied. -
n_over
The reverse hierarchically ordered replacement site sample sizes per stratum. Ifseltype
isunequal
, this represents the expected sample sizes. This isNULL
when no reverse hierarchically ordered replacement sites were selected. -
n_near
The number of nearest neighbor replacement sites desired. This isNULL
when no nearest neighbor replacement sites were selected.
-
When non-NULL
, the sites_legacy
, sites_base
,
sites_over
, and sites_near
objects contain the original columns
in sframe
and include a few additional columns. These additional columns
are
-
siteID
A site identifier (as named using theDesignID
andSiteBegin
arguments togrts()
). -
siteuse
Whether the site is a legacy site (Legacy
), base site (Base
), reverse hierarchically ordered replacement site (Over
), or nearest neighbor replacement site (Near
). -
replsite
The replacement site ordering.replsite
isNone
if the site is not a replacement site,Next
if it is the next reverse hierarchically ordered replacement site to use, orNear_
, where the word following_
indicates the ordering of sites closest to the originally sampled site. -
lon_WGS84
Longitude coordinates using the WGS84 coordinate system (EPSG:4326). Only given if coordinates are projected. -
lat_WGS84
Latitude coordinates using the WGS84 coordinate system (EPSG:4326). Only given if coordinates are projected. -
X
Longitude coordinates using the provided coordinate system. Only given if coordinates are not projected (i.e., they are geographic or NA). -
Y
Latitude coordinates using the provided coordinate system. Only given if coordinates are not projected (i.e., they are geographic or NA). -
stratum
A stratum indicator.stratum
isNone
if the sampling design was unstratified. If the sampling design wasstratified
,stratum
indicates the stratum. -
wgt
The design weight. -
ip
The site's original inclusion probability (the reciprocal) of (wgt
). -
caty
An unequal probability grouping indicator.caty
isNone
if the sampling design did not use unequal inclusion probabilities. If the sampling design did use unequal inclusion probabilities,caty
indicates the unequal probability level. -
aux
The auxiliary proportional probability variable. This column is only returned ifseltype
wasproportional
in the original sampling design.
If any columns in sframe
contain these names, those columns
from sframe
will be automatically prefixed with sframe_
in the sites
object. When output is printed, a summary of site counts by
the levels in stratum_var
and caty_var
is shown.
Author(s)
Tony Olsen olsen.tony@epa.gov
References
Stevens Jr., Don L. and Olsen, Anthony R. (2004). Spatially balanced sampling of natural resources. Journal of the American Statistical Association, 99(465), 262-278.
See Also
irs
to select a sample that is not spatially balanced
Examples
## Not run:
samp <- grts(NE_Lakes, n_base = 100)
print(samp)
strata_n <- c(low = 25, high = 30)
samp_strat <- grts(NE_Lakes, n_base = strata_n, stratum_var = "ELEV_CAT")
print(samp_strat)
samp_over <- grts(NE_Lakes, n_base = 30, n_over = 5)
print(samp_over)
## End(Not run)
Select an independent random sample (IRS)
Description
Select a sample that is not spatially balanced from a point (finite), linear / linestring (infinite), or areal / polygon (infinite) sampling frame using the Independent Random Sampling (IRS) algorithm. The IRS algorithm accommodates unstratified and stratified sampling designs and allows for equal inclusion probabilities, unequal inclusion probabilities according to a categorical variable, and inclusion probabilities proportional to a positive auxiliary variable. Several additional sampling options are included, such as including legacy (historical) sites, requiring a minimum distance between sites, and selecting replacement sites.
Usage
irs(
sframe,
n_base,
stratum_var = NULL,
seltype = NULL,
caty_var = NULL,
caty_n = NULL,
aux_var = NULL,
legacy_var = NULL,
legacy_sites = NULL,
legacy_stratum_var = NULL,
legacy_caty_var = NULL,
legacy_aux_var = NULL,
mindis = NULL,
maxtry = 10,
n_over = NULL,
n_near = NULL,
wgt_units = NULL,
pt_density = NULL,
DesignID = "Site",
SiteBegin = 1,
sep = "-",
projcrs_check = TRUE
)
Arguments
sframe |
A sampling frame as an |
n_base |
The base sample size required. If the sampling design is unstratified,
this is a single numeric value. If the sampling design is stratified, this is a named
vector or list whose names represent each stratum and whose values represent each
stratum's sample size. These names must match the values of the stratification
variable represented by |
stratum_var |
A character string containing the name of the column from
|
seltype |
A character string or vector indicating the inclusion probability type,
which must be one of following: |
caty_var |
A character string containing the name of the column from
|
caty_n |
A character vector indicating the expected sample size for each
level of |
aux_var |
A character string containing the name of the column from
|
legacy_var |
This argument can be used instead of |
legacy_sites |
An sf object with a |
legacy_stratum_var |
A character string containing the name of the column from
|
legacy_caty_var |
A character string containing the name of the column from
|
legacy_aux_var |
A character string containing the name of the column from
|
mindis |
A numeric value indicating the desired minimum distance between sampled
sites. If the sampling design is stratified and |
maxtry |
The number of maximum attempts to apply the minimum distance algorithm to obtain
the desired minimum distance between sites. Each iteration takes roughly as long as the
standard GRTS algorithm. Successive iterations will always contain at least as many
sites satisfying the minimum distance requirement as the previous iteration. The algorithm stops
when the minimum distance requirement is met or there are |
n_over |
The number of reverse hierarchically ordered (rho) replacement sites.
If the sampling design is unstratified, then
|
n_near |
The number of nearest neighbor (nn) replacement sites.
If the sampling design is unstratified, |
wgt_units |
The units used to compute the design weights. These
units must be standard units as defined by the |
pt_density |
A positive integer controlling the density of the GRTS approximation
for infinite sampling frames. The GRTS approximation for infinite sample
frames vastly improves computational efficiency by generating many finite points and
selecting a sample from the points. |
DesignID |
A character string indicating the naming structure for each
site's identifier selected in the sample, which is matched with |
SiteBegin |
A character string indicating the first number to use to match
with |
sep |
A character string that acts as a separator between
|
projcrs_check |
A check for whether the coordinates are projected. If |
Details
n_base
is the number of sites used to calculate
the design weights, which is typically the number of sites used in an analysis. When a panel sampling design is implemented, n_base
is typically the
number of sites in all panels that will be sampled in the same temporal period –
n_base
is not the total number of sites in all panels. The sum of n_base
and
n_over
is equal to the total number of sites to be visited for all panels plus
any replacement sites that may be required.
Value
The sampling design sites and additional information about the sampling design. More specifically, it is, a list with five elements:
-
sites_legacy
An sf object containing legacy sites. This isNULL
if legacy sites were not included in the sample. -
sites_base
An sf object containing the base sites. This isNULL
ifn_base
equals the number of legacy sites. -
sites_over
An sf object containing the reverse hierarchically ordered replacement sites. This isNULL
if no reverse hierarchically ordered replacement sites were included in the sample. -
sites_near
An sf object containing the nearest neighbor replacement sites. This isNULL
if no nearest neighbor replacement sites were included in the sample. -
design
A list documenting the specifications of this sampling design. This can be checked to verify your sampling design ran as intended.-
call
The original function call. -
stratum_var
The name of the stratification variable insframe
. This equalsNULL
if no stratification is used. -
stratum
The unique strata. This equals"None"
if the sampling design is unstratified. -
n_base
The base sample size per stratum. -
seltype
The selection type per stratum. -
caty_var
The name of the unequal probability variable insframe
. This equalsNULL
if no unequal probability variable is used. -
caty_n
The expected sample sizes for each level of the unequal probability grouping variable per stratum. This equalsNULL
whenseltype
is not"unequal"
. -
aux_var
The name of the proportional probability (auxiliary) variable insframe
. This equalsNULL
if no proportional probability variable is used. -
legacy
A logical variable indicating whether legacy sites were included in the sample. -
legacy_stratum_var
The name of the stratification variable inlegacy_sites
. Omitted if legacy sites are not used. This equalsNULL
if legacy sites were used but no stratification variable is used. -
legacy_caty_var
The name of the unequal probability variable inlegacy_sites
. Omitted if legacy sites are not used. This equalsNULL
if legacy sites were used but no unequal probability variable is used. -
legacy_aux_var
The name of the proportional probability (auxiliary) variable inlegacy_sites
. Omitted if legacy sites are not used. This equalsNULL
if legacy sites were used but no proportional probability variable is used. -
mindis
The minimum distance requirement desired. This isNULL
when no minimum distance requirement was applied. -
n_over
The reverse hierarchically ordered replacement site sample sizes per stratum. Ifseltype
isunequal
, this represents the expected sample sizes. This isNULL
when no reverse hierarchically ordered replacement sites were selected. -
n_near
The number of nearest neighbor replacement sites desired. This isNULL
when no nearest neighbor replacement sites were selected.
-
When non-NULL
, the sites_legacy
, sites_base
,
sites_over
, and sites_near
objects contain the original columns
in sframe
and include a few additional columns. These additional columns
are
-
siteID
A site identifier (as named using theDesignID
andSiteBegin
arguments togrts()
). -
siteuse
Whether the site is a legacy site (Legacy
), base site (Base
), reverse hierarchically ordered replacement site (Over
), or nearest neighbor replacement site (Near
). -
replsite
The replacement site ordering.replsite
isNone
if the site is not a replacement site,Next
if it is the next reverse hierarchically ordered replacement site to use, orNear_
, where the word following_
indicates the ordering of sites closest to the originally sampled site. -
lon_WGS84
Longitude coordinates using the WGS84 coordinate system (EPSG:4326). Only given if coordinates are projected. -
lat_WGS84
Latitude coordinates using the WGS84 coordinate system (EPSG:4326). Only given if coordinates are projected. -
X
Longitude coordinates using the provided coordinate system. Only given if coordinates are not projected (i.e., they are geographic or NA). -
Y
Latitude coordinates using the provided coordinate system. Only given if coordinates are not projected (i.e., they are geographic or NA). -
stratum
A stratum indicator.stratum
isNone
if the sampling design was unstratified. If the sampling design wasstratified
,stratum
indicates the stratum. -
wgt
The design weight. -
ip
The site's original inclusion probability (the reciprocal) of (wgt
). -
caty
An unequal probability grouping indicator.caty
isNone
if the sampling design did not use unequal inclusion probabilities. If the sampling design did use unequal inclusion probabilities,caty
indicates the unequal probability level. -
aux
The auxiliary proportional probability variable. This column is only returned ifseltype
wasproportional
in the original sampling design.
If any columns in sframe
contain these names, those columns
from sframe
will be automatically prefixed with sframe_
in the sites
object. When output is printed, a summary of site counts by
the levels in stratum_var
and caty_var
is shown.
Author(s)
Tony Olsen olsen.tony@epa.gov
See Also
grts
to select a sample that is spatially balanced
Examples
## Not run:
samp <- irs(NE_Lakes, n_base = 100)
print(samp)
strata_n <- c(low = 25, high = 30)
samp_strat <- irs(NE_Lakes, n_base = strata_n, stratum_var = "ELEV_CAT")
print(samp_strat)
samp_over <- irs(NE_Lakes, n_base = 30, n_over = 5)
print(samp_over)
## End(Not run)
Internal Function: Variance-Covariance Matrix Based on Local Mean Estimator
Description
This function calculates the variance-covariance matrix using the local mean estimator.
Usage
localmean_cov(zmat, weight_1st)
Arguments
zmat |
Matrix of weighted response values or weighted residual values for the sample points. |
weight_1st |
List from the local mean weight function containing two
elements: a matrix named |
Value
The local mean estimator of the variance-covariance matrix.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
Internal Function: Local Mean Variance Estimator
Description
This function calculates the local mean variance estimator.
Usage
localmean_var(z, weight_1st)
Arguments
z |
Vector of weighted response values or weighted residual values for the sample points. |
weight_1st |
List from the local mean weight function containing two
elements: a matrix named |
Value
The local mean estimator of the variance.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
Internal Function: Local Mean Variance Neighbors and Weights
Description
This function calculates the index values of neighboring points and associated weights required by the local mean variance estimator.
Usage
localmean_weight(x, y, prb, nbh = 4)
Arguments
x |
Vector of x-coordinates for location of the sample points. |
y |
Vector of y-coordinates for location of the sample points. |
prb |
Vector of inclusion probabilities for the sample points. |
nbh |
Number of neighboring points to use in the calculations. |
Value
If ginv fails to return valid output, a NULL object. Otherwise, a
list containing two elements: a matrix named ij
composed of the
index values of neighboring points and a vector named gwt
composed of weights.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
Summary characteristics of a panel revisit design
Description
Panel revisit design characteristics are summarized: number of panels, number of time periods, total number of sample events for the revisit design, total number of sample events for each panel, total number of sample events for each time period and cumulative number of unique units sampled by time periods.
Usage
pd_summary(object, visitdsgn = NULL, ...)
Arguments
object |
Two-dimensional array from |
visitdsgn |
Two-dimensional array with same dimensions as |
... |
Additional arguments (S3 consistency) |
Details
The revisit panel design and the visit design (if present) are summarized. Summaries can be useful to know the effort required to complete the survey design. See the values returned for the summaries that are produced.
Value
List of six elements.
n_panel
number of panels in revisit design
n_period
number of time periods in revisit design
n_total
total number of sample events across all panels and all time periods, accounting for
visitdsgn
, that will be sampled in the revisit designn_periodunit
vector of the number of time periods a unit will be sampled in each panel
n_unitpnl
vector of the number of sample units, accounting for
visitdsgn
, that will be sampled in each paneln_unitperiod
vector of the number of sample units, accounting for
visitdsgn
, that will be sampled during each time periodncum_unit
vector of the cumulative number of unique units that will be sampled in time periods up to and including the current time period.
Author(s)
Tony Olsen Olsen.Tony@epa.gov
Examples
# Serially alternating panel revisit design summary
sa_dsgn <- revisit_dsgn(20, panels = list(SA60N = list(
n = 60, pnl_dsgn = c(1, 4),
pnl_n = NA, start_option = "None"
)), begin = 1)
pd_summary(sa_dsgn)
# Add visit design where first panel is sampled twice at every time period
sa_visit <- sa_dsgn
sa_visit[sa_visit > 0] <- 1
sa_visit[1, sa_visit[1, ] > 0] <- 2
pd_summary(sa_dsgn, sa_visit)
Plot sampling frames, design sites, and analysis data.
Description
This function plots sampling frames, design sites, and analysis data.
If the left-hand side of the formula is empty, plots
are of the distributions of the right-hand side variables. If the left-hand side
of the variable contains a variable, plots are of the left-hand size variable
for each level of each right-hand side variable.
This function is largely built on plot.sf()
, and all spsurvey plotting
methods can supply additional arguments to plot.sf()
. For more information on
plotting in sf
, run ?sf::plot.sf()
. Equivalent to sp_plot()
; both
are currently maintained for backwards compatibility.
Usage
## S3 method for class 'sp_frame'
plot(
x,
formula = ~1,
xcoord,
ycoord,
crs,
var_args = NULL,
varlevel_args = NULL,
geom = FALSE,
onlyshow = NULL,
fix_bbox = TRUE,
...
)
## S3 method for class 'sp_design'
plot(
x,
sframe = NULL,
formula = ~siteuse,
siteuse = NULL,
var_args = NULL,
varlevel_args = NULL,
geom = FALSE,
onlyshow = NULL,
fix_bbox = TRUE,
...
)
Arguments
x |
An object to plot. When plotting sampling frames an |
formula |
A formula. One-sided formulas are used to summarize the
distribution of numeric or categorical variables. For one-sided formulas,
variable names are placed to the right of |
xcoord |
Name of the x-coordinate (east-west) in |
ycoord |
Name of y (north-south)-coordinate in |
crs |
Projection code for |
var_args |
A named list. The name of each list element corresponds to a
right-hand side variable in |
varlevel_args |
A named list. The name of each list element corresponds to a
right-hand side variable in |
geom |
Should separate geometries for each level of the right-hand
side |
onlyshow |
A string indicating the single level of the single right-hand side variable for which a summary is requested. This argument is only used when a single right-hand side variable is provided. |
fix_bbox |
Should the geometry bounding box be fixed across plots?
If a length-four vector with names "xmin", "ymin", "xmax", and "ymax" and values
indicating bounding box edges, the bounding box will be fixed as |
... |
Additional arguments to pass to |
sframe |
The sampling frame (an |
siteuse |
A character vector of site types to include when plotting design sites.
It can only take on values |
Author(s)
Michael Dumelle Dumelle.Michael@epa.gov
Examples
## Not run:
data("NE_Lakes")
NE_Lakes <- sp_frame(NE_Lakes)
plot(NE_Lakes, formula = ~ELEV_CAT)
sample <- grts(NE_Lakes, 30)
plot(sample, NE_Lakes)
## End(Not run)
Plot a cumulative distribution function (CDF)
Description
This function creates a CDF plot. Input data for the plots is provided by a
data frame from the "CDF" output given by cont_analysis
.
Confidence limits for the CDF also are plotted. Equivalent to cdf_plot()
;
both are currently maintained for backwards compatibility.
Usage
## S3 method for class 'sp_CDF'
plot(
x,
var = NULL,
subpop = NULL,
subpop_level = NULL,
units_cdf = "Percent",
type_cdf = "Continuous",
log = "",
xlab = NULL,
ylab = NULL,
ylab_r = NULL,
main = NULL,
legloc = NULL,
confcut = 0,
conflev = 95,
cex.main = 1.2,
cex.legend = 1,
...
)
Arguments
x |
Data frame from the "CDF" output given by
|
var |
If |
subpop |
If |
subpop_level |
If |
units_cdf |
Indicator for the label utilized for the left side y-axis and the values used for the left side y-axis tick marks, where "Percent" means the label and values are in terms of percent of the population, and "Units" means the label and values are in terms of units (count, length, or area) of the population. The default is "Percent". |
type_cdf |
Character string consisting of the value "Continuous" or "Ordinal" that controls the type of CDF plot. The default is "Continuous". |
log |
Character string consisting of the value "" or "x" that controls whether the x axis uses the original scale ("") or the base 10 logarithmic scale ("x"). The default is "". |
xlab |
Character string providing the x-axis label. If this argument equals NULL, then the indicator name is used as the label. The default is NULL. |
ylab |
Character string providing the left side y-axis label. If argument units_cdf equals "Units", a value should be provided for this argument. Otherwise, the label will be "Percent". The default is "Percent". |
ylab_r |
Character string providing the label for the right side y-axis (and, hence, determining the values used for the right side y-axis tick marks), where NULL means a right side y-axis is not created. If this argument equals "Same", the right side y-axis will have the same label and tick mark values as the left side y-axis. If this argument equals a character string other than "Same", the right side y-axis label will be the value provided for argument ylab_r, and the right side y-axis tick mark values will be determined by the choice not utilized for argument units_cdf, which means that the default value of argument units_cdf (i.e., "Percent") will result in the right side y-axis tick mark values being expressed in terms of units of the population (i.e., count, length, or area). The default is NULL. |
main |
Character string providing the plot title. The default is NULL. |
legloc |
Indicator for location of the plot legend, where "BR" means bottom right, "BL" means bottom left, "TR" means top right, "TL" means top left, and NULL means no legend. The default is NULL. |
confcut |
Numeric value that controls plotting confidence limits at the CDF extremes. Confidence limits for CDF values (percent scale) less than confcut or greater than 100 minus confcut are not plotted. A value of zero means confidence limits are plotted for the complete range of the CDF. The default is 0. |
conflev |
Numeric value of the confidence level used for confidence limits. The default is 95. |
cex.main |
Expansion factor for the plot title. The default is 1.2. |
cex.legend |
Expansion factor for the legend title. The default is 1. |
... |
Additional arguments passed to the |
Value
A plot of a variable's CDF estimates associated confidence limits.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
cont_cdfplot
for creating a PDF file containing CDF plots
cont_cdftest
for CDF hypothesis testing
Examples
## Not run:
dframe <- data.frame(
siteID = paste0("Site", 1:100),
wgt = runif(100, 10, 100),
xcoord = runif(100),
ycoord = runif(100),
stratum = rep(c("Stratum1", "Stratum2"), 50),
ContVar = rnorm(100, 10, 1),
All_Sites = rep("All Sites", 100),
Resource_Class = rep(c("Good", "Poor"), c(55, 45))
)
myvars <- c("ContVar")
mysubpops <- c("All_Sites", "Resource_Class")
mypopsize <- data.frame(
Resource_Class = c("Good", "Poor"),
Total = c(4000, 1500)
)
myanalysis <- cont_analysis(dframe,
vars = myvars, subpops = mysubpops,
siteID = "siteID", weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
stratumID = "stratum", popsize = mypopsize
)
keep <- with(myanalysis$CDF, Type == "Resource_Class" &
Subpopulation == "Good")
par(mfrow = c(2, 1))
plot(myanalysis$CDF[keep, ],
xlab = "ContVar",
ylab = "Percent of Stream Length", ylab_r = "Stream Length (km)",
main = "Estimates for Resource Class: Good"
)
plot(myanalysis$CDF[keep, ],
xlab = "ContVar",
ylab = "Percent of Stream Length", ylab_r = "Same",
main = "Estimates for Resource Class: Good"
)
## End(Not run)
Power calculation for multiple panel designs
Description
Calculates the power for trend detection for one or more variables, for one or more panel designs, for one or more linear trends, and for one or more significance levels. The panel designs create a covariance model where the model includes variance components for units, periods, the interaction of units and periods, and the residual (or index) variance.
Usage
power_dsgn(
ind_names,
ind_values,
unit_var,
period_var,
unitperiod_var,
index_var,
unit_rho = 1,
period_rho = 0,
paneldsgn,
nrepeats = NULL,
trend_type = "mean",
ind_pct = NULL,
ind_tail = NULL,
trend = 2,
alpha = 0.05
)
Arguments
ind_names |
Vector of indicator names |
ind_values |
Vector of indicator mean values |
unit_var |
Vector of variance component estimates for unit variability for the indicators |
period_var |
Vector of variance component estimates for period variability for the indicators |
unitperiod_var |
Vector of variance component estimates for unit by period interaction variability for the indicators |
index_var |
Vector of variance component estimates for index (residual) error for the indicators |
unit_rho |
Correlation across units. Default is |
period_rho |
Correlation across periods. Default is |
paneldsgn |
A list of panel designs each as a matrix. Each element of
the list is a matrix with |
nrepeats |
Either |
trend_type |
Trend type is either |
ind_pct |
When |
ind_tail |
When trend_type is equal to |
trend |
Single value or vector of assumed percent change from
initial value in the indicator for each period. Assumes the trend is
expressed as percent per period. Note that the trend may be either positive
or negative. The default is |
alpha |
Single value or vector of significance level for linear
trend test, alpha, Type I error, level. The default is |
Details
Calculates the power for detecting a change in the mean for different panel design structures. The model incorporates unit, period, unit by period, and index variance components as well as correlation across units and across periods. See references for methods.
Value
A list with components trend_type
, ind_pct
, ind_tail
, trend values
across periods, periods (all periods included in one or more panel
designs), significance levels, a five-dimensional array of power
calculations (dimensions: panel, design names, periods, indicator names,
trend names, alpha_names
), an array of indicator mean values for each trend
and the function call.
Author(s)
Tony Olsen Olsen.Tony@epa.gov
References
Urquhart, N. S., W. S. Overton, et al. (1993) Comparing sampling designs for monitoring ecological status and trends: impact of temporal patterns. In: Statistics for the Environment. V. Barnett and K. F. Turkman. John Wiley & Sons, New York, pp. 71-86.
Urquhart, N. S. and T. M. Kincaid (1999). Designs for detecting trends from repeated surveys of ecological resources. Journal of Agricultural, Biological, and Environmental Statistics, 4(4), 404-414.
Urquhart, N. S. (2012). The role of monitoring design in detecting trend in long-term ecological monitoring studies. In: Design and Analysis of Long-term Ecological Monitoring Studies. R. A. Gitzen, J. J. Millspaugh, A. B. Cooper, and D. S. Licht (eds.). Cambridge University Press, New York, pp. 151-173.
See Also
ppd_plot
to plot power curves for panel designs
Examples
# Power for rotating panel with sample size 60
power_dsgn("Variable_Name",
ind_values = 43, unit_var = 280, period_var = 4,
unitperiod_var = 40, index_var = 90, unit_rho = 1, period_rho = 0,
paneldsgn = list(NoR60 = revisit_dsgn(20,
panels = list(NoR60 = list(
n = 60, pnl_dsgn = c(1, NA),
pnl_n = NA, start_option = "None"
)), begin = 1
)),
nrepeats = NULL, trend_type = "mean", trend = 1.0, alpha = 0.05
)
Plot power curves for panel designs
Description
Plot power curves and relative power curves for trend detection for set of panel designs, time periods, indicators, significance levels and trend. Trend may be based on percent change per period in mean or percent change in proportion of cumulative distribution function above or below a fixed cut point. Types of plots are combinations of standard/relative, mean/percent, period/change and design/indicator. Input must be be of class powerpaneldesign and is normally the output of function power_dsgn.
Usage
ppd_plot(
object,
plot_type = "standard",
trend_type = "mean",
xaxis_type = "period",
comp_type = "design",
dsgns = NULL,
indicator = NULL,
trend = NULL,
period = NULL,
alpha = NULL,
...
)
Arguments
object |
List object of class |
plot_type |
Default is |
trend_type |
Character value for trend in mean ( |
xaxis_type |
Character value equal to |
comp_type |
Character value equal to |
dsgns |
Vector of names of panel designs that are to be plotted. Names
must be all, or a subset of, names of designs in |
indicator |
Vector of indicator names contained in |
trend |
|
period |
|
alpha |
A single value or vector of significance levels (as proportion,
e.g. |
... |
Additional arguments (S3 consistency) |
Details
By default the plot function produces a standard power curve at end
of each time period on the x-axis with y-axis as power. When more than one
panel design is in dsgnpower
, the first panel design is used. When more than
one indicator is in dsgnpower
, the first indicator is used. When more than
one trend value is in dsgnpower
, the maximum trend value is used. When more
than one significance level, alpha
, is in dsgnpower
, the minimum
significance level is used.
Control of the type of plot produced is governed by plot_type
, trend_type
,
xaxis_type
and comp_type
. The number of plots produced is governed by the
number of panel designs (dsgn
) specified, the number of indicators
(indicator
) specified, the number of time periods (period
) specifies, the
number of trend values (trend) specified and the number of significance
levels (alpha
) specified.
When the comparison type ("comp_type"
) is equal to "design"
, all power
curves specified by dsgn are plotted on the same plot. When comp_type
is
equal to "indicator"
, all power curves specified by "indicator"
are plotted
on the same plot. Typically, no more than 4-5 power curves should be
plotted on same plot.
Value
One or more power curve plots are created and plotted. User must specify output graphical device if more than one plot is created. See Devices for graphical output options.
Author(s)
Tony Olsen Olsen.Tony@epa.gov
Examples
## Not run:
# Construct a rotating panel design with sample size of 60
R60N <- revisit_dsgn(20, panels = list(R60N = list(
n = 60, pnl_dsgn = c(1, NA),
pnl_n = NA, start_option = "None"
)), begin = 1)
# Construct a fixed panel design with sample size of 60
F60 <- revisit_dsgn(20, panels = list(F60 = list(
n = 60, pnl_dsgn = c(1, 0),
pnl_n = NA, start_option = "None"
)), begin = 1)
# Power for rotating panel with sample size 60
Power_tst <- power_dsgn("Variable_Name",
ind_values = 43, unit_var = 280,
period_var = 4, unitperiod_var = 40, index_var = 90,
unit_rho = 1, period_rho = 0, paneldsgn = list(
R60N = R60N, F60 = F60
), nrepeats = NULL,
trend_type = "mean", trend = c(1.0, 2.0), alpha = 0.05
)
ppd_plot(Power_tst)
ppd_plot(Power_tst, dsgns = c("F60", "R60N"))
ppd_plot(Power_tst, dsgns = c("F60", "R60N"), trend = 1.0)
ppd_plot(Power_tst,
plot_type = "relative", comp_type = "design",
trend_type = "mean", trend = c(1, 2), dsgns = c("R60N", "F60"),
indicator = "Variable_Name"
)
## End(Not run)
Relative risk analysis
Description
This function organizes input and output for relative risk analysis (of
categorical variables). The analysis data,
dframe
, can be either a data frame or a simple features (sf
) object. If an
sf
object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord
and ycoord
are assigned values
"xcoord"
and "ycoord"
, respectively, and the geometry column is
dropped from the object.
Usage
relrisk_analysis(
dframe,
vars_response,
vars_stressor,
response_levels = NULL,
stressor_levels = NULL,
subpops = NULL,
siteID = NULL,
weight = "weight",
xcoord = NULL,
ycoord = NULL,
stratumID = NULL,
clusterID = NULL,
weight1 = NULL,
xcoord1 = NULL,
ycoord1 = NULL,
sizeweight = FALSE,
sweight = NULL,
sweight1 = NULL,
fpc = NULL,
popsize = NULL,
vartype = "Local",
conf = 95,
All_Sites = FALSE
)
Arguments
dframe |
Data to be analyzed (analysis data). A data frame or
|
vars_response |
Vector composed of character values that identify the
names of response variables in |
vars_stressor |
Vector composed of character values that identify the
names of stressor variables in |
response_levels |
List providing the category values (levels) for each
element in the |
stressor_levels |
List providing the category values (levels) for each
element in the |
subpops |
Vector composed of character values that identify the
names of subpopulation (domain) variables in |
siteID |
Character value providing the name of the site ID variable in
|
weight |
Character value providing the name of the design weight
variable in |
xcoord |
Character value providing name of the x-coordinate variable in
|
ycoord |
Character value providing name of the y-coordinate variable in
|
stratumID |
Character value providing the name of the stratum ID
variable in |
clusterID |
Character value providing the name of the cluster
(stage one) ID variable in |
weight1 |
Character value providing the name of the stage one weight
variable in |
xcoord1 |
Character value providing the name of the stage one
x-coordinate variable in |
ycoord1 |
Character value providing the name of the stage one
y-coordinate variable in |
sizeweight |
Logical value that indicates whether size weights should be
used during estimation, where |
sweight |
Character value providing the name of the size weight variable
in |
sweight1 |
Character value providing the name of the stage one size
weight variable in |
fpc |
Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design:
Example fpc for a single-stage stratified survey design:
Example fpc for a two-stage unstratified survey design:
Example fpc for a two-stage stratified survey design:
|
popsize |
Object that provides values for the population argument of the
Example popsize for calibration:
Example popsize for post-stratification using a data frame:
Example popsize for post-stratification using a table:
Example popsize for post-stratification using an xtabs object:
|
vartype |
Character value providing the choice of the variance
estimator, where |
conf |
Numeric value providing the Gaussian-based confidence level. The default value
is |
All_Sites |
A logical variable used when |
Value
The analysis results. A data frame of population estimates for all combinations of subpopulations, categories within each subpopulation, response variables, and categories within each response variable. Estimates are provided for proportion and size of the population plus standard error, margin of error, and confidence interval estimates. The data frame contains the following variables:
- Type
subpopulation (domain) name
- Subpopulation
subpopulation name within a domain
- Response
response variable
- Stressor
stressor variable
- nResp
sample size
- Estimate
relative risk estimate
- Estimate_num
relative risk numerator estimate
- Estimate_denom
relative risk denominator estimate
- StdError
relative risk standard error
- MarginofError
relative risk margin of error
- LCBxxPct
xx% (default 95%) lower confidence bound
- UCBxxPct
xx% (default 95%) upper confidence bound
- WeightTotal
sum of design weights
- Count_RespPoor_StressPoor
number of observations in the poor response and poor stressor group
- Count_RespPoor_StressGood
number of observations in the poor response and good stressor group
- Count_RespGood_StressPoor
number of observations in the good response and poor stressor group
- Count_RespGood_StressGood
number of observations in the good response and good stressor group
- Prop_RespPoor_StressPoor
weighted proportion of observations in the poor response and poor stressor group
- Prop_RespPoor_StressGood
weighted proportion of observations in the poor response and good stressor group
- Prop_RespGood_StressPoor
weighted proportion of observations in the good response and poor stressor group
- Prop_RespGood_StressGood
weighted proportion of observations in the good response and good stressor group
Details
Relative risk measures the relative strength of association between conditional probabilities defined for a response variable and a stressor variable, where the response and stressor variables are classified as either good (i.e., reference condition) or poor (i.e., different from reference condition). Relative risk is defined as the ratio of two conditional probabilities. The numerator of the ratio is the probability that the response variable is in poor condition given that the stressor variable is in poor condition. The denominator of the ratio is the probability that the response variable is in poor condition given that the stressor variable is in good condition. A relative risk value equal to one indicates that the response variable is independent of the stressor variable. Relative risk values greater than one measure the extent to which poor condition of the stressor variable is associated with poor condition of the response variable.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
attrisk_analysis
for attributable risk analysis
diffrisk_analysis
for risk difference analysis
Examples
dframe <- data.frame(
siteID = paste0("Site", 1:100),
wgt = runif(100, 10, 100),
xcoord = runif(100),
ycoord = runif(100),
stratum = rep(c("Stratum1", "Stratum2"), 50),
RespVar1 = sample(c("Poor", "Good"), 100, replace = TRUE),
RespVar2 = sample(c("Poor", "Good"), 100, replace = TRUE),
StressVar = sample(c("Poor", "Good"), 100, replace = TRUE),
All_Sites = rep("All Sites", 100),
Resource_Class = rep(c("Agr", "Forest"), c(55, 45))
)
myresponse <- c("RespVar1", "RespVar2")
mystressor <- c("StressVar")
mysubpops <- c("All_Sites", "Resource_Class")
relrisk_analysis(dframe,
vars_response = myresponse,
vars_stressor = mystressor, subpops = mysubpops, siteID = "siteID",
weight = "wgt", xcoord = "xcoord", ycoord = "ycoord",
stratumID = "stratum"
)
Create a balanced incomplete block panel revisit design
Description
Create a revisit design for panels in a survey that specifies the time periods for the units of each panel to be sampled based on searching for a D-optimal block design that is a member of the class of generalized Youden designs. The resulting design need not be a balanced incomplete block design. Based on algorithmic idea by Cook and Nachtsheim (1989) and implemented by Robert Wheeler.
Usage
revisit_bibd(
n_period,
n_pnl,
n_visit,
nsamp,
panel_name = "BIB",
begin = 1,
skip = 1,
iter = 30
)
Arguments
n_period |
Number of time periods for the survey design. Typically, number of periods if sampling occurs once per period or number of months if sampling occurs once per month. (v, number of varieties/treatments in BIBD terms) |
n_pnl |
Number of panels (b, number of blocks in BIBD terms) |
n_visit |
Number of time periods to be visited in a panel (k, block size in BIBD terms) |
nsamp |
Number of samples in each panel. |
panel_name |
Prefix for name of each panel |
begin |
Numeric name of first sampling occasion, e.g. a specific period. |
skip |
Number of sampling occasions to skip between planned sampling
periods, e.g., sampling will occur only every 5 periods if |
iter |
Maximum number of iterations in search for D-optimal Generalized Youden Design. |
Details
The function uses find.BIB
function from crossdes package to
search for a D-optimal block design. crossdes uses package AlgDesign
to search balanced incomplete block designs.
Value
A two-dimensional array of sample sizes to be sampled for each panel and each sampling occasion.
Author(s)
Tony Olsen Olsen.Tony@epa.gov
References
Cook R. D. and C. Nachtsheim. (1989). Computer-aided blocking of factorial and response-surface designs. Technometrics 31(3), 339-346.
See Also
revisit_dsgn
to create a panel revisit design
revisit_rand
to create a panel revisit design with random assignment to panels and time periods
pd_summary
to summarize characteristics of a panel revisit design
Examples
# Balanced incomplete block design with 20 sample occasions, 20 panels,
# 3 visits to each unit, and 20 units in each panel.
revisit_bibd(n_period = 20, n_pnl = 20, n_visit = 3, nsamp = 20)
Create a panel revisit design
Description
Create a revisit design for panels in a survey that specifies the time periods that members of each panel will be sampled. Three basic panel design structures may be created: always revisit panel, serially alternating panels, or rotating panels.
Usage
revisit_dsgn(n_period, panels, begin = 1, skip = 1)
Arguments
n_period |
Number of time periods for the panel design. For example, number of periods if sampling occurs once per period or number of months if sampling occurs once per month. |
panels |
List of lists where each list specifies a revisit panel
structure. Each sublist consists of four components: |
begin |
Numeric name of first sampling occasion, e.g. a specific period. |
skip |
Number of time periods to skip between planned sampling
periods, e.g., sampling will occur only every 5 periods if |
Details
The function creates revisit designs using the concepts in McDonald (2003) to specify the revisit pattern across time periods for each panel. The panel revisit schedule is specified by a vector. Odd positions in vector specify the number of consecutive time periods when panel units are sampled. Even positions in vector specify the number of consecutive time periods when panel units are not sampled.
If last even position is a "0"
, then a single panel follows an always
revisit panel structure. After satisfying the initial revisit schedule
specified prior to the "0"
, units in a panel are always visited for rest of
the time periods. The simplest always revisit panel design is to revisit
every sample unit on every time period, specified as pnl_dsgn = c(1,0)
or
using McDonald's notation [1-0].
If the last even position is NA
, the panels follow a rotating panel
structure. For example, pnl_dsgn = c(1, NA)
designates that sample units in
a panel will be visited once and then never again, [1-n] in McDonald's
notation. pnl_dsgn =c(1, 4, 1, NA)
designates that sample units in a panel
will be visited once, then not sampled on next four time periods, then
sampled again once at the next time period and then never sampled again,
[1-4-1-n] in McDonald/s notation.
If the last even position is > 0
, the panels follow a serially alternating
panel structure. For example, pnl_dsgn = c(1, 4)
designates that sample
units in a panel will be visited once, then not sampled during the next
four time periods, then sampled once and not sampled for next four time
periods, and that cycle repeated until end of the number of time periods,
[1-4] in McDonald's notation. pnl_dsgn = c(2, 3, 1, 4)
designates that the
cycle has sample units in a panel being visited during two consecutive time
periods, not sampled for three consecutive time periods, sampled for one time
period and then not sampled on next four time periods, and the cycle is
repeated until end of the number of time periods, [2-3-1-4] in McDonald's
notation.
The number of panels in a single panel design is specified by pnl_n
. For
an always revisit panel structure, a single panel is created and pnl_n
is
ignored. For a rotating panel structure, when pnl_n = NA
, the number of
panels is equal to n_period. Note that this should only be used when the
rotating panel structure is the only panel design, i.e., no split panel
design (see below for split panel details). If pnl_n = m
is specified for a
rotating panel design, then then number of panels will be m
. For example,
pnl_dsgn = c( 1, 4, 1, NA)
and and pnl_n = 5
means that only 5 panels will
be constructed and the last time period to be sampled will be time period
10. In McDonald's notation the panel design structure is [(1-4-1-n)^5]. If
the number of time periods, n_period
, is 20 and no other panel design
structure is specified, then the last 10 time periods will not be sampled.
For serially alternating panels, when pnl_n = NA
, the number of panels will
be the sum of the elements in pan_dsgn (ignoring NA
). If pnl_n
is specified
as m
, then m
panels will be created. For example, pnl_dsgn = c(1, 4, 1, 4)
and pnl_n = 3
, [(1-4-1-4)^3] in McDonald's notation, will create first three
panels of the 510 serially alternating panels specified by pnl_dsgn
.
A serially alternating or rotating panel revisit design may not result in
the same number of units being sampled during each time period,
particularly during the initial start up period. The default is to not
specify a startup option ("None"
). Start up option "Partial_Begin"
initiates the revisit design at the last time period scheduled for sampling
in the first panel. For example, a [2-3-1-4] design starts at time period 6
instead of time period 1 under the Partial_Begin option. For a serially
alternating panel structure, start up option "Partial_End"
initiates the
revisit design at the time period that begins the second serially
alternating pattern. For example, a [2-3-1-4] design starts at time period
11 instead of time period 1. For a rotating panel structure design, use of
Partial_End makes the assumption that the number of panels equals the
number of time periods and adds units to the last "m" panels for time
periods 1
to "m"
as if number of time periods was extended by "m"
where "m"
is one less than then the sum of the panel design. For example, a
[1-4-1-4-1-n] design would result in m = 10
. Note that some designs with
pnl_n
not equal to the number of sample occasions can produce unexpected
panel designs. See examples.
Different types of panel structures can be combined, these are termed split panels by many authors, by specifying more than one list for the panels parameter. The total number of panels is the sum of the number of panels in each of the panel structures specified by the split panel design.
Value
A two-dimensional array of sample sizes to be sampled at each combination of panel and time period.
Author(s)
Tony Olsen Olsen.Tony@epa.gov
References
McDonald, T. (2003). Review of environmental monitoring methods: survey designs. Environmental Monitoring and Assessment 85, 277-292.
See Also
revisit_bibd
to create a balanced incomplete block panel revisit design
revisit_rand
to create a revisit design with random assignment to panels and time periods
pd_summary
to summarize characteristics of a panel revisit design
Examples
# One panel of 60 sample units sampled at every time period: [1-0]
revisit_dsgn(20, panels = list(
Annual = list(
n = 60, pnl_dsgn = c(1, 0), pnl.n = NA,
start_option = "None"
)
), begin = 1)
# Rotating panels of 60 units sampled once and never again: [1-n]. Number
# of panels equal n_period.
revisit_dsgn(20,
panels = list(
R60N = list(n = 60, pnl_dsgn = c(1, NA), pnl_n = NA, start_option = "None")
),
begin = 1
)
# Serially alternating panel with three visits to sample unit then skip
# next two time periods: [3-2]
revisit_dsgn(20, panels = list(
SA60PE = list(
n = 20, pnl_dsgn = c(3, 2), pnl_n = NA,
start_option = "Partial_End"
)
), begin = 1)
# Split panel of sample units combining above two panel designs: [1-0, 1-n]
revisit_dsgn(n_period = 20, begin = 2017, panels = list(
Annual = list(
n = 60, pnl_dsgn = c(1, 0), pnl.n = NA,
start_option = "None"
),
R60N = list(n = 60, pnl_dsgn = c(1, NA), pnl_n = NA, start_option = "None")
))
Create a revisit design with random assignment to panels and time periods
Description
Create a revisit design for a survey that specifies the panels and time
periods that will be sampled by random selection of panels and time periods.
Three options for random assignments are "period"
where the number of time
periods to be sampled in a panel is fixed, "panel"
where the number panels to
be sampled in a time period is fixed, and "none"
where the number of
panel-period combinations is fixed.
Usage
revisit_rand(
n_period,
n_pnl,
rand_control = "period",
n_visit,
nsamp,
panel_name = "Random",
begin = 1,
skip = 1
)
Arguments
n_period |
Number of time periods for the survey design. Typically, number of periods if sampling occurs once per period or number of months if sampling occurs once per month. (v, number of varieties (or treatments) in BIBD terms) |
n_pnl |
Number of panels |
rand_control |
Character value must be |
n_visit |
If |
nsamp |
Number of samples in each panel. |
panel_name |
Prefix for name of each panel |
begin |
Numeric name of first sampling occasion, e.g. a specific period. |
skip |
Number of sampling occasions to skip between planned sampling
periods, e.g., sampling will occur only every 5 periods if |
Details
The revisit design for a survey is created by random selection of panels and time periods that will have sample events. The number of sample occasions that will be visited by a panel is random.
Value
A two-dimensional array of sample sizes to be sampled for each panel and each time period.
Author(s)
Tony Olsen Olsen.Tony@epa.gov
See Also
revisit_bibd
create a balanced incomplete block panel revisit design
revisit_dsgn
create a panel revisit design
pd_summary
to summarize characteristics of a panel revisit design
Examples
revisit_rand(
n_period = 20, n_pnl = 10, rand_control = "none", n_visit = 50,
nsamp = 20
)
revisit_rand(
n_period = 20, n_pnl = 10, rand_control = "panel", n_visit = 5,
nsamp = 10
)
revisit_rand(
n_period = 20, n_pnl = 10, rand_control = "period",
n_visit = 5, nsamp = 10
)
Calculate spatial balance metrics
Description
This function measures the spatial balance (with respect to the sampling frame) of design sites using Voronoi polygons (Dirichlet tessellations).
Usage
sp_balance(
object,
sframe,
stratum_var = NULL,
ip = NULL,
metrics = "pielou",
extents = FALSE
)
Arguments
object |
An |
sframe |
The sampling frame as an |
stratum_var |
The name of the stratum variable in |
ip |
Inclusion probabilities associated with each row of |
metrics |
A character vector of spatial balance metrics:
All spatial balance metrics have a lower bound of zero, which indicates perfect spatial balance. As the metric value increases, the spatial balance decreases. |
extents |
Should the extent (total units) within each Voronoi polygon
be returned? Defaults to |
Value
A data frame with columns providing the stratum (stratum
),
spatial balance metric (metric
), and spatial balance (value
).
Author(s)
Michael Dumelle Dumelle.Michael@epa.gov
Examples
## Not run:
sample <- grts(NE_Lakes, 30)
sp_balance(sample$sites_base, NE_Lakes)
strata_n <- c(low = 25, high = 30)
sample_strat <- grts(NE_Lakes, n_base = strata_n, stratum_var = "ELEV_CAT")
sp_balance(sample_strat$sites_base, NE_Lakes, stratum_var = "ELEV_CAT", metric = "rmse")
## End(Not run)
sp_frame
objects
Description
Turn sampling frames or analysis data into an sp_frame
object
or transform sp_frame
objects back into their original object.
Usage
sp_frame(frame)
sp_unframe(sp_frame)
Arguments
frame |
A sampling frame or analysis data |
sp_frame |
An |
Details
The sp_frame()
function assigns frame
class sp_frame
to be used by summary()
and plot()
. sp_frame()
objects
can sometimes clash with other sf and tidyverse generics, so un_spframe()
removes
class sp_frame()
, leaving the original classes of frame
intact.
Value
An sp_frame
object.
Examples
NE_Lakes <- sp_frame(NE_Lakes)
class(NE_Lakes)
NE_Lakes <- sp_unframe(NE_Lakes)
class(NE_Lakes)
Plot sampling frames, design sites, and analysis data.
Description
This function plots sampling frames, design sites, and analysis data.
If the left-hand side of the formula is empty, plots
are of the distributions of the right-hand side variables. If the left-hand side
of the variable contains a variable, plots are of the left-hand size variable
for each level of each right-hand side variable.
This function is largely built on plot.sf()
, and all spsurvey plotting
methods can supply additional arguments to plot.sf()
. For more information on
plotting in sf
, run ?sf::plot.sf()
. Equivalent to spsurvey::plot()
; both
are currently maintained for backwards compatibility.
Usage
sp_plot(object, ...)
## Default S3 method:
sp_plot(
object,
formula = ~1,
xcoord,
ycoord,
crs,
var_args = NULL,
varlevel_args = NULL,
geom = FALSE,
onlyshow = NULL,
fix_bbox = TRUE,
...
)
## S3 method for class 'sp_design'
sp_plot(
object,
sframe = NULL,
formula = ~siteuse,
siteuse = NULL,
var_args = NULL,
varlevel_args = NULL,
geom = FALSE,
onlyshow = NULL,
fix_bbox = TRUE,
...
)
Arguments
object |
An object to plot. When plotting sampling frames or analysis data,
a data frame or |
... |
Additional arguments to pass to |
formula |
A formula. One-sided formulas are used to summarize the
distribution of numeric or categorical variables. For one-sided formulas,
variable names are placed to the right of |
xcoord |
Name of the x-coordinate (east-west) in |
ycoord |
Name of y (north-south)-coordinate in |
crs |
Projection code for |
var_args |
A named list. The name of each list element corresponds to a
right-hand side variable in |
varlevel_args |
A named list. The name of each list element corresponds to a
right-hand side variable in |
geom |
Should separate geometries for each level of the right-hand
side |
onlyshow |
A string indicating the single level of the single right-hand side variable for which a summary is requested. This argument is only used when a single right-hand side variable is provided. |
fix_bbox |
Should the geometry bounding box be fixed across plots?
If a length-four vector with names "xmin", "ymin", "xmax", and "ymax" and values
indicating bounding box edges, the bounding box will be fixed as |
sframe |
The sampling frame (an |
siteuse |
A character vector of site types to include when plotting design sites.
It can only take on values |
Author(s)
Michael Dumelle Dumelle.Michael@epa.gov
Examples
## Not run:
data("NE_Lakes")
sp_plot(NE_Lakes, formula = ~ELEV_CAT)
sample <- grts(NE_Lakes, 30)
sp_plot(sample, NE_Lakes)
data("NLA_PNW")
sp_plot(NLA_PNW, formula = ~BMMI)
## End(Not run)
Combine rows from GRTS or IRS samples.
Description
This function row binds the sites_legacy
, sites_base
,
sites_over
, and sites_near
objects from a GRTS or IRS sample
into a single sf
object. This function is most useful when a single
sf
object that contains all design sites is desired
(e.g. writing out a single shapefile using sf::write_sf()
).
Usage
sp_rbind(object, siteuse = NULL)
Arguments
object |
The design sites (output from |
siteuse |
A character vector of site types to return. Can contain
|
Value
A single sf
object containing all requested design sites.
Author(s)
Michael Dumelle Dumelle.Michael@epa.gov
Examples
## Not run:
sample <- grts(NE_Lakes, 50, n_over = 10)
sample <- sp_rbind(sample)
write_sf(sample, "mypath/sample.shp")
## End(Not run)
Summarize sampling frames, design sites, and analysis data.
Description
sp_summary()
summarizes sampling frames, design sites, and analysis data. The right-hand of the
formula specifies the variables (or factors) to
summarize by. If the left-hand side of the formula is empty, the
summary will be of the distributions of the right-hand side variables. If the left-hand side
of the formula contains a variable, the summary will be of the left-hand size variable
for each level of each right-hand side variable. Equivalent to spsurvey::summary()
; both
are currently maintained for backwards compatibility.
Usage
sp_summary(object, ...)
## Default S3 method:
sp_summary(object, formula = ~1, onlyshow = NULL, ...)
## S3 method for class 'sp_design'
sp_summary(object, formula = ~siteuse, siteuse = NULL, onlyshow = NULL, ...)
Arguments
object |
An object to summarize. When summarizing sampling frames,
an |
... |
Additional arguments to pass to |
formula |
A formula. One-sided formulas are used to summarize the
distribution of numeric or categorical variables. For one-sided formulas,
variable names are placed to the right of |
onlyshow |
A string indicating the single level of the single right-hand side variable for which a summary is requested. This argument is only used when a single right-hand side variable is provided. |
siteuse |
A character vector indicating the design sites
for which summaries are requested in |
Value
If the left-hand side of the formula is empty, a named list containing summaries of the count distribution for each right-hand side variable is returned. If the left-hand side of the formula contains a variable, a named list containing five number summaries (numeric left-hand side) or tables (categorical or factor left hand side) is returned for each right-hand side variable.
Author(s)
Michael Dumelle Dumelle.Michael@epa.gov
Examples
## Not run:
data("NE_Lakes")
sp_summary(NE_Lakes, ELEV ~ 1)
sp_summary(NE_Lakes, ~ ELEV_CAT * AREA_CAT)
sample <- grts(NE_Lakes, 100)
sp_summary(sample, ~ ELEV_CAT * AREA_CAT)
## End(Not run)
Print grts() and irs() errors.
Description
This function prints the error messages vector in the grts
and irs
functions.
Usage
stopprnt(stop_df = get("stop_df", envir = .GlobalEnv), m = 1:nrow(stop_df))
Arguments
stop_df |
Data frame that contains stop messages. The default is
|
m |
Vector of indices for stop messages that are to be printed. The
default is a vector containing the integers from 1 through the number of
rows in |
Value
Printed errors
Author(s)
Tony Olsen Olsen.Tony@epa.gov
Summarize sampling frames, design sites, and analysis data.
Description
summary()
summarizes sampling frames, design sites, and analysis data. The right-hand of the
formula specifies the variables (or factors) to
summarize by. If the left-hand side of the formula is empty, the
summary will be of the distributions of the right-hand side variables. If the left-hand side
of the formula contains a variable, the summary will be of the left-hand size variable
for each level of each right-hand side variable. Equivalent to sp_summary()
; both
are currently maintained for backwards compatibility.
Usage
## S3 method for class 'sp_frame'
summary(object, formula = ~1, onlyshow = NULL, ...)
## S3 method for class 'sp_design'
summary(object, formula = ~siteuse, siteuse = NULL, onlyshow = NULL, ...)
Arguments
object |
An object to summarize. When summarizing sampling frames,
an |
formula |
A formula. One-sided formulas are used to summarize the
distribution of numeric or categorical variables. For one-sided formulas,
variable names are placed to the right of |
onlyshow |
A string indicating the single level of the single right-hand side variable for which a summary is requested. This argument is only used when a single right-hand side variable is provided. |
... |
Additional arguments to pass to |
siteuse |
A character vector indicating the design sites
for which summaries are requested in |
Value
If the left-hand side of the formula is empty, a named list containing summaries of the count distribution for each right-hand side variable is returned. If the left-hand side of the formula contains a variable, a named list containing five number summaries (numeric left-hand side) or tables (categorical or factor left hand side) is returned for each right-hand side variable.
Author(s)
Michael Dumelle Dumelle.Michael@epa.gov
Examples
## Not run:
data("NE_Lakes")
summary(NE_Lakes, ELEV ~ 1)
summary(NE_Lakes, ~ ELEV_CAT * AREA_CAT)
sample <- grts(NE_Lakes, 100)
summary(sample, ~ ELEV_CAT * AREA_CAT)
## End(Not run)
Trend analysis
Description
This function organizes input and output for estimation of trend across time
for a series of samples (for categorical and continuous variables). Trend is estimated using the
analytical procedure identified by the model arguments. For categorical
variables, the choices for the model_cat
argument are: (1) simple linear
regression, (2) weighted linear regression, and (3) generalized linear
mixed-effects model. For continuous variables, the choices for the
model_cont
argument are: (1) simple linear regression, (2) weighted
linear regression, and (3) linear mixed-effects model. The analysis data,
dframe
, can be either a data frame or a simple features (sf
) object. If an
sf
object is used, coordinates are extracted from the geometry column in the
object, arguments xcoord
and ycoord
are assigned values
"xcoord"
and "ycoord"
, respectively, and the geometry column is
dropped from the object.
Usage
trend_analysis(
dframe,
vars_cat = NULL,
vars_cont = NULL,
subpops = NULL,
model_cat = "SLR",
cat_rhs = NULL,
model_cont = "LMM",
cont_rhs = NULL,
siteID = "siteID",
yearID = "year",
weight = "weight",
xcoord = NULL,
ycoord = NULL,
stratumID = NULL,
clusterID = NULL,
weight1 = NULL,
xcoord1 = NULL,
ycoord1 = NULL,
sizeweight = FALSE,
sweight = NULL,
sweight1 = NULL,
fpc = NULL,
popsize = NULL,
invprboot = TRUE,
nboot = 1000,
vartype = "Local",
jointprob = "overton",
conf = 95,
All_Sites = FALSE
)
Arguments
dframe |
Data to be analyzed (analysis data). A data frame or
|
vars_cat |
Vector composed of character values that identify the names
of categorical response variables in |
vars_cont |
Vector composed of character values that identify the
names of continuous response variables in |
subpops |
Vector composed of character values that identify the
names of subpopulation (domain) variables in |
model_cat |
Character value identifying the analytical procedure used
for trend estimation for categorical variables. The choices are:
|
cat_rhs |
Character value specifying the right hand side of the formula
for a generalized linear mixed-effects model. If a value is not provided,
the argument is assigned a value that specifies the Piepho and Ogutu (2002)
model. The default value is |
model_cont |
Character value identifying the analytical procedure used
for trend estimation for continuous variables. The choices are:
|
cont_rhs |
Character value specifying the right hand side of the
formula for a linear mixed-effects model. If a value is not provided, the
argument is assigned a value that specifies the Piepho and Ogutu (2002)
model. The default value is |
siteID |
Character value providing name of the site ID variable in
|
yearID |
Character value providing name of the time period variable in
|
weight |
Character value providing name of the design weight
variable in |
xcoord |
Character value providing name of the x-coordinate variable in
|
ycoord |
Character value providing name of the y-coordinate variable in
|
stratumID |
Character value providing name of the stratum ID variable in
|
clusterID |
Character value providing name of the cluster (stage one) ID
variable in |
weight1 |
Character value providing name of the stage one weight
variable in |
xcoord1 |
Character value providing name of the stage one x-coordinate
variable in |
ycoord1 |
Character value providing name of the stage one y-coordinate
variable in |
sizeweight |
Logical value that indicates whether size weights should be
used during estimation, where |
sweight |
Character value providing name of the size weight variable in
|
sweight1 |
Character value providing name of the stage one size weight
variable in |
fpc |
Object that specifies values required for calculation of the finite population correction factor used during variance estimation. The object must match the survey design in terms of stratification and whether the design is single-stage or two-stage. For an unstratified design, the object is a vector. The vector is composed of a single numeric value for a single-stage design. For a two-stage unstratified design, the object is a named vector containing one more than the number of clusters in the sample, where the first item in the vector specifies the number of clusters in the population and each subsequent item specifies the number of stage two units for the cluster. The name for the first item in the vector is arbitrary. Subsequent names in the vector identify clusters and must match the cluster IDs. For a stratified design, the object is a named list of vectors, where names must match the strata IDs. For each stratum, the format of the vector is identical to the format described for unstratified single-stage and two-stage designs. Note that the finite population correction factor is not used with the local mean variance estimator. Example fpc for a single-stage unstratified survey design:
Example fpc for a single-stage stratified survey design:
Example fpc for a two-stage unstratified survey design:
Example fpc for a two-stage stratified survey design:
|
popsize |
Object that provides values for the population argument of the
Example popsize for calibration:
Example popsize for post-stratification using a data frame:
Example popsize for post-stratification using a table:
Example popsize for post-stratification using an xtabs object:
|
invprboot |
Logical value that indicates whether the inverse probability
bootstrap procedure is used to calculate trend parameter estimates. This
bootstrap procedure is only available for the "LMM" option for continuous
variables. Inverse probability references the design weights, which
are the inverse of the sample inclusion probabilities. The default value
is |
nboot |
Numeric value for the number of bootstrap iterations. The
default is |
vartype |
Character value providing choice of the variance estimator,
where |
jointprob |
Character value providing choice of joint inclusion
probability approximation for use with Horvitz-Thompson and Yates-Grundy
variance estimators, where |
conf |
Numeric value for the Gaussian-based confidence level. The default is
|
All_Sites |
A logical variable used when |
Value
The analysis results. A list composed of two data frames containing trend estimates for all combinations of population Types, subpopulations within Types, and response variables. For categorical variables, trend estimates are calculated for each category of the variable. The two data frames in the output list are:
catsum
data frame containing trend estimates for categorical variables
contsum
data frame containing trend estimates for continuous variables
For the SLR and WLR model options, the data frame contains the following variables:
- Type
subpopulation (domain) name
- Subpopulation
subpopulation name within a domain
- Indicator
response variable
- Trend_Estimate
trend estimate
- Trend_Std_Error
trend standard error
- Trend_LCBxxPct
trend xx% (default 95%) lower confidence bound
- Trend_UCBxxPct
trend xx% (default 95%) upper confidence bound
- Trend_p_Value
trend p-value
- Intercept_Estimate
intercept estimate
- Intercept_Std_Error
intercept standard error
- Intercept_LCBxxPct
intercept xx% (default 95%) lower confidence bound
- Intercept_UCBxxPct
intercept xx% (default 95%) upper confidence bound
- Intercept_p_Value
intercept p-value
- R_Squared
R-squared value
- Adj_R_Squared
adjusted R-squared value
For the GLMM and LMM model options, contents of the data frames will vary
depending on the model specified by arguments cat_rhs
and
cont_rhs
. For the default PO model, the data frame contains the
following variables:
- Type
subpopulation (domain) name
- Subpopulation
subpopulation name within a domain
- Indicator
response variable
- Trend_Estimate
trend estimate
- Trend_Std_Error
trend standard error
- Trend_LCBxxPct
trend xx% (default 95%) lower confidence bound
- Trend_UCBxxPct
trend xx% (default 95%) upper confidence bound
- Trend_p_Value
trend p-value
- Intercept_Estimate
intercept estimate
- Intercept_Std_Error
intercept standard error
- Intercept_LCBxxPct
intercept xx% (default 95%) lower confidence bound
- Intercept_UCBxxPct
intercept xx% (default 95%) upper confidence bound
- Intercept_p_Value
intercept p-value
- Var_SiteInt
variance of the site intercepts
- Var_SiteTrend
variance of the site trends
- Corr_SiteIntSlope
correlation of site intercepts and site trends
- Var_Year
year variance
- Var_Residual
residual variance
- AIC
generalized Akaike Information Criterion
Details
For the simple linear regression (SLR) model, a design-based estimate of the
category proportion (categorical variables) or the mean (continuous
variables) is calculated for each time period (year). Four choices of
variance estimator are available for calculating variance of the design-based
estimates: (1) the local mean estimator, (2) the simple random sampling
estimator, (3) the Horvitz-Thompson estimator, and (4) the Yates-Grundy
estimator. For the Horvitz-Thompson and Yates-Grundy estimators, there are
three choices for calculating joint inclusion probabilities: (1) the Overton
approximation, (2) the Hartley-Rao approximation, and (3) the Brewer
approximation. The lm
function in the stats package is used to fit a
linear model using a formula
argument that specifies the proportion or
mean estimates as the response variable and years as the regressor variable.
For fitting the SLR model, the yearID
variable from the dframe
argument is modified by subtracting the minimum value of years from all
values of the variable. Parameter estimates are extracted from the object
returned by the lm
function. For the weighted linear regression (WLR)
model, the process is the same as the SLR model except that the inverse of
the variances of the proportion or mean estimates is used as the
weights
argument in the call to the lm
function. For the LMM
option, the lmer
function in the lme4 package is used to fit a linear
mixed-effects model for trend across years. For both the GLMM and LMM
options, the default Piepho and Ogutu (PO) model includes fixed effects for
intercept and trend (slope) and random effects for intercept and trend for
individual sites, where the siteID
variable from the dframe
argument identifies sites. Correlation between the random effects for site
intercepts and site trends is included in the model. Finally, the PO model
contains random effects for year variance and residual variance. For the GLMM
and LMM options, arguments cat_rhs
and cont_rhs
, respectively,
can be used to specify the right hand side of the model formula. Internally,
a variable named Wyear
is created that is useful for specifying the
cat_rhs
and cont_rhs
arguments. The Wyear
variable is
created by subtracting the minimum value of the yearID
variable from
all values of the variable. If argument invprboot
is FALSE
,
parameter estimates are extracted from the object returned by the lmer
function. If argument invprboot
is TRUE
, the boot
function in the boot package is used to generate bootstrap replicates using a
function named bootfcn
as the statistic
argument passed to the
boot
function. For each bootstrap replicate, bootfcn
calls the
glmer
or lmer
function, as appropriate, using the specified
model. design weights identified by the weight
argument for
the trend_analysis
function are passed as the weights
argument
for the boot
function, which specifies importance weights. Using the
design weights as the weights
argument ensures that bootstrap
replicates are representative of the survey population. Parameter estimates
are calculated using the object returned by the boot
function.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov
See Also
change_analysis
for change analysis
Examples
# Example using a categorical variable with three resource classes and a
# continuous variable
mydframe <- data.frame(
siteID = rep(paste0("Site", 1:40), rep(5, 40)),
yearID = rep(seq(2000, 2020, by = 5), 40),
wgt = rep(runif(40, 10, 100), rep(5, 40)),
xcoord = rep(runif(40), rep(5, 40)),
ycoord = rep(runif(40), rep(5, 40)),
All_Sites = rep("All Sites", 200),
Region = sample(c("North", "South"), 200, replace = TRUE),
Resource_Class = sample(c("Good", "Fair", "Poor"), 200, replace = TRUE),
ContVar = rnorm(200, 10, 1)
)
myvars_cat <- c("Resource_Class")
myvars_cont <- c("ContVar")
mysubpops <- c("All_Sites", "Region")
trend_analysis(
dframe = mydframe,
vars_cat = myvars_cat,
vars_cont = myvars_cont,
subpops = mysubpops,
model_cat = "WLR",
model_cont = "SLR",
siteID = "siteID",
yearID = "yearID",
weight = "wgt",
xcoord = "xcoord",
ycoord = "ycoord"
)
Print grts(), irs()), and analysis function warnings
Description
This function prints the warnings messages from the grts()
, irs()
,
and analysis functions.
Usage
warnprnt(warn_df = get("warn_df", envir = .GlobalEnv), m = 1:nrow(warn_df))
Arguments
warn_df |
Data frame that contains warning messages. The default is
|
m |
Vector of indices for warning messages that are to be printed. The
default is a vector containing the integers from 1 through the number of
rows in |
Value
Printed warnings.
Author(s)
Tom Kincaid Kincaid.Tom@epa.gov