Help for package robnptests

Version:

1.1.0

Type:

Package

Title:

Robust Nonparametric Two-Sample Tests for Location/Scale

Author:

Sermad Abbas

[aut, cre], Barbara Brune

[aut], Roland Fried

[aut]

Maintainer:

Sermad Abbas <abbas@statistik.tu-dortmund.de>

BugReports:

https://github.com/s-abbas/robnptests/issues

Description:

Implementations of several robust nonparametric two-sample tests for location or scale differences. The test statistics are based on robust location and scale estimators, e.g. the sample median or the Hodges-Lehmann estimators as described in Fried & Dehling (2011) <doi:10.1007/s10260-011-0164-1>. The p-values can be computed via the permutation principle, the randomization principle, or by using the asymptotic distributions of the test statistics under the null hypothesis, which ensures (approximate) distribution independence of the test decision. To test for a difference in scale, we apply the tests for location difference to transformed observations; see Fried (2012) <doi:10.1016/j.csda.2011.02.012>. Random noise on a small range can be added to the original observations in order to hold the significance level on data from discrete distributions. The location tests assume homoscedasticity and the scale tests require the location parameters to be zero.

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Depends:

R (≥ 4.0.0)

URL:

https://github.com/s-abbas/robnptests

Encoding:

UTF-8

RoxygenNote:

7.2.1

Imports:

Rdpack, gtools, robustbase, statmod, stats, utils, checkmate

RdMacros:

Rdpack

Suggests:

testthat, knitr, rmarkdown, usethis, covr

VignetteBuilder:

knitr

Config/testthat/edition:

NeedsCompilation:

Packaged:

2023-02-13 20:44:58 UTC; abbas

Repository:

CRAN

Date/Publication:

2023-02-13 21:10:02 UTC

Calculation of permutation p-value

Description

calc_perm_p_value calculates the permutation p-value following Phipson and Smyth (2010).

Usage

calc_perm_p_value(
  statistic,
  distribution,
  m,
  n,
  randomization,
  n.rep,
  alternative
)

Arguments

statistic

observed value of the test statistic.

distribution

a numeric vector with the permutation/randomization distribution.

m

an integer value giving size of first sample.

n

an integer value giving size of second sample.

randomization

a logical value indicating whether the p-value should be computed from a permutation (FALSE, default) or a randomization (TRUE) distribution.

n.rep

an integer value specifying the number of random splits used to calculate the randomization distribution if method = "randomization". This argument is ignored if method = "permutation" or method = "asymptotic". The default is n.rep = 10000.

alternative

a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater", or "less".

Value

p-value for the specified alternative.

References

Phipson B, Smyth GK (2010). “Permutation p-values should never be zero: Calculating exact p-values when permutations are randomly drawn.” Statistical Applications in Genetics and Molecular Biology, 9(1), Article 39. doi:10.2202/1544-6115.1585.

Checks for input arguments

Description

check_test_input is a helper functions that contains checks for the input arguments of the two-sample tests.

Usage

check_test_input(
  x,
  y,
  alternative,
  delta,
  method,
  scale,
  n.rep,
  na.rm,
  scale.test,
  wobble,
  wobble.seed,
  gamma = NULL,
  psi = NULL,
  k = NULL,
  test.name
)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

alternative

a character string specifying the alternative hypothesis, must be one of "two.sided", "greater", or "less".

delta

a numeric value indicating the true difference in the location or scale parameter, depending on whether the test should be performed for a difference in location or in scale.

scale

a character string specifying the scale estimator used for standardization in the test statistic; must be one of "S1", "S2", "S3", and "S4".

n.rep

an integer value specifying the number of random splits used to calculate the randomization distribution if method = "randomization".

na.rm

a logical value indicating whether NA values in x and y should be stripped before the computation proceeds.

scale.test

a logical value testing whether the samples should be compared for a difference in scale.

wobble

wobble.seed

an integer value used as a seed for the random number generation in case of wobble = TRUE or when scale.test = TRUE with one of the vectors x and y containing zeros. When no seed is specified, it is chosen randomly and printed in a message. The argument is ignored if scale.test = FALSE and/or wobble = FALSE.

gamma

a numeric value in [0, 0.5] specifying the fraction of observations to be trimmed/replaced from each end of the sample before trimmed mean/winsorized variance.

psi

kernel used for optimization in the computation of the M-estimates. Must be one of "bisquare", "hampel" and "huber".

k

tuning parameter(s) for the respective psi function.

test.name

character string specifying the two-sample test for which the helper function is used.

Details

The two-sample tests in this package share similar arguments. To reduce the amount of repetitive code, this function contains the argument checks so that only check_test_input needs to be called within the functions for the two-sample tests.

The scale estimators "S1" and "S2" can only be used in combination with test.name = "hl1_test" or test.name = "hl2_test". The estimators "S3" and "S4" can only be used with test.name = "med_test".

Value

An error message if a check fails.

Test decision for asymptotic versions of HL1-, HL2-, and MED-tests

Description

compute_results_asymptotic is a helper function to compute the test decision for the HL1-, HL2-, and MED-test when method = "asymptotic".

Usage

compute_results_asymptotic(x, y, alternative, delta, type)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

alternative

a character string specifying the alternative hypothesis, must be one of "two.sided", "greater", or "less".

delta

a numeric value indicating the true difference in the location or scale parameter, depending on whether the test should be performed for a difference in location or in scale.

type

a character string specifying the desired test statistic. It must be one of "HL11", "HL12", "HL21", "HL22", "MED1", and "MED2", where "HL1", "HL2" and "MED" specify the location estimator and the numbers 1 and 2 the scale estimator, see the vignette (vignette("robnptests")) for more information.

Value

A named list containing the following components:

statistic

the value of the test statistic.

estimates

the location estimates for both samples in case of the HL1- and the MED-tests. The estimate for the location difference in case of the HL2-tests.

p.value

the p-value for the test.

Finite-sample test decision for HL1-, HL2-, and MED-tests

Description

compute_results_finite is a helper function to compute the test decision for the HL1-, HL2-, and MED-test when method = "randomization" or method = "permutation".

Usage

compute_results_finite(x, y, alternative, delta, method, n.rep, type)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

alternative

a character string specifying the alternative hypothesis, must be one of "two.sided", "greater", or "less".

delta

a numeric value indicating the true difference in the location or scale parameter, depending on whether the test should be performed for a difference in location or in scale.

method

n.rep

an integer value specifying the number of random splits used to calculate the randomization distribution if method = "randomization".

type

Value

A named list containing the following components:

statistic

the value of the test statistic.

estimates

the location estimates for both samples in case of the HL1- and the MED-tests. The estimate for the location difference in case of the HL2-tests.

p.value

the p-value for the test.

Two-sample location tests based on one-sample Hodges-Lehmann estimator

Description

hl1_test performs a two-sample location test based on the difference of the one-sample Hodges-Lehmann estimators of both samples.

Usage

hl1_test(
  x,
  y,
  alternative = c("two.sided", "greater", "less"),
  delta = ifelse(scale.test, 1, 0),
  method = c("asymptotic", "permutation", "randomization"),
  scale = c("S1", "S2"),
  n.rep = 10000,
  na.rm = FALSE,
  scale.test = FALSE,
  wobble = FALSE,
  wobble.seed = NULL
)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

alternative

a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater", or "less".

delta

a numeric value indicating the true difference in the location or scale parameter, depending on whether the test should be performed for a difference in location or in scale. The default is delta = 0 for a location test and delta = 1 for a scale test. In case of scale.test = TRUE, delta represents the ratio of the squared scale parameters.

method

a character string specifying how the p-value is computed with possible values "asymptotic" for an asymptotic test based on a normal approximation, "permutation" for a permutation test, and "randomization" for a randomization test. The permutation test uses all splits of the joint sample into two samples of sizes m and n, while the randomization test draws n.rep random splits with replacement. The values m and n denote the sample sizes. If not specified explicitly, defaults to "permutation" if m < 30, n < 30 and n.rep >= choose(m + n, m), "randomization" if m < 30, n < 30 and n.rep < choose(m + n, m), and "asymptotic" if m >= 30 and n >= 30.

scale

a character string specifying the scale estimator used for standardization of the test statistic; must be one of "S1" and "S2". The default is "S1". Ignored if method = "asymptotic"; see details for the definition of the scale estimators.

n.rep

na.rm

a logical value indicating whether NA values in x and y should be stripped before the computation proceeds. The default is na.rm = FALSE.

scale.test

a logical value to specify if the samples should be compared for a difference in scale. The default is scale.test = FALSE.

wobble

a logical value indicating whether the sample should be checked for duplicated values that can cause the scale estimate to be zero. If such values are present, uniform noise is added to the sample, see wobble. Only necessary for the permutation and randomization version of the test. The default is wobble = FALSE.

wobble.seed

Details

The test statistic for this test is based on the difference of the one-sample Hodges-Lehmann estimators of x and y, see hodges_lehmann. Three versions of the test are implemented: randomization, permutation, and asymptotic.

The test statistic for the permutation and randomization version of the test is standardized using a robust scale estimator, see (Fried and Dehling 2011).

With scale = "S1", the scale is estimated by

S = med(|x_i - x_j|: 1 \le i < j \le m, |y_i - y_j|, 1 \le i < j \le n),

whereas scale = "S2" uses

S = med(|z_i - z_j|: 1 \le i < j \le m + n).

Here, z = (z_1, ..., z_{m + n}) = (x_1 - med(x), ..., x_m - med(x), y_1 - med(y), ..., y_n - med(y)) is the median-corrected sample.

The randomization distribution is based on randomly drawn splits with replacement. The function permp (Phipson and Smyth 2010) is used to calculate the p-value. For the asymptotic test, a transformed version of the difference of the HL1-estimators, which asymptotically follows a normal distribution, is used. For more details on the asymptotic test, see Fried and Dehling (2011).

For scale.test = TRUE, the test compares the two samples for a difference in scale. This is achieved by log-transforming the original squared observations, i.e. x is replaced by log(x^2) and y by log(y^2). A potential scale difference then appears as a location difference between the transformed samples, see Fried (2012). Note that the samples need to have equal locations. The sample should not contain zeros to prevent problems with the necessary log-transformation. If it contains zeros, uniform noise is added to all variables in order to remove zeros and a message is printed.

If the sample has been modified (either because of zeros if scale.test = TRUE or wobble = TRUE), the modified samples can be retrieved using

set.seed(wobble.seed); wobble(x, y).

Both samples need to contain at least 5 non-missing values.

Value

A named list with class "htest" containing the following components:

statistic

the value of the test statistic.

p.value

the p-value for the test.

estimate

the one-sample Hodges-Lehmann estimates of x and y (if scale.test = FALSE) or of log(x^2) and log(y^2) (if scale.test = TRUE).

null.value

the specified hypothesized value of the mean difference/squared scale ratio.

alternative

a character string describing the alternative hypothesis.

method

a character string indicating how the p-value was computed.

data.name

a character string giving the names of the data.

References

Fried R, Dehling H (2011). “Robust nonparametric tests for the two-sample location problem.” Statistical Methods & Applications, 20(4), 409–422. doi:10.1007/s10260-011-0164-1.

Fried R (2012). “On the online estimation of piecewise constant volatilities.” Computational Statistics & Data Analysis, 56(11), 3080–3090. doi:10.1016/j.csda.2011.02.012.

Examples

# Generate random samples
set.seed(108)
x <- rnorm(20); y <- rnorm(20)

# Asymptotic HL1 test
hl1_test(x, y, method = "asymptotic", scale = "S1")

## Not run: 
# HL12 test using randomization principle by drawing 1000 random permutations
# with replacement

hl1_test(x, y, method = "randomization", n.rep = 1000, scale = "S2")

## End(Not run)

Two-sample location tests based on two-sample Hodges-Lehmann estimator.

Description

hl2_test performs a two-sample location test based on the two-sample Hodges-Lehmann estimator for shift.

Usage

hl2_test(
  x,
  y,
  alternative = c("two.sided", "greater", "less"),
  delta = ifelse(scale.test, 1, 0),
  method = c("asymptotic", "permutation", "randomization"),
  scale = c("S1", "S2"),
  n.rep = 10000,
  na.rm = FALSE,
  scale.test = FALSE,
  wobble = FALSE,
  wobble.seed = NULL
)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

alternative

a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater", or "less".

delta

method

scale

n.rep

na.rm

a logical value indicating whether NA values in x and y should be stripped before the computation proceeds. The default is na.rm = FALSE.

scale.test

a logical value to specify if the samples should be compared for a difference in scale. The default is scale.test = FALSE.

wobble

wobble.seed

Details

The test statistic for this test is based on the two-sample Hodges-Lehmann estimator of x and y, see hodges_lehmann_2sample. Three versions of the test are implemented: randomization, permutation, and asymptotic.

The test statistic for the permutation and randomization version of the test is standardized using a robust scale estimator, see (Fried and Dehling 2011).

With scale = "S1", the scale is estimated by

S = med(|x_i - x_j|: 1 \le i < j \le m, |y_i - y_j|, 1 \le i < j \le n),

whereas scale = "S2" uses

S = med(|z_i - z_j|: 1 \le i < j \le m + n).

Here, z = (z_1, ..., z_{m + n}) = (x_1 - med(x), ..., x_m - med(x), y_1 - med(y), ..., y_n - med(y)) is the median-corrected sample.

The randomization distribution is based on randomly drawn splits with replacement. The function permp (Phipson and Smyth 2010) is used to calculate the p-value. For the asymptotic test, a transformed version of the HL2-estimator, which asymptotically follows a normal distribution, is used. For more details on the asymptotic test, see Fried and Dehling (2011).

If the sample has been modified (either because of zeros if scale.test = TRUE or wobble = TRUE), the modified samples can be retrieved using

set.seed(wobble.seed); wobble(x, y).

Both samples need to contain at least 5 non-missing values.

Value

A named list with class "htest" containing the following components:

statistic

the value of the test statistic.

p.value

the p-value for the test.

estimate

the estimated location difference between x and y (if scale.test = FALSE) or of log(x^2) and log(y^2) (if scale.test = TRUE) based on the two-sample Hodges-Lehmann estimator.

null.value

the specified hypothesized value of the mean difference/squared scale ratio.

alternative

a character string describing the alternative hypothesis.

method

a character string indicating how the p-value was computed.

data.name

a character string giving the names of the data.

References

Fried R, Dehling H (2011). “Robust nonparametric tests for the two-sample location problem.” Statistical Methods & Applications, 20(4), 409–422. doi:10.1007/s10260-011-0164-1.

Fried R (2012). “On the online estimation of piecewise constant volatilities.” Computational Statistics & Data Analysis, 56(11), 3080–3090. doi:10.1016/j.csda.2011.02.012.

Examples

# Generate random samples
set.seed(108)
x <- rnorm(20); y <- rnorm(20)

# Asymptotic HL2 test
hl2_test(x, y, method = "asymptotic", scale = "S1")

## Not run: 
# HL22 test using randomization principle by drawing 1000 random permutations
# with replacement

hl2_test(x, y, method = "randomization", n.rep = 1000, scale = "S2")

## End(Not run)

One-sample Hodges-Lehmann estimator

Description

hodges_lehmann calculates the one-sample Hodges-Lehmann estimator of a sample.

Usage

hodges_lehmann(x, na.rm = FALSE)

Arguments

x

a (non-empty) numeric vector of data values.

na.rm

a logical value indicating whether NA values in x and y should be stripped before the computation proceeds. The default is na.rm = FALSE.

Details

The one-sample Hodges-Lehmann estimator for a sample of size n is defined as

med(\frac{X_i + X_j}{2}, 1 \le i < j \le m).

Value

The one-sample Hodges-Lehmann estimator.

References

Hodges JL, Lehmann EL (1963). “Estimates of location based on rank tests.” The Annals of Mathematical Statistics, 34(2), 598–611. doi:10.1214/aoms/1177704172.

Examples

# Generate random sample
set.seed(108)
x <- rnorm(10)

# Compute one-sample Hodges-Lehmann estimator
hodges_lehmann(x)

Two-sample Hodges-Lehmann estimator

Description

hodges_lehmann_2sample calculates the two-sample Hodges-Lehmann estimator for the location difference of two samples x and y.

Usage

hodges_lehmann_2sample(x, y, na.rm = FALSE)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

na.rm

a logical value indicating whether NA values in x and y should be stripped before the computation proceeds. The default is na.rm = FALSE.

Details

The two-sample Hodges-Lehmann estimator for two samples x and y of sizes m and n is defined as

med(|x_i - y_j|, 1 \le i \le m, 1 \le j \le n).

Value

The two-sample Hodges-Lehmann estimator.

References

Hodges JL, Lehmann EL (1963). “Estimates of location based on rank tests.” The Annals of Mathematical Statistics, 34(2), 598–611. doi:10.1214/aoms/1177704172.

Examples

# Generate random samples
set.seed(108)
x <- rnorm(10); y <- rnorm(10)

# Compute two-sample Hodges-Lehmann estimator
hodges_lehmann_2sample(x, y)

M-estimator of location

Description

m_est calculates an M-estimate of location and its variance for different psi functions.

Usage

m_est(
  x,
  psi,
  k = robustbase::.Mpsi.tuning.default(psi),
  tol = 1e-06,
  max.it = 15,
  na.rm = FALSE
)

Arguments

x

a (non-empty) numeric vector of data values.

psi

kernel used for optimization. Must be one of "bisquare", "hampel" and "huber". The default is "huber".

k

tuning parameter(s) for the respective kernel function, defaults to parameters implemented in .Mpsi.tuning.default(psi) in the package robustbase.

tol

tolerance for convergence. The default is 1e-06.

max.it

the maximum number of iterations. The default is 15.

na.rm

a logical value indicating whether NA values in x and y should be stripped before the computation proceeds. The default is na.rm = FALSE.

Details

To compute the M-estimate, the iterative algorithm described in Maronna et al. (2019) is used. The variance is estimated as in Huber (1981).

If max.it contains decimal places, it is truncated to an integer value.

Value

A named list containing the components:

est

estimated mean.

var

estimated variance.

References

Maronna RA, Martin DR, Yohai VJ, Salibián-Barrera M (2019). Robust Statistics: Theory and Methods (with R), Wiley Series in Probability and Statistics, Second edition edition. Wiley. doi:10.1002/9781119214656.

Huber PJ (1981). Robust Statistics. Wiley, New York. doi:10.1002/0471725250.

Examples


# Generate random sample
set.seed(108)
x <- rnorm(10)

# Computer Huber's M-estimate
m_est(x, psi = "huber")

Permutation distribution for M-statistics

Description

mest_perm_distribution calculates the permutation distribution for the M-statistics from m_test_statistic.

Usage

m_est_perm_distribution(x, y, psi, k, randomization = FALSE, n.rep = 10000)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

psi

kernel used for optimization. Must be one of "bisquare", "hampel" and "huber". The default is "huber".

k

tuning parameter(s) for the respective kernel function, defaults to parameters implemented in .Mpsi.tuning.default(psi) in the package robustbase.

randomization

a logical value indicating whether the p-value should be computed from a permutation (FALSE, default) or a randomization (TRUE) distribution.

n.rep

an integer value specifying the number of random splits used to calculate the randomization distribution if method = "randomization". The default is n.rep = 10000.

Details

Missing values in either x or y are not allowed.

Value

Vector with permutation distribution of the test statistic specified by psi and k.

References

Maechler M, Rousseeuw P, Croux C, Todorov V, Ruckstuhl A, Salibián-Barrera M, Verbeke T, Koller M, Conceicao EL, di Palma MA (2022). robustbase: Basic robust statistics. R package version 0.95-0, https://CRAN.R-project.org/package=robustbase.

Two sample location test based on M-estimators

Description

m_test performs a two-sample location test based on an M-estimator.

Usage

m_test(
  x,
  y,
  alternative = c("two.sided", "greater", "less"),
  delta = ifelse(scale.test, 1, 0),
  method = c("asymptotic", "permutation", "randomization"),
  psi = c("huber", "hampel", "bisquare"),
  k = robustbase::.Mpsi.tuning.default(psi),
  n.rep = 10000,
  na.rm = FALSE,
  scale.test = FALSE,
  wobble.seed = NULL,
  ...
)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

alternative

a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater", or "less".

delta

method

psi

kernel used for optimization. Must be one of "bisquare", "hampel" and "huber". The default is "huber".

k

tuning parameter(s) for the respective kernel function, defaults to parameters implemented in .Mpsi.tuning.default(psi) in the package robustbase.

n.rep

na.rm

a logical value indicating whether NA values in x and y should be stripped before the computation proceeds. The default is na.rm = FALSE.

scale.test

a logical value to specify if the samples should be compared for a difference in scale. The default is scale.test = FALSE.

wobble.seed

an integer value used as a seed for the random number generation in case that scale.test = TRUE and one of the vectors x and y contains zeros. When no seed is specified, it is chosen randomly and printed in a message. The argument is ignored if scale.test = FALSE.

...

additional arguments c1 and c2 that can be passed to the function scaleTau2(), which is used internally for estimating the within-sample dispersion, in order to account for non-normal distributions; see Maronna and Zamar (2002).

Details

The test statistic for this test is based on the difference of the M-estimates of location of x and y, see m_est.

Three different psi-functions can be used: huber, hampel, and bisquare. The corresponding tuning parameter(s) can be set by the argument k of the function.

The estimate for the location difference is scaled by a pooled estimate for the standard deviation. This estimate is based on the tau-estimate of scale and is computed with the default parameter settings of the function scaleTau2. These can be changed if by setting c1 and c2.

More details on the construction of the test statistic are given in the vignettes vignette("robnptests") and vignette("m_tests").

Three versions of the test are implemented: randomization, permutation, and asymptotic.

The randomization distribution is based on randomly drawn splits with replacement. The function permp (Phipson and Smyth 2010) is used to calculate the p-value. The psi-function for the the M-estimate is computed with the implementations in the package robustbase.

For the asymptotic test, the distribution of the test statistic is approximated by a standard normal distribution. However, this is only justified under the normality assumption. When the observations do not come from a normal distribution, the tests might not keep the desired significance level. Simulations indicate that the level is kept under symmetric distributions if the variance exists. Under skewed distributions, it tends to be anti-conservative, see the vignette vignette("m_tests"). The test statistic can be corrected by a factor which has to be determined individually for a specific distribution in such cases.

If the sample has been modified because of zeros when scale.test = TRUE, the modified samples can be retrieved using

set.seed(wobble.seed); wobble(x, y)

Both samples need to contain at least 5 non-missing values.

Value

A named list with class "htest" containing the following components:

statistic

the value of the test statistic.

parameter

the degrees of freedom for the test statistic.

p.value

the p-value for the test.

estimate

the M-estimates of x and y (if scale.test = FALSE) or of log(x^2) and log(y^2) (if scale.test = TRUE).

null.value

the specified hypothesized value of the mean difference/squared scale ratio.

alternative

a character string describing the alternative hypothesis.

method

a character string indicating how the p-value was computed.

data.name

a character string giving the names of the data.

References

Fried R (2012). “On the online estimation of piecewise constant volatilities.” Computational Statistics & Data Analysis, 56(11), 3080–3090. doi:10.1016/j.csda.2011.02.012.

Maronna RA, Zamar RH (2002). “Robust estimates of location and dispersion of high-dimensional datasets.” Technometrics, 44(4), 307–317. doi:10.1198/004017002188618509.

Examples

# Generate random samples
set.seed(108)
x <- rnorm(20); y <- rnorm(20)

# Asymptotic test based on Huber M-estimator
m_test(x, y, method = "asymptotic", psi = "huber")

## Not run: 
# Randomization test based on Hampel M-estimator with 1000 random permutations
# drawn with replacement

m_test(x, y, method = "randomization", n.rep = 1000, psi = "hampel")

## End(Not run)

Test statistics for the M-tests

Description

m_test_statistic calculates the test statistics for tests based on M-estimators.

Usage

m_test_statistic(x, y, psi, k = robustbase::.Mpsi.tuning.default(psi), ...)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

psi

kernel used for optimization. Must be one of "bisquare", "hampel" and "huber". The default is "huber".

k

tuning parameter(s) for the respective kernel function, defaults to parameters implemented in .Mpsi.tuning.default(psi) in the package robustbase.

...

Details

For details on how the test statistic is constructed, we refer to the vignette vignette("m_tests")

Value

A named list containing the following components:

statistic

standardized test statistic.

estimates

M-estimates of location for both x and y.

Examples

# Generate random samples
set.seed(108)
x <- rnorm(20); y <- rnorm(20)

# Compute Huber-M-statistic
m_test_statistic(x, y, psi = "huber")

Two-sample location tests based on the sample median

Description

med_test performs a two-sample location test based on the difference of the sample medians for both samples.

Usage

med_test(
  x,
  y,
  alternative = c("two.sided", "greater", "less"),
  delta = ifelse(scale.test, 1, 0),
  method = c("asymptotic", "permutation", "randomization"),
  scale = c("S3", "S4"),
  n.rep = 10000,
  na.rm = FALSE,
  scale.test = FALSE,
  wobble = FALSE,
  wobble.seed = NULL
)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

alternative

a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater", or "less".

delta

method

scale

a character string specifying the scale estimator used for standardization of the test statistic, must be one of "S3" and "S4". The default is "S3". Ignored if method = "asymptotic"; see details for the definition of the scale estimators.

n.rep

na.rm

a logical value indicating whether NA values in x and y should be stripped before the computation proceeds. The default is na.rm = FALSE.

scale.test

a logical value to specify if the samples should be compared for a difference in scale. The default is scale.test = FALSE.

wobble

wobble.seed

Details

The test statistic for this test is based on the difference of the sample medians of x and y. Three versions of the test are implemented: randomization, permutation, and asymptotic.

The test statistic for the permutation and randomization version of the test is standardized using a robust scale estimator, see (Fried and Dehling 2011).

With scale = "S3", the scale is estimated by

S = 2 * (|x_1 - med(x)|, ..., |x_m - med(x)|, |y_1 - med(y)|, ..., |y_n - med(y)|),

whereas scale = "S4" uses

S = (med(|x_1 - med(x)|, ..., |x_m - med(x)|) + med(|y_1 - med(y)|, ..., |y_n - med(y)|).

When computing the randomization distribution based on randomly drawn splits with replacement, the function permp (Phipson and Smyth 2010) is used to calculate the p-value. For the asymptotic test, a transformed version of the difference of the sample medians, which asymptotically follows a normal distribution, is used. For more details on the asymptotic test, see Fried and Dehling (2011).

If the sample has been modified (either because of zeros for scale.test = TRUE, or wobble = TRUE), the modified samples can be retrieved using

set.seed(wobble.seed); wobble(x, y)

Both samples need to contain at least 5 non-missing values.

Value

A named list with class "htest" containing the following components:

statistic

the value of the test statistic.

p.value

the p-value for the test.

estimate

the sample medians of x and y (if scale.test = FALSE) or of log(x^2) and log(y^2) (if scale.test = TRUE).

null.value

the specified hypothesized value of the mean difference/squared scale ratio.

alternative

a character string describing the alternative hypothesis.

method

a character string indicating how the p-value was computed.

data.name

a character string giving the names of the data.

References

Fried R, Dehling H (2011). “Robust nonparametric tests for the two-sample location problem.” Statistical Methods & Applications, 20(4), 409–422. doi:10.1007/s10260-011-0164-1.

Fried R (2012). “On the online estimation of piecewise constant volatilities.” Computational Statistics & Data Analysis, 56(11), 3080–3090. doi:10.1016/j.csda.2011.02.012.

Examples

# Generate random samples
set.seed(108)
x <- rnorm(20); y <- rnorm(20)

# Asymptotic MED test
med_test(x, y, method = "asymptotic", scale = "S3")

## Not run: 
# MED2 test using randomization principle by drawing 1000 random permutations
# with replacement

med_test(x, y, method = "randomization", n.rep = 1000, scale = "S4")

## End(Not run)

Permutation distribution for robust test statistics

Description

perm_distribution() calculates the permutation distribution for several test statistics.

Usage

perm_distribution(x, y, type, randomization = FALSE, n.rep = 10000)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

type

a character string specifying the desired test statistic. It must be one of "HL11" (default), "HL12", "HL21", "HL22", "MED1", and "MED2", where "HL1", "HL2" and "MED" specify the location estimator and the numbers 1 and 2 the scale estimator, see the vignette vignette("robnptests") for more information.

randomization

a logical value indicating whether the p-value should be computed from a permutation (FALSE, default) or a randomization (TRUE) distribution.

n.rep

an integer value specifying the number of random splits used to calculate the randomization distribution if method = "randomization". The default is n.rep = 10000.

Details

Missing values in either x or y are not allowed.

Value

Vector with permutation distribution of the test statistic specified by type.

Preprocess data for the robust two sample tests

Description

preprocess_data is a helper function that performs several preprocessing steps on the data before performing the two-sample tests.

Usage

preprocess_data(x, y, delta, na.rm, wobble, wobble.seed, scale.test)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

delta

a numeric value indicating the true difference in the location or scale parameter, depending on whether the test should be performed for a difference in location or in scale.

na.rm

a logical value indicating whether NA values in x and y should be stripped before the computation proceeds.

wobble

wobble.seed

scale.test

a logical value to specify if the samples should be compared for a difference in scale.

Details

The preprocessing steps include the removal of missing values and, if specified, wobbling and a transformation of the observations to test for differences in scale.

Value

A named list containing the following components:

x

the (possibly transformed) input vector x.

y

the (possibly transformed) input vector y.

delta

the (possibly transformed) input value delta.

Robust test statistics based on robust location estimators

Description

rob_perm_statistic calculates test statistics for robust permutation/randomization tests based on the sample median, the one-sample Hodges-Lehmann estimator, or the two-sample Hodges-Lehmann estimator.

Usage

rob_perm_statistic(
  x,
  y,
  type = c("HL11", "HL12", "HL21", "HL22", "MED1", "MED2"),
  na.rm = FALSE
)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

type

na.rm

a logical value indicating whether NA values in x and y should be stripped before the computation proceeds. The default is na.rm = FALSE.

Details

The test statistics returned by rob_perm_statistic are of the form

D_i/S_j

where the D_i, i = 1,...,3, are different estimators of location and the S_j, j = 1,...,4, are estimates for the mutual sample scale. See Fried and Dehling (2011) or the vignette vignette("robnptests") for details.

Value

A named list containing the following components:

statistic

the selected test statistic.

estimates

estimate of location for each sample if available.

References

Fried R, Dehling H (2011). “Robust nonparametric tests for the two-sample location problem.” Statistical Methods & Applications, 20(4), 409–422. doi:10.1007/s10260-011-0164-1.

Examples

# Generate random samples
set.seed(108)
x <- rnorm(20); y <- rnorm(20)

# Compute HL21-statistic
rob_perm_statistic(x, y, type = "HL21")

Robust scale estimators based on median absolute deviation

Description

rob_scale calculates an estimator for the within-sample dispersion based on two samples.

Usage

rob_scale(
  x,
  y,
  type = c("S1", "S2", "S3", "S4"),
  na.rm = FALSE,
  check.for.zero = FALSE
)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

type

character that specifies the estimator for the variance, can be "S1", "S2", "S3" and "S4"; see details for description of the scale estimators.

na.rm

a logical value indicating whether NA values in x and y should be stripped before the computation proceeds. The default is na.rm = FALSE.

check.for.zero

logical value indicating a warning should be triggered if the scale estimate is zero. The default is FALSE.

Details

For definitions of the scale estimators, see Fried and Dehling (2011).

If check.for.zero = TRUE, an error is thrown when the scale estimate is zero. This argument is only included because the function is used in rob_perm_statistic to compute values of robust test statistics where the scale estimate is used for standardization. A scale estimate of zero leads to a non-existing test statistic, so that the corresponding test cannot be performed.

Value

An estimate of the pooled variance of the two samples.

References

Fried R, Dehling H (2011). “Robust nonparametric tests for the two-sample location problem.” Statistical Methods & Applications, 20(4), 409–422. doi:10.1007/s10260-011-0164-1.

Select principle for computing null distribution

Description

select_method is a helper function that chooses the principle for computing the null distribution of a two-sample test.

Usage

select_method(x, y, method, test.name, n.rep)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

method

test.name

character string specifying the two-sample test for which the helper function is used.

n.rep

an integer value specifying the number of random splits used to calculate the randomization distribution if method = "randomization".

Details

When the principle is specified by the user, i.e. method contains only one element, the selected method is returned. Otherwise, if the user does not specify the principle, it depends on the sample size: When both samples contain more than 30 observations, an asymptotic test is performed. If one of the samples contains less than 30 observations, the null distribution is computed via the randomization principle. The number of replications n.rep for the randomization test needs to be specified outside of this function. Each test function contains the argument n.rep where this can be done.

If n.rep is larger than the maximum number of splits and method = "randomization", a permutation test is performed.

Value

A character string, which specifies the principle for computing the null distribution.

Trimmed mean

Description

trim_mean calculates a trimmed mean of a sample.

Usage

trim_mean(x, gamma = 0.2, na.rm = FALSE)

Arguments

x

a (non-empty) numeric vector of data values.

gamma

a numeric value in [0, 0.5] specifying the fraction of observations to be trimmed from each end of the sample before calculating the mean. The default value is 0.2.

na.rm

a logical value indicating whether NA values in x and y should be stripped before the computation proceeds. The default is na.rm = FALSE.

Details

This is a wrapper function for the function mean.

Value

The trimmed mean.

Examples

# Generate random sample
set.seed(108)
x <- rnorm(10)

# Compute 20% trimmed mean
trim_mean(x, gamma = 0.2)

Test statistic for the two-sample trimmed t-test (Yuen's t-test)

Description

trimmed_t calculates the test statistic for the two-sample trimmed t-test.

Usage

trimmed_t(x, y, gamma = 0.2, na.rm = FALSE)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

gamma

a numeric value in [0, 0.5] specifying the fraction of observations to be trimmed from each end of the sample before calculating the mean. The default value is 0.2.

na.rm

a logical value indicating whether NA values in x and y should be stripped before the computation proceeds. The default is na.rm = FALSE.

Value

A named list containing the following components:

statistic

the value of the test statistic.

estimates

the trimmed means for both samples.

df

the degrees of freedom for the test statistic.

References

Yuen KK, Dixon WT (1973). “The approximate behaviour and performance of the two-sample trimmed t.” Biometrika, 60(2), 369–374. doi:10.2307/2334550.

Yuen KK (1974). “The two-sample trimmed t for unequal population variances.” Biometrika, 61(1), 165–170. doi:10.2307/2334299.

Examples

# Generate random samples
set.seed(108)
x <- rnorm(20); y <- rnorm(20)

# Compute trimmed t-statistic
trimmed_t(x, y, gamma = 0.2)

Two-sample trimmed t-test (Yuen's t-Test)

Description

trimmed_test performs the two-sample trimmed t-test.

Usage

trimmed_test(
  x,
  y,
  gamma = 0.2,
  alternative = c("two.sided", "less", "greater"),
  method = c("asymptotic", "permutation", "randomization"),
  delta = ifelse(scale.test, 1, 0),
  n.rep = 1000,
  na.rm = FALSE,
  scale.test = FALSE,
  wobble.seed = NULL
)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

gamma

a numeric value in [0, 0.5] specifying the fraction of observations to be trimmed from each end of the sample before calculating the mean. The default value is 0.2.

alternative

a character string specifying the alternative hypothesis, must be one of "two.sided" (default), "greater", or "less".

method

delta

n.rep

na.rm

a logical value indicating whether NA values in x and y should be stripped before the computation proceeds. The default is na.rm = FALSE.

scale.test

a logical value to specify if the samples should be compared for a difference in scale. The default is scale.test = FALSE.

wobble.seed

Details

The function performs Yuen's t-test based on the trimmed mean and winsorized variance (Yuen and Dixon 1973). The amount of trimming/winsorization is set in gamma and defaults to 0.2, i.e. 20% of the values are removed/replaced. In addition to the asymptotic distribution a permutation and a randomization version of the test are implemented.

When computing a randomization distribution based on randomly drawn splits with replacement, the function permp (Phipson and Smyth 2010) is used to calculate the p-value.

If the sample has been modified because of zeros when scale.test = TRUE, the modified samples can be retrieved using

set.seed(wobble.seed); wobble(x, y)

Both samples need to contain at least 5 non-missing values.

Value

A named list with class "htest" containing the following components:

statistic

the value of the test statistic.

parameter

the degrees of freedom for the test statistic.

p.value

the p-value for the test.

estimate

the trimmed means of x and y (if scale.test = FALSE) or of log(x^2) and log(y^2) (if scale.test = TRUE).

null.value

the specified hypothesized value of the mean difference/squared scale ratio.

alternative

a character string describing the alternative hypothesis.

method

a character string indicating how the p-value was computed.

data.name

a character string giving the names of the data.

References

Yuen KK, Dixon WT (1973). “The approximate behaviour and performance of the two-sample trimmed t.” Biometrika, 60(2), 369–374. doi:10.2307/2334550.

Yuen KK (1974). “The two-sample trimmed t for unequal population variances.” Biometrika, 61(1), 165–170. doi:10.2307/2334299.

Fried R (2012). “On the online estimation of piecewise constant volatilities.” Computational Statistics & Data Analysis, 56(11), 3080–3090. doi:10.1016/j.csda.2011.02.012.

Examples

# Generate random samples
set.seed(108)
x <- rnorm(20); y <- rnorm(20)

# Trimmed t-test
trimmed_test(x, y, gamma = 0.1)

Winsorized mean

Description

win_mean calculates the winsorized mean of a sample.

Usage

win_mean(x, gamma = 0.2, na.rm = FALSE)

Arguments

x

a (non-empty) numeric vector of data values.

gamma

a numeric value in [0, 0.5] specifying the fraction of observations to be replaced at each end of the sample before calculating the mean. The default value is 0.2.

na.rm

a logical value indicating whether NA values in x and y should be stripped before the computation proceeds. The default is na.rm = FALSE.

Value

The winsorized mean.

Examples

# Generate random samples
set.seed(108)
x <- rnorm(10)

# Compute 20% winsorized mean
win_mean(x, gamma = 0.2)

Winsorized variance

Description

win_var calculates the winsorized variance of a sample.

Usage

win_var(x, gamma = 0, na.rm = FALSE)

Arguments

x

a (non-empty) numeric vector of data values.

gamma

a numeric value in [0, 0.5] specifying the fraction of observations to be replaced at each end of the sample before calculating the mean. The default value is 0.2.

na.rm

a logical value indicating whether NA values in x and y should be stripped before the computation proceeds. The default is na.rm = FALSE.

Value

A named list containing the following items:

var

winsorized variance.

h

degrees of freedom used for tests based on trimmed means and the winsorized variance.

Examples

# Generate random sample
set.seed(108)
x <- rnorm(10)

# Compute 20% winsorized variance
win_var(x, gamma = 0.2)

Add random noise to remove ties

Description

wobble adds noise from a continuous uniform distribution to the observations to remove ties.

Usage

wobble(x, y, check = TRUE)

Arguments

x

a (non-empty) numeric vector of data values.

y

a (non-empty) numeric vector of data values.

check

a logical value indicating whether the samples should be checked for bindings prior to adding uniform noise or not, defaults to TRUE.

Details

If check = TRUE the function checks whether all values in the two numeric input vectors are distinct. If so, it returns the original values, otherwise the ties are removed by adding noise from a continuous uniform distribution to all observations. If check = FALSE, it simply determines the number of digits and adds uniform noise.

More precisely, we determine the minimum number of digits d_min in the sample and then add random numbers from the U[-0.5 10^(-d_min), 0.5 10^(-d_min)] distribution to each of the observations.

Value

A named list of length two containing the modified input samples x and y.

References

Fried R, Gather U (2007). “On rank tests for shift detection in time series.” Computational Statistics & Data Analysis, 52(1), 221–233. doi:10.1016/j.csda.2006.12.017.

Examples

x <- rnorm(20); y <- rnorm(20); x <- round(x)
wobble(x, y)