Version: | 0.8.10 |
Date: | 2018-04-15 |
Title: | Analysis of Scientific Evidence Using Bayesian and Likelihood Methods |
Author: | Robert van Hulst |
Maintainer: | Robert van Hulst <rvhulst@ubishops.ca> |
BugReports: | https://github.com/rvhulst/evidence/ |
Depends: | rstan, rstanarm, loo, lattice, stats, utils, graphics, grDevices |
Imports: | LearnBayes, LaplacesDemon, |
ByteCompile: | TRUE |
Description: | Bayesian (and some likelihoodist) functions as alternatives to hypothesis-testing functions in R base using a user interface patterned after those of R's hypothesis testing functions. See McElreath (2016, ISBN: 978-1-4822-5344-3), Gelman and Hill (2007, ISBN: 0-521-68689-X) (new edition in preparation) and Albert (2009, ISBN: 978-0-387-71384-7) for good introductions to Bayesian analysis and Pawitan (2002, ISBN: 0-19-850765-8) for the Likelihood approach. The functions in the package also make extensive use of graphical displays for data exploration and model comparison. |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
NeedsCompilation: | no |
Packaged: | 2018-05-15 14:42:44 UTC; rvhulst |
Repository: | CRAN |
Date/Publication: | 2018-05-15 15:19:39 UTC |
evidence: Functions and Data for Bayesian and Likelihood Analysis
Description
The functions in this package include Bayesian and likelihood alternatives to the standard statistical hypothesis tests that form part of base R. Their aim is to provide a wider perspective on how statistical evidence can be analyzed than the usual hypothesis-testing one. In view of the increasing importance in science of Bayesian and likelihood inference a wider exposure to these alternatives has become overdue.
This package makes Bayesian and likelihood analyses of simple statistical problems as convenient as traditional frequentist ones are in R. In addition, it makes effective use of R's excellent plotting capabilities, and facilitates exploratory data analysis and an interactive approach to modeling. Both data exploration and model exploration are crucial in data analysis, and these are facilitated by an interactive and graphics-centered approach.
Details
Package: | evidence |
Type: | Package |
Version: | 0.8.10 |
Date: | 2018-04-15 |
License: | GPL |
Author(s)
Robert van Hulst
Maintainer: <rvhulst.ubishops.ca>
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Made-up data for a balanced one-way anova.
Description
Made-up data with easy numbers for practicing one-way anova by hand to understand how an anova works.
Usage
data(AOV1)
Format
A data frame with 15 observations on the following 2 variables.
y
response
i
predictor, a factor with 3 levels
Details
Note that the design is balanced.
Source
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
data(AOV1)
summary(aov(y ~ i, data=AOV1))
Made-up data for an unbalanced one-way anova.
Description
Made-up data with easy numbers for practicing one-way anova by hand to understand how an anova works.
Usage
data(AOV2)
Format
A data frame with 22 observations on the following 2 variables:
y
response
i
predictor: a factor with 4 levels
Details
Note that the design is unbalanced.
Source
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
data(AOV2, package)
summary(aov(y ~ i, data=AOV2))
A contingency table for heart attacks and aspirin use.
Description
The Physicians health study data cross-classified according to Infarct (heart attack or not) and Group (Placebo or Aspirin).
Usage
data(Aspirin)
Format
A 2 by 2 matrix of counts with row names:
Infarct:Yes and Infarct:No,
and column names:
Group:placebo and Group:aspirin.
Source
Steering Committee of the Physicians' Health Study Research Group. 1989. Final report of the aspirin component of the ongoing Physicians' Health Study. N Engl J Med, 321:129–135.
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Bayesian analysis of one sample from a Normal distribution with imprecise priors.
Description
This function performs a standard Bayesian analysis of a single sample of a population presumably following a Normal distribution. Imprecise priors for the mean and the standard deviation are used.
Usage
B1Nmean(x, plotit = TRUE, hists = FALSE, pdf = FALSE)
Arguments
x |
a vector of sample values |
plotit |
should the function produce plots? Defaults to TRUE. |
hists |
should histograms of the posterior distribution for the data with twenty posterior predictive histograms also be plotted? Defaults to FALSE. |
pdf |
should the histograms be saved as a pdf-file? Defaults to FALSE. |
Value
none produced: text and graphical output are produced
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
See Also
Examples
## Not run:
data(Fat)
B1Nmean(Fat$Height)
## End(Not run)
Bayesian analysis of a Normal sample using a SIR prior.
Description
This function performs a standard Bayesian analysis of a single sample of a population assumed to follow a Normal distribution. A Standard Improper Reference prior is assumed.
Usage
B1Nsir(x, r = 10000, alpha = 0.05)
Arguments
x |
a vector of sample values |
r |
the number of samples to be taken from the posterior distribution (defaults to 10000) |
alpha |
1 - level of credibility, so that for alpha = 0.05 (the default) credible intervals will have 95% credibility |
Value
none returned; the function produces a plot of the posterior distribution and prints some statistics.
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
See Also
Examples
data(darwin)
B1Nsir(darwin$difference)
Bayesian analysis of the binomial parameter for one sample.
Description
This function computes the posterior distribution of the binomial
probability \pi
when given the number of “successes” and the sample
size, as well as one of a choice of priors. A plot of the posterior
distribution is produced with the 95% credible interval of \pi
.
Usage
B1prop(s, n, p = 0.5, alpha = 0.05, prior = c("uniform", "near_0.5",
"not_near_0.5", "near_0", "near_1", "custom"), params = NULL)
Arguments
s |
the number of sampling units with the feature |
n |
the number of sampling units examined |
p |
an optional hypothesized probability |
alpha |
1 - alpha is the desired level of credibility of a credible interval |
prior |
one of: "uniform", "near_0.5", "not_near_0.5", "near_0", "near_1", "custom", which are all beta distributions with appropriate parameter values. Note that if prior="custom" the following argument has to be supplied: |
params |
a vector with the a and b parameters of the custom beta prior |
Value
the posterior probability
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
See Also
Examples
B1prop(13, 100, .1, prior="near_0")
simulates Bayesian updating of the binomial parameter \pi
.
Description
Provides a simple demonstration of how the posterior distribution improves as increasing amounts of data become available. A Binomial variable with a known parametric probability is sampled, and as increasing numbers of samples become available the posterior distribution is re-evaluated and plotted.
Usage
B1propSim(p, N = 100, prior = c("uniform", "near_0.5",
"not_near_0.5", "near_0", "near_1"))
Arguments
p |
the “real” binomial probability; if a number samller than 0 or one lager than 1 isentered the function will choose an arbitrary probability |
N |
the number of observations to accumulate |
prior |
one of: "uniform", "near_0.5", "not_near_0.5", "near_0", or "near_1". |
Value
none returned; the function is run for the plot it produces.
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
B1propSim(p = 0.44, prior = "near_0.5")
Bayesian analysis of the means of two Normal samples using SIR priors.
Description
Produces exploratory plots (boxplots and, if the sample sizes are equal), a quantile-quantile plot of the two samples. Also produces Bayesian posterior densities of the two sample means and of the difference between the means. The priors used are standard improper reference priors.
Usage
B2Nsir(formula, data, var.equal = TRUE, alpha = 0.05, plotit = TRUE, r = 10000)
Arguments
formula |
the standard formula interface: response ~ factor |
data |
a data.frame containing the response and the two-level factor |
var.equal |
if TRUE the group variances are assumed to be equal, if FALSE two separate group variances are estimated |
alpha |
1 - level of credibility, so that for alpha = 0.05 (the default) credible intervals will have 95% credibility |
plotit |
should plots be produced? |
r |
the number of samples from the posterior distribution; can usually be left at its default value of 10000 |
Details
Note that in the first plot the second sub-plot is NOT a normality plot but a quantile-quantile plot that compares the observations in the two groups.
Value
none returned; the function produces several plots and prints some statistics.
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
See Also
Examples
data(bodytemp)
B2Nsir(temperature ~ gender, bodytemp)
Bayesian analysis of the binomial parameters for two samples.
Description
This function computes the posterior distributions of the binomial
parameters \pi[1]
and \pi[2]
when given the numbers of
“successes” and the sample sizes for the two samples. It uses uniform
priors. A plot of the posterior distributions of the two \pi
's is
produced, and a plot of the posterior distribution of \pi[1] - \pi[2]
with its 95% credible interval.
Usage
B2props(s, n, alpha = 0.05)
Arguments
s |
a vector containing the 2 numbers of sampling units with the feature ("success") |
n |
a vector containing the 2 numbers of sampling units examined |
alpha |
1 - level of credibility, so that for alpha = 0.05 (the default) credible intervals will have 95% credibility |
Value
None, the inferred difference between the probabilities and its 95% credible interval is calculated and several plots are produced
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
See Also
Examples
B2props(c(13, 22), c(78, 92))
Bayesian analysis of a 2 x 2 contingency table.
Description
A 2 x 2 contingency table (in matrix form) is analyzed in a Bayesian way using uniform priors. The posterior probabilities of each of the the two outcomes given the other factor levels are calculated. See MacKay(2003, p. 460).
Usage
Bft2x2(X, div = 100, plotit = TRUE)
Arguments
X |
a contingency table in the form of a 2 x 2 matrix with row and column names |
div |
optional: the number of divisions for the row and column variables for use in calculations (can be left at 100) |
plotit |
should plots be produced? (defaults to TRUE) |
Details
Note that the rows of the 2 x 2 matrix are assumed to represent the "outcomes" and the columns the "treatments"—where these expressions are applicable. Note also that to obtain properly labeled plots the matrix has to be supplied with dimnames.
Value
the matrix of div
x div
posterior probabilities that was plotted
Author(s)
Robert van Hulst
References
MacKay, D.J.C. 2003. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge.
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
See Also
Examples
data(Glasses)
Bft2x2(Glasses)
a simple example of the bias–variance trade-off.
Description
A total of eight models are fitted to a data set consisting of seven predictors. The response is the exact fit with a variable amount of zero-mean noise added. This is repeated a certain number of times (by default, 100 times). Plots of Bias^2 and variance vs. the number of parameters are produced.
Usage
BiasVarTO(times = 100)
Arguments
times |
the number of repeats to average bias and variance over (default 100) |
Value
none produced, the function produces two plots
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Simulated clutch size data for birds with different nesting locations.
Description
These made-up data do respect the average clutch sizes (number of eggs laid in a single brood) and incubation periods that were observed in different European bird species with four different types of nests, as reported in Case(2000).
Usage
data(BirdsCS)
Format
A data frame with 40 observations on the following 3 variables:
Nest
kind of nest, a factor with levels
hole
,roofed
,niche
, andopen
Inc.Per
average duration of the incubation period (days)
ClutchSize
the typical number of eggs in a nest
Source
Case, T.J. An Illustrated Guide to Theoretical Ecology. Oxford University Press, New York.
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
data(BirdsCS)
library(graphics)
coplot(ClutchSize ~ Inc.Per | Nest, BirdsCS, panel=panel.smooth)
Bayesian analysis of n >= 2 Normal means with standard improper reference priors.
Description
Several exploratory plots are produced, after which this function calculates and plots the posterior densities of the treatment means and their differences. Pooled or separate variances can be specified. Note that this function uses Standard Improper Reference (SIR) priors.
Usage
BnNsir(formula, data, var.equal = TRUE, alpha = 0.05, plotit = TRUE,
r = 10000)
Arguments
formula |
the usual formula interface: response ~ factor |
data |
a data.frame containing the response and the factor variables |
var.equal |
should a pooled variance be used? Specify var.equal = FALSE if you want separate variances to be fitted |
alpha |
1 - level of credibility, so that for alpha = 0.05 (the default) credible intervals will have 95% credibility |
plotit |
are plots desired? |
r |
the number of samples of the posterior that should be taken |
Value
none returned: the function is used for the plots and the printed information it produces
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
See Also
Examples
data(PlantGrowth)
BnNsir(weight ~ group, PlantGrowth)
Bayesian regression model comparison with Bayes factors.
Description
This function compares different linear models on the basis of their Bayes factors and by graphically comparing posterior model probabilities.
Usage
Bregbf(form.list, data, l=length(form.list))
Arguments
form.list |
a list of linear models, each expressed by a model formula, that should be compared; the models must all be applicable to the same data frame and use the same response variable |
data |
a data frame to be analyzed |
l |
the number of models to be compared; defaults to all models in the form.list |
Details
Note that a list containing several appropriate models for the data frame should be prepared beforehand. See the example for how to do this.
Value
A list with model parameter probabilities is silently returned.
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
## Not run:
data(PlantGrowth)
frmlst <- list(
model0 = formula(weight ~ 1),
model1 = formula(weight ~ group) )
Bregbf(form.list=frmlst, data=PlantGrowth)
data(fev)
frmlst.fev <- list(
formula(FEV ~ Age),
formula(FEV ~ Smoke),
formula(FEV ~ Age + Smoke),
formula(FEV ~ Age * Smoke)
)
Bregbf(frmlst.fev, fev)
## End(Not run)
Bayesian t-test using reference priors.
Description
The Bayesian “t-test” developed by Bernardo and Perez (2007) that calculates the Bayes-factor against the null hypothesis of no difference.
Usage
Bt.test(formula, data, plotit = TRUE)
Arguments
formula |
the usual formula interface: response ~ factor |
data |
a data.frame with the response values and the factor values for all samples; the factor can only have two factor levels |
plotit |
is plotted output required? |
Value
none supplied: the function is used for the plotted and printed output it produces
Author(s)
Robert van Hulst
References
J. Bernardo and S. Perez. Comparing normal means: New methods for an old problem. Bayesian Analysis, 2:45–58, 2007.
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
See Also
Examples
data(bodytemp)
Bt.test(temperature ~ gender, bodytemp)
Bt.test(heart.rate ~ gender, bodytemp)
Contingency Table Analysis in different ways
Description
An n x n contingency table is analyzed in frequentist, information-theoretical, likelihood, and Bayesian ways. Note that for the Bayesian analysis package LearnBayes needs to be installed.
Usage
CTA(X, extBayes = FALSE)
Arguments
X |
a matrix with non-negative integers representing the counts for the row-column levels |
extBayes |
should a Bayesian analysis with a near-independence prior (instead of only an independence prior) be done as well? Defaults to FALSE. |
Value
none provided: the function is run for its graphical and numerical output
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
See Also
Examples
data(Smoking)
CTA(Smoking)
Made-up data to illustrate Simpson's paradox.
Description
These made-up data illustrate the discrete form (contingency table form) of Simpson's paradox.
Usage
data(Clin)
Format
A three-dimensional array of frequencies with:
rows indicating "outcome" (either "death" or "cured"),
columns indicating "male" (either "Yes" or "No"), and
layers indicating "clinic" (either "A" or "B").
Source
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
data(Clin)
Clin[1,,]
prop.table(Clin[1,,], 2)
Human body fat and several covariates for calculating it.
Description
Data from Johnson (1996) on human body fat: determined by under-water weight and several covariates to estimate it statistically.
Usage
data(Fat)
Format
A data frame with 252 observations on the following 19 variables:
Case
case number
PBF.B
percentage body fat estimated using Brozek's equation
PBF.S
percentage body fat estimated using Siri's equation
Dens
Density (gm/cm^3)
Age
Age (yrs)
Weight
Weight (lbs)
Height
Height (inches)
AI
Adiposity index = Weight/Height^2 (kg/m^2)
FFWt
Fat Free Weight using Brozek's formula (lbs)
Neck
Neck circumference (cm)
Chest
Chest circumference (cm)
Abd
Abdomen circumference (cm)
Hip
Hip circumference (cm)
Thigh
Thigh circumference (cm)
Knee
Knee circumference (cm)
Ankle
Ankle circumference (cm)
Biceps
Extended biceps circumference (cm)
FArm
Forearm circumference (cm)
Wrist
Wrist circumference (cm)
Source
Johnson, R. 1996. Fitting percentage of body fat to simple body measurements. Journal of Statistics Education 2(1), 1–6.
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
data(Fat)
qqnorm(Fat$Height)
qqline(Fat$Height)
A contingency table of 16 British youths categorized as juvenile delinquents or not, and as wearing glasses or not.
Description
Data from Heidelberger and Holland(2004) categorizing a random sample of 16 British juveniles on the basis of whether they were juvenile delinquents or not, and whether wore glasses or not.
Usage
data(Glasses)
Format
A matrix with 16 counts cross-classified on Juvenile delinquency (rows) and the wearing of glasses (columns).
Source
Heiberger, R.M. and Holland, B.(2004) Statistical Analysis and Data Display: An Intermediate Course with Examples in S-PLUS, R, and SAS. Springer, New York.
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
data(Glasses)
Bft2x2(Glasses)
generates the 100 * (1 - alpha)% most probable interval of a distribution of empirical values
Description
function used to produce a Bayesian credible interval of a unimodal distribution of empirical values using the Highest Posterior Probability approach
Usage
HPDcrd(x, alpha = 0.05)
Arguments
x |
a vector of empirical values |
alpha |
1 - alpha is the desired level of credibility |
Value
a vector of the lower and upper limits of the 95% credible interval calculated using a standard algorithm
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
HPDcrd(rnorm(1000))
Morphology of horseshoe crabs.
Description
Data on horseshoe crab morphology collected by Brockman(1996) and used by Agresti(2012).
Usage
data(HSCrab)
Format
A data frame with 173 observations on the following 5 variables:
Col
an indicator variable for the carapace color
spineW
coded width of the spine
Width
maximal width of the carapace (cm)
Satell
number of satellite males
Weight
weight in g
Source
Brockman, H.J.(1996) Satellite male groups in horseshoe crabs, Limulus polyphemus Ethology 102(1), 1–21.
References
Agresti, A.(2012) Categorical Data Analysis (3rd ed.) Wiley, New York.
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
data(HSCrab)
plot(Weight ~ Width, col = Col, data = HSCrab)
Likelihood analysis of the binomial parameter for one sample.
Description
When given the number of “successes” and the sample size this
function plots the normed likelihood of values of the binomial
parameter \pi
and calculates the likelihood ratio for a hypothesized
value and the maximum likelihood value for the sample, as well as an
approximate frequentist p-value.
Usage
L1prop(x, n, p.hypoth, pLset=0.05)
Arguments
x |
the number of sampling units with the feature |
n |
the number of sampling units examined |
p.hypoth |
the hypothesized probability |
pLset |
the desired likelihood for the likelihood interval |
Value
none, the normed likelihood for different values of the binomial probability is plotted with the likelihood interval, and some information is printed
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Pawitan, Y. 2001. In All Likelihood. Oxford University Press, Oxford.
See Also
Examples
L1prop(13, 78, 0.02)
Likelihood analysis of the binomial parameters for two samples.
Description
When given the numbers of “successes” and the sample sizes for the two samples, this function plots the normed likelihoods of the two samples and calculates the likelihood ratio for two different models, one fitting two binomial parameters, and one fitting only one.
Usage
L2prop(x, n)
Arguments
x |
a vector containing the 2 numbers of sampling units with the feature |
n |
a vector containing the 2 numbers of sampling units examined |
Value
none, the inferred difference between the probabilities and its 95% credible interval are calculated and a plot is produced
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
See Also
Examples
L2prop(c(13, 22), c(78, 92))
Computes the posterior probability of having a certain disease from prevalence, sensitivity, and specificity data.
Description
If experimental data on the sensitivity and the specificity of a diagnostic test are available, and the prevalence of the the condition is known with its raw data, then this function estimates the posterior probability of having the condition, with its 95% credible interval.
Usage
MedDiagn(x0, n0, x1, n1, x2, n2, N = 10000,
alpha = 0.05, pdf = FALSE)
Arguments
x0 |
prevalence raw data: number of people with a certain condition |
n0 |
number of people examined for that condition |
x1 |
sensitivity data: number of people with the disease for whom this test was positive |
n1 |
total number of people in the sensitivity sample |
x2 |
specificity raw data: number of people who did not have the disease who tested negative |
n2 |
total number of people in the specificity sample |
N |
number of cases to be simulated (best left at 10000 or greater |
alpha |
credibility required (default 95%) |
pdf |
set this to TRUE only if you want to keep a pdf-file of the posterior probability plot |
Value
none returned: a plot and printed information are produced
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
MedDiagn(105, 35000, 72, 80, 640, 800)
computes the Negative Predictive Value.
Description
The negative predictive value (NPV) of a diagnostic test is the probability that someone with a negative diagnostic test for a condition does not have the condition. The NPV can easily be calculated from the prevalence, the sensitivity, and the specificity, but this function automates the procedure.
Usage
NPV(sens, spec, prev)
Arguments
sens |
the sensitivity of the test |
spec |
the specificity of the test |
prev |
the prevalence of the disease |
Value
the negative predictive value
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
See Also
Examples
NPV(0.9, 0.8, 0.003)
calculates the positive predictive value (PPV) of a diagnostic test.
Description
The positive predictive value (PPV) of a dianostic test is the probability that someone with a positive diagnostic test for a condition does have the condition. The PPV can easily be calculated from the prevalence, the sensitivity, and the specificity, but this function automates the procedure.
Usage
PPV(sens, spec, prev)
Arguments
sens |
the sensitivity of the test |
spec |
the specificity of the test |
prev |
the prevalence of the disease |
Value
the positive predictive value of the test
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
See Also
Examples
PPV(0.9, 0.8, 0.003)
Data of the growth of tissue cultures on five different media.
Description
These data came from a designed experiment reported in Sokal and Rohlf(1995), box 9.4. The growth (in arbitrary units) of pea sections grown in tissue culture on five different sugars was replicated ten times.
Usage
data(SRb94)
Format
A data frame with 50 observations on the following 2 variables:
L
length difference in mm
Treatm
a factor with levels "Contr", "fruct.", "gluc.", "gluc&fruct.", and "sucr."
Source
Sokal, R.R., and Rohlf, F.J. Biometry. Freeman, New York.
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
data(SRb94)
with(SRb94, meansplot(L, Treatm))
A support function that calculates the sum of squares of a data vector.
Description
The sum of squares of the input vector is returned.
Usage
SSQ(x)
Arguments
x |
a vector of numbers without missing values |
Value
the sum of squares of x
Author(s)
Robert van Hulst
Examples
SSQ(x = rnorm(n=100))
Mortality due to heart infarct in smokers and non-smokers.
Description
The data are from a retrospective study that compared mortality due to a heart infarct in people who smoked and sex-matched controls who did not.
Usage
data(Smoking)
Format
A matrix with 781 observations cross-classified on the following 2 factors: “Infarct” (”Yes” or “Control”, rows), and “EverSmoked” (”Yes” or “No”, columns).
Source
unknown
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Data on the incidence of hypertension and three indicator variables.
Description
A total of 433 persons were tested for hypertension and checked for whether they were smokers, obese, or snored. The data are in Altman(1991).
Usage
data(Snoring)
Format
A data frame with 8 observations on the following 5 variables:
smoking
did the person smoke (1) or not (0)?
obese
was the person obese (1) or not (0)?
snoring
did the person snore (1) or not (0)?
n
the number of persons observed with these covariates
hypert
did the person suffer from hypertension (1) or not (0)?
Source
Altman, D.G. 1991. Practical Statistics for Medical Research. Chapman \& Hall, London.
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
data(Snoring)
fit <- glm(cbind(hypert, n - hypert) ~ smoking + obese + snoring,
family=binomial, data=Snoring)
summary(fit)
function to plot diverse Beta distributions for use as Binomial priors
Description
This function just plots some Beta distributions with commonly used parameters
Usage
binPriorsPlot()
Value
none produced, the function just produces one (compound) plot
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
binPriorsPlot()
Data on body temperature, heart rate, and gender of 130 human subjects.
Description
These data were collected by Mackowiak, Wasserman, and Levine(1992), and have been used, among others, by Ntzoufras(2009).
Usage
data(bodytemp)
Format
A data frame with 130 observations on the following 3 variables:
temperature
body temperature in degrees Fahrenheit
gender
a factor with levels 'female' and 'male'
heart.rate
heart rate in beats per minute
Source
Mackowiak, P.A., Wasserman, S.S., and Levine, M.M.(1992) A critical appraisal of 98.6 degrees F, the upper limit of the normal body temperature, and other legacies of Carl Reinhold August Wunderlich. JASA 268, 1578–1580.
Ntzoufras, I.(2009) Bayesian Modeling Using Winbugs. Wiley, Hoboken, N.J.
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
data(bodytemp)
B2Nsir(temperature ~ gender, bodytemp)
Mortality data of moth larvae due to increasing doses of insecticide.
Description
Batches of twenty larvae were exposed to increasing doses of insecticide, and the number of survivors and their sexes were noted. These data were reported by Collett(1991) and used by Venables and Ripley(1994 and later editions). They resulted from an experiment to study the toxicity of a pyrethroid insecticide to the tobacco budworm Heliothis virescens of different doses of the insecticide.
Usage
data(budworm)
Format
A data frame with 12 observations on the following 3 variables:
ldose
the log of the dose of the insecticide
dead
the number of budworms that were dead a day later
sex
a factor with two levels: “F” and “M”
Source
Collett, D. 1991. Modelling Binary Data. Chapman and Hall, London.
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Venables, W.N. and Ripley, B.D. 1994. Modern Applied Statistics with S-PLUS. Springer Verlag, New York.
Examples
data(budworm)
fit <- glm(cbind(dead, 20 - dead) ~ ldose, data=budworm,
family=binomial)
summary(fit)
Made-up data that are not unlike the actual data collected by Nespolo et al.(2003).
Description
Nespolo et al.(2003) collected data on the metabolic rates (as measured by oxygen consumption) of crickets kept and acclimated at three different temperatures. Since the original data were not available and only a statistical summary was published, we simulated these data to approximately agree with the statistical summary.
Usage
data(crickets)
Format
A data frame with 292 observations on the following 3 variables:
VO2
oxygen consumption in
\mu
l/h (a measure of basal metabolic rate)mass
weight of the cricket in mg
temp
temperature in degrees C.
Source
Nespolo et al., 2003.
References
Nespolo, R.F., Lardies, M.A., and Bozinovic, F. 2003. Intrapopulational variation in the standard metabolic rate of insects: repeatability, thermal dependence and sensitivity of (Q[10]) on oxygen consumption in a cricket. Journal of Experimental Biology 206, 4309–4315.
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
data(crickets)
crickets7 <- subset(crickets, crickets$temp==7)
with(crickets7, scatter.smooth(mass, VO2))
Charles Darwin's (1876) data on the fecundity of selfed and crossed corn plants.
Description
Charles Darwin(1876) provided data on the difference in the heights attained by selfed and crossed mother plants.
Usage
data(darwin)
Format
A data frame with 15 observations on the following variable:
difference
the difference in height in inches between each paired pair of offspring of a selfed and a crossed mother plant
Source
Darwin, C.R. 1876. The effects of cross and self fertilisation in the vegetable kingdom. John Murray, London.
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
data(darwin)
with(darwin, qqnorm(difference) )
with(darwin, qqline(difference) )
Data on lung capacity of 654 children and adolescents.
Description
These data come from Rosner (2006), and represent forced expiratory volume (FEV) in l/s and several covariates.
Usage
data(fev)
Format
A data frame with 654 observations on the following 6 variables:
Id
an identification code
Age
age in years
FEV
forced expiratory volume in l/s
Hgt
height in inches
Sex
gender: 0 for female, 1 for male
Smoke
smokes (1) or not (0)
Source
Rosner, B. 2006. Fundamentals of Biostatistics. 6th ed. Duxbury Press.
Examples
data(fev)
splom(fev[c(3, 2, 4, 5, 6)], main="fev data")
Simon Newcomb's measurements of the speed of light
Description
Simon Newcom's measured in the late 1900's the time it took light to cover a certain distance. The data are reported in Stigler(1977) and have been widely used since to illustrate statistical inference.
Usage
data(lightspeed)
Format
A vector with 66 observations of the travel time of light.
Source
Stigler, S.M. (1977) Do robust estimators work with real data? Annals of Statistics 5, 1055–1098.
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
data(lightspeed)
qqnorm(lightspeed)
qqline(lightspeed)
A dot plot is produced for several related models showing for each model its LOOIC-value with its credible interval.
Description
The LOOIC-value (like the non-Bayesian AIC-value) is a useful measure of model performance for model prediction.
Usage
looicplot(looiclist, modnames, perc = 90)
Arguments
looiclist |
a list of character-valued names of rstanarm model objects |
modnames |
a character-valued vector of model names for each of the models |
perc |
the percentage credibility for the credible intervals (defaults to 90%) |
Value
None provided, but a printed list of looic-values, their standard errors, and credible intervals, and a dot plot with the same information are produced.
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
See Also
Examples
## Not run:
data(budworm)
Mbudworm1 <- stan_glm(formula = cbind(dead, 20 - dead) ~ ldose,
family = binomial, data = budworm,
prior = student_t(df = 7),
prior_intercept = student_t(df = 7))
Mbudworm2 <- stan_glm(formula = cbind(dead, 20 - dead) ~ ldose * sex,
family = binomial, data = budworm,
prior = student_t(df = 7),
prior_intercept = student_t(df = 7))
Mbudworm3 <- stan_glm(formula = cbind(dead, 20 - dead) ~ ldose + sex,
family = binomial, data = budworm,
prior = student_t(df = 7),
prior_intercept = student_t(df = 7))
looicplot(looiclist = list("Mbudworm1", "Mbudworm2", "Mbudworm3"),
modnames = c("~ ldose", "~ ldose + sex", "~ ldose * sex") )
## End(Not run)
Plots a simple strip chart of the observations with group means and grand mean.
Description
A strip chart of the first argument grouped by the second argument is produced. This function is useful for looking at experimental data with a numeric response and a factorial predictor.
Usage
meansplot(y, grp)
Arguments
y |
a vector of observed values |
grp |
a factor of the same length as the observation vector indicating the treatment under which each observation was obtained |
Value
none returned: the function is used for the plot it produces
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
See Also
Examples
data(PlantGrowth)
with(PlantGrowth, meansplot(weight, group))
produces a Normality plot for the argument surrounded by eight other Normality plots for Normal distributions having the same mean and standard deviation as the argument
Description
Normality plots can be hard to judge if one is not experienced. This function plots a Normality plot for the data surrounded by eight other Normality plots for samples with the same mean and standard deviation that were randomly generated. The eight plots provide an idea of the variability to be expected in Normally distributed data.
Usage
nineplot(x)
Arguments
x |
a vector of observations to be examined for Normality |
Value
none produced: the function is used for the plot it produces
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
See Also
Examples
nineplot(rt(100, 2))
A robust comparison of the location and the scale of the input vector.
Description
A large sample of Normal-distributed data with more than 10% of the observations further than 1.5 times the IQR from the median shows signs of overdispersion, as recommended in Gelman et al., 2014.
Usage
overdispersionCheck(x)
Arguments
x |
an input vector of reals without missing values |
Value
The function prints the approximate percentage of observations that are further from the median than would be expected in a normal distribution.
Author(s)
Robert van Hulst
References
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., and Rubin, D.B. 2014. Bayesian Data Analysis. Third Ed.. CRC Press
Examples
overdispersionCheck(rt(100, 1))
Conversion of a frequentist p-value to the lower bound of the Bayes factor against the null hypothesis assuming equal odds of the null and the alternative
Description
This function computes the approximate lower bound to the Bayes factor of the null hypothesis against the alternative, assuming equal odds of the null and the alternatlve.
Usage
p2BF(p)
Arguments
p |
the frequentist p-value (which has to be less than 1/e or 0.37) |
Value
the approximate lower bound of the Bayes factor of the null hypothesis against the alternative
Note
the p-value should be less than 1/e (= 0.37).
Author(s)
Robert van Hulst
References
Sellke, T., Bayarri, M.J., and Berger, J.O. 2001. Calibration of p Values for Testing Precise Hypotheses. Am. Statistician 55(1) pp 62–71.
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
See Also
Examples
p2BF(p = 0.05)
Conversion of a frequentist p-value to a lower bound of the posterior probability that the null hypothesis is true assuming equal odds of the null and the alternative
Description
This function computes the approximate lower bound to the posterior probability of the null hypothesis assuming equal odds of the null and the alternative. See Sellke et al.(2001) for the derivation, and note that the posterior probability of the null hypothesis is what many incorrectly assume the p-value is measuring.
Usage
p2minpp(p)
Arguments
p |
the frequentist p-value (which has to be less than 1/e or 0.37) |
Value
the approximate lower bound of the posterior probability of the null hypothesis
Note
the p-value should be less than 1/e (0.37).
Author(s)
Robert van Hulst
References
Sellke, T., Bayarri, M.J., and Berger, J.O. 2001. Calibration of p Values for Testing Precise Hypotheses. Am. Statistician 55(1) pp 62–71.
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
See Also
Examples
p2minpp(p=0.05)
Carcinogenesis data on rats exposed to a carcinogen.
Description
Up to forty eight rats were exposed to the carcinogen retinyl acetate or to a placebo in their diet, after which the number of tumors they developed was evaluated.
Usage
data(rats)
Format
A data frame with 71 observations on the following 2 variables:
y
number of rats that developed tumors
N
number of rats in group
Source
Gail, M.H., Santner, T.J., and Brown, C.C. (1980) An analysis of comparative carcinogenesis experiments based on multiple times to tumor. Biometrics 36, 255–266.
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Universal Fisherian significance test with confidence interval.
Description
Given a critical value alpha, this function performs a Fisherian significance test of the null hypothesis at level p, reports the result of the test, as well as the lower and upper values of the corresponding confidence interval. See Kadane(2016) for the idea for this.
Usage
sigtestCI(p)
Arguments
p |
the desired significance level |
Details
Note that this function does not require any data: if a rare (as long as p is sufficiently small) event occurs, H[0] is deemed to be implausible, and rejected. If such an event does not occur, we can simply try to do the experiment again. A Neyman-Pearson hypothesis test does require data and also an alternative hypothesis. For a NP hypothesis test we can (and should) consider the power of the test (the probability of rejecting H[0] when H[a] is true).
Value
A message informing the user if H0 was rejected or not and the lower and upper boundaries of the corresponding confidence interval.
Author(s)
Robert van Hulst
References
Kadane, J.B. 2016. Beyond hypothesis testing. Entropy 18, 199.
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
sigtestCI(p=0.05)
Conversion of 2 props input to 2x2 contingency table
Description
This function converts the successes and totals vectors required as input for function B2props to a 2x2 contingency table for input to CTA or Bft2x2.
Usage
sn2ft2x2(s, n)
Arguments
s |
a vector of length 2 of successes |
n |
a vector of length 2 of numbers of trials |
Value
a 2 x 2 contingency table equivalent to the two arguments
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
Examples
sn2ft2x2(c(47, 59), c(120, 125))
Plotting routine for dataframes of looic values.
Description
Produces a dotchart with error bars as summary of a dataframe with model names (‘modnames’), LOOIC-values (‘looic’), standard errors (‘se’), lower values (‘lwr’), and upper values (‘upr’) .
Usage
sumchart(df, rownames, groups, perc)
Arguments
df |
data.frame name |
rownames |
model names |
groups |
row names |
perc |
the percentage of credibility desired |
Value
A plot is produced.
Author(s)
Robert van Hulst
References
van Hulst, R. 2018. Evaluating Scientific Evidence. ms.
See Also
weight gain in rats
Description
Rats were fed diets with different quantities of protein from either animal or plant sources. The weight gained at the end of the experiment was the response variable.
Usage
data("weightgain")
Format
A data frame with 40 observations on the following 3 variables
source
source of protein given, a factor with levels
Beef
andCereal
type
amount of protein given, a factor with levels
High
andLow
weightgain
weight gain in grams
Source
Hand, D.J., Daly, F., Lunn, A.D., McConway, K.J. and Ostrowski, E. 1994. A Handbook of Small Datasets, Chapman and Hall, London.
Examples
data("weightgain")
with(weightgain, table(source, type))