Type: | Package |
Title: | Multiscale Graph Correlation |
Version: | 2.0.2 |
Date: | 2020-06-20 |
Maintainer: | Eric Bridgeford <ericwb95@gmail.com> |
Description: | Multiscale Graph Correlation (MGC) is a framework developed by Vogelstein et al. (2019) <doi:10.7554/eLife.41690> that extends global correlation procedures to be multiscale; consequently, MGC tests typically require far fewer samples than existing methods for a wide variety of dependence structures and dimensionalities, while maintaining computational efficiency. Moreover, MGC provides a simple and elegant multiscale characterization of the potentially complex latent geometry underlying the relationship. |
Depends: | R (≥ 3.4.0) |
Imports: | stats, MASS, abind, boot, energy, raster |
URL: | https://github.com/neurodata/r-mgc |
Suggests: | testthat (≥ 2.1.0), ggplot2, reshape2, knitr, rmarkdown |
License: | GPL-2 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.0.2 |
VignetteBuilder: | knitr |
NeedsCompilation: | yes |
Packaged: | 2020-06-22 22:51:13 UTC; eric |
Author: | Eric Bridgeford [aut, cre], Censheng Shen [aut], Shangsi Wang [aut], Joshua Vogelstein [ths] |
Repository: | CRAN |
Date/Publication: | 2020-06-23 12:50:18 UTC |
Connected Components Labelling – Unique Patch Labelling
Description
ConnCompLabel
is a 1 pass implementation of connected components
labelling. Here it is applied to identify disjunt patches within a
distribution.
The raster matrix can be a raster of class 'asc'
(adehabitat package), 'RasterLayer' (raster package) or
'SpatialGridDataFrame' (sp package).
Usage
ConnCompLabel(mat)
Arguments
mat |
is a binary matrix of data with 0 representing background and 1 representing environment of interest. NA values are acceptable. The matrix can be a raster of class 'asc' (this & adehabitat package), 'RasterLayer' (raster package) or 'SpatialGridDataFrame' (sp package) |
Value
A matrix of the same dim and class of mat
in which unique
components (individual patches) are numbered 1:n with 0 remaining background
value.
Author(s)
Jeremy VanDerWal jjvanderwal@gmail.com
References
Chang, F., C.-J. Chen, and C.-J. Lu. 2004. A linear-time component-labeling algorithm using contour tracing technique. Comput. Vis. Image Underst. 93:206-220.
Examples
#define a simple binary matrix
tmat = { matrix(c( 0,0,0,1,0,0,1,1,0,1,
0,0,1,0,1,0,0,0,0,0,
0,1,NA,1,0,1,0,0,0,1,
1,0,1,1,1,0,1,0,0,1,
0,1,0,1,0,1,0,0,0,1,
0,0,1,0,1,0,0,1,1,0,
1,0,0,1,0,0,1,0,0,1,
0,1,0,0,0,1,0,0,0,1,
0,0,1,1,1,0,0,0,0,1,
1,1,1,0,0,0,0,0,0,1),nr=10,byrow=TRUE) }
#do the connected component labelling
ccl.mat = ConnCompLabel(tmat)
ccl.mat
image(t(ccl.mat[10:1,]),col=c('grey',rainbow(length(unique(ccl.mat))-1)))
An auxiliary function that properly transforms the distance matrix X
Description
An auxiliary function that properly transforms the distance matrix X
Usage
DistCentering(X, option, optionRk)
Arguments
X |
is a symmetric distance matrix |
option |
is a string that specifies which global correlation to build up-on, including 'mgc','dcor','mantel', and 'rank' |
optionRk |
is a string that specifies whether ranking within column is computed or not. |
Value
A list contains the following:
A |
is the centered distance matrices |
RX |
is the column rank matrices of X. |
An auxiliary function that sorts the entries within each column by ascending order: For ties, the minimum ranking is used, e.g. if there are repeating distance entries, the order is like 1,2,3,3,4,..,n-1.
Description
An auxiliary function that sorts the entries within each column by ascending order: For ties, the minimum ranking is used, e.g. if there are repeating distance entries, the order is like 1,2,3,3,4,..,n-1.
Usage
DistRanks(dis)
Arguments
dis |
is a symmetric distance matrix. |
Value
disRank
is the column rank matrices of X
.
An auxiliary function that computes all local correlations simultaneously in O(n^2).
Description
An auxiliary function that computes all local correlations simultaneously in O(n^2).
Usage
LocalCov(A, B, RX, RY)
Arguments
A |
is a properly transformed distance matrix |
B |
is the second distance matrix properly transformed |
RX |
is the column-ranking matrix of A |
RY |
is the column-ranking matrix of B. |
Value
covXY is all local covariances computed iteratively.
An auxiliary function that finds the smoothed maximal within the significant region R: If area of R is too small, return the last local corr otherwise take the maximum within R.
Description
An auxiliary function that finds the smoothed maximal within the significant region R: If area of R is too small, return the last local corr otherwise take the maximum within R.
Usage
Smoothing(localCorr, m, n, R)
Arguments
localCorr |
is all local correlations |
m |
is the number of rows of localCorr |
n |
is the number of columns of localCorr |
R |
is a binary matrix of size m by n indicating the significant region. |
Value
A list contains the following:
stat |
is the sample MGC statistic within |
optimalScale |
the estimated optimal scale as a list. |
Author(s)
C. Shen
An auxiliary function that finds a region of significance in the local correlation map by thresholding.
Description
An auxiliary function that finds a region of significance in the local correlation map by thresholding.
Usage
Thresholding(localCorr, m, n, sz)
Arguments
localCorr |
is all local correlations |
m |
is the number of rows of localCorr |
n |
is the number of columns of localCorr |
sz |
is the sample size of original data (which may not equal m or n in case of repeating data). |
Value
R is a binary matrix of size m and n, with 1's indicating the significant region.
Author(s)
Eric Bridgeford and C. Shen
Discriminability Mean Normalized Rank
Description
Discriminability Mean Normalized Rank
Usage
discr.mnr(rdf)
Arguments
rdf |
the reliability densities. |
Value
the mnr.
Reliability Density Function
Description
A function for computing the reliability density function of a dataset.
Usage
discr.rdf(X, ids)
Arguments
X |
|
ids |
|
Value
[n]
vector of the reliability per sample.
Author(s)
Eric Bridgeford
Discriminability Cross Simulation
Description
A function to simulate data with the same mean that spreads as class id increases.
Usage
discr.sims.cross(
n,
d,
K,
signal.scale = 10,
non.scale = 1,
mean.scale = 0,
rotate = FALSE,
class.equal = TRUE,
ind = FALSE
)
Arguments
n |
the number of samples. |
d |
the number of dimensions. |
K |
the number of classes in the dataset. |
signal.scale |
the scaling for the signal dimension. Defaults to |
non.scale |
the scaling for the non-signal dimensions. Defaults to |
mean.scale |
whether the magnitude of the difference in the means between the two classes.
If a mean scale is requested, |
rotate |
whether to apply a random rotation. Defaults to |
class.equal |
whether the number of samples/class should be equal, with each
class having a prior of 1/K, or inequal, in which each class obtains a prior
of k/sum(K) for k=1:K. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
Author(s)
Eric Bridgeford
Examples
library(mgc)
sim <- discr.sims.cross(100, 3, 2)
Discriminability Exponential Simulation
Description
A function to simulate multi-class data with an Exponential class-mean trend.
Usage
discr.sims.exp(
n,
d,
K,
signal.scale = 1,
signal.lshift = 1,
non.scale = 1,
rotate = FALSE,
class.equal = TRUE,
ind = FALSE
)
Arguments
n |
the number of samples. |
d |
the number of dimensions. The first dimension will be the signal dimension; the remainders noise. |
K |
the number of classes in the dataset. |
signal.scale |
the scaling for the signal dimension. Defaults to |
signal.lshift |
the location shift for the signal dimension between the classes. Defaults to |
non.scale |
the scaling for the non-signal dimensions. Defaults to |
rotate |
whether to apply a random rotation. Defaults to |
class.equal |
whether the number of samples/class should be equal, with each
class having a prior of 1/K, or inequal, in which each class obtains a prior
of k/sum(K) for k=1:K. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
Author(s)
Eric Bridgeford
Discriminability Spread Simulation
Description
A function to simulate data with the same mean that spreads as class id increases.
Usage
discr.sims.fat_tails(
n,
d,
K,
signal.scale = 1,
rotate = FALSE,
class.equal = TRUE,
ind = FALSE
)
Arguments
n |
the number of samples. |
d |
the number of dimensions. |
K |
the number of classes in the dataset. |
signal.scale |
the scaling for the signal dimension. Defaults to |
rotate |
whether to apply a random rotation. Defaults to |
class.equal |
whether the number of samples/class should be equal, with each
class having a prior of 1/K, or inequal, in which each class obtains a prior
of k/sum(K) for k=1:K. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
Author(s)
Eric Bridgeford
Examples
library(mgc)
sim <- discr.sims.fat_tails(100, 3, 2)
Discriminability Linear Simulation
Description
A function to simulate multi-class data with a linear class-mean trend. The signal dimension is the dimension carrying all of the between-class difference, and the non-signal dimensions are noise.
Usage
discr.sims.linear(
n,
d,
K,
signal.scale = 1,
signal.lshift = 1,
non.scale = 1,
rotate = FALSE,
class.equal = TRUE,
ind = FALSE
)
Arguments
n |
the number of samples. |
d |
the number of dimensions. The first dimension will be the signal dimension; the remainders noise. |
K |
the number of classes in the dataset. |
signal.scale |
the scaling for the signal dimension. Defaults to |
signal.lshift |
the location shift for the signal dimension between the classes. Defaults to |
non.scale |
the scaling for the non-signal dimensions. Defaults to |
rotate |
whether to apply a random rotation. Defaults to |
class.equal |
whether the number of samples/class should be equal, with each
class having a prior of 1/K, or inequal, in which each class obtains a prior
of k/sum(K) for k=1:K. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
Author(s)
Eric Bridgeford
Discriminability Radial Simulation
Description
A function to simulate data with the same mean with radial symmetry as class id increases.
Usage
discr.sims.radial(
n,
d,
K,
er.scale = 0.1,
r = 1,
class.equal = TRUE,
ind = FALSE
)
Arguments
n |
the number of samples. |
d |
the number of dimensions. |
K |
the number of classes in the dataset. |
er.scale |
the scaling for the error of the samples. Defaults to |
r |
the radial spacing between each class. Defaults to |
class.equal |
whether the number of samples/class should be equal, with each
class having a prior of 1/K, or inequal, in which each class obtains a prior
of k/sum(K) for k=1:K. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
Author(s)
Eric Bridgeford
Examples
library(mgc)
sim <- discr.sims.radial(100, 3, 2)
Discriminability Statistic
Description
A function for computing the discriminability from a distance matrix and a set of associated labels.
Usage
discr.stat(
X,
Y,
is.dist = FALSE,
dist.xfm = mgc.distance,
dist.params = list(method = "euclidean"),
dist.return = NULL,
remove.isolates = TRUE
)
Arguments
X |
is interpreted as:
|
Y |
|
is.dist |
a boolean indicating whether your |
dist.xfm |
if |
dist.params |
a list of trailing arguments to pass to the distance function specified in |
dist.return |
the return argument for the specified
|
remove.isolates |
remove isolated samples from the dataset. Isolated samples are samples with only
one instance of their class appearing in the |
Value
A list containing the following:
discr |
the discriminability statistic. |
rdf |
the rdfs for each sample. |
Details
For more details see the help vignette:
vignette("discriminability", package = "mgc")
Author(s)
Eric Bridgeford
References
Eric W. Bridgeford, et al. "Optimal Decisions for Reference Pipelines and Datasets: Applications in Connectomics." Bioarxiv (2019).
Examples
sim <- discr.sims.linear(100, 10, K=2)
X <- sim$X; Y <- sim$Y
discr.stat(X, Y)$discr
Discriminability One Sample Permutation Test
Description
A function that performs a one-sample test for whether the discriminability differs from random chance.
Usage
discr.test.one_sample(
X,
Y,
is.dist = FALSE,
dist.xfm = mgc.distance,
dist.params = list(method = "euclidean"),
dist.return = NULL,
remove.isolates = TRUE,
nperm = 500,
no_cores = 1
)
Arguments
X |
is interpreted as:
|
Y |
|
is.dist |
a boolean indicating whether your |
dist.xfm |
if |
dist.params |
a list of trailing arguments to pass to the distance function specified in |
dist.return |
the return argument for the specified
|
remove.isolates |
remove isolated samples from the dataset. Isolated samples are samples with only
one instance of their class appearing in the |
nperm |
the number of permutations to perform. Defaults to |
no_cores |
the number of cores to use for permutation test. Defaults to |
Value
A list containing the following:
stat |
the discriminability of the data. |
null |
the discriminability scores under the null, computed via permutation. |
p.value |
the pvalue associated with the permutation test. |
Details
Performs a test of whether an observed discriminability is significantly different from chance, as described in Bridgeford et al. (2019).
With \hat D_X
the sample discriminability of X
:
H_0: D_X = D_0
and:
H_A: D_X > D_0
where D_0
is the discriminability that would be observed by random chance.
Author(s)
Eric Bridgeford
References
Eric W. Bridgeford, et al. "Optimal Decisions for Reference Pipelines and Datasets: Applications in Connectomics." Bioarxiv (2019).
Examples
## Not run:
require(mgc)
n = 100; d=5
# simulation with a large difference between the classes
# meaning they are more discriminable
sim <- discr.sims.linear(n=n, d=d, K=2, signal.lshift=10)
X <- sim$X; Y <- sim$Y
# p-value is small
discr.test.one_sample(X, Y)$p.value
## End(Not run)
Discriminability Two Sample Permutation Test
Description
A function that takes two sets of paired data and tests of whether or not the data is more, less, or non-equally discriminable between the set of paired data.
Usage
discr.test.two_sample(
X1,
X2,
Y,
dist.xfm = mgc.distance,
dist.params = list(method = "euclidian"),
dist.return = NULL,
remove.isolates = TRUE,
nperm = 500,
no_cores = 1,
alt = "greater"
)
Arguments
X1 |
is interpreted as a |
X2 |
is interpreted as a |
Y |
|
dist.xfm |
if |
dist.params |
a list of trailing arguments to pass to the distance function specified in |
dist.return |
the return argument for the specified
|
remove.isolates |
remove isolated samples from the dataset. Isolated samples are samples with only
one instance of their class appearing in the |
nperm |
the number of permutations for permutation test. Defualts to |
no_cores |
the number of cores to use for the permutations. Defaults to |
alt |
the alternative hypothesis. Can be that first dataset is more discriminable ( |
Value
A list containing the following:
stat |
the observed test statistic. the test statistic is the difference in discriminability of X1 vs X2. |
discr |
the discriminabilities for each of the two data sets, as a list. |
null |
the null distribution of the test statistic, computed via permutation. |
p.value |
The p-value associated with the test. |
alt |
The alternative hypothesis for the test. |
Details
A function that performs a two-sample test for whether the discriminability is different for that of
one dataset vs another, as described in Bridgeford et al. (2019). With \hat D_{X_1}
the sample discriminability of one approach, and \hat D_{X_2}
the sample discriminability of another approach:
H_0: D_{X_1} = D_{X_2}
and:
H_A: D_{X_1} > D_{X_2}
.
Also implemented are tests of <
and \neq
.
Author(s)
Eric Bridgeford
References
Eric W. Bridgeford, et al. "Optimal Decisions for Reference Pipelines and Datasets: Applications in Connectomics." Bioarxiv (2019).
Examples
## Not run:
require(mgc)
require(MASS)
n = 100; d=5
# generate two subjects truths; true difference btwn
# subject 1 (column 1) and subject 2 (column 2)
mus <- cbind(c(0, 0), c(1, 1))
Sigma <- diag(2) # dimensions are independent
# first dataset X1 contains less noise than X2
X1 <- do.call(rbind, lapply(1:dim(mus)[2],
function(k) {mvrnorm(n=50, mus[,k], 0.5*Sigma)}))
X2 <- do.call(rbind, lapply(1:dim(mus)[2],
function(k) {mvrnorm(n=50, mus[,k], 2*Sigma)}))
Y <- do.call(c, lapply(1:2, function(i) rep(i, 50)))
# X1 should be more discriminable, as less noise
discr.test.two_sample(X1, X2, Y, alt="greater")$p.value # p-value is small
## End(Not run)
Discriminability Utility Validator
Description
A script that validates that data inputs are correct, and returns a distance matrix and a ids vector.
Usage
discr.validator(
X,
Y,
is.dist = FALSE,
dist.xfm = mgc.distance,
dist.params = list(method = "euclidean"),
dist.return = NULL,
remove.isolates = TRUE
)
Arguments
X |
is interpreted as:
|
Y |
is interpreted as:
|
is.dist |
a boolean indicating whether your |
dist.xfm |
if |
dist.params |
a list of trailing arguments to pass to the distance function specified in |
dist.return |
the return argument for the specified
|
remove.isolates |
whether to remove isolated samples, or samples with only a single instance in the |
Value
A list containing the following:
DX |
The X distance matrix, as a |
Y |
The sample ids, with isolates removed. |
A helper function to generate a d-dimensional linear transformation matrix.
Description
A helper function to generate a d-dimensional linear transformation matrix.
Usage
gen.coefs(d)
Arguments
d |
the number of dimensions. |
Value
A [d]
the coefficient vector.
Author(s)
Eric Bridgeford
A helper function for simulating sample labels
Description
A helper function for simulating sample labels
Usage
gen.sample.labels(K, class.equal = TRUE)
Arguments
K |
the number of classes |
class.equal |
whether the number of samples/class should be equal, with each
class having a prior of 1/K, or inequal, in which each class obtains a prior
of k/sum(K) for k=1:K. Defaults to |
A helper function to generate n samples of a d-dimensional uniform vector.
Description
A helper function to generate n samples of a d-dimensional uniform vector.
Usage
gen.x.unif(n, d, a = -1, b = 1)
Arguments
n |
the number of samples. |
d |
the number of dimensions. |
a |
the lower limit. |
b |
the upper limit. |
x |
|
Author(s)
Eric Bridgeford
Distance Matrix Validator
Description
A utility to validate a distance matrix.
Usage
mgc.dist.validator(
X,
is.dist = FALSE,
dist.xfm = mgc.distance,
dist.params = list(method = "euclidean"),
dist.return = NULL
)
Arguments
X |
is interpreted as:
|
is.dist |
a boolean indicating whether your |
dist.xfm |
if |
dist.params |
a list of trailing arguments to pass to the distance function specified in |
dist.return |
the return argument for the specified
|
Value
A distance matrix.
Author(s)
Eric Bridgeford
MGC Distance Transform
Description
Transform the distance matrices, with column-wise ranking if needed.
Usage
mgc.dist.xfm(X, Y, option = "mgc", optionRk = TRUE)
Arguments
X |
|
Y |
|
option |
is a string that specifies which global correlation to build up-on. Defaults to
|
optionRk |
is a string that specifies whether ranking within column is computed or not. If |
Value
A list containing the following:
A |
|
B |
|
RX |
|
RY |
|
Author(s)
C. Shen
Examples
library(mgc)
n=200; d=2
data <- mgc.sims.linear(n, d)
Dx <- as.matrix(dist(data$X), nrow=n); Dy <- as.matrix(dist(data$Y), nrow=n)
dt <- mgc.dist.xfm(Dx, Dy)
Distance
Description
A function that returns a distance matrix given a collection of observations.
Usage
mgc.distance(X, method = "euclidean")
Arguments
X |
|
method |
the method for computing distances. Defaults to |
Value
a [n x n]
distance matrix indicating the pairwise distances between all samples passed in.
Author(s)
Eric Bridgeford
MGC K Sample Testing
Description
MGC K Sample Testing provides a wrapper for MGC Sample testing under the constraint that the Ys here are categorical labels with K possible sample ids. This function uses a 0-1 loss for the Ys (one-hot-encoding)).
Usage
mgc.ksample(X, Y, mgc.opts = list(), ...)
Arguments
X |
is interpreted as:
|
Y |
|
mgc.opts |
Arguments to pass to MGC, as a named list. See |
... |
trailing args. |
Value
A list containing the following:
p.value |
P-value of MGC |
stat |
is the sample MGC statistic within |
pLocalCorr |
P-value of the local correlations by double matrix index |
localCorr |
the local correlations |
optimalScale |
the optimal scale identified by MGC |
Author(s)
Eric Bridgeford
References
Youjin Lee, et al. "Network Dependence Testing via Diffusion Maps and Distance-Based Correlations." ArXiv (2019).
Examples
## Not run:
library(mgc)
library(MASS)
n = 100; d = 2
# simulate 100 samples, where first 50 have mean [0,0] and second 50 have mean [1,1]
Y <- c(replicate(n/2, 0), replicate(n/2, 1))
X <- do.call(rbind, lapply(Y, function(y) {
return(rnorm(d) + y)
}))
# p value is small
mgc.ksample(X, Y, mgc.opts=list(nperm=100))$p.value
## End(Not run)
MGC Local Correlations
Description
Compute all local correlation coefficients in O(n^2 log n)
Usage
mgc.localcorr(
X,
Y,
is.dist.X = FALSE,
dist.xfm.X = mgc.distance,
dist.params.X = list(method = "euclidean"),
dist.return.X = NULL,
is.dist.Y = FALSE,
dist.xfm.Y = mgc.distance,
dist.params.Y = list(method = "euclidean"),
dist.return.Y = NULL,
option = "mgc"
)
Arguments
X |
is interpreted as:
|
Y |
is interpreted as:
|
is.dist.X |
a boolean indicating whether your |
dist.xfm.X |
if |
dist.params.X |
a list of trailing arguments to pass to the distance function specified in |
dist.return.X |
the return argument for the specified
|
is.dist.Y |
a boolean indicating whether your |
dist.xfm.Y |
if |
dist.params.Y |
a list of trailing arguments to pass to the distance function specified in |
dist.return.Y |
the return argument for the specified
|
option |
is a string that specifies which global correlation to build up-on. Defaults to
|
Value
A list contains the following:
corr |
consists of all local correlations within [-1,1] by double matrix index |
varX |
contains all local variances for X. |
varY |
contains all local variances for X. |
Author(s)
C. Shen
Examples
library(mgc)
n=200; d=2
data <- mgc.sims.linear(n, d)
lcor <- mgc.localcorr(data$X, data$Y)
Driver for MGC Local Correlations
Description
Driver for MGC Local Correlations
Usage
mgc.localcorr.driver(DX, DY, option = "mgc")
Arguments
DX |
the first distance matrix. |
DY |
the second distance matrix. |
option |
is a string that specifies which global correlation to build up-on. Defaults to
|
Value
A list contains the following:
corr |
consists of all local correlations within [-1,1] by double matrix index |
varX |
contains all local variances for X. |
varY |
contains all local variances for X. |
Author(s)
C. Shen
Sample from Unit 2-Ball
Description
Sample from the 2-ball in d-dimensions.
Usage
mgc.sims.2ball(n, d, r = 1, cov.scale = 0)
Arguments
n |
the number of samples. |
d |
the number of dimensions. |
r |
the radius of the 2-ball. Defaults to |
cov.scale |
if desired, sample from 2-ball with error sigma. Defaults to |
Value
the points sampled from the ball, as a [n, d]
array.
Author(s)
Eric Bridgeford
Examples
library(mgc)
# sample 100 points from 3-d 2-ball with radius 2
X <- mgc.sims.2ball(100, 3, 2)
Sample from Unit 2-Sphere
Description
Sample from the 2-sphere in d-dimensions.
Usage
mgc.sims.2sphere(n, d, r, cov.scale = 0)
Arguments
n |
the number of samples. |
d |
the number of dimensions. |
r |
the radius of the 2-ball. Defaults to |
cov.scale |
if desired, sample from 2-ball with error sigma. Defaults to |
Value
the points sampled from the sphere, as a [n, d]
array.
Author(s)
Eric Bridgeford
Examples
library(mgc)
# sample 100 points from 3-d 2-sphere with radius 2
X <- mgc.sims.2sphere(100, 3, 2)
Cubic Simulation
Description
A function for Generating a cubic simulation.
Usage
mgc.sims.cubic(
n,
d,
eps = 80,
ind = FALSE,
a = -1,
b = 1,
c.coef = c(-12, 48, 128),
s = 1/3
)
Arguments
n |
the number of samples for the simulation. |
d |
the number of dimensions for the simulation setting. |
eps |
the noise level for the simulation. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
a |
the lower limit for the range of the data matrix. Defaults to |
b |
the upper limit for the range of the data matrix. Defaults to |
c.coef |
the coefficients for the cubic function, where the first value is the first order coefficient, the second value the quadratic coefficient, and the third the cubic coefficient. Defaults to |
s |
the scaling for the center of the cubic. Defaults to |
Value
a list containing the following:
X |
|
Y |
|
Details
Given: w_i = \frac{1}{i}
is a weight-vector that scales with the dimensionality.
Simulates n
points from Linear(X, Y) \in \mathbf{R}^d \times \mathbf{R}
, where:
X \sim {U}(a, b)^d
Y = c_3\left(w^TX - s\right)^3 + c_2\left(w^TX - s\right)^2 + c_1\left(w^TX - s\right) + \kappa \epsilon
and \kappa = 1\textrm{ if }d = 1, \textrm{ and 0 otherwise}
controls the noise for higher dimensions.
Author(s)
Eric Bridgeford
Examples
library(mgc)
result <- mgc.sims.cubic(n=100, d=10) # simulate 100 samples in 10 dimensions
X <- result$X; Y <- result$Y
Exponential Simulation
Description
A function for Generating an exponential simulation.
Usage
mgc.sims.exp(n, d, eps = 10, ind = FALSE, a = 0, b = 3)
Arguments
n |
the number of samples for the simulation. |
d |
the number of dimensions for the simulation setting. |
eps |
the noise level for the simulation. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
a |
the lower limit for the range of the data matrix. Defaults to |
b |
the upper limit for the range of the data matrix. Defaults to |
Value
a list containing the following:
X |
|
Y |
|
Details
Given: w_i = \frac{1}{i}
is a weight-vector that scales with the dimensionality.
Simulates n
points from Linear(X, Y) \in \mathbf{R}^d \times \mathbf{R}
, where:
X \sim {U}(a, b)^d
Y = e^{w^TX} + \kappa \epsilon
and \kappa = 1\textrm{ if }d = 1, \textrm{ and 0 otherwise}
controls the noise for higher dimensions.
Author(s)
Eric Bridgeford
Examples
library(mgc)
result <- mgc.sims.exp(n=100, d=10) # simulate 100 samples in 10 dimensions
X <- result$X; Y <- result$Y
Joint Normal Simulation
Description
A function for Generating a joint-normal simulation.
Usage
mgc.sims.joint(n, d, eps = 0.5)
Arguments
n |
the number of samples for the simulation. |
d |
the number of dimensions for the simulation setting. |
eps |
the noise level for the simulation. Defaults to |
Value
a list containing the following:
X |
|
Y |
|
Details
Given: \rho = \frac{1}{2}d
, I_d
is the identity matrix of size d \times d
, J_d
is the matrix of ones of size d \times d
.
Simulates n
points from Joint-Normal(X, Y) \in \mathbf{R}^d \times \mathbf{R}^d
, where:
(X, Y) \sim {N}(0, \Sigma)
,
\Sigma = \left[I_d, \rho J_d; \rho J_d , (1 + \epsilon\kappa)I_d\right]
and \kappa = 1\textrm{ if }d = 1, \textrm{ and 0 otherwise}
controls the noise for higher dimensions.
For more details see the help vignette:
vignette("sims", package = "mgc")
Author(s)
Eric Bridgeford
Examples
library(mgc)
result <- mgc.sims.joint(n=100, d=10) # simulate 100 samples in 10 dimensions
X <- result$X; Y <- result$Y
Linear Simulation
Description
A function for Generating a linear simulation.
Usage
mgc.sims.linear(n, d, eps = 1, ind = FALSE, a = -1, b = 1)
Arguments
n |
the number of samples for the simulation. |
d |
the number of dimensions for the simulation setting. |
eps |
the noise level for the simulation. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
a |
the lower limit for the range of the data matrix. Defaults to |
b |
the upper limit for the range of the data matrix. Defaults to |
Value
a list containing the following:
X |
|
Y |
|
Details
Given: w_i = \frac{1}{i}
is a weight-vector that scales with the dimensionality.
Simulates n
points from Linear(X, Y) \in \mathbf{R}^d \times \mathbf{R}
, where:
X \sim {U}(a, b)^d
Y = w^TX + \kappa \epsilon
and \kappa = 1\textrm{ if }d = 1, \textrm{ and 0 otherwise}
controls the noise for higher dimensions.
Author(s)
Eric Bridgeford
Examples
library(mgc)
result <- mgc.sims.linear(n=100, d=10) # simulate 100 samples in 10 dimensions
X <- result$X; Y <- result$Y
Quadratic Simulation
Description
A function for Generating a quadratic simulation.
Usage
mgc.sims.quad(n, d, eps = 0.5, ind = FALSE, a = -1, b = 1)
Arguments
n |
the number of samples for the simulation. |
d |
the number of dimensions for the simulation setting. |
eps |
the noise level for the simulation. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
a |
the lower limit for the data matrix. Defaults to |
b |
the upper limit for the data matrix. Defaults to |
Value
a list containing the following:
X |
|
Y |
|
Details
Given: w_i = \frac{1}{i}
is a weight-vector that scales with the dimensionality.
Simulates n
points from Quadratic(X, Y) \in \mathbf{R}^d \times \mathbf{R}
where:
X \sim {U}(a, b)^d
,
Y = (w^TX)^2 + \kappa\epsilon N(0, 1)
and \kappa = 1\textrm{ if }d = 1, \textrm{ and 0 otherwise}
controls the noise for higher dimensions.
For more details see the help vignette:
vignette("sims", package = "mgc")
Author(s)
Eric Bridgeford
Examples
library(mgc)
result <- mgc.sims.quad(n=100, d=10) # simulate 100 samples in 10 dimensions
X <- result$X; Y <- result$Y
Random Rotation
Description
A helper function for applying a random rotation to gaussian parameter set.
Usage
mgc.sims.random_rotate(mus, Sigmas, Q = NULL)
Arguments
mus |
means per class. |
Sigmas |
covariances per class. |
Q |
rotation to use, if any |
Author(s)
Eric Bridgeford
Sample Random Rotation
Description
A helper function for estimating a random rotation matrix.
Usage
mgc.sims.rotation(d)
Arguments
d |
dimensions to generate a rotation matrix for. |
Value
the rotation matrix
Author(s)
Eric Bridgeford
GMM Simulate
Description
A helper function for simulating from Gaussian Mixture.
Usage
mgc.sims.sim_gmm(mus, Sigmas, n, priors)
Arguments
mus |
|
Sigmas |
|
n |
the number of examples. |
priors |
|
Value
A list with the following:
X |
|
Y |
|
priors |
|
Author(s)
Eric Bridgeford
Spiral Simulation
Description
A function for Generating a spiral simulation.
Usage
mgc.sims.spiral(n, d, eps = 0.4, a = 0, b = 5)
Arguments
n |
the number of samples for the simulation. |
d |
the number of dimensions for the simulation setting. |
eps |
the noise level for the simulation. Defaults to |
a |
the lower limit for the data matrix. Defaults |
b |
the upper limit for the data matrix. Defaults to |
Value
a list containing the following:
X |
|
Y |
|
Details
Given: U \sim U(a, b)
a random variable.
Simumlates n
points from Spiral(X, Y) \in \mathbf{R}^d \times \mathbf{R}
where:
X_i = U\, \textrm{cos}(\pi\, U)^d
if i = d
, and U\, \textrm{sin}(\pi U)\textrm{cos}^i(\pi U)
otherwise
Y = U\, \textrm{sin}(\pi\, U) + \epsilon p N(0, 1)
For more details see the help vignette:
vignette("sims", package = "mgc")
Author(s)
Eric Bridgeford
Examples
library(mgc)
result <- mgc.sims.spiral(n=100, d=10) # simulate 100 samples in 10 dimensions
X <- result$X; Y <- result$Y
Step Function Simulation
Description
A function for Generating a step function simulation.
Usage
mgc.sims.step(n, d, eps = 1, ind = FALSE, a = -1, b = 1)
Arguments
n |
the number of samples for the simulation. |
d |
the number of dimensions for the simulation setting. |
eps |
the noise level for the simulation. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
a |
the lower limit for the data matrix. Defaults to |
b |
the upper limit for the data matrix. Defaults to |
Value
a list containing the following:
X |
|
Y |
|
Details
Given: w_i = \frac{1}{i}
is a weight-vector that scales with the dimensionality.
Simulates n
points from Step(X, Y) \in \mathbf{R}^d\times \mathbf{R}
where:
X \sim {U}\left(a, b\right)^d
,
Y = \mathbf{I}\left\{w^TX > 0\right\} + \kappa \epsilon N(0, 1)
and \kappa = 1\textrm{ if }d = 1, \textrm{ and 0 otherwise}
controls the noise for higher dimensions.
For more details see the help vignette:
vignette("sims", package = "mgc")
Author(s)
Eric Bridgeford
Examples
library(mgc)
result <- mgc.sims.step(n=100, d=10) # simulate 100 samples in 10 dimensions
X <- result$X; Y <- result$Y
Uncorrelated Bernoulli Simulation
Description
A function for Generating an uncorrelated bernoulli simulation.
Usage
mgc.sims.ubern(n, d, eps = 0.5, p = 0.5)
Arguments
n |
the number of samples for the simulation. |
d |
the number of dimensions for the simulation setting. |
eps |
the noise level for the simulation. Defaults to |
p |
the bernoulli probability. |
Value
a list containing the following:
X |
|
Y |
|
Details
Given: w_i = \frac{1}{i}
is a weight-vector that scales with the dimensionality.
Simumlates n
points from Wshape(X, Y) \in \mathbf{R}^d \times \mathbf{R}
where:
U \sim Bern(p)
X \sim Bern\left(p\right)^d + \epsilon N(0, I_d)
Y = (2U - 1)w^TX + \epsilon N(0, 1)
For more details see the help vignette:
vignette("sims", package = "mgc")
Author(s)
Eric Bridgeford
Examples
library(mgc)
result <- mgc.sims.ubern(n=100, d=10) # simulate 100 samples in 10 dimensions
X <- result$X; Y <- result$Y
W Shaped Simulation
Description
A function for Generating a W-shaped simulation.
Usage
mgc.sims.wshape(n, d, eps = 0.5, ind = FALSE, a = -1, b = 1)
Arguments
n |
the number of samples for the simulation. |
d |
the number of dimensions for the simulation setting. |
eps |
the noise level for the simulation. Defaults to |
ind |
whether to sample x and y independently. Defaults to |
a |
the lower limit for the data matrix. Defaults |
b |
the upper limit for the data matrix. Defaults to |
Value
a list containing the following:
X |
|
Y |
|
Details
Given: w_i = \frac{1}{i}
is a weight-vector that scales with the dimensionality.
Simumlates n
points from W-shape(X, Y) \in \mathbf{R}^d \times \mathbf{R}
where:
U \sim {U}(a, b)^d
,
X \sim {U}(a, b)^d
,
Y = \left[\left((w^TX)^2 - \frac{1}{2}\right)^2 + \frac{w^TU}{500}\right] + \kappa \epsilon N(0, 1)
and \kappa = 1\textrm{ if }d = 1, \textrm{ and 0 otherwise}
controls the noise for higher dimensions.
For more details see the help vignette:
vignette("sims", package = "mgc")
Author(s)
Eric Bridgeford
Examples
library(mgc)
result <- mgc.sims.wshape(n=100, d=10) # simulate 100 samples in 10 dimensions
X <- result$X; Y <- result$Y
MGC Test
Description
The main function that computes the MGC measure between two datasets: It first computes all local correlations, then use the maximal statistic among all local correlations based on thresholding.
Usage
mgc.stat(
X,
Y,
is.dist.X = FALSE,
dist.xfm.X = mgc.distance,
dist.params.X = list(method = "euclidean"),
dist.return.X = NULL,
is.dist.Y = FALSE,
dist.xfm.Y = mgc.distance,
dist.params.Y = list(method = "euclidean"),
dist.return.Y = NULL,
option = "mgc"
)
Arguments
X |
is interpreted as:
|
Y |
is interpreted as:
|
is.dist.X |
a boolean indicating whether your |
dist.xfm.X |
if |
dist.params.X |
a list of trailing arguments to pass to the distance function specified in |
dist.return.X |
the return argument for the specified
|
is.dist.Y |
a boolean indicating whether your |
dist.xfm.Y |
if |
dist.params.Y |
a list of trailing arguments to pass to the distance function specified in |
dist.return.Y |
the return argument for the specified
|
option |
is a string that specifies which global correlation to build up-on. Defaults to
|
Value
A list containing the following:
stat |
is the sample MGC statistic within |
localCorr |
the local correlations |
optimalScale |
the optimal scale identified by MGC |
option |
specifies which global correlation was used |
Author(s)
C. Shen and Eric Bridgeford
References
Joshua T. Vogelstein, et al. "Discovering and deciphering relationships across disparate data modalities." eLife (2019).
Examples
library(mgc)
n=200; d=2
data <- mgc.sims.linear(n, d)
mgc.stat.res <- mgc.stat(data$X, data$Y)
MGC Sample Statistic Internal Driver
Description
MGC Sample Statistic Internal Driver
Usage
mgc.stat.driver(DX, DY, option = "mgc")
Arguments
DX |
the first distance matrix. |
DY |
the second distance matrix. |
option |
is a string that specifies which global correlation to build up-on. Defaults to
|
MGC Permutation Test
Description
Test of Dependence using MGC Approach.
Usage
mgc.test(
X,
Y,
is.dist.X = FALSE,
dist.xfm.X = mgc.distance,
dist.params.X = list(method = "euclidean"),
dist.return.X = NULL,
is.dist.Y = FALSE,
dist.xfm.Y = mgc.distance,
dist.params.Y = list(method = "euclidean"),
dist.return.Y = NULL,
nperm = 1000,
option = "mgc",
no_cores = 1
)
Arguments
X |
is interpreted as:
|
Y |
is interpreted as:
|
is.dist.X |
a boolean indicating whether your |
dist.xfm.X |
if |
dist.params.X |
a list of trailing arguments to pass to the distance function specified in |
dist.return.X |
the return argument for the specified
|
is.dist.Y |
a boolean indicating whether your |
dist.xfm.Y |
if |
dist.params.Y |
a list of trailing arguments to pass to the distance function specified in |
dist.return.Y |
the return argument for the specified
|
nperm |
specifies the number of replicates to use for the permutation test. Defaults to |
option |
is a string that specifies which global correlation to build up-on. Defaults to
|
no_cores |
the number of cores to use for the permutations. Defaults to |
Value
A list containing the following:
p.value |
P-value of MGC |
stat |
is the sample MGC statistic within |
p.localCorr |
P-value of the local correlations by double matrix index. |
localCorr |
the local correlations |
optimalScale |
the optimal scale identified by MGC |
option |
specifies which global correlation was used |
Details
A test of independence using the MGC approach, described in Vogelstein et al. (2019). For X \sim F_X
, Y \sim F_Y
:
H_0: F_X \neq F_Y
and:
H_A: F_X = F_Y
Note that one should avoid report positive discovery via minimizing individual p-values of local correlations, unless corrected for multiple hypotheses.
For details on usage see the help vignette:
vignette("mgc", package = "mgc")
Author(s)
Eric Bridgeford and C. Shen
References
Joshua T. Vogelstein, et al. "Discovering and deciphering relationships across disparate data modalities." eLife (2019).
Examples
## Not run:
library(mgc)
n = 100; d = 2
data <- mgc.sims.linear(n, d)
# note: on real data, one would put nperm much higher (at least 100)
# nperm is set to 10 merely for demonstration purposes
result <- mgc.test(data$X, data$Y, nperm=10)
## End(Not run)
MGC Utility Validator
Description
A script that validates that data inputs are correct, and returns a X distance and Y distance matrix for MGC.
Usage
mgc.validator(
X,
Y,
is.dist.X = FALSE,
dist.xfm.X = mgc.distance,
dist.params.X = list(method = "euclidean"),
dist.return.X = NULL,
is.dist.Y = FALSE,
dist.xfm.Y = mgc.distance,
dist.params.Y = list(method = "euclidean"),
dist.return.Y = NULL
)
Arguments
X |
is interpreted as:
|
Y |
|
is.dist.X |
a boolean indicating whether your |
dist.xfm.X |
if |
dist.params.X |
a list of trailing arguments to pass to the distance function specified in |
dist.return.X |
the return argument for the specified
|
is.dist.Y |
a boolean indicating whether your |
dist.xfm.Y |
if |
dist.params.Y |
a list of trailing arguments to pass to the distance function specified in |
dist.return.Y |
the return argument for the specified
|
Value
A list containing the following:
D |
The distance matrix, as a |
Y |
the sample ids, as a |
Remove Isolates
Description
A function to remove isolates from a dataset, given a data matrix or a distance matrix.
Usage
remove.isolates(X, Y, is.dist = FALSE)
Arguments
X |
is interpreted as:
|
Y |
|
is.dist |
a boolean indicating whether your |
Author(s)
Eric Bridgeford