Type: | Package |
Title: | Multi-Trait and Multi-Trial Genome Wide Association Studies |
Version: | 1.0.3 |
Date: | 2025-07-23 |
Description: | Fast multi-trait and multi-trail Genome Wide Association Studies (GWAS) following the method described in Zhou and Stephens. (2014), <doi:10.1038/nmeth.2848>. One of a series of statistical genetic packages for streamlining the analysis of typical plant breeding experiments developed by Biometris. |
License: | GPL-3 |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Depends: | R (≥ 4.0) |
Imports: | data.table, foreach, Rcpp, sommer (≥ 4.2.0), statgenGWAS (≥ 1.0.9) |
Suggests: | covr, knitr, rmarkdown, tinytest |
VignetteBuilder: | knitr |
LinkingTo: | Rcpp, RcppArmadillo |
LazyData: | true |
NeedsCompilation: | yes |
Packaged: | 2025-07-23 08:16:53 UTC; rossu027 |
Author: | Bart-Jan van Rossum
|
Maintainer: | Bart-Jan van Rossum <bart-jan.vanrossum@wur.nl> |
Repository: | CRAN |
Date/Publication: | 2025-07-23 08:50:02 UTC |
statgenQTLxT: Multi-Trait and Multi-Trial Genome Wide Association Studies
Description
Fast multi-trait and multi-trail Genome Wide Association Studies (GWAS) following the method described in Zhou and Stephens. (2014), doi:10.1038/nmeth.2848. One of a series of statistical genetic packages for streamlining the analysis of typical plant breeding experiments developed by Biometris.
Author(s)
Maintainer: Bart-Jan van Rossum bart-jan.vanrossum@wur.nl (ORCID)
Authors:
Willem Kruijer (ORCID)
Other contributors:
Fred van Eeuwijk (ORCID) [contributor]
Martin Boer (ORCID) [contributor]
Daniela Bustos-Korts (ORCID) [contributor]
Emilie J Millet (ORCID) [contributor]
Joao Paulo (ORCID) [contributor]
Maikel Verouden (ORCID) [contributor]
Ron Wehrens (ORCID) [contributor]
Choazhi Zheng (ORCID) [contributor]
Factor analytic variation of EM algoritm
Description
Implementation of the factor analytic variation of the EM algoritm as proposed by Dahl et al. (2013).
Usage
EMFA(
y,
k,
size_param_x = NULL,
cmHet = TRUE,
dmHet = TRUE,
tolerance = 1e-06,
maxIter = 300L,
size_param_cmStart = NULL,
size_param_dmStart = NULL,
mG = 1L,
mE = 1L,
maxDiag = 10000,
stopIfDecreasing = TRUE,
traits = ""
)
Arguments
y |
An n x p matrix of observed phenotypes, on p traits or environments for n individuals. No missing values are allowed. |
k |
An n x n kinship matrix. |
size_param_x |
An n x c covariate matrix, c being the number of covariates and n being the number of genotypes. c has to be at least one (typically an intercept). No missing values are allowed. If not provided a vector of 1s is used. |
cmHet |
Should an extra diagonal part be added in the model for the precision matrix Cm? |
dmHet |
Should an extra diagonal part be added in the model for the precision matrix Dm? |
tolerance |
A numerical value. The iterating process stops if the difference in conditional log-likelihood between two consecutive iterations drops below tolerance. |
maxIter |
A numerical value for the maximum number of iterations. |
size_param_cmStart |
A p x p matrix containing starting values for the precision matrix Cm. |
size_param_dmStart |
A p x p matrix containing starting values for the precision matrix Dm. |
mG |
An integer. The order of the genetic part of the model. |
mE |
An integer. The order of the environmental part of the model. |
maxDiag |
A numical value. The maximal value of the diagonal elements in the precision matrices Cm and Dm (ignoring the low-rank part W W^t) |
stopIfDecreasing |
Should the iterating process stop if after 50 iterations the log-likelihood decreases between two consecutive iterations? |
Value
A list containing the following components
Vg
The genetic variance components matrix.Ve
The environmental variance components matrix.
References
Dahl et al. (2013). Network inference in matrix-variate Gaussian models with non-independent noise. arXiv preprint arXiv:1312.1622.
Zhou, X. and Stephens, M. (2014). Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nature Methods, February 2014, Vol. 11, p. 407–409
Compute tYPY as in Zhou and Stephens eqn. 50.
Description
Compute t(y) * P * y
, part of the log-likelihood functions from
equation 26 and 27 in Zhou and Stephens using equation 50. Equation 56, 57
and 58 are used to do the actual computations.
Usage
LLQuadFormDiagCPP(y, vInv, size_param_x = NULL)
Arguments
vInv |
A n x p x p cube containing for each genotype l the
p x p matrix |
size_param_x |
An optional c x n covariate matrix, c being the number of covariates and n being the number of genotypes. c has to be at least one (typically an intercept). No missing values are allowed. |
Details
It is assumed that X and Y have already been rotated by Uk, where Uk is such
that the kinship matrix K equals K = Uk * Dk * t(Uk)
.
The original X and Y are right multiplied by Uk, e.g. Y <- Y * Uk
.
See Zhou and Stephens (2014), supplement.
It is these rotated versions that are the input of this function.
Value
A numerical value for the t(y) * P * y
part of the
log-likelihood function.
References
Zhou, X. and Stephens, M. (2014). Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nature Methods, February 2014, Vol. 11, p. 407–409
Compute unstructured covariance
Description
Compute unstructured covariance pairwise using covPairwise
or using a
single model using covUnstr
.
Usage
covUnstr(Y, K, X = NULL, fixDiag = FALSE, VeDiag = FALSE)
covPW(Y, K, X = NULL, fixDiag = FALSE, corMat = FALSE, parallel = FALSE)
Arguments
Y |
An n x p matrix of observed phenotypes, on p traits or trials for n individuals. No missing values are allowed. |
K |
An n x n kinship matrix. |
X |
An n x c covariate matrix, c being the number of covariates and n being the number of genotypes. |
fixDiag |
Should the diagonal of the covariate matrix be fixed during calculations? – NOT YET IMPLEMENTED |
VeDiag |
Should Ve be a diagonal matrix? |
corMat |
Should the output be a correlation matrix instead of a covariance matrix? |
parallel |
Should the computation of variance components be done in parallel? |
Value
A list of two matrices Vg
and Ve
containing genotypic
and environmental variance components respectively.
Helper function for estimating effect
Description
Function is a wrapper around estEffsCPP, useful for usage in chromosome specific calculations.
Usage
estEffTot(
markers,
X,
Y,
K,
XRed,
Vg,
Ve,
VgRed,
VeRed,
snpCov,
allFreq,
MAF,
estCom,
nCores = NULL
)
Estimates for covariates
Description
Compute the estimates and standard errors for the covariates in the input matrix W.
Usage
estEffsCPP(
y0,
w0,
x0,
vg,
ve,
k,
returnSe = TRUE,
estCom = FALSE,
nCores = NULL
)
Arguments
y0 |
An n x p matrix of observed phenotypes, on p traits or environments for n genotypes. No missing values are allowed. |
w0 |
An n x c covariate matrix, c being the number of covariates and n being the number of genotypes. c has to be at least one (typically an intercept). No missing values are allowed. |
x0 |
An n x ns matrix of marker scores. Neither missing values nor non-segregating markers are allowed. |
vg |
A p x p matrix of genetic covariances. |
ve |
A p x p matrix of environmental covariances. |
k |
An n x n genetic relatedness matrix. |
returnSe |
Should standard errors and p-values be returned? |
estCom |
Should the common SNP-effect model be fitted? |
nCores |
An integer indicating the number of cores used for parallel computation. |
Value
A list containing the estimates, optionally the standard errors of
the estimates and corresponding p-values. If estCom = TRUE
also
common SNP-effects, their standard errors and corresponding p-values and
the p-values for QtlxE are output.
References
Zhou, X. and Stephens, M. (2014). Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nature Methods, February 2014, Vol. 11, p. 407–409
Select markers to be excluded from GWAS scan.
Description
Helper function for selecting markers to be excluded from GWAS scan. Markers are excluded if they are identical to any of the snpCovariates (including the snpCovariates themselves).
Usage
exclMarkers(snpCov, markers, allFreq, ref = NULL)
Arguments
snpCov |
A character vector of snpCovariates. |
markers |
A matrix with marker information. |
allFreq |
A numerical vector of allele frequencies of the markers in
|
Value
A numerical vector of markers to be exluded from the GWAS scan.
Subset of DROPS data for use in examples
Description
A gData
object based on a subset of the DROPS data
set used in the statgenGWAS package. The data is restricted to 3 traits and
10% (just over 4000) of the available markers. For a full description of the
data set see dropsData
.
Usage
gDataDropsRestr
Format
An object of class gData
of length 5.
Source
References
Millet, E. J., Pommier, C., et al. (2019). A multi-site experiment in a network of European fields for assessing the maize yield response to environmental scenarios - Data set. doi:10.15454/IASSTN
Ganal MW, et al. (2011) A Large Maize (Zea mays L.) SNP Genotyping Array: Development and Germplasm Genotyping, and Genetic Mapping to Compare with the B73 Reference Genome. PLoS ONE 6(12): e28334. doi:10.1371/journal.pone.0028334
Perform multi-trait GWAS
Description
runMultiTraitGwas
performs multi-trait or multi-environment Genome
Wide Association mapping on phenotypic and genotypic data contained in a
gData
object.
Usage
runMultiTraitGwas(
gData,
trials = NULL,
traits = NULL,
covar = NULL,
snpCov = NULL,
kin = NULL,
kinshipMethod = c("astle", "IBS", "vanRaden", "identity"),
GLSMethod = c("single", "multi"),
estCom = FALSE,
useMAF = TRUE,
MAF = 0.01,
MAC = 10,
genomicControl = FALSE,
fitVarComp = TRUE,
covModel = c("unst", "pw", "fa"),
VeDiag = TRUE,
maxIter = 2e+05,
mG = 1,
mE = 1,
Vg = NULL,
Ve = NULL,
thrType = c("bonf", "fixed", "small", "fdr"),
alpha = 0.05,
LODThr = 4,
nSnpLOD = 10,
pThr = 0.05,
rho = 0.4,
sizeInclRegion = 0,
minR2 = 0.5,
parallel = FALSE,
nCores = NULL
)
Arguments
gData |
An object of class |
trials |
A vector specifying the environment on which to run GWAS.
This can be either a numeric index or a character name of a list item in
|
traits |
A vector of traits on which to run GWAS. These can be either
numeric indices or character names of columns in |
covar |
An optional vector of covariates taken into account when
running GWAS. These can be either numeric indices or character names of
columns in |
snpCov |
An optional character vector of SNP-names to be included as
covariates. SNP-names should match those used in |
kin |
An optional kinship matrix or list of kinship matrices. These
matrices can be from the |
kinshipMethod |
An optional character indicating the method used for
calculating the kinship matrix(ces). Currently "astle" (Astle and Balding,
2009), "IBS", "vanRaden" (VanRaden, 2008), and "identity" are supported.
If a kinship matrix is supplied either in |
GLSMethod |
A character string indicating the method used to estimate
the marker effects. Either |
estCom |
Should the common SNP-effect model be fitted? If |
useMAF |
Should the minor allele frequency be used for selecting SNPs
for the analysis. If |
MAF |
The minor allele frequency (MAF) threshold used in GWAS. A
numerical value between 0 and 1. SNPs with MAF below this value are not taken
into account in the analysis, i.e. p-values and effect sizes are put to
missing ( |
MAC |
A numerical value. SNPs with minor allele count below this value
are not taken into account for the analysis, i.e. p-values and effect sizes
are set to missing ( |
genomicControl |
Should genomic control correction as in Devlin and Roeder (1999) be applied? |
fitVarComp |
Should the variance components be fitted? If |
covModel |
A character string indicating the covariance model for the
genetic background (Vg) and residual effects (Ve); see details.
Either |
VeDiag |
Should there be environmental correlations if covModel = "unst"
or "pw"? If traits are measured on the same individuals, put |
maxIter |
An integer for the maximum number of iterations. Only used
when |
mG |
An integer. The order of the genetic part of the factor analytic
model. Only used when |
mE |
An integer. The order of the environmental part of the factor
analytic model. Only used when |
Vg |
An optional matrix with genotypic variance components. |
Ve |
An optional matrix with environmental variance components.
|
thrType |
A character string indicating the type of threshold used for
the selection of candidate loci. Either |
alpha |
A numerical value used for calculating the LOD-threshold for
|
LODThr |
A numerical value used as a LOD-threshold when
|
nSnpLOD |
A numerical value indicating the number of SNPs with the
smallest p-values that are selected when |
pThr |
A numerical value just as the cut off value for p-Values for
|
rho |
A numerical value used a the minimum value for SNPs to be
considered correlated when using |
sizeInclRegion |
An integer. Should the results for SNPs close to significant SNPs be included? If so, the size of the region in centimorgan or base pairs. Otherwise 0. |
minR2 |
A numerical value between 0 and 1. Restricts the SNPs included
in the region close to significant SNPs to only those SNPs that are in
sufficient Linkage Disequilibrium (LD) with the significant snp, where LD
is measured in terms of |
parallel |
Should the computation of variance components be done in
parallel? Only used if |
nCores |
A numerical value indicating the number of cores to be used by
the parallel part of the algorithm. If |
Value
An object of class GWAS
.
Details
runMultiTraitGwas estimates the effect of a SNP in different trials or on different traits, one SNP at a time. Genetic and residual covariances are fitted only once, for a model without SNPs. Following the diagonalization scheme of Zhou and Stephens (2014), the following model is fit
Y = \left(\begin{array}{c} Y_1 \\ \vdots \\ Y_p\end{array}\right) =
\left(\begin{array}{c} X_1\gamma_1 \\ \vdots \\ X_p\gamma_p\end{array}\right)
+ \left(\begin{array}{c} x_1\beta_1 \\ \vdots \\ x_p\beta_p\end{array}\right)
+ \left(\begin{array}{c} G_1 \\ \vdots \\ G_p\end{array}\right) +
\left(\begin{array}{c} E_1 \\ \vdots \\ E_p\end{array}\right)
where Y
is a np \times 1
vector of phenotypic values for n
genotypes and p
traits or trials. x
is the n \times 1
vector of scores for the marker under consideration, and X
the
n \times q
design matrix for the other covariates. By default only a
trait (environment) specific intercept is included. The vector of genetic
background effects
(\left(\begin{array}{c}G_1 \\ \vdots \\ G_p\end{array}\right)
) is
Gaussian with zero mean and covariance V_g \otimes K
, where V_g
is a p \times p
matrix of genetic (co)variances, and K
an
n \times n
kinship matrix. Similarly, the residual errors
(\left(\begin{array}{c}E_1 \\ \vdots \\ E_p\end{array}\right)
)
have covariance V_e \otimes I_n
, for a p \times p
matrix
V_e
of residual (co)variances.
Hypotheses for the SNP-effects
For each SNP, the null-hypothesis \beta_1 = \dots = \beta_p = 0
is
tested, using the likelihood ratio test (LRT) described in Zhou and
Stephens (2014). If estCom = TRUE
, additional tests for a common
effect and for QTL x E are performed, using the parameterization
\beta_j = \alpha + \alpha_j (1 \leq j \leq p)
.
As in Korte et al (2012), we use likelihood ratio tests, but not restricted
to the bivariate case. For the common effect, we fit the reduced
model \beta_j = \alpha
, and test if \alpha = 0
.
For QTL-by-environment interaction, we test if \alpha_1 = \dots =
\alpha_p = 0
.
Models for the genetic and residual covariance
V_g
and V_e
can be provided by the user
(fitVarComp = FALSE
);
otherwise one of the following models is used, depending on covModel.
If covModel = "unst"
, an unstructured model is assumed, as in Zhou and
Stephens (2014): V_g
and V_e
can be any positive-definite matrix,
requiring a total of p(p + 1)/2
parameters per matrix.
If covModel = "fa"
, a factor-analytic model is fitted using an
EM-algorithm, as in Millet et al (2016). V_g
and V_e
are assumed
to be of the form W W^t + D
, where W
is a p \times m
matrix
of factor loadings and D
a diagonal matrix with trait or environment
specific values. m
is the order of the model, and the parameters
mG
and mE
specify the order used for respectively V_g
and V_e
. maxIter
sets the maximum number of iterations used
in the EM-algorithm.
Finally, if covModel = "pw"
, V_g
and V_e
are estimated
'pairwise', as in Furlotte and Eskin (2015). Looping over pairs of traits
or trials 1 \leq j < k \leq p
,
V_g[j,k] = V_g[k,j]
and V_e[j,k] = V_e[k,j]
are estimated assuming a bivariate mixed model. The diagonals of
V_g
and V_e
are fitted assuming univariate mixed models. If the
resulting V_g
or V_e
is not positive-definite, they are
replaced by the nearest positive-definite matrix.
In case covModel = "unst"
or "pw"
it is possible to assume
that V_e
is diagonal (VeDiag = TRUE
)
References
Dahl et al. (2013). Network inference in matrix-variate Gaussian models with non-independent noise. arXiv preprint arXiv:1312.1622.
Furlotte, N.A. and Eskin, E. (2015). Efficient multiple-trait association and estimation of genetic correlation using the matrix-variate linear mixed model. Genetics, May 2015, Vol.200-1, p. 59-68.
Korte et al. (2012). A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nature Genetics, 44(9), 1066–1071. doi:10.1038/ng.2376
Millet et al. (2016). Genome-wide analysis of yield in Europe: allelic effects as functions of drought and heat scenarios. Plant Physiology, pp.00621.2016. doi:10.1104/pp.16.00621
Thoen et al. (2016). Genetic architecture of plant stress resistance: multi-trait genome-wide association mapping. New Phytologist, 213(3), 1346–1362. doi:10.1111/nph.14220
Zhou, X. and Stephens, M. (2014). Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nature Methods, February 2014, Vol. 11, p. 407–409.
Examples
## First create a gData object.
## See the vignette for a detailed description.
## Here we use the gData object included in the package
## Run multi-trait GWAS
## Use a factor analytic model to estimate variance components.
mtg0 <- runMultiTraitGwas(gDataDropsRestr,
trial = "Mur13W",
covModel = "fa")
## Plot the results.
## For details on the different plots see plot.GWAS
plot(mtg0, plotType = "qq")
plot(mtg0, plotType = "manhattan")
plot(mtg0, plotType = "qtl", yThr = 3.5)
## Run multi-trait GWAS
## Use a pairwise model to estimate variance components.
## Estimate common effects and set a fixed threshold for significant SNPs
mtg1 <- runMultiTraitGwas(gDataDropsRestr,
trial = "Mur13W",
covModel = "pw",
estCom = TRUE,
thrType = "fixed",
LODThr = 3)
## Run multi-trait GWAS
## Use an unstructured model to estimate variance components.
## Identify the 5 SNPs with smallest p-values as significant SNPs.
## Compute the kinship matrix using the vanRaden method.
mtg2 <- runMultiTraitGwas(gDataDropsRestr,
trial = "Mur13W",
kinshipMethod = "vanRaden",
covModel = "unst",
thrType = "small",
nSnpLOD = 5)
Update W and P in EMFA algorithm
Description
Update W and P used in the iteration process in the EMFA algorithm.
Usage
updateFA(
y,
wStart,
pStart,
wNew,
pNew,
m0,
hetVar = FALSE,
maxDiag = 10000,
tolerance = 1e-04,
maxIter = 100L,
printProgress = FALSE
)
Arguments
y |
An n x p matrix or data.frame. |
wStart |
A p x p matrix or data.frame containing starting values for W. |
pStart |
A p x p matrix or data.frame containing starting values for P. |
m0 |
An integer. The order of the model. |
hetVar |
Should an extra diagonal part be added in the model for the precision matrix? |
maxDiag |
A numerical value for the maximum value of the diagonal of P. |
tolerance |
A numerical value. The iterating process stops if the sum of the difference for P and W between two steps gets lower than this value. |
maxIter |
A numerical value for the maximum number of iterations. |
printProgress |
Should progress be printed during iterations? |
Update W and P in EMFA algorithm for homogeneous variance.
Description
Update W and P used in the iteration process in the EMFA algorithm in case the variance is homogeneous.
Usage
updateFAHomVar(s, wNew, pNew, m, maxDiag = 10000)
Arguments
s |
A p x p sample covariance matrix. |
m |
An integer. The order of the model. |
maxDiag |
A numerical value for the maximum value of sigma2. |
Helper function for updating precision matrix.
Description
Helper function for updating the precision matrices in the EMFA algorithm.
Usage
updatePrec(m, nc, omega, w, p, wNew, pNew, cNew, het, maxDiag)
Arguments
m |
An integer, the order of the model. |
nc |
An integer, the number of traits or genotypes. |
omega |
A computed matrix for the current step in the algoritm. |
w |
A model matrix for the current step in the algorithm. |
p |
A model matrix for the current step in the algorithm. |
wNew |
A pointer to the updated model matrix for w. |
pNew |
A pointer to the updated model matrix for p. |
cNew |
A pointer to the updated matrix c. |
het |
Should an extra diagonal part be added in the model for the precision matrix. |
maxDiag |
A numerical value for the maximum value of sigma2. |