Type: | Package |
Title: | Calculation of Covariance Between Markers for Half-Sib Families |
Version: | 0.4.2 |
Date: | 2021-04-13 |
Description: | The theoretical covariance between pairs of markers is calculated from either paternal haplotypes and maternal linkage disequilibrium (LD) or vise versa. A genetic map is required. Grouping of markers is based on the correlation matrix and a representative marker is suggested for each group. Employing the correlation matrix, optimal sample size can be derived for association studies based on a SNP-BLUP approach. The implementation relies on paternal half-sib families and biallelic markers. If maternal half-sib families are used, the roles of sire/dam are swapped. Multiple families can be considered. Wittenburg, Bonk, Doschoris, Reyer (2020) "Design of Experiments for Fine-Mapping Quantitative Trait Loci in Livestock Populations" <doi:10.1186/s12863-020-00871-1>. Carlson, Eberle, Rieder, Yi, Kruglyak, Nickerson (2004) "Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium" <doi:10.1086/381000>. |
Depends: | R (≥ 3.5.0) |
Imports: | parallel, Matrix, foreach, rlist, pwr |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.1.1 |
NeedsCompilation: | no |
Packaged: | 2021-04-13 06:30:46 UTC; wittenburg |
Author: | Dörte Wittenburg [aut, cre], Michael Doschoris [aut], Jan Klosa [ctb] |
Maintainer: | Dörte Wittenburg <wittenburg@fbn-dummerstorf.de> |
Repository: | CRAN |
Date/Publication: | 2021-04-13 07:20:06 UTC |
Correlation matrix of an autoregressive model of order 1
Description
An order-1 autoregressive correlation matrix is set up which is used for the examples on power and sample size calculation.
Usage
AR1(p, rho)
Arguments
p |
dimension |
rho |
correlation |
Value
(p x p) matrix
Examples
AR1(10, 0.2)
Calculation of covariance or correlation matrix
Description
The theoretical covariance between pairs of markers is calculated from either paternal haplotypes and maternal linkage disequilibrium (LD) or vise versa. A genetic map is required. The implementation relies on paternal half-sib families and biallelic markers such as single nucleotide polymorphisms (SNP). If parental haplotypes are incomplete (i.e., SNP alleles are missing), those parents will be discarded at the corresponding pairs of SNPs. If maternal half-sib families are used, the roles of sire/dam are swapped. Multiple families can be considered.
Usage
CovMat(linkMat, haploMat, nfam, pos_chr, map_fun = "haldane", corr = F)
Arguments
linkMat |
(p x p) matrix of maternal LD between pairs of p markers; matrix is block diagonal in case of multiple chromosomes and must not contain missing values; use zeros if LD is uncertain |
haploMat |
(2N x p) matrix of sires haplotypes for all chromosomes (2 lines per sire); coding with 0's and 1's reflecting reference and alternate alleles, respectively; missing values can be coded as NA or any integer but not 0 and 1 |
nfam |
vector (LEN N) containing number of progeny per family or scalar value in case of equal family size |
pos_chr |
list (LEN number of chromosomes) of vectors (LEN number of markers) of genetic positions in Morgan per chromosome |
map_fun |
character string of mapping function used; so far "haldane" (default) and "kosambi" are enabled |
corr |
logical; |
Value
list (LEN 2) of matrix (DIM p1
x p1
) and vector
(LEN p1
) with p1 \le p
K
covariance matrix OR
R
correlation matrix
valid.snps
vector of SNP indices considered for covariance/ correlation matrix
Note
Family size is used for weighting covariance terms in case of multiple half-sib families. It only matters if number of progeny differs.
If maternal haplotypes (H.mothers) are used instead of maternal LD (matLD), LD can be estimated from counting haplotype frequencies as:
matLD <- LDdam(inMat = H.mother, pos.chr)
If multiple chromosomes are considered, SNP positions are provided as, e.g.
pos.chr <- list(pos.snp.chr1, pos.snp.chr2, pos.snp.chr3)
References
Wittenburg, Bonk, Doschoris, Reyer (2020) Design of Experiments for Fine-Mapping Quantitative Trait Loci in Livestock Populations. BMC Genetics 21:66. doi: 10.1186/s12863-020-00871-1
Examples
### 1: INPUT DATA
data(testdata)
### 2: COVARIANCE/CORRELATION MATRIX
corrmat <- CovMat(matLD, H.sire, 100, pos.chr, corr = TRUE)
### 3: TAGSNPS FROM CORRELATION MATRIX
bin <- tagSNP(corrmat$R)
bin <- tagSNP(corrmat$R, 0.5)
Calculation of covariance matrices from maternal and paternal LD
Description
The covariance matrix is set as maternal plus paternal LD matrix where the paternal part is a weighted average of sire-specific LD matrices.
Usage
CovarMatrix(exp_freq_mat, LDDam, LDSire, Ns)
Arguments
exp_freq_mat |
[MATRIX] paternal EXPECTATION matrix |
LDDam |
[MATRIX] maternal Linkage Disequilibrium matrix |
LDSire |
[LIST] Linkage disequilibrium matrices for the sires; each element of the list corresponds to a family |
Ns |
[VECTOR] family size for each sire s |
Details
The internal suMM function works on lists!
Value
covK
(p x p) matrix of covariance between markers
Examples
data(testdata)
G <- Haplo2Geno(H.sire)
E <- ExpectMat(G)
LDfam2 <- LDsire(H.sire, pos.chr, family = 3:4)
LDfam3 <- LDsire(H.sire, pos.chr, family = 5:6)
## covariance matrix based on sires 2 and 3 only, each with 100 progeny
K <- CovarMatrix(E[2:3, ], LDDam = matLD, LDSire = list(LDfam2, LDfam3), Ns = c(100, 100))
Expected value of paternally inherited allele
Description
Expected value is +/-0.5 if sire is homozygous reference/ alternate allele or 0 if sire is heterozygous at the investigated marker
Usage
ExpectMat(inMat)
Arguments
inMat |
[MATRIX] The paternal genotype matrix |
Value
ExP.Fa
(N x p) matrix of expected values
Examples
data(testdata)
G <- Haplo2Geno(H.sire)
E <- ExpectMat(G)
testdata: sire haplotypes
Description
(2N x p) matrix of sire haplotypes for all chromosomes (2 lines per sire); unknown alleles are marked as 9
Usage
H.sire
Format
An object of class matrix
(inherits from array
) with 10 rows and 300 columns.
Conversion of haplotypes into genotypes
Description
Haplotypes are converted into into genotypes without checking for missing values.
Usage
Haplo2Geno(inpMat)
Arguments
inpMat |
[MATRIX] haplotype matrix (2 lines per individual) |
Value
outMa
(N x p) genotype matrix
Examples
data(testdata)
G <- Haplo2Geno(H.sire)
Calculation of maternal LD matrix
Description
Matrix containing linkage disequilibrium between marker pairs on maternal gametes is set up by counting haplotypes frequencies.
Usage
LDdam(inMat, pos_chr)
Arguments
inMat |
[MATRIX] The maternal HAPLOTYPE matrix. |
pos_chr |
[LIST] The marker positions in Morgan on chromosomes. |
Details
The function generates a block diagonal sparse matrix based on Matrix::bdiag. Use as.matrix() to obtain a regular one.
Value
Dd
(p x p) matrix of maternal LD
Examples
## haplotype matrix of n individuals at p SNPs
p <- 10; n <- 4
mat <- matrix(ncol = p, nrow = 2 * n, sample(c(0, 1), size = 2 * n * p, replace = TRUE))
LDdam(mat, list(1:p))
Calculation of paternal LD matrix
Description
Matrix containing linkage disequilibrium between marker pairs on paternal gametes is set up from sire haplotypes and genetic-map information for each half-sib family.
Usage
LDsire(inMat, pos_chr, family, map_fun = "haldane")
Arguments
inMat |
[MATRIX] Haplotype matrix for sires for all chromosomes. |
pos_chr |
[LIST] The marker positions in Morgan on chromosomes. |
family |
[VECTOR] Which family (sire) should be processed? Vector with consecutive entries of the form 1:2, 3:4, 5:6 and so on, linking to haplotypes (rows in inMat) of the corresponding sire |
map_fun |
["haldane" or "kosambi"] The mapping function applied. |
Details
The function generates a block diagonal sparse matrix based on Matrix::bdiag. Use as.matrix() to obtain a regular one.
Value
Ds
Ds
(p x p) matrix of paternal LD
Examples
data(testdata)
LDfam2 <- LDsire(H.sire, pos.chr, family = 3:4)
Variance of estimator
Description
Calculation of variance of estimator and residual degrees of freedom
Usage
calcvar(lambda, eigendec, n, weights = 1)
Arguments
lambda |
shrinkage parameter |
eigendec |
eigenvalue decomposition of (p x p) correlation matrix
|
n |
sample size |
weights |
vector (LEN p) of SNP-specific weights or scalar if weights are equal for all SNPs; default value 1 |
Details
The variance of estimator beta (regression coefficient of SNP-BLUP
approach) and the residual degrees of freedom are calculated based on the
eigenvalue decomposition of correlation matrix R
Value
df
residual degrees of freedom
var.beta
vector (LEN p) of variance of estimator beta up to a constant (i.e. residual variance / n)
Examples
### correlation matrix (should depend on sire haplotypes)
R <- AR1(100, rho = 0.1)
eigendec <- eigen(R)
out <- calcvar(1200, eigendec, 100)
Ratio of expected value to variance of estimator
Description
The ratio of expected value to standard deviation is calculated for the estimator of a selected regression coefficient.
Usage
coeff.beta.k(k, beta.true, lambda, eigendec, n, weights = 1)
Arguments
k |
index of selected regression coefficient |
beta.true |
(LEN p) vector of regression coefficients |
lambda |
shrinkage parameter |
eigendec |
eigenvalue decomposition of (p x p) correlation matrix
|
n |
sample size |
weights |
vector (LEN p) of SNP-specific weights or scalar if weights are equal for all SNPs; default value 1 |
Value
ratio
testdata: maternal linkage disequilibrium
Description
(p x p) matrix of maternal LD between pairs of p markers; matrix is block diagonal in case of multiple chromosomes
Usage
matLD
Format
An object of class matrix
(inherits from array
) with 300 rows and 300 columns.
testdata: genetic map positions
Description
list of vectors of genetic map positions per chromosome
Usage
pos.chr
Format
An object of class list
of length 1.
Probability under alternative hypothesis (power)
Description
Calculation of power is based on normal distribution.
At each selected QTL position, the probability of the corresponding
regression coefficient being different from zero is calculated using a
t-like test statistic which has normal distribution with mean
E(beta_k)/sqrt{Var(beta_k)}
and variance 1. Under the null hypothesis beta_k = 0
,
E(beta_k) = 0
.
Then, the mean value is returned as power.
Usage
pwr.normtest(R, n, betaSE, lambda, pos, weights = 1, alpha = 0.01)
Arguments
R |
(p x p) matrix containing theoretical correlation between SNP pairs |
n |
sample size |
betaSE |
effect size relative to residual standard deviation |
lambda |
shrinkage parameter |
pos |
vector (LEN nqtl) of SNP indices for assumed QTL positions |
weights |
weights vector (LEN p) of SNP-specific weights or scalar if weights are equal for all SNPs; default value 1 |
alpha |
type-I error level; default value 0.01 |
Value
result
mean power at selected QTL positions
h2.le
QTL heritability under linkage-equilibrium assumption
h2.ld
QTL heritability under linkage-disequilibrium assumption
References
Wittenburg, Bonk, Doschoris, Reyer (2020) Design of Experiments for Fine-Mapping Quantitative Trait Loci in Livestock Populations. BMC Genetics 21:66. doi: 10.1186/s12863-020-00871-1
Examples
### correlation matrix (should depend on sire haplotypes)
R <- AR1(100, rho = 0.1)
### positions of putative QTL signals
pos <- c(14, 75)
### power at given sample size and other parameters
pwr.normtest(R, 100, 0.35, 1200, pos)
Wrapper function for sample size calculation
Description
Given parameters specified by the experimenter, optimal sample
size is estimated by repeatedly applying search.best.n.bisection
.
Usage
pwr.snpblup(
nfathers,
nqtl,
h2,
R,
rep = 10,
nmax = 5000,
weights = 1,
typeII = 0.2,
alpha = 0.01
)
Arguments
nfathers |
number of half-sib families |
nqtl |
number of QTL assumed |
h2 |
heritability captured by QTL |
R |
(p x p) matrix containing theoretical correlation between SNP pairs |
rep |
number of repetitions; default value 10 |
nmax |
maximum value for grid search; default value 5000 |
weights |
vector (LEN p) of SNP-specific weights or scalar if weights are equal for all SNPs; default value 1 |
typeII |
type-II error level; default value 0.2 |
alpha |
type-I error level; default value 0.01 |
Details
Sample size depends on parameters specified by the experimenter (number of half-sib families, number of QTL, heritability, correlation matrix). These values are converted into parameters required for the probability density function under the alternative hypothesis (beta_k !=0, for k selected QTL positions). As power depends on the selected QTL positions, these are sampled at random and power calculations are repeated. Afterwards the mean value is a plausible estimate of optimal sample size.
Linear model for SNP-BLUP approach:
y = X beta + e
with t(beta) = (beta_1, ldots, beta_p)
Ridge approach:
hat{beta} = (Xt X + I lambda)^{-1} Xt y
The identity matrix I
can be replaced by a diagonal matrix
containing SNP-specific weights yielding a generalised ridge approach.
Value
vector of optimal sample size over all repetitions
References
Wittenburg, Bonk, Doschoris, Reyer (2020) Design of Experiments for Fine-Mapping Quantitative Trait Loci in Livestock Populations. BMC Genetics 21:66. doi: 10.1186/s12863-020-00871-1
Examples
### input parameters specified by experimenter
# number of half-sib families
nfathers <- 10
# number of assumed QTL
nqtl <- 2
# QTL heritability
h2 <- 0.2
### correlation matrix (should depend on sire haplotypes)
R <- AR1(100, rho = 0.1)
### optimal sample size in a multi-marker approach
set.seed(11)
pwr.snpblup(nfathers, nqtl, h2, R, rep = 1)
Method of bisection for estimating optimal sample size
Description
A grid [nstart, nmax]
for possible sample size is
considered. Instead of executing a time-consuming grid search, the method
of bisection is applied to this interval. For each step, the function
pwr.normtest
is called for the given set of parameters.
Usage
search.best.n.bisection(
R,
betaSE,
lambda,
pos,
nstart,
nmax,
weights = 1,
typeII = 0.2,
alpha = 0.01
)
Arguments
R |
(p x p) matrix containing theoretical correlation between SNP pairs |
betaSE |
effect size relative to residual standard deviation |
lambda |
shrinkage parameter |
pos |
vector (LEN nqtl) of SNP indices for assumed QTL positions |
nstart |
minimum value for grid search |
nmax |
maximum value for grid search |
weights |
vector (LEN p) of SNP-specific weights or scalar if weights are equal for all SNPs; default value 1 |
typeII |
type-II error level; default value 0.2 |
alpha |
type-I error level; default value 0.01 |
Value
integer of optimal sample size
Examples
### correlation matrix (should depend on sire haplotypes)
R <- AR1(100, rho = 0.1)
### positions of putative QTL signals
pos <- c(14, 75)
### optimal sample size
search.best.n.bisection(R, 0.35, 1200, pos, 10, 5000)
Calculation of effective number of independent tests
Description
Adapted simpleM method which considers theoretical correlation between SNP pairs instead of composite LD values. Principal component decomposition yields the effective number of independent tests. This value is needed for the Bonferroni correction of type-I error when testing SNP effects based on a single-marker model.
Usage
simpleM(mat, quant = 0.995)
Arguments
mat |
correlation matrix |
quant |
percentage cutoff, variation of SNP data explained by eigenvalues; default value 0.995 |
Value
effective number of independent tests
References
Gao, Starmer & Martin (2008) A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genetic Epidemiology 32:361-369.
Examples
### correlation matrix (should depend on sire haplotypes)
R <- AR1(100, rho = 0.1)
### effective number of tests
Meff <- simpleM(R)
### relative effect size given heritability and number of QTL signals
h2 <- 0.2
nqtl <- 2
betaSE <- sqrt(h2 / (nqtl - nqtl * h2))
### optimal sample size in a single-marker approach
pwr::pwr.t.test(d = betaSE, sig.level = 0.01 / Meff, power = 0.8,
alternative = "two.sided", type = "one.sample")
Start value for estimating optimal sample size
Description
Calculation of start value for estimating optimal sample size
Usage
startvalue(lambda, R, nfam, weights = 1)
Arguments
lambda |
shrinkage parameter |
R |
(p x p) matrix containing theoretical correlation between SNP pairs |
nfam |
number of half-sib families |
weights |
vector (LEN p) of SNP-specific weights or scalar if weights are equal for all SNPs; default value 1 |
Details
Minimum sample size that exceeds residual degrees of freedom; this value can be used as start value in grid search for optimal sample size
Value
start value
Examples
### correlation matrix (should depend on sire haplotypes)
R <- AR1(100, rho = 0.1)
startvalue(1200, R, 10)
tagSNP
Description
Grouping of markers depending on correlation structure
Usage
tagSNP(mat, threshold = 0.8)
Arguments
mat |
(p x p) correlation matrix |
threshold |
lower value of correlation considered for grouping |
Details
Grouping of markers is based on the correlation matrix. Apart from this, the strategy for grouping is similar to Carlson et al. (2004). A representative marker is suggested for each group.
Value
list (LEN number of groups) of lists (LEN 2); marker names correspond to column names of mat
snps
vector of marker IDs in group
tagsnp
representative marker suggested for this group
References
Carlson, Eberle, Rieder, Yi, Kruglyak & Nickerson (2004) Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. American Journal of Human Genetics 74:106-120.
Wittenburg, Doschoris, Klosa (2021) Grouping of genomic markers in populations with family structure BMC Bioinformatics 22:79. doi: 10.1186/s12859-021-04010-0
Examples
### 1: INPUT DATA
data(testdata)
### 2: COVARIANCE/CORRELATION MATRIX
corrmat <- CovMat(matLD, H.sire, 100, pos.chr, corr = TRUE)
### 3: TAGSNPS FROM CORRELATION MATRIX
bin <- tagSNP(corrmat$R)
bin <- tagSNP(corrmat$R, 0.5)
as.numeric(unlist(rlist::list.select(bin, tagsnp)))
Description of the testdata
Description
The data set contains paternal haplotypes, maternal LD and genetic map positions that are required to calculate the covariance between pairs of markers.
The raw data can be downloaded at the source given below. Then,
executing the following R code leads to the data that have been provided as
testdata.RData
.
- H.sire
(2N x p) haplotype matrix for sires for all chromosomes (2 lines per sire)
- matLD
(p x p) matrix of maternal LD between pairs of p markers; matrix is block diagonal in case of multiple chromosomes
- pos.chr
list of vectors of genetic map positions per chromosome
Source
The data are available from RADAR doi: 10.22000/280.
Examples
## Not run:
## data.frame of estimates of paternal recombination rate and maternal LD
load('Result.RData')
## list of haplotypes of sires for each chromosome
load('sire_haplotypes.RData')
## physical map
map <- read.table('map50K_ARS_reordered.txt', header = T)
## select target region
chr <- 1
window <- 301:600
## map information of target region
map.target <- map[map$Chr == chr, ][window, ]
Result.target <- Result[(Result$Chr == chr) & (Result$SNP1 %in% window) &
(Result$SNP2 %in% window), ]
## SNP position in Morgan approximated from recombination rate
part <- Result.target[Result.target$SNP1 == window[1], ]
sp <- smooth.spline(x = map.target$locus_Mb[part$SNP2 - window[1] + 1], y = part$Theta, df = 4)
pos.snp <- predict(sp, x = map.target$locus_Mb[window - window[1] + 1])$y
## list of SNPs positions
pos.chr <- list(pos.snp)
## haplotypes of sires (mating candidates) in target region
H.sire <- rlist::list.rbind(haps[[chr]])[, window]
## matrix of maternal LD (block diagonal if multiple chromosome)
matLD <- matrix(0, ncol = length(window), nrow = length(window))
## off-diagonal elements
for(l in 1:nrow(Result.target)){
id1 <- Result.target$SNP1[l] - window[1] + 1
id2 <- Result.target$SNP2[l] - window[1] + 1
matLD[id1, id2] <- matLD[id2, id1] <- Result.target$D[l]
}
## diagonal elements
for(k in unique(Result.target$SNP1)){
id <- k - window[1] + 1
p <- Result.target$fAA[Result.target$SNP1 == k] + Result.target$fAB[Result.target$SNP1 == k]
matLD[id, id] <- max(p * (1 - p))
}
## End(Not run)