Title: | Alpha and Beta Diversity Measures |
Version: | 0.2.0 |
Description: | A collection of measures for measuring ecological diversity. Ecological diversity comes in two flavors: alpha diversity measures the diversity within a single site or sample, and beta diversity measures the diversity across two sites or samples. This package overlaps considerably with other R packages such as 'vegan', 'gUniFrac', 'betapart', and 'fossil'. We also include a wide range of functions that are implemented in software outside the R ecosystem, such as 'scipy', 'Mothur', and 'scikit-bio'. The implementations here are designed to be basic and clear to the reader. |
URL: | https://github.com/kylebittinger/abdiv |
BugReports: | https://github.com/kylebittinger/abdiv/issues |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 6.1.1 |
Suggests: | testthat (≥ 2.1.0), vegan |
Imports: | ape |
NeedsCompilation: | no |
Packaged: | 2020-01-15 02:25:11 UTC; bittingerk |
Author: | Kyle Bittinger [aut, cre] |
Maintainer: | Kyle Bittinger <kylebittinger@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2020-01-20 10:50:02 UTC |
Berger-Parker dominance
Description
The Berger-Parker dominance is the proportion of the most abundant species.
Usage
berger_parker_d(x)
Arguments
x |
A numeric vector of species counts or proportions. |
Details
Equivalent to
berger_parker_d()
inskbio.diversity.alpha
.Equivalent to the
bergerparker
calculator in Mothur.
Value
The Berger-Parker dominance, 0 < D_{BP} \leq 1
. If the vector
sums to zero, the Berger-Parker dominance is undefined, and we return
NaN
.
References
Berger WH, Parker FL. Diversity of Planktonic Foraminifera in Deep-Sea Sediments. Science. 1970;168(3937):1345-1347.
Examples
x <- c(15, 6, 4, 0, 3, 0)
berger_parker_d(x) # 15 / 28
Binomial deviance and CY index of dissimilarity
Description
The binomial deviance dissimilarity and the CY (or Cao) index of dissimilarity were created to compare species counts at sites with moderate to large differences.
Usage
binomial_deviance(x, y)
cy_dissimilarity(x, y, base = 10, min_value = 0.1)
Arguments
x , y |
Numeric vectors |
base |
Base of the logarithm |
min_value |
Replacement for zero or near-zero values. Values less than
|
Details
Both of these measures were designed to be used with whole-numbered counts, and may not make sense for comparing normalized vectors or vectors of species proportions.
For two vectors x
and y
, the binomial deviance dissimilarity
is
d(x,y) = \sum_i{
\frac{1}{n_i}
\left (
x_i \log{\frac{x_i}{n_i}} +
y_i \log{\frac{y_i}{n_i}} -
(x_i + y_i) log{2}
\right )
},
where n_i = x_i + y_i
. This value is the weighted average of the
deviance for each species, under a binomial model where the expected counts
are n_i / 2
at each site. It was proposed by Anderson and Millar in
2004. Relation to other definitions:
Equivalent to vegdist() with method = "binomial".
The CY index was proposed by Cao, Williams, and Bark in 1997. For two
vectors x
and y
, the CY index is
d(x,y) = \frac{1}{N} \sum_i
\left (
\frac{
(x_i + y_i) \log_{10} ( \frac{x_i + y_i}{2} ) -
x_i \log_{10}(y_i) - y_i \log_{10}(x_i)
}{
x_i + y_i
}
\right ),
where N
is the total number of species in vectors x
and y
.
Double zeros are not considered in the measure.
When either x_i
or y_i
are zero, they need to be replaced by
another value in the CY index to avoid infinities. Cao suggested replacing
zero values with 0.1
, which is one log lower than the minimum value
for whole-numbered counts. Here, we use a min_value
argument to allow
the user set a lower limit on the values. For vectors of species counts,
this function follows the formulation of Cao by default.
Relation of the CY index to other definitions:
Equivalent to the
vegdist()
function withmethod = "cao"
, ifbase = exp(1)
.
Value
The Binomial deviance or CY index of dissimilarity. The CY index is
undefined if all elements of x
and y
are zero, in which case
we return NaN
.
References
Anderson MJ, Millar RB. Spatial variation and effects of habitat on temperate reef fish assemblages in northeastern New Zealand. Journal of Experimental Marine Biology and Ecology 2004;305:191–221.
Cao Y, Williams WP, Bark AW. Similarity measure bias in river benthic Aufwuchs community analysis. Water Environment Research 1997;69(1):95-106.
Bray-Curtis distance
Description
The Bray-Curtis distance is the Manhattan distance divided by the sum of both vectors.
Usage
bray_curtis(x, y)
Arguments
x , y |
Numeric vectors |
Details
For two vectors x
and y
, the Bray-Curtis distance is defined
as
d(x, y) = \frac{\sum_i |x_i - y_i|}{\sum_i x_i + y_i}.
The Bray-Curtis distance is connected to many other distance measures in this package; we try to list some of the more important connections here. Relation to other definitions:
Equivalent to
vegdist()
withmethod = "bray"
.Equivalent to the
braycurtis()
function inscipy.spatial.distance
for positive vectors. They take the absolute value ofx_i + y_i
in the denominator.Equivalent to the
braycurtis
andodum
calculators in Mothur.Equivalent to
D_{14} = 1 - S_{17}
in Legendre & Legendre.The Bray-Curtis distance on proportions is equal to half the Manhattan distance.
The Bray-Curtis distance on presence/absence vectors is equal to the Sorenson index of dissimilarity.
Value
The Bray-Curtis distance between x
and y
. The
Bray-Curtis distance is undefined if the sum of all elements in x
and y
is zero, in which case we return NaN
.
Examples
x <- c(15, 6, 4, 0, 3, 0)
y <- c(10, 2, 0, 1, 1, 0)
bray_curtis(x, y)
# For proportions, equal to half the Manhattan distance
bray_curtis(x / sum(x), y / sum(y))
manhattan(x / sum(x), y / sum(y)) / 2
Balanced variation and abundance gradient components for abundance data
Description
Balanced variation and abundance gradient components for abundance data
Usage
bray_curtis_balanced(x, y)
bray_curtis_gradient(x, y)
ruzicka_balanced(x, y)
ruzicka_gradient(x, y)
Arguments
x , y |
Numeric vectors |
Value
The balanced variation or abundance gradient component of distance
between x
and y
. This quantity is undefined when either
x
or y
have all elements equal to zero, in which case we
return NaN
.
References
Baselga A. Separating the two components of abundance-based dissimilarity: balanced changes in abundance vs.abundance gradients. Methods in Ecology and Evolution. 2013;4:552–557.
Baselga A. Partitioning abundance-based multiple-site dissimilarity into components: balanced variation in abundance and abundance gradients. Methods in Ecology and Evolution. 2017;8:799–808.
Canberra and related distances
Description
The Canberra distance and Clark's coefficient of divergence are measures that use the absolute difference over the sum for each element of the vectors.
Usage
canberra(x, y)
clark_coefficient_of_divergence(x, y)
Arguments
x , y |
Numeric vectors |
Details
For vectors x
and y
, the Canberra distance is defined as
d(x, y) = \sum_i \frac{|x_i - y_i|}{x_i + y_i}.
Elements where
x_i + y_i = 0
are not included in the sum. Relation of
canberra()
to other definitions:
Equivalent to R's built-in
dist()
function withmethod = "canberra"
.Equivalent to the
vegdist()
function withmethod = "canberra"
, multiplied by the number of entries wherex > 0
,y > 0
, or both.Equivalent to the
canberra()
function inscipy.spatial.distance
for positive vectors. They take the absolute value ofx_i
andy_i
in the denominator.Equivalent to the
canberra
calculator in Mothur, multiplied by the total number of species inx
andy
.Equivalent to
D_{10}
in Legendre & Legendre.
Clark's coefficient of divergence involves summing squares and taking a square root afterwards:
d(x, y) = \sqrt{
\frac{1}{n} \sum_i \left( \frac{x_i - y_i}{x_i + y_i} \right)^2
},
where n
is the number of elements where x > 0
, y > 0
, or
both. Relation of clark_coefficient_of_divergence()
to other
definitions:
Equivalent to
D_{11}
in Legendre & Legendre.
Value
The Canberra distance or Clark's coefficient of divergence. If every
element in x
and y
is zero, Clark's coefficient of
divergence is undefined, and we return NaN
.
Examples
x <- c(15, 6, 4, 0, 3, 0)
y <- c(10, 2, 0, 1, 1, 0)
canberra(x, y)
clark_coefficient_of_divergence(x, y)
Chebyshev distance
Description
The Chebyshev distance is the maximum absolute difference between the vector elements.
Usage
chebyshev(x, y)
Arguments
x , y |
Numeric vectors |
Details
For vectors x
and y
, the Chebyshev distance is defined as
d(x, y) = \max_i |x_i - y_i|.
Relation to other definitions:
Equivalent to the
chebyshev()
function inscipy.spatial.distance
.
Value
The Chebyshev distance between x
and y
.
Examples
x <- c(15, 6, 4, 0, 3, 0)
y <- c(10, 2, 0, 1, 1, 0)
chebyshev(x, y) # should be 5
Correlation and cosine distance
Description
The correlation and cosine distances, which are derived from the dot product of the two vectors.
Usage
correlation_distance(x, y)
cosine_distance(x, y)
Arguments
x , y |
Numeric vectors |
Details
For vectors x
and y
, the cosine distance is defined as the
cosine of the angle between the vectors,
d(x, y) = 1 - \frac{x \cdot y}{|x| |y|},
where |x|
is the
magnitude or L2 norm of the vector, |x| = \sqrt{\sum_i x_i^2}
.
Relation to other definitions:
Equivalent to the
cosine()
function inscipy.spatial.distance
.
The correlation distance is simply equal to one minus the Pearson
correlation between vectors. Mathematically, it is equivalent to the cosine
distance between the vectors after they are centered (x - \bar{x}
).
Relation to other definitions:
Equivalent to the
correlation()
function inscipy.spatial.distance
.Equivalent to the
1 - mempearson
calculator in Mothur.
Value
The correlation or cosine distance. These are undefined if either
x
or y
contain all zero elements, that is, if |x| = 0
or |y| = 0
. In this case, we return NaN
.
Examples
x <- c(2, 0)
y <- c(5, 5)
cosine_distance(x, y)
# The two vectors form a 45 degree angle, or pi / 4
1 - cos(pi / 4)
v <- c(3.5, 0.1, 1.4)
w <- c(3.3, 0.5, 0.9)
correlation_distance(v, w)
1 - cor(v, w)
Diversity measures implemented
Description
The diversity functions offered in abdiv
are organized based on the
function signature.
Usage
alpha_diversities
beta_diversities
phylogenetic_alpha_diversities
phylogenetic_beta_diversities
Format
Four objects of class character
.
Details
The following character vectors are provided:
alpha_diversities
-
All non-phylogenetic alpha diversity measures. These functions take a single numeric vector as an argument.
beta_diversities
-
All non-phylogenetic beta diversity measures. These functions take two numeric vectors as arguments.
phylogenetic_alpha_diversities
-
There is only one phylogenetic alpha diversity measure implemented, but we use the plural to be consistent with the other vectors. This function takes a numeric vector, a phylogenetic tree object, and optionally a character vector of species labels.
phylogenetic_beta_diversities
-
Phylogenetic measures of beta diversity. These functions take two numeric vectors, a phylogenetic tree object, and optionally a character vector of species labels.
Euclidean and related distances
Description
These distance and diversity measures are mathematically similar to the Euclidean distance between two vectors.
Usage
euclidean(x, y)
rms_distance(x, y)
chord(x, y)
hellinger(x, y)
geodesic_metric(x, y)
Arguments
x , y |
Numeric vectors |
Details
For vectors x
and y
, the Euclidean distance is defined as
d(x, y) = \sqrt{\sum_i (x_i - y_i) ^ 2}.
Relation of euclidean()
to other definitions:
Equivalent to R's built-in
dist()
function withmethod = "euclidean"
.Equivalent to
vegdist()
withmethod = "euclidean"
.Equivalent to the
euclidean()
function inscipy.spatial.distance
.Equivalent to the
structeuclidean
calculator in Mothur, tospeciesprofile
ifx
andy
are transformed to relative abundance, and tomemeuclidean
ifx
andy
are transformed to presence/absence.Equivalent to
D_1
in Legendre & Legendre.Equivalent to the distance between species profiles,
D_{18}
in Legendre & Legendre ifx
andy
are transformed to relative abundance.
The root-mean-square distance or average distance is similar
to Euclidean distance. As the name implies, it is computed as the square
root of the mean of the squared differences between elements of x
and y
:
d(x, y) = \sqrt{\frac{1}{n} \sum_i^n (x_i - y_i) ^ 2}.
Relation of rms_distance()
to other definitions:
Equivalent to
D_2
in Legendre & Legendre.
The chord distance is the Euclidean distance after scaling each
vector by its root sum of squares, \sqrt{\sum_i x_i^2}
. The chord
distance between any two vectors ranges from 0 to \sqrt{2}
.
Relation of chord()
to other definitions:
Equivalent to
D_3
in Legendre & Legendre.
The Hellinger distance is equal to the chord distance computed after
a square-root transformation. Relation of hellinger()
to other
definitions:
Equivalent to
D_{17}
in Legendre & Legendre.Equivalent to the
hellinger
calculator in Mothur.
The geodesic metric is a transformed version of the chord distance.
d(x, y) = \textrm{arccos} \left(1 - \frac{d_c^2(x, y)}{2} \right),
where d_c
is the chord distance. It gives the length of the arc on a
hypersphere between the vectors, if the vectors are normalized to unit
length. Relation of geodesic_metric()
to other definitions:
Equivalent to
D_4
in Legendre & Legendre.
Value
The distance between x
and y
. The chord distance,
Hellinger distance, and geodesic metric are not defined if all elements
of either vector are zero. We return NaN
in this case.
Examples
x <- c(15, 6, 4, 0, 3, 0)
y <- c(10, 2, 0, 1, 1, 0)
euclidean(x, y)
# The "distance between species profiles"
euclidean(x / sum(x), y / sum(y))
rms_distance(x, y)
chord(x, y)
hellinger(x, y)
# Hellinger is chord distance after square root transform
chord(sqrt(x), sqrt(y))
geodesic_metric(x, y)
# No species in common with x
v <- c(0, 0, 0, 5, 0, 5)
chord(v, x)
sqrt(2)
Faith's phylogenetic diversity
Description
Faith's phylogenetic diversity gives the total branch length on a phylogenetic tree that is spanned by a community. The abundance of each species in the community is not considered.
Usage
faith_pd(x, tree, x_labels = NULL)
Arguments
x |
A numeric vector of species counts or proportions, or a logical vector of species presence/absence. |
tree |
A phylogenetic tree object.. |
x_labels |
A character vector of species labels for |
Details
If the vector x
is named, the names will be automatically
used to match x
with the tree. Missing names are filled in with
zero counts. If x
is not named and x_labels
is provided,
these labels are used to match the elements of x
with the tree.
If x
is not named and x_labels
is not provided, it is
assumed that x
is already in the correct order, and we simply
check that its length matches the number of tips in the tree.
Value
Faith's phylogenetic diversity, PD \geq 0
.
References
Faith DP. Conservation evaluation and phylogenetic diversity. Biol. Conserv. 1992;61:1–10. doi: 10.1016/0006-3207(92)91201-3.
Examples
# Faith's phylogenetic diversity for whole tree is equal to the sum of the
# branch lengths.
sum(faith_tree$edge.length)
faith_pd(c(1, 1, 1, 1, 1), faith_tree)
# Can use named vector or additional argument to match species to tree.
faith_tree$tip.label
faith_pd(c(0, 0, 0, 10, 12), faith_tree)
faith_pd(c(d=10, e=12), faith_tree)
faith_pd(c(10, 12), faith_tree, c("d", "e"))
Example data for Faith's phylogenetic diversity
Description
This example was used to illustrate phylogenetic diversity in Faith and Richards (2012).
Usage
faith_tree
Format
faith_tree
is a phylogenetic tree with five tips, labeled
a-e. It was created with the ape
library.
Details
In the paper, they give the total branch length as 41, but they
don't assign a length to the branch leading to species "b" and "c". Looking
at the figure, we estimated that the length should be 4. For that reason,
the total branch length of faith_tree
is 45, rather than 41.
Source
Faith DP, Richards ZT. Biology (Basel). 2012;1(3):906-32. 10.3390/biology1030906
Hamming distance
Description
The Hamming distance is the number of positions where the values are different.
Usage
hamming(x, y)
Arguments
x , y |
Numeric vectors. |
Details
For vectors x
and y
, the Hamming distance is defined as
d(x, y) = \sum_i [x_i \neq y_i],
where the quantity in the brackets is 1 if the elements are not equal, and zero if the elements are equal. Relation to other definitions:
The
hamming()
function inscipy.spatial.distance
divides the result by the vector length. Our function is equivalent to the SciPy version multiplied by the vector length.Equivalent to the
hamming
calculator in Mothur for presence/absence vectors.
Value
The Hamming distance between x
and y
.
Beta diversity for presence/absence data
Description
These functions transform the input vectors to binary or presence/absence format, then compute a distance or dissimilarity.
Usage
jaccard(x, y)
sorenson(x, y)
kulczynski_first(x, y)
kulczynski_second(x, y)
rogers_tanimoto(x, y)
russel_rao(x, y)
sokal_michener(x, y)
sokal_sneath(x, y)
yule_dissimilarity(x, y)
Arguments
x , y |
Numeric vectors |
Details
Many of these indices are covered in Koleff et al. (2003), so we adopt their
notation. For two vectors x
and y
, we define three quantities:
-
a
is the number of species that are present in bothx
andy
, -
b
is the number of species that are present iny
but notx
, -
c
is the number of species that are present inx
but noty
, and -
d
is the number of species absent in both vectors.
The quantity d
is seldom used in ecology, for good reason. For
details, please see the discussion on the "double zero problem," in section
2 of chapter 7.2 in Legendre & Legendre.
The Jaccard index of dissimilarity is 1 - a / (a + b + c)
, or
one minus the proportion of shared species, counting over both samples
together. Relation of jaccard()
to other definitions:
Equivalent to R's built-in
dist()
function withmethod = "binary"
.Equivalent to
vegdist()
withmethod = "jaccard"
andbinary = TRUE
.Equivalent to the
jaccard()
function inscipy.spatial.distance
, except that we always convert vectors to presence/absence.Equivalent to
1 - S_7
in Legendre & Legendre.Equivalent to
1 - \beta_j
, as well as\beta_{cc}
, and\beta_g
in Koleff (2003).
The Sørenson or Dice index of dissimilarity is
1 - 2a / (2a + b + c)
, or one minus the average proportion of shared
species, counting over each sample individually. Relation of
sorenson()
to other definitions:
Equivalent to the
dice()
function inscipy.spatial.distance
, except that we always convert vectors to presence/absence.Equivalent to the
sorclass
calculator in Mothur, and to1 - whittaker
.Equivalent to
D_{13} = 1 - S_8
in Legendre & Legendre.Equivalent to
1 - \beta_{sor}
in Koleff (2003). Also equivalent to Whittaker's beta diversity (the second definition,\beta_w = (S / \bar{a}) - 1
), as well as\beta_{-1}
,\beta_t
,\beta_{me}
, and\beta_{hk}
.
I have not been able to track down the original reference for the first and
second Kulczynski indices, but we have good formulas from Legendre &
Legendre. The first Kulczynski index is 1 - a / (b + c)
, or
one minus the ratio of shared to unshared species.
Relation of kulczynski_first
to other definitions:
Equivalent to
1 - S_{12}
in Legendre & Legendre.Equivalent to the
kulczynski
calculator in Mothur.
Some people refer to the second Kulczynski index as the Kulczynski-Cody index. It is defined as one minus the average proportion of shared species in each vector,
d = 1 - \frac{1}{2} \left ( \frac{a}{a + b} + \frac{a}{a + c} \right ).
Relation of kulczynski_second
to other definitions:
Equivalent to
1 - S_{13}
in Legendre & Legendre.Equivalent to the
kulczynskicody
calculator in Mothur.Equivalent to one minus the Kulczynski similarity in Hayek (1994).
Equivalent to
vegdist()
withmethod = "kulczynski"
andbinary = TRUE
.
The Rogers-Tanimoto distance is defined as
(2b + 2c) / (a + 2b + 2c + d)
. Relation of rogers_tanimoto()
to other definitions:
Equivalent to the
rogerstanimoto()
function inscipy.spatial.distance
, except that we always convert vectors to presence/absence.Equivalent to
1 - S_2
in Legendre & Legendre.
The Russel-Rao distance is defined
(b + c + d) / (a + b + c + d)
, or the fraction of elements not present
in both vectors, counting double absences. Relation of russel_rao()
to
other definitions:
Equivalent to the
russelrao()
function inscipy.spatial.distance
, except that we always convert vectors to presence/absence.Equivalent to
1 - S_{11}
in Legendre & Legendre.
The Sokal-Michener distance is defined as
(2b + 2c) / (a + 2b + 2c + d)
. Relation of sokal_michener()
to
other definitions:
Equivalent to the
sokalmichener()
function inscipy.spatial.distance
, except that we always convert vectors to presence/absence.
The Sokal-Sneath distance is defined as
(2b + 2c) / (a + 2b + 2c)
. Relation of sokal_sneath()
to other
definitions:
Equivalent to the
sokalsneath()
function inscipy.spatial.distance
, except that we always convert vectors to presence/absence.Equivalent to the
anderberg
calculator in Mothur.Equivalent to
1 - S_{10}
in Legendre & Legendre.
The Yule dissimilarity is defined as 2bc / (ad + bc)
. Relation
of yule_dissimilarity()
to other definitions:
Equivalent to the
yule()
function inscipy.spatial.distance
, except that we always convert vectors to presence/absence.Equivalent to
1 - S
, whereS
is the Yule coefficient in Legendre & Legendre.
Value
The dissimilarity between x
and y
, based on
presence/absence. The Jaccard, Sorenson, Sokal-Sneath, Yule, and both
Kulczynski dissimilarities are not defined if both x
and y
have no nonzero elements. In addition, the second Kulczynski index and the
Yule index of dissimilarity are not defined if one of the vectors has no
nonzero elements. We return NaN
for undefined values.
Nestedness and turnover components for presence/absence data
Description
Nestedness and turnover components for presence/absence data
Usage
jaccard_turnover(x, y)
jaccard_nestedness(x, y)
sorenson_turnover(x, y)
sorenson_nestedness(x, y)
Arguments
x , y |
Numeric vectors |
Value
The nestedness or turnover component of distance between x
and y
. This quantity is undefined when either x
or y
have no observations, in which case we return NaN
.
References
Baselga A. Partitioning the turnover and nestedness components of beta diversity. Global Ecol. Biogeogr. 2010;19:134-143.
Baselga A. The relationship between species replacement, dissimilarity derived from nestedness, and nestedness. Global Ecol. Biogeogr. 2012;21:1223–1232.
Kempton-Taylor Q index
Description
The Kempton-Taylor Q index is designed to measure species in the middle of the abundance distribution.
Usage
kempton_taylor_q(x, lower_quantile = 0.25, upper_quantile = 0.75)
Arguments
x |
A numeric vector of species counts or proportions. |
lower_quantile , upper_quantile |
Lower and upper quantiles of the abundance distribution. Default values are the ones suggested by Kempton and Taylor. |
Details
For a vector of species counts x
, the Kempton-Taylor Q statistic is
equal to the slope of the cumulative abundance curve across a specified
quantile range. The cumulative abundance curve is the plot of the number of
species against the log-abundance.
Kempton and Taylor originally defined the index as
Q = \frac{\frac{1}{2}S}{\log{R_2} - \log{R_1}},
where S
is the
total number of species observed, R_1
is the abundance at the lower
quantile, and R_2
is the abundance at the upper quantile. However,
this definition only holds if one uses the interquartile range. Because we
allow the user to adjust the upper and lower quantiles, we have to find the
number of species at these abundance values. Here, we follow the
implementation in scikit-bio
and round inwards to find the quantile
values, taking the number of species and log-abundance values at these data
points exactly.
Equivalent to
kempton_taylor_q()
inskbio.diversity.alpha
.Similar to the
qstat
calculator in Mothur. Our implementation differs slightly, and this difference affects the result.
Value
The Kempton-Taylor Q index, Q < 0
. If the vector sums to zero,
we cannot compute the quantiles, and this index is undefined. In that
case, we return NaN
.
References
Kempton RA, Taylor LR. Models and statistics for species diversity. Nature. 1976;262:818-820.
Kullback-Leibler divergence
Description
Kullback-Leibler divergence
Usage
kullback_leibler_divergence(x, y)
Arguments
x , y |
Numeric vectors representing probabilities |
Details
Kullback-Leibler divergence is a non-symmetric measure of difference between two probability vectors. In general, KL(x, y) is not equal to KL(y, x).
Because this measure is defined for probabilities, the vectors x and y are normalized in the function so they sum to 1.
Value
The Kullback-Leibler divergence between x
and y
. We
adopt the following conventions if elements of x
or y
are
zero: 0 \log (0 / y_i) = 0
, 0 \log (0 / 0) = 0
, and
x_i \log (x_i / 0) = \infty
. As a result, if elements of x
are
zero, they do not contribute to the sum. If elements of y
are zero
where x
is nonzero, the result will be Inf
. If either
x
or y
sum to zero, we are not able to compute the
proportions, and we return NaN
.
Example data for phylogenetic nestedness and turnover components
Description
This tree was used in Figure 2 of Leprieur et al. (2005) to demonstrate the nestedness and turnover components of phylogenetic beta diversity.
Usage
leprieur_tree
Format
leprieur_tree
is a phylogenetic tree with 8 tips, labeled
a-h. It was created with the ape
library. All edges (branches) in the
tree are of length 1.
Source
Leprieur F, Albouy C, De Bortoli J, Cowman PF, Bellwood DR, Mouillot D. Quantifying phylogenetic beta diversity: distinguishing between "true" turnover of lineages and phylogenetic diversity gradients. PLoS One. 2012;7(8):e42760. 10.1371/journal.pone.0042760
Example data for UniFrac distance
Description
This example was used to illustrate unweighted UniFrac distance in Lozupone and Knight (2005).
Usage
lozupone_tree
lozupone_panel_a
lozupone_panel_b
Format
lozupone_tree
is a phylogenetic tree with 14 tips, labeled
A-N. It was created with the ape
library.
The data frames lozupone_panel_a
and lozupone_panel_b
are
transcribed from Figure 1 of the paper. They have the following columns:
- Species
The species label, matching to the phylogenetic tree.
- SampleID
The community, either "Circle" or "Square".
- Counts
-
The number of organisms counted per species, always 1 for this example.
Source
Lozupone C, Knight R. UniFrac: a new phylogenetic method for comparing microbial communities. Applied and environmental microbiology. 2005;71(12):8228–8235. 10.1128/AEM.71.12.8228-8235.2005
Manhattan and related distances
Description
The Manhattan or city block distance is the sum of absolute differences between the elements of two vectors. The mean character difference is a closely related measure.
Usage
manhattan(x, y)
mean_character_difference(x, y)
modified_mean_character_difference(x, y)
Arguments
x , y |
Numeric vectors |
Details
For vectors x
and y
, the Manhattan distance is given by
d(x, y) = \sum_i |x_i - y_i|.
Relation of manhattan()
to
other definitions:
Equivalent to R's built-in
dist()
function withmethod = "manhattan"
.Equivalent to
vegdist()
withmethod = "manhattan"
.Equivalent to the
cityblock()
function inscipy.spatial.distance
.Equivalent to the
manhattan
calculator in Mothur.Equivalent to
D_7
in Legendre & Legendre.Whittaker's index of association (
D_9
in Legendre & Legendre) is the Manhattan distance computed after transforming to proportions and dividing by 2.
The mean character difference is the Manhattan distance divided by the
length of the vectors. It was proposed by Cain and Harrison in 1958.
Relation of mean_character_difference()
to other definitions:
Equivalent to
D_8
in Legendre & Legendre.For binary data, equivalent to
1 - S_1
in Legendre & Legendre, whereS_1
is the simple matching coefficient.
The modified mean character difference is the Manhattan distance divided by
the number elements where either x
or y
(or both) are nonzero.
Relation of modified_mean_character_difference()
to other
definitions:
Equivalent to
D_{19}
in Legendre & Legendre.Equivalent to
vegdist()
withmethod = "altGower"
.For binary data, it is equivalent to the Jaccard distance.
Value
The distance between x
and y
. The modified mean
character difference is undefined if all elements in x
and y
are zero, in which case we return NaN
.
References
Cain AJ, Harrison GA. An analysis of the taxonomist's judgment of affinity. Proceedings of the Zoological Society of London 1958;131:85-98.
Examples
x <- c(15, 6, 4, 0, 3, 0)
y <- c(10, 2, 0, 1, 1, 0)
manhattan(x, y)
# Whittaker's index of association
manhattan(x / sum(x), y / sum(y)) / 2
mean_character_difference(x, y)
# Simple matching coefficient for presence/absence data
# Should be 2 / 6
mean_character_difference(x > 0, y > 0)
modified_mean_character_difference(x, y)
# Jaccard distance for presence/absence data
modified_mean_character_difference(x > 0, y > 0)
jaccard(x, y)
Margalef's richness index
Description
Margalef's richness index
Usage
margalef(x)
Arguments
x |
A numeric vector of species counts. |
Details
For a vector x
of species counts, Margalef's index is
D = \frac{S -1}{\log N},
where S
is the total number of species
observed and N
is the total number of counts.
This index is appropriate only for raw counts, not transformed counts or proportions.
Equivalent to margalef()
in skbio.diversity.alpha
.
Value
The value of Margalef's index, D \geq 0
. This index is
undefined when the total number of counts is 1 or 0, in which case we
return NaN
.
References
Margalef R. Information theory in ecology. General Systems 3. 1958;36-71.
Examples
x <- c(15, 6, 4, 0, 3, 0)
margalef(x)
Match vector of counts to phylogenetic tree
Description
Match vector of counts to phylogenetic tree
Usage
match_to_tree(x, tree, x_labels = NULL)
Arguments
x |
A vector of species counts. |
tree |
A phylogenetic tree of class |
x_labels |
A vector of species labels for |
Details
This function applies a couple of different methods to arrange the
data in x
to match a phylogenetic tree.
If
x_labels
is provided, we use this vector to match the elements ofx
with the tip labels in the tree.If
x_labels
is not provided andx
is a named vector, we use the names to match the tip labels in the tree.If
x_labels
is not provided andx
is not named, we assume thatx
is already in the correct order, check that the length ofx
matches the number of tips in the tree, and returnx
.
Value
The vector x
, re-arranged to match the tree.
McIntosh dominance index D
Description
McIntosh dominance index D
Usage
mcintosh_d(x)
Arguments
x |
A numeric vector of species counts. |
Details
For a vector x
of raw species counts, the McIntosh dominance index is
defined as
D = \frac{N - U}{N - \sqrt{N}},
where N
is the total
number of counts and U = \sqrt{\sum_i x_i^2}
.
This index is appropriate only for raw counts, not transformed counts or proportions.
Equivalent to mcintosh_d()
in skbio.diversity.alpha
.
Value
The McIntosh dominance index, 0 \leq D < 1
. The index is undefined
when the total number of counts is 1 or 0, in which case we return
NaN
.
References
McIntosh RP. An index of diversity and the relation of certain concepts to diversity. Ecology. 1967;48:1115-1126.
Examples
x <- c(15, 6, 4, 0, 3, 0)
mcintosh_d(x)
McIntosh's evenness measure E
Description
McIntosh's evenness measure E
Usage
mcintosh_e(x)
Arguments
x |
A numeric vector of species counts. |
Details
For a vector x
of raw species counts, the McIntosh evenness measure
is
E = \frac{\sqrt{\sum_i x_i^2}}{\sqrt{(N - S + 1)^2 + S - 1},}
where N
is the total number of counts and S
is the total
number of species observed.
This index is appropriate only for raw counts, not transformed counts or proportions.
Equivalent to mcintosh_e()
in skbio.diversity.alpha
.
Value
McIntosh's evenness measure, 0 < E \leq 1
. The index is
undefined when the total number of counts is 0, in which case we return
NaN
.
References
Heip C, Engels P. Comparing Species Diversity and Evenness Indices. J. Mar. Bioi. Ass. U.K. 1974;54:559-563.
Examples
x <- c(15, 6, 4, 0, 3, 0)
mcintosh_e(x)
Menhinick's richness index
Description
Menhinick's richness index
Usage
menhinick(x)
Arguments
x |
A numeric vector of species counts. |
Details
For a vector x
of raw species counts, the Menhinick's richness index
is \frac{S}{\sqrt{N}}
, where N
is the total number
of counts and S
is the total number of species observed.
This index is appropriate only for raw counts, not transformed counts or proportions.
Equivalent to menhinick()
in skbio.diversity.alpha
.
Value
Menhinick's richness index, R > 0
. The index is undefined when
the total number of counts is 0, in which case we return NaN
.
Examples
x <- c(15, 6, 4, 0, 3, 0)
menhinick(x)
Minkowski distance
Description
The Minkowski metric is a generalized form of Euclidean (p=2) and Manhattan (p=1) distance.
Usage
minkowski(x, y, p = 1)
Arguments
x , y |
Numeric vectors. |
p |
Exponent parameter, a single number greater than zero. |
Details
For vectors x
and y
, the Minkowski distance is defined as
d(x, y) = \left( \sum_i |x_i - y_i|^p \right)^{1/p}.
Relation to other definitions:
Equivalent to R's built-in
dist()
function withmethod = "minkowski"
.Equivalent to the
minkowski()
function inscipy.spatial.distance
.Equivalent to
D_6
in Legendre & Legendre.
The default value of p = 1
makes this distance equal to the Manhattan
distance.
Value
The Minkowski distance between x
and y
.
The Morisita index and Horn-Morisita index
Description
The Morisita and the Horn-Morisita indices measure the probability that individuals drawn one from each vector will belong to different species, relative to drawing from each vector separately. The Morisita index is formulated for count data only, whereas the Horn-Morisita index can be used with transformed counts or proportions.
Usage
morisita(x, y)
horn_morisita(x, y)
Arguments
x , y |
Numeric vectors |
Details
For two vectors x
and y
, the Morisita index of dissimilarity
is
d(x,y) = 1 - \frac{2 \sum_i x_i y_i}{(\lambda_x + \lambda_y) N_x N_y},
where
\lambda_x = \frac{\sum_i x_i (x_i - 1)}{N_x (N_x - 1)}
and
N_x = \sum_i x_i
The formula for \lambda_x
is the unbiased
estimate for the probability of drawing two individuals of the same species
from x
, without replacement. The correction for sampling without
replacement only makes sense for species count data.
Relation of morisita()
to other definitions:
Equivalent to
vegdist()
withmethod = "morisita"
.
Horn (1966) reformulated the index to use the equations for sampling with
replacement in \lambda_x
and \lambda_y
:
\lambda_x = \frac{\sum_i x_i^2}{N_x^2}
With this modification, the index is valid for proportions or transformed count data.
Relation of horn_morisita()
to other definitions:
Equivalent to
vegdist()
withmethod = "horn"
.Equivalent to the
morisitahorn
calculator in Mothur.
Value
The Morisita or Horn-Morisita index between x
and y
.
Both are undefined if x
or y
have no nonzero elements, in
which case we return NaN
.
References
Mosrisita M. Measuring of interspecific association and similarity between communities. Memoirs of the Faculty of Science, Kyushu Univ., Series E (Biology). 1959;3:65-80.
Horn HS. Measurement of "Overlap" in Comparative Ecological Studies. The American Naturalist, 1966;100(914):419-424.
Richness or number of observed species
Description
Richness or number of observed species
Usage
richness(x)
Arguments
x |
A numeric vector of species counts or proportions. |
Details
The richness is simply the number of nonzero elements in x
.
Relation to other definitions:
Equivalent to
observed_otus()
inskbio.diversity.alpha
.Equivalent to
specnumber
invegan
.Equivalent to the
sobs
calculator in Mothur.
Value
The number of species observed, R \geq 0
.
Examples
x <- c(15, 6, 4, 0, 3, 0)
richness(x) # 4
Ruzicka or weighted Jaccard distance
Description
Ruzicka or weighted Jaccard distance
Usage
ruzicka(x, y)
Arguments
x , y |
Numeric vectors. |
Details
For vectors x
and y
, the Ruzicka distance is defined as
d(x, y) = 1 - \frac{\sum_i \min(x, y)}{\sum_i \max(x, y)}.
Relation to other definitions:
Equivalent to vegdist() with method = "jaccard".
Related to the Bray-Curtis distance,
d_r = 2 d_{bc} / (1 + d_{bc})
.
Value
The Ruzicka distance between x
and y
. The distance is
not defined if all elements in x
and y
are zero, and we
return NaN
in this case.
Shannon diversity and related measures
Description
The Shannon index of diversity
Usage
shannon(x, base = exp(1))
brillouin_d(x)
heip_e(x)
pielou_e(x)
Arguments
x |
A numeric vector of species counts or proportions. |
base |
Base of the logarithm to use in the calculation. |
Details
The Shannon index of diversity or Shannon information entropy has deep roots in information theory. It is defined as
H = - \sum_i p_i \log{p_i},
where p_i
is the species proportion. Relation to other definitions:
Equivalent to
diversity()
invegan
withindex = "shannon"
.Equivalent to
shannon()
inskbio.diversity.alpha
.
The Brillouin index (Brillouin 1956) is similar to Shannon's index, but
accounts for sampling without replacement. For a vector of species counts
x
, the Brillouin index is
\frac{1}{N}\log{\frac{N!}{\prod_i x_i!}} =
\frac{\log{N!} - \sum_i \log{x_i!}}{N}
where N
is the total number of counts. Relation to other definitions:
Equivalent to
brillouin_d()
inskbio.diversity.alpha
.Equivalent to the
shannon
calculator in Mothur.
The Brillouin index accounts for the total number of individuals sampled, and should be used on raw count data, not proportions.
Heip's evenness measure is
\frac{e^H - 1}{S - 1},
where S
is
the total number of species observed. Relation to other definitions:
Equivalent to
heip_e()
inskbio.diversity.alpha
.
Pielou's Evenness index J = H / \log{S}
. Relation to other
definitions:
Equivalent to
peilou_e()
inskbio.diversity.alpha
.
Value
The Shannon diversity, H \geq 0
, or related quantity. The
value of H
is undefined if x
sums to zero, and we return
NaN
in this case. Heip's evenness measure and Pielou's Evenness
index are undefined if only one element of x
is nonzero, and again
we return NaN
if this is the case.
References
Brillouin L. Science and Information Theory. 1956;Academic Press, New York.
Pielou EC. The Measurement of Diversity in Different Types of Biological Collections. Journal of Theoretical Biology. 1966;13:131-144.
Examples
x <- c(15, 6, 4, 0, 3, 0)
shannon(x)
# Using a different base is the same as dividing by the log of that base
shannon(x, base = 10)
shannon(x) / log(10)
brillouin_d(x)
# Brillouin index should be almost identical to Shannon index for large N
brillouin_d(10000 * x)
shannon(10000 * x)
heip_e(x)
(exp(shannon(x)) - 1) / (richness(x) - 1)
pielou_e(x)
shannon(x) / log(richness(x))
Simpson's index and related measures
Description
These measures are based on the sum of squared species proportions. The
function dominance()
gives this quantity, simpson()
gives one
minus this quantity, invsimpson()
gives the reciprocal of the
quantity, and simpson_e
gives the reciprocal divided by the number
of species.
Usage
simpson(x)
dominance(x)
invsimpson(x)
simpson_e(x)
Arguments
x |
A numeric vector of species counts or proportions. |
Details
For a vector of species counts x
, the dominance index is defined as
D = \sum_i p_i^2,
where p_i
is the species proportion,
p_i = x_i / N
, and N
is the total number of counts. This is
equal to the probability of selecting two individuals from the same species,
with replacement. Relation to other definitions:
Equivalent to
dominance()
inskbio.diversity.alpha
.Similar to the
simpson
calculator in Mothur. They use the unbiased estimatep_i = x_i (x_i - 1) / (N (N -1))
.
Simpson's index is defined here as 1 - D
, or the probability of
selecting two individuals from different species, with replacement. Relation
to other definitions:
Equivalent to
diversity()
invegan
withindex = "simpson"
.Equivalent to
simpson()
inskbio.diversity.alpha
.
The inverse Simpson index is 1/D
. Relation to other definitions:
Equivalent to
diversity()
invegan
withindex = "invsimpson"
.Equivalent to
enspie()
inskbio.diversity.alpha
.Similar to the
invsimpson
calculator in Mothur. They use the unbiased estimatep_i = x_i (x_i - 1) / (N (N -1))
.
Simpson's evenness index is the inverse Simpson index divided by the
number of species observed, 1 / (D S)
. Relation to other definitions:
Equivalent to
simpson_e()
inskbio.diversity.alpha
.
Please be warned that the naming conventions vary between sources. For
example Wikipedia calls D
the Simpson index and 1 - D
the
Gini-Simpson index. We have followed the convention from vegan
, to
avoid confusion within the R
ecosystem.
Value
The value of the dominance (0 < D \leq 1
), Simpson index, or
inverse Simpson index. The dominance is undefined if the vector sums to
zero, in which case we return NaN
.
Examples
x <- c(15, 6, 4, 0, 3, 0)
dominance(x)
# Simpson is 1 - D
simpson(x)
1 - dominance(x)
# Inverse Simpson is 1/D
invsimpson(x)
1 / dominance(x)
# Simpson's evenness is 1 / (D * S)
simpson_e(x)
1 / (dominance(x) * richness(x))
Strong's dominance index
Description
Strong's dominance index measures the maximum departure between the observed proportions and a perfectly even community.
Usage
strong(x)
Arguments
x |
A numeric vector of species counts. |
Details
Strong's dominance index is defined as
D_W = \max_i \left [ \frac{b_i}{N} - \frac{i}{S} \right ],
where
b_i
is the abundance of the i
th species, ordered from smallest
to largest, N
is the total number of counts, and S
is the number
of species observed.
Equivalent to strong()
in skbio.diversity.alpha
.
Value
Strong's dominance index, 0 \leq D_W < 1
. The index is
undefined if x
sums to 0, and we return NaN
in this case.
References
Strong WL. Assessing species abundance uneveness within and between plant communities. Community Ecology. 2002;3:237-246.
Examples
x <- c(9, 0, 1, 2, 5, 2, 1, 1, 0, 7, 2, 1, 0, 1, 1)
strong(x)
UniFrac distance
Description
The UniFrac distance is a phylogenetically-weighted distance between two communities of organisms. The measure has been extended a number of times to include abundance-weighted and variance-adjusted versions.
Usage
unweighted_unifrac(x, y, tree, xy_labels = NULL)
weighted_unifrac(x, y, tree, xy_labels = NULL)
weighted_normalized_unifrac(x, y, tree, xy_labels = NULL)
variance_adjusted_unifrac(x, y, tree, xy_labels = NULL)
generalized_unifrac(x, y, tree, alpha = 0.5, xy_labels = NULL)
information_unifrac(x, y, tree, xy_labels = NULL)
phylosor(x, y, tree, xy_labels = NULL)
Arguments
x , y |
Numeric vectors of species counts or proportions. |
tree |
A phylogenetic tree object. |
xy_labels |
A character vector of species labels for |
alpha |
Generalized UniFrac parameter. |
Details
These functions compute different variations of the UniFrac distance between
communities described by the vectors x
and y
. If the vectors
are named, the names will be automatically used to match the vectors with
the tree. Missing names are filled in with zero counts. If the vectors are
not named and xy_labels
is provided, these labels will be used to
match the vectors with the tree. If the vectors are not named and
xy_labels
is not provided, it is assumed that the vectors are already
in the correct order, and we simply check that their length matches the
number of tips in the tree.
unweighted_unifrac
gives the original UniFrac distance from Lozupone
and Knight (2005), which is the fraction of total branch length leading to
community x
or community y
, but not both. It is based on
species presence/absence.
weighted_unifrac
gives the abundance-weighted version of UniFrac
proposed by Lozupone et al. (2007). In this measure, the branch lengths of
the tree are multiplied by the absolute difference in species abundances
below each branch.
weighted_normalized_unifrac
provides a normalized version of
weighted_unifrac
, so the distance is between 0 and 1.
variance_adjusted_unifrac
was proposed by Chang et al. (2011) to
adjust for the variation of weights in weighted UniFrac under random
sampling.
generalized_unifrac
was proposed by Chen et al. (2012) to provide a
unified mathematical framework for weighted and unweighted UniFrac distance.
It includes a parameter, \alpha
, which can be used to adjust the
abundance-weighting in the distance. A value of \alpha = 1
corresponds
to weighted UniFrac. A value of \alpha = 0
corresponds to unweighted
UniFrac if presence/absence vectors are provided. The authors suggest a
value of \alpha = 0.5
as a compromise between weighted and unweighted
distances.
information_unifrac
was proposed by Wong et al. (2016) to connect
UniFrac distance with compositional data analysis. They also proposed a
"ratio UniFrac" distance, which is not yet implemented.
phylosor
, proposed by Bryant et al. (2008), is closely related to
unweighted UniFrac distance. If unweighted UniFrac distance is the analogue
of Jaccard distance using branches on a phylogenetic tree, PhyloSor is the
analogue of Sorenson dissimilarity.
Value
The UniFrac distance between communities x
and y
. The
distance is not defined if either x
or y
have all zero
elements. We return NaN
if this is the case.
References
Lozupone C, Knight R. UniFrac: a new phylogenetic method for comparing microbial communities. Applied and environmental microbiology. 2005;71(12):8228–8235. 10.1128/AEM.71.12.8228-8235.2005
Lozupone CA, Hamady M, Kelley ST, Knight R. Quantitative and
qualitative \beta
diversity measures lead to different insights into
factors that structure microbial communities. Applied and environmental
microbiology. 2007;73(5):1576–1585. 10.1128/AEM.01996-06
Chang Q., et al. Variance adjusted weighted UniFrac: a powerful beta diversity measure for comparing communities based on phylogeny. BMC Bioinformatics. 2011;12:118. 10.1186/1471-2105-12-118
Chen J, Bittinger K, Charlson ES, Hoffmann C, Lewis J, Wu GD, et al. Associating microbiome composition with environmental covariates using generalized UniFrac distances. Bioinformatics. 2012;28(16):2106–2113. 10.1093/bioinformatics/bts342
Wong RG, Wu JR, Gloor GB. Expanding the UniFrac Toolbox. PLOS ONE. 2016;11(9):1–20. 10.1371/journal.pone.0161196
Bryant JA, Lamanna C, Morlon H, Kerkhoff AJ, Enquist BJ, Green JL. Microbes on mountainsides: contrasting elevational patterns of bacterial and plant diversity. Proc Natl Acad Sci U S A. 2008;105 Suppl 1:11505-11. 10.1073/pnas.0801920105
Examples
# From Lozupone and Knight (2005), Figure 1.
# Panel A
x1 <- c(1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1)
x2 <- c(0, 1, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0)
unweighted_unifrac(x1, x2, lozupone_tree)
# Panel B
x3 <- c(0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1)
x4 <- c(1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0)
unweighted_unifrac(x3, x4, lozupone_tree)
# Can use named vectors to specify species
weighted_normalized_unifrac(
c(A=1, C=1, D=1, F=1, I=1, L=1, N=1),
c(B=1, E=1, G=1, H=1, J=1, K=1, M=1),
lozupone_tree)
weighted_normalized_unifrac(x1, x2, lozupone_tree)
# Generalized UniFrac is equal to weighted normalized UniFrac when alpha = 1
generalized_unifrac(x1, x2, lozupone_tree, alpha=1)
generalized_unifrac(x1, x2, lozupone_tree, alpha=0.5)
Nestedness and turnover components of unweighted UniFrac distance
Description
Nestedness and turnover components of unweighted UniFrac distance
Usage
unweighted_unifrac_turnover(x, y, tree, xy_labels = NULL)
unweighted_unifrac_nestedness(x, y, tree, xy_labels = NULL)
phylosor_turnover(x, y, tree, xy_labels = NULL)
phylosor_nestedness(x, y, tree, xy_labels = NULL)
Arguments
x , y |
Numeric vectors of species counts or proportions. |
tree |
A phylogenetic tree object. |
xy_labels |
A character vector of species labels for |
Details
Leprieur et al. (2012) showed that measures of phylogenetic beta diversity could be partitioned into nestedness and turnover components, following the approach of Baselga (2010) for Sorenson dissimilarity.
Value
The nestedness or turnover component of the UniFrac distance
between communities x
and y
. This quantity is undefined
when either x
or y
have all elements equal to zero, in
which case we return NaN
.
References
Baselga A. Partitioning the turnover and nestedness components of beta diversity. Global Ecol. Biogeogr. 2010;19:134-143.
Leprieur F, Albouy C, De Bortoli J, Cowman PF, Bellwood DR, Mouillot D. Quantifying phylogenetic beta diversity: distinguishing between "true" turnover of lineages and phylogenetic diversity gradients. PLoS One. 2012;7(8):e42760. 10.1371/journal.pone.0042760
Examples
# Vectors x and y have turnover but no nestedness
x <- c(1, 1, 1, 0, 0, 0, 0, 0)
y <- c(0, 1, 1, 1, 0, 0, 0, 0)
unweighted_unifrac(x, y, leprieur_tree)
unweighted_unifrac_turnover(x, y, leprieur_tree)
unweighted_unifrac_nestedness(x, y, leprieur_tree)
phylosor(x, y, leprieur_tree)
phylosor_turnover(x, y, leprieur_tree)
phylosor_nestedness(x, y, leprieur_tree)
# Vectors y and z have nestedness but no turnover
z <- c(0, 1, 1, 1, 1, 1, 1, 1)
unweighted_unifrac(y, z, leprieur_tree)
unweighted_unifrac_turnover(y, z, leprieur_tree)
unweighted_unifrac_nestedness(y, z, leprieur_tree)
phylosor(y, z, leprieur_tree)
phylosor_turnover(y, z, leprieur_tree)
phylosor_nestedness(y, z, leprieur_tree)
Weighted Kulczynski distance
Description
The quantitative version of the second Kulczynski index
Usage
weighted_kulczynski_second(x, y)
Arguments
x , y |
Numeric vectors |
Details
The quantitative version of the second Kulczynski index is defined as
d(x, y) = 1 - \frac{1}{2} \left (
\frac{\sum_i \min{(x_i, y_i)}}{\sum_i x_i} +
\frac{\sum_i \min{(x_i, y_i)}}{\sum_i y_i}
\right ).
Relation of weighted_kulczynski_second()
to other definitions:
Equivalent to
vegdist()
withmethod = "kulczynski"
.Equivalent to
structkulczynski
in Mothur.Equivalent to
1 - S_{18}
in Legendre & Legendre.
Value
The weighted Kulczynski distance between x
and y
. The
distance is undefined if the sum of x
or the sum of y
is
zero, in which case we return NaN
.