Title: | Persistence Homology Utilities |
Version: | 0.0.1 |
Description: | A low-level package for hosting persistence data. It is part of the 'TDAverse' suite of packages, which is designed to provide a collection of packages for enabling machine learning and data science tasks using persistent homology. Implements a class for hosting persistence data, a number of coercers from and to already existing and used data structures from other packages and functions to compute distances between persistence diagrams. A formal definition and study of bottleneck and Wasserstein distances can be found in Bubenik, Scott and Stanley (2023) <doi:10.1007/s41468-022-00103-8>. Their implementation in 'phutil' relies on the 'C++' Hera library developed by Kerber, Morozov and Nigmetov (2017) <doi:10.1145/3064175>. |
License: | MIT + file LICENSE |
URL: | https://github.com/tdaverse/phutil, https://tdaverse.github.io/phutil/ |
BugReports: | https://github.com/tdaverse/phutil/issues |
Depends: | R (≥ 3.5) |
Imports: | cli, rlang |
Suggests: | ggplot2, knitr, microbenchmark, quarto, scales, TDA, tdaunif, tinysnapshot, tinytest |
LinkingTo: | BH |
VignetteBuilder: | quarto |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | yes |
Packaged: | 2025-05-13 10:39:56 UTC; stamm-a |
Author: | Aymeric Stamm |
Maintainer: | Aymeric Stamm <aymeric.stamm@cnrs.fr> |
Repository: | CRAN |
Date/Publication: | 2025-05-15 13:50:08 UTC |
phutil: Persistence Homology Utilities
Description
A low-level package for hosting persistence data. It is part of the 'TDAverse' suite of packages, which is designed to provide a collection of packages for enabling machine learning and data science tasks using persistent homology. Implements a class for hosting persistence data, a number of coercers from and to already existing and used data structures from other packages and functions to compute distances between persistence diagrams. A formal definition and study of bottleneck and Wasserstein distances can be found in Bubenik, Scott and Stanley (2023) doi:10.1007/s41468-022-00103-8. Their implementation in 'phutil' relies on the 'C++' Hera library developed by Kerber, Morozov and Nigmetov (2017) doi:10.1145/3064175.
Author(s)
Maintainer: Aymeric Stamm aymeric.stamm@cnrs.fr (ORCID)
Authors:
Jason Cory Brunson cornelioid@gmail.com (ORCID)
Other contributors:
Michael Kerber (HERA C++ code) [contributor]
Dmitriy Morozov (HERA C++ code) [contributor]
Arnur Nigmetov (HERA C++ code) [contributor]
See Also
Useful links:
Report bugs at https://github.com/tdaverse/phutil/issues
A sample of persistence diagrams from the arch spiral
Description
A collection of 24 samples of size 120 on the arch spiral from which a
persistence diagram is computed using the TDA::ripsDiag()
function with
maxdimension = 2
and maxscale = 6
. Each diagram has been generated using
the tdaunif::sample_arch_spiral()
function with the following parameters:
n = 120
, arms = 2`` and
sd = 0.05'. The seed was fixed to 28415.
Usage
arch_spirals
Format
A list of length 24, where each element is an object of class 'persistence'.
Distances between two persistence diagrams
Description
This collection of functions computes the distance between two persistence diagrams of the same homology dimension. The diagrams must be represented as 2-column matrices. The first column of the matrix contains the birth times and the second column contains the death times of the points.
Usage
bottleneck_distance(
x,
y,
tol = sqrt(.Machine$double.eps),
validate = TRUE,
dimension = 0L
)
wasserstein_distance(
x,
y,
tol = sqrt(.Machine$double.eps),
p = 1,
validate = TRUE,
dimension = 0L
)
kantorovich_distance(
x,
y,
tol = sqrt(.Machine$double.eps),
p = 1,
validate = TRUE,
dimension = 0L
)
Arguments
x |
Either a matrix of shape |
y |
Either a matrix of shape |
tol |
A numeric value specifying the relative error. Defaults to
|
validate |
A boolean value specifying whether to validate the input
persistence diagrams. Defaults to |
dimension |
An integer value specifying the homology dimension for which
to compute the distance. Defaults to |
p |
A numeric value specifying the power for the Wasserstein distance.
Defaults to |
Details
A matching \varphi : D_1 \to D_2
between persistence diagrams is a
bijection of multisets, where both diagrams are assumed to have all points on
the diagonal with infinite multiplicity. The p
-Wasserstein distance
between D_1
and D_2
is defined as the infimum over all matchings
of the expression
W_p(D_1,D_2) = \inf_{\varphi: D_1 \to D_2}
\left( \sum_{x \in D_1}{\lVert x - \varphi(x) \rVert^p}
\right)^{\frac{1}{p}}
that can be thought of as the Minkowski distance between the diagrams viewed
as vectors on the shared coordinates defined by the matching \varphi
.
The norm \lVert \cdot \rVert
can be arbitrary; as implemented here, it
is the infinity norm \lVert (x_1,x_2) \rVert_\infty = \max(x_1,x_2)
. In
the limit p \to \infty
, the Wasserstein distance becomes the
bottleneck distance:
B(D_1,D_2) = \inf_{\varphi: D_1 \to D_2}
\sup_{x \in D_1}{\lVert x - \varphi(x) \rVert}.
The Wasserstein metric is also called the Kantorovich metric in recognition of the originator of the metric.
Value
A numeric value storing either the Bottleneck or the Wasserstein distance between the two persistence diagrams.
See Also
Examples
bottleneck_distance(
persistence_sample[[1]]$pairs[[1]],
persistence_sample[[2]]$pairs[[1]]
)
bottleneck_distance(
persistence_sample[[1]],
persistence_sample[[2]]
)
wasserstein_distance(
persistence_sample[[1]]$pairs[[1]],
persistence_sample[[2]]$pairs[[1]]
)
wasserstein_distance(
persistence_sample[[1]],
persistence_sample[[2]]
)
Toy Data: Noisy circle
Description
A simulated data set consisting of 100 points sampled from a circle with additive Gaussian noise using a standard deviation of 0.05.
Usage
noisy_circle_points
noisy_circle_ripserr
noisy_circle_tda_rips
Format
noisy_circle_points
A matrix with 100 rows and 2 columns listing the coordinates of the points.
noisy_circle_ripserr
An object of class 'PHom' as returned by the
ripserr::vietoris_rips()
function, which is a data frame with 3 variables:
-
dimension
: the dimension/degree of the feature, -
birth
: the birth value of the feature, -
death
: the death value of the feature.
noisy_circle_tda_rips
A list of length 1 containing an object of class 'diagram' as returned by the
TDA::ripsDiag()$diagram
function, which is a matrix with 3 columns:
-
dimension
: the dimension/degree of the feature, -
birth
: the birth value of the feature, -
death
: the death value of the feature.
An object of class PHom
(inherits from data.frame
) with 101 rows and 3 columns.
An object of class list
of length 1.
Details
The point cloud stored in noisy_circle_points
has been generated using the
tdaunif package using the
tdaunif::sample_circle()
function. Specifically, the following parameters were used: n = 100
, sd = 0.05
and a seed of 1234.
The persistence diagram stored in noisy_circle_ripserr
has been computed
using the ripserr package with the
ripserr::vietoris_rips()
function. Specifically, the following parameters were used: max_dim = 1L
.
The persistence diagram stored in noisy_circle_tda_rips
has been computed
using the TDA package with the
TDA::ripsDiag()
function. Specifically, the following parameters were used: maxdimension = 1L
and maxscale = 1.6322
.
Source
https://tdaverse.github.io/tdaunif/reference/circles.html, https://tdaverse.github.io/ripserr/reference/vietoris_rips.html, https://www.rdocumentation.org/packages/TDA/versions/1.9.1/topics/ripsDiag
Pairwise distances within a set of persistence diagrams
Description
This collection of functions computes the pairwise distance matrix between all pairs in a set of persistence diagrams of the same homology dimension. The diagrams must be represented as 2-column matrices. The first column of the matrix contains the birth times and the second column contains the death times of the points.
Usage
bottleneck_pairwise_distances(
x,
tol = sqrt(.Machine$double.eps),
validate = TRUE,
dimension = 0L,
ncores = 1L
)
wasserstein_pairwise_distances(
x,
tol = sqrt(.Machine$double.eps),
p = 1,
validate = TRUE,
dimension = 0L,
ncores = 1L
)
kantorovich_pairwise_distances(
x,
tol = sqrt(.Machine$double.eps),
p = 1,
validate = TRUE,
dimension = 0L,
ncores = 1L
)
Arguments
x |
A list of either 2-column matrices or objects of class persistence specifying the set of persistence diagrams. |
tol |
A numeric value specifying the relative error. Defaults to
|
validate |
A boolean value specifying whether to validate the input
persistence diagrams. Defaults to |
dimension |
An integer value specifying the homology dimension for which
to compute the distance. Defaults to |
ncores |
An integer value specifying the number of cores to use for
parallel computation. Defaults to |
p |
A numeric value specifying the power for the Wasserstein distance.
Defaults to |
Value
An object of class 'dist' containing the pairwise distance matrix between the persistence diagrams.
Examples
spl <- persistence_sample[1:10]
# Extract the list of 2-column matrices for dimension 0 in the sample
x <- lapply(spl[1:10], function(x) x$pairs[[1]])
# Compute the pairwise Bottleneck distances
Db <- bottleneck_pairwise_distances(spl)
Db <- bottleneck_pairwise_distances(x)
# Compute the pairwise Wasserstein distances
Dw <- wasserstein_pairwise_distances(spl)
Dw <- wasserstein_pairwise_distances(x)
An S3
class for storing persistence data
Description
A collection of functions to coerce persistence data into objects of class
persistence
(See Value section for more details on this class). It is
currently possible to coerce persistence data from the following sources:
a matrix with at least 3 columns (dimension/degree, start/birth, end/death) as returned by
ripserr::vietoris_rips()
in the form of the 'PHom' class,a list as returned by any
*Diag()
function in the TDA package.
Usage
as_persistence(x, warn = TRUE, ...)
## S3 method for class 'list'
as_persistence(x, warn = TRUE, ...)
## S3 method for class 'persistence'
as_persistence(x, warn = TRUE, ...)
## S3 method for class 'data.frame'
as_persistence(x, warn = TRUE, ...)
## S3 method for class 'matrix'
as_persistence(x, warn = TRUE, ...)
## S3 method for class 'diagram'
as_persistence(x, warn = TRUE, ...)
## S3 method for class 'PHom'
as_persistence(x, ...)
## S3 method for class 'hclust'
as_persistence(x, warn = TRUE, birth = NULL, ...)
## S3 method for class 'persistence'
print(x, ...)
## S3 method for class 'persistence'
format(x, ...)
get_pairs(x, dimension, ...)
## S3 method for class 'persistence'
as.matrix(x, ...)
## S3 method for class 'persistence'
as.data.frame(x, row.names = NULL, optional = TRUE, ...)
Arguments
x |
An
|
warn |
A boolean specifying whether to issue a warning if the input
persistence data contained unordered pairs. Defaults to |
... |
Parameters passed to methods. |
birth |
A numeric value specifying the height at which to declare all
leaves were born. Defaults to |
dimension |
A non-negative integer specifying the homology dimension for which to recover a matrix of persistence pairs. |
row.names |
|
optional |
logical. If |
Details
Caution. When providing an unnamed input matrix, the matrix coercer assumes that it has at least 3 columns, with the first column being the dimension/degree, the second column being the start/birth and the third column being the end/death.
Value
An object of class persistence
which is a list of 2 elements:
-
pairs
: A list of 2-column matrices containing birth-death pairs. Thei
-th element of the list corresponds to the(i-1)
-th homology dimension. If there is no pairs for a given dimension but there are pairs in higher dimensions, the corresponding element(s) is/are filled with a0 \times 2
numeric matrix. -
metadata
: A list of length 6 containing information about how the data was computed:-
orderered_pairs
: A boolean indicating whether the pairs in thepairs
list are ordered (i.e. the first column is strictly less than the second column). -
data
: The name of the object containing the original data on which the persistence data was computed. -
engine
: The name of the package and the function of this package that computed the persistence data in the form"package_name::package_function"
. -
filtration
: The filtration used in the computation in a human-readable format (i.e. full names, capitals where need, etc.). -
parameters
: A list of parameters used in the computation. -
call
: The exact call that generated the persistence data.
-
Examples
as_persistence(noisy_circle_ripserr)
x <- as_persistence(noisy_circle_tda_rips)
x
as_persistence(x)
get_pairs(x, dimension = 1)
as.data.frame(x)
# distances between cities
euroclust <- hclust(eurodist, method = "ward.D")
as_persistence(euroclust)
# `hclust()` can accommodate negative distances
d <- as.dist(rbind(c(0, 3, -4), c(3, 0, 5), c(-4, 5, 0)))
hc <- hclust(d, method = "single")
ph <- as_persistence(hc, birth = -10)
get_pairs(ph, 0)
An 'S3' class object for storing sets of persistence diagrams
Description
An 'S3' class object for storing sets of persistence diagrams
Usage
as_persistence_set(x)
## S3 method for class 'persistence_set'
format(x, ...)
## S3 method for class 'persistence_set'
print(x, ...)
Arguments
x |
A list of objects of class persistence. |
... |
Additional arguments passed to the function. |
Value
An object of class 'persistence_set' containing the set of persistence diagrams.
Examples
# Create a persistence set from a list of persistence diagrams
as_persistence_set(persistence_sample[1:10])
Toy Data: A sample of persistence diagrams
Description
A collection of 100 samples of size 100 on the sphere from which a
persistence diagram is computed using the TDA::ripsDiag()
function with
parameters maxdimension = 1L
and maxscale = 1.6322
. Each diagram has been
generated using the tdaunif::sample_2sphere()
function with the following
parameters: n = 100
and sd = 0.05
. The seed was fixed to 1234.
Usage
persistence_sample
Format
A list of length 100, where each element is an object of class 'persistence'.
A sample of persistence diagrams from the trefoil
Description
A collection of 24 samples of size 120 on the trefoil from which a
persistence diagram is computed using the TDA::ripsDiag()
function with
maxdimension = 2
and maxscale = 6
. Each diagram has been generated using the
tdaunif::sample_trefoil()
function with the following parameters: n = 120
and sd = 0.05
. The seed was fixed to 28415.
Usage
trefoils
Format
A list of length 24, where each element is an object of class 'persistence'.