Help for package caviarpd

Type:

Package

Title:

Cluster Analysis via Random Partition Distributions

Version:

0.3.20

Description:

Cluster analysis is performed using pairwise distance information and a random partition distribution. The method is implemented for two random partition distributions. It draws samples and then obtains and plots clustering estimates. An implementation of a selection algorithm is provided for the mass parameter of the partition distribution. Since pairwise distances are the principal input to this procedure, it is most comparable to the hierarchical and k-medoids clustering methods. The method is Dahl, Andros, Carter (2022+) <doi:10.1002/sam.11602>.

License:

MIT + file LICENSE | Apache License 2.0

URL:

https://github.com/dbdahl/caviarpd-package

BugReports:

https://github.com/dbdahl/caviarpd-package/issues

Depends:

R (≥ 4.2.0)

Suggests:

salso (≥ 0.3.0)

SystemRequirements:

Cargo (Rust's package manager), rustc (>= 1.77.2)

Encoding:

UTF-8

RoxygenNote:

7.3.2

Config/Roxido/Version:

25.06.02

NeedsCompilation:

yes

Packaged:

2025-06-02 21:59:49 UTC; dahl

Author:

David B. Dahl

[aut, cre], R. Jacob Andros

[aut], J. Brandon Carter

[aut], Alex Crichton [ctb] (Rust crates: cfg-if, proc-macro2), Brendan Zabarauskas [ctb] (Rust crate: approx), David B. Dahl [ctb] (Rust crates: dahl-partition, dahl-salso, epa, roxido, roxido_macro), David Tolnay [ctb] (Rust crates: proc-macro2, quote, syn, unicode-ident), Jim Turner [ctb] (Rust crate: ndarray), Jorge Aparicio [ctb] (Rust crate: libm), Josh Stone [ctb] (Rust crate: autocfg), Mikhail Vorotilov [ctb] (Rust crate: roots), R. Janis Goldschmidt [ctb] (Rust crate: matrixmultiply), Sean McArthur [ctb] (Rust crate: num_cpus), Stefan Lankes [ctb] (Rust crate: hermit-abi), The Cranelift Project Developers [ctb] (Rust crate: wasi), The CryptoCorrosion Contributors [ctb] (Rust crates: ppv-lite86, rand_chacha), The Rand Project Developers [ctb] (Rust crates: getrandom, rand, rand_chacha, rand_core, rand_distr, rand_pcg), The Rust Project Developers [ctb] (Rust crates: libc, num-complex, num-integer, num-traits, rand, rand_chacha, rand_core), Ulrik Sverdrup "bluss" [ctb] (Rust crate: ndarray), bluss [ctb] (Rust crates: matrixmultiply, rawpointer)

Maintainer:

David B. Dahl <dahl@stat.byu.edu>

Repository:

CRAN

Date/Publication:

2025-06-02 22:20:05 UTC

caviarpd: Cluster Analysis via Random Partition Distributions

Description

Author(s)

Maintainer: David B. Dahl dahl@stat.byu.edu (ORCID)

Authors:

R. Jacob Andros androsrj@gmail.com (ORCID)
J. Brandon Carter carterj4@icloud.com (ORCID)

Other contributors:

Alex Crichton alex@alexcrichton.com (Rust crates: cfg-if, proc-macro2) [contributor]
Brendan Zabarauskas bjzaba@yahoo.com.au (Rust crate: approx) [contributor]
David B. Dahl dahl@stat.byu.edu (Rust crates: dahl-partition, dahl-salso, epa, roxido, roxido_macro) [contributor]
David Tolnay dtolnay@gmail.com (Rust crates: proc-macro2, quote, syn, unicode-ident) [contributor]
Jim Turner (Rust crate: ndarray) [contributor]
Jorge Aparicio jorge@japaric.io (Rust crate: libm) [contributor]
Josh Stone cuviper@gmail.com (Rust crate: autocfg) [contributor]
Mikhail Vorotilov mikhail.vorotilov@gmail.com (Rust crate: roots) [contributor]
R. Janis Goldschmidt (Rust crate: matrixmultiply) [contributor]
Sean McArthur sean@seanmonstar.com (Rust crate: num_cpus) [contributor]
Stefan Lankes (Rust crate: hermit-abi) [contributor]
The Cranelift Project Developers (Rust crate: wasi) [contributor]
The CryptoCorrosion Contributors (Rust crates: ppv-lite86, rand_chacha) [contributor]
The Rand Project Developers (Rust crates: getrandom, rand, rand_chacha, rand_core, rand_distr, rand_pcg) [contributor]
The Rust Project Developers (Rust crates: libc, num-complex, num-integer, num-traits, rand, rand_chacha, rand_core) [contributor]
Ulrik Sverdrup "bluss" (Rust crate: ndarray) [contributor]
bluss (Rust crates: matrixmultiply, rawpointer) [contributor]

Cluster Analysis via Random Partition Distributions

Description

Returns a clustering estimate given pairwise distances using the CaviarPD method.

Usage

caviarpd(
  distance,
  nClusters,
  mass = NULL,
  nSamples = 200,
  gridLength = 5,
  loss = "binder",
  temperature = 100,
  similarity = c("exponential", "reciprocal")[1],
  maxNClusters = 0,
  nRuns = 4,
  nCores = nRuns
)

Arguments

distance

An object of class 'dist' or a pairwise distance matrix.

nClusters

A numeric vector that specifies the range for the number of clusters to consider in the search for a clustering estimate.

mass

The mass value to use for sampling. If NULL, the mass value is found by inverting values from nClusters.

nSamples

The number of samples drawn per candidate estimate.

gridLength

The number of candidate estimates to consider. The final estimate is obtained from nSamples \times gridLength total samples.

loss

The SALSO method (Dahl, Johnson, Müller, 2021) tries to minimize this expected loss when searching the partition space for an optimal estimate. This must be either "binder" or "VI".

temperature

A positive number that accentuates or dampens distance between observations.

similarity

Either "exponential" or "reciprocal" to indicate the desired similarity function.

maxNClusters

The maximum number of clusters that can be considered by the SALSO method.

nRuns

The number of runs of the SALSO algorithm.

nCores

The number of CPU cores to use. A value of zero indicates to use all cores on the system.

Details

A range for the number of clusters to be considered is supplied using the nClusters argument.

Value

A object of class salso.estimate, which provides a clustering estimate (a vector of cluster labels) that can be displayed and plotted.

References

D. B. Dahl, J. Andros, J. B. Carter (2023), Cluster Analysis via Random Partition Distributions, Statistical Analysis and Data Mining, doi:10.1002/sam.11602.

D. B. Dahl, D. J. Johnson, and P. Müller (2022), Search Algorithms and Loss Functions for Bayesian Clustering, Journal of Computational and Graphical Statistics, 31(4), 1189-1201, doi:10.1080/10618600.2022.2069779. '

Examples

# To reduce load on CRAN servers, limit the number of samples, grid length, and CPU cores.
set.seed(34)
iris.dis <- dist(iris[,-5])
est <- caviarpd(distance=iris.dis, nClusters=c(2,4), nSamples=20, nCores=1)
if ( require("salso") ) {
  summ <- summary(est, orderingMethod=2)
  plot(summ, type="heatmap")
  plot(summ, type="mds")
}

caviarpd: Cluster Analysis via Random Partition Distributions

Description

Author(s)

See Also

Cluster Analysis via Random Partition Distributions

Description

Usage

Arguments

Details

Value

References

Examples