Title: Project Code - Nonparametric Bayes
Version: 0.0.1
Description: Basic implementation of a Gibbs sampler for a Chinese Restaurant Process along with some visual aids to help understand how the sampling works. This is developed as part of a postgraduate school project for an Advanced Bayesian Nonparametric course. It is inspired by Tamara Broderick's presentation on Nonparametric Bayesian statistics given at the Simons institute.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.1.2
Imports: mvtnorm, progress
NeedsCompilation: no
Packaged: 2021-11-27 19:19:00 UTC; erik
Author: Erik-Cristian Seulean ORCID iD [aut, cre]
Maintainer: Erik-Cristian Seulean <erikseulean@gmail.com>
Repository: CRAN
Date/Publication: 2021-11-29 09:50:05 UTC

Gibbs sampling for the Chinese Restaurant Process Implementation details can be found in the associated paper The algorithm stops at every 1000th iteration and prints the current cluster configuration.

Description

Gibbs sampling for the Chinese Restaurant Process Implementation details can be found in the associated paper The algorithm stops at every 1000th iteration and prints the current cluster configuration.

Usage

cluster_datapoints(
  data,
  sd = 1,
  initialisation = rep(1, nrow(data)),
  sigma0 = matrix(c(1, 0, 0, 1), nrow = 2, byrow = TRUE)
)

Arguments

data

A matrix of nx2 containing the datapoints

sd

Prior standard deviation

initialisation

Cluster initialisation for each datapoint. Default initialisation is to set every point in the same cluster.

sigma0

Covariance matrix for the points. Default initialisation is set to matrix(c(1, 0, 0, 1), mrow=2, byrow=TRUE)

Value

Returns the cluster assignments after the last iteration. Examples cluster_datapoints(generate_split_data(350, 0.5)$x, sigma0=diag(3^2, 2)) cluster_datapoints(petal, sigma0=petal_sigma0) cluster_datapoints(width, sigma0=width_sigma0) cluster_datapoints(mixed, sigma0=mixed_sigma0)


Draws from a Dirichlet distribution and shows the clusters that were generated by this draw. Varying alpha, will put more or less mass in the first clusters compared to higher clusters (rhos).

Description

Draws from a Dirichlet distribution and shows the clusters that were generated by this draw. Varying alpha, will put more or less mass in the first clusters compared to higher clusters (rhos).

Usage

generate_dirichlet_clusters(a, K)

Arguments

a

Parameter that will be passed in to a Gamma distribution in order to draw from the Dirichlet distribution.

K

Number of clusters to draw

Value

No return value

Examples

generate_dirichlet_clusters(10, 10)
generate_dirichlet_clusters(0.5, 30)

Draws from a Dirichlet distribution and shows the clusters that were generated by this draw. Additionally, adds points to these clusters and shows which clusters are occupied

Description

Each point is generated one at a time, need to hit enter to generate a new point. Typing "x" will stop the clustering and the function will return.

Usage

generate_dirichlet_clusters_with_sampled_points(n, a, K)

Arguments

n

Number of points to be drawn in the clusters

a

Parameter that will be passed in to a Gamma distribution in order to draw from the Dirichlet distribution.

K

Number of clusters to draw

Value

No return value

Examples

generate_dirichlet_clusters_with_sampled_points(15, 0.5, 20)

Generates a dataset used to exemplify clustering The cluster centers are set relatively far away to see how well the algorithm performs in simple scenarios

Description

Generates a dataset used to exemplify clustering The cluster centers are set relatively far away to see how well the algorithm performs in simple scenarios

Usage

generate_split_data(n, sd)

Arguments

n

Number of datapoints to generate

sd

Standard deviation from the cluster center

Value

Returns the datapoints and the cluster assignments. The cluster assignments can be used to calculate the performance of the clustering.


Sequentially generate draws from a Dirichlet process mixture model, by showing step by step the iterations taken. The plot is centered at 0, with x and y from -5 to 5. The mixture draws the centres for clusters from a Normal distribution with mean mu and standard deviation sigma_0 Additional to plotting the points, it also returns the points sampled.

Description

Hit enter to keep drawing until max n or type "x" to exit.

Usage

rDPM(n, alpha, mu, sigma_0, sigma)

Arguments

n

Number of observations.

alpha

Alpha corresponding to GEM(alpha) used to draw the rho vector.

mu

Mean of the Normal distribution used to draw the clusters.

sigma_0

Standard deviation of the Normal distribution used to draw the points around the cluster centre.

sigma

Standard deviation for cluster centers

Value

Returns the n observations sampled from the DPMM distribution.

Examples

rDPM(n=30, alpha=3, mu=0, sigma_0=1.5, sigma=0.7)

Sequentially generate draws from a Dirichlet process mixture model, by showing step by step the iterations taken. The plot is centered at 0, with x and y from -5 to 5. The mixture draws the centres for clusters from a Normal distribution with mean mu and standard deviation sigma_0

Description

Hit enter to keep drawing until max n, type x to exit.

Usage

rDPM_visual(n, alpha, mu, sigma_0, sigma)

Arguments

n

Number of observations.

alpha

Alpha corresponding to GEM(alpha) used to draw the rho vector.

mu

Mean of the Normal distribution used to draw the clusters.

sigma_0

Standard deviation of the Normal distribution used to draw the points around the cluster centre.

sigma

Standard deviation for the cluster centre.

Value

Returns the n observations sampled from the DPMM distribution.

Examples

rDPM_visual(n=30, alpha=3, mu=0, sigma_0=1.5, sigma=0.7)

Generate a sample from a Dirichlet distirbution Using: https://en.wikipedia.org/wiki/Dirichlet_distribution#Random_number_generation

Description

Generate a sample from a Dirichlet distirbution Using: https://en.wikipedia.org/wiki/Dirichlet_distribution#Random_number_generation

Usage

rdirichlet(n, alpha)

Arguments

n

Number of observations.

alpha

A vector containing the parameters for the Dirichlet distribution.

Value

A sample of n observations from the Dirichlet distribution.

Examples

rdirichlet(n=1, alpha=c(2, 2, 2))