Cluster on Missingness Patterns — cluster_on

Given an R data.frame or matrix with missing values, clusters on the pattern of missingness and returns cluster labels plus silhouette score.

Usage

cluster_on_missing(
  data,
  cols_ignore = NULL,
  n_clusters = NULL,
  seed = NULL,
  min_cluster_size = NULL,
  cluster_selection_epsilon = 0.25
)

data: A data.frame or matrix (samples × features), may contain NA.
cols_ignore: Character vector of column names to ignore when clustering.
n_clusters: Integer; if provided, will run KMeans with this many clusters. If NULL, will use HDBSCAN.
seed: Integer; random seed for KMeans (or reproducibility in HDBSCAN).
min_cluster_size: Integer; minimum cluster size for HDBSCAN. If NULL, defaults to nrow(data) %/% 25.
cluster_selection_epsilon: Numeric; epsilon parameter for HDBSCAN.

A list with components: