Given an R data.frame or matrix with missing values, clusters on the pattern of missingness and returns cluster labels plus silhouette score.
Usage
cluster_on_missing(
data,
cols_ignore = NULL,
n_clusters = NULL,
seed = NULL,
min_cluster_size = NULL,
cluster_selection_epsilon = 0.25
)
Arguments
- data
A data.frame or matrix (samples × features), may contain
NA
.- cols_ignore
Character vector of column names to ignore when clustering.
- n_clusters
Integer; if provided, will run KMeans with this many clusters. If
NULL
, will use HDBSCAN.- seed
Integer; random seed for KMeans (or reproducibility in HDBSCAN).
- min_cluster_size
Integer; minimum cluster size for HDBSCAN. If
NULL
, defaults tonrow(data) %/% 25
.- cluster_selection_epsilon
Numeric; epsilon parameter for HDBSCAN.