Skip to contents

Given an R data.frame or matrix with missing values, clusters on the pattern of missingness and returns cluster labels plus silhouette score.

Usage

cluster_on_missing(
  data,
  cols_ignore = NULL,
  n_clusters = NULL,
  seed = 42,
  k_neighbors = NULL,
  leiden_resolution = 0.25,
  leiden_objective = "CPM",
  use_snn = TRUE
)

Arguments

data

A data.frame or matrix (samples × features), may contain NA.

cols_ignore

Character vector of column names to ignore when clustering.

n_clusters

Integer; if provided, will run KMeans with this many clusters. If NULL, will use Leiden.

seed

Integer; random seed for KMeans (or reproducibility in Leiden).

k_neighbors

Integer; minimum cluster size for Leiden. If NULL, defaults to nrow(data) %/% 25.

leiden_resolution

Resolution for Leiden Clustering.

leiden_objective

objective

use_snn

use snn

Value

A list with components:

  • clusters — integer vector of cluster labels

  • silhouette — numeric silhouette score, or NA if not computable