Groups samples with similar patterns of missingness across features using
either K-means clustering (when n_clusters is specified) or Leiden
(when n_clusters is NULL). This is useful for detecting cohorts with
shared missing-data behavior (e.g., site/batch effects).
Usage
cluster_on_missing_prop(
prop_matrix,
n_clusters = NULL,
seed = NULL,
k_neighbors = NULL,
leiden_resolution = 0.25,
use_snn = TRUE,
leiden_objective = "CPM",
metric = "euclidean",
scale_features = FALSE
)Arguments
- prop_matrix
Matrix or data frame where rows are samples and columns are features, entries are missingness proportions in
[0,1]. Can be created withcreate_missingness_prop_matrix().- n_clusters
Integer; number of clusters for KMeans. If
NULL, uses Leiden (default:NULL).- seed
Integer; random seed for KMeans reproducibility (default:
NULL).- k_neighbors
Integer; Leiden minimum cluster size. If
NULL, Python default is used (default:NULL).- leiden_resolution
Numeric; Leiden cluster selection threshold (default:
0.25).- use_snn
Logical; whether to use shared nearest neighbors (optional).
- leiden_objective
Character; Leiden optimization objective (optional).
- metric
Character; distance metric. Options include:
"euclidean", "cosine"(default:"euclidean").- scale_features
Logical; whether to standardize feature columns before clustering samples (default:
FALSE).
Value
A list with:
clusters: Integer vector of cluster assignments per sample.silhouette_score: Numeric silhouette score, orNULLif not computable.
Examples
set.seed(123)
dat <- data.frame(
sample_id = paste0("s", 1:12),
# Two features measured at 3 timepoints each -> proportions by feature
A_1 = c(NA, rnorm(11)),
A_2 = c(NA, rnorm(11)),
A_3 = rnorm(12),
B_1 = rnorm(12),
B_2 = c(rnorm(10), NA, NA),
B_3 = rnorm(12)
)
pm <- create_missingness_prop_matrix(
dat,
index_col = "sample_id",
repeat_feature_names = c("A", "B")
)
## cluster_on_missing_prop requires a working Python environment via reticulate
## Examples are wrapped in try() to avoid failures on CRAN check systems
try({
res <- cluster_on_missing_prop(
pm,
n_clusters = 2,
metric = "cosine",
scale_features = TRUE
)
table(res$clusters)
res$silhouette_score
})
#> Error in py_module_import(module, convert = convert) :
#> ModuleNotFoundError: No module named 'ciss_vae'
#> Run `reticulate::py_last_error()` for details.