Wraps the Python run_cissvae
function from the ciss_vae
module,
handles an optional index_col
, and returns imputed data,
and optionally the model and silhouette scores.
Usage
run_cissvae(
data,
index_col = NULL,
val_percent = 0.1,
replacement_value = 0,
columns_ignore = NULL,
print_dataset = TRUE,
clusters = NULL,
n_clusters = NULL,
cluster_selection_epsilon = 0.25,
seed = 42,
hidden_dims = c(150, 120, 60),
latent_dim = 15,
layer_order_enc = c("unshared", "unshared", "unshared"),
layer_order_dec = c("shared", "shared", "shared"),
latent_shared = FALSE,
output_shared = FALSE,
batch_size = 4000,
return_model = TRUE,
epochs = 500,
initial_lr = 0.01,
decay_factor = 0.999,
beta = 0.001,
max_loops = 100,
patience = 2,
epochs_per_loop = NULL,
initial_lr_refit = NULL,
decay_factor_refit = NULL,
beta_refit = NULL,
verbose = FALSE,
return_silhouettes = FALSE
)
Arguments
- data
A data.frame or matrix (samples × features), may contain
NA
.- index_col
Character. Column in
data
to treat as sample ID; removed before training and re-attached. DefaultNULL
.- val_percent
Numeric fraction of non-missing entries to hold out. Default
0.1
.- replacement_value
Numeric fill value for masked entries. Default
0.0
.- columns_ignore
Character or integer vector of columns to ignore. Default
NULL
.- print_dataset
Logical; if
TRUE
, prints dataset summary. DefaultTRUE
.- clusters
Optional vector (or single-column data.frame) of precomputed cluster labels. Default
NULL
.- n_clusters
Integer for KMeans if
clusters
isNULL
. DefaultNULL
.- cluster_selection_epsilon
Numeric epsilon for HDBSCAN. Default
0.25
.- seed
Integer random seed. Default
42
.Integer vector of hidden layer sizes. Default
c(150,120,60)
.- latent_dim
Integer latent space dimension. Default
15
.- layer_order_enc
Character vector for encoder layer sharing. Default
c("unshared","unshared","unshared")
.- layer_order_dec
Character vector for decoder layer sharing. Default
c("shared","shared","shared")
.Logical; share latent weights? Default
FALSE
.Logical; share output weights? Default
FALSE
.- batch_size
Integer batch size. Default
4000
.- return_model
Logical; if
TRUE
, returns Python model. DefaultTRUE
.- epochs
Integer initial training epochs. Default
500
.- initial_lr
Numeric initial learning rate. Default
0.01
.- decay_factor
Numeric learning rate decay. Default
0.999
.- beta
Numeric KL weight. Default
0.001
.- max_loops
Integer max refit loops. Default
100
.- patience
Integer early stop patience. Default
2
.- epochs_per_loop
Integer epochs per refit loop. Default
NULL
(usesepochs
).- initial_lr_refit
Numeric LR for refit loops. Default
NULL
.- decay_factor_refit
Numeric decay for refit loops. Default
NULL
.- beta_refit
Numeric KL weight for refit loops. Default
NULL
.- verbose
Logical; if
TRUE
, prints progress. DefaultFALSE
.- return_silhouettes
Logical; if
TRUE
, returns silhouette scores. DefaultFALSE
.