Skip to contents

Given a loaded model, an R data frame, and a vector of cluster labels, this builds the Python ClusterDataset and DataLoader, runs inference, and returns an imputed data frame in R.

Usage

impute_with_cissvae(
  model,
  data,
  index_col = NULL,
  cols_ignore = NULL,
  clusters,
  imputable_matrix = NULL,
  binary_feature_mask = NULL,
  replacement_value = 0,
  batch_size = NULL,
  seed = 42
)

Arguments

model

Python model object loaded via load_cissvae_model()

data

R data.frame with missing values

index_col

String name of index column to preserve (optional)

cols_ignore

Character vector of column names to exclude from imputation scoring.

clusters

Integer vector of cluster labels for rows of data

imputable_matrix

Logical matrix indicating entries allowed to be imputed.

binary_feature_mask

Logical vector marking which columns are binary.

replacement_value

Numeric value used to replace missing entries before model input.

batch_size

Batch size passed to Python DataLoader. If NULL, batch_size = nrow(data)

seed

Base random seed for reproducible results

Value

Imputed R data.frame

Tips

  • Use same ClusterDataset parameters as for initial model training.

  • Clusters must have same labels as clusters used for model training

  • 'binary_feature_mask' is required for correct imputation of binary columns.

Examples

## Requires a working Python environment via reticulate
## Wrapped in try() and donttest to avoid CRAN check failures
# \donttest{
try({
  # Activate your reticulate Python environment with ciss-vae installed
  reticulate::use_virtualenv("cissvae_environment", required = TRUE)

  # Load example data and clusters (replace with your own)
  data(df_missing)
  data(clusters)

  # Load a previously saved model
  model <- try(load_cissvae_model("model.pt", python_env = "cissvae_environment"))

  # Perform imputation on new data
  imputed_df <- try(
    impute_with_cissvae(
      model = model,
      data = df_missing,
      index_col = "index",
      cols_ignore = c("Age", "Salary"),
      clusters = clusters$clusters,
      imputable_matrix = NULL,
      binary_feature_mask = NULL,
      replacement_value = 0,
      batch_size = 4000L,
      seed = 42
    )
  )
})
#> Error in py_call_impl(callable, call_args$unnamed, call_args$named) : 
#>   FileNotFoundError: [Errno 2] No such file or directory: 'model.pt'
#> Run `reticulate::py_last_error()` for details.
#> Error in impute_with_cissvae(model = model, data = df_missing, index_col = "index",  : 
#>   Loaded model has empty `layer_order_enc`/`layer_order_dec`. This usually means the R load routine reconstructed the model incorrectly (e.g., defaults) rather than restoring the original architecture/config.
# }