Impute new data with a loaded Python CISS-VAE model
Source:R/save_load_models.R
impute_with_cissvae.RdGiven a loaded model, an R data frame, and a vector of cluster labels, this builds the Python ClusterDataset and DataLoader, runs inference, and returns an imputed data frame in R.
Usage
impute_with_cissvae(
model,
data,
index_col = NULL,
cols_ignore = NULL,
clusters,
imputable_matrix = NULL,
binary_feature_mask = NULL,
replacement_value = 0,
batch_size = NULL,
seed = 42
)Arguments
- model
Python model object loaded via load_cissvae_model()
- data
R data.frame with missing values
- index_col
String name of index column to preserve (optional)
- cols_ignore
Character vector of column names to exclude from imputation scoring.
- clusters
Integer vector of cluster labels for rows of data
- imputable_matrix
Logical matrix indicating entries allowed to be imputed.
- binary_feature_mask
Logical vector marking which columns are binary.
- replacement_value
Numeric value used to replace missing entries before model input.
- batch_size
Batch size passed to Python DataLoader. If NULL, batch_size = nrow(data)
- seed
Base random seed for reproducible results
Tips
Use same ClusterDataset parameters as for initial model training.
Clusters must have same labels as clusters used for model training
'binary_feature_mask' is required for correct imputation of binary columns.
Examples
## Requires a working Python environment via reticulate
## Wrapped in try() and donttest to avoid CRAN check failures
# \donttest{
try({
# Activate your reticulate Python environment with ciss-vae installed
reticulate::use_virtualenv("cissvae_environment", required = TRUE)
# Load example data and clusters (replace with your own)
data(df_missing)
data(clusters)
# Load a previously saved model
model <- try(load_cissvae_model("model.pt", python_env = "cissvae_environment"))
# Perform imputation on new data
imputed_df <- try(
impute_with_cissvae(
model = model,
data = df_missing,
index_col = "index",
cols_ignore = c("Age", "Salary"),
clusters = clusters$clusters,
imputable_matrix = NULL,
binary_feature_mask = NULL,
replacement_value = 0,
batch_size = 4000L,
seed = 42
)
)
})
#> Error in py_call_impl(callable, call_args$unnamed, call_args$named) :
#> FileNotFoundError: [Errno 2] No such file or directory: 'model.pt'
#> Run `reticulate::py_last_error()` for details.
#> Error in impute_with_cissvae(model = model, data = df_missing, index_col = "index", :
#> Loaded model has empty `layer_order_enc`/`layer_order_dec`. This usually means the R load routine reconstructed the model incorrectly (e.g., defaults) rather than restoring the original architecture/config.
# }