Create Missingness Proportion Matrix
Source:R/missing_prop_matrix.R
create_missingness_prop_matrix.Rd
Creates a matrix where each entry represents the proportion of missing values
for each sample–feature combination across multiple timepoints. Each sample will have
one proportion value per feature. Features may have repeated time points
(columns named like feature_1
, feature_2
, ...). This matrix can be used
with cluster_on_missing_prop()
to group samples with similar missingness patterns.
Arguments
- data
Data frame or matrix containing the input data with potential missing values.
- index_col
Character scalar. Name of an index column to exclude from analysis (optional). If supplied and present, it will be removed from analysis; row names are preserved as-is.
- cols_ignore
Character vector of column names to exclude from the proportion matrix (optional).
- na_values
Vector of values to treat as missing in addition to standard missing values. Defaults to
c(NA, NaN, Inf, -Inf)
.- repeat_feature_names
Character vector of "base" feature names that have repeated timepoints. Repeat measurements must be in the form
<feature>_<timepoint>
where<feature>
is alphanumeric (and may include dots) and<timepoint>
is an integer (e.g.,"CRP_1"
).
Value
A numeric matrix of dimension nrow(data)
by n_features
, where rows are
samples and columns are features (base names). Entries are per-sample missingness proportions in [0, 1]
.
The returned matrix has an attribute "feature_columns_map"
: a named list mapping each
output feature to the source columns used to compute its proportion.
Examples
df <- data.frame(
id = paste0("s", 1:4),
CRP_1 = c(1.2, NA, 2.1, NaN),
CRP_2 = c(NA, NA, 2.0, 1.9),
IL6_1 = c(0.5, 0.7, Inf, 0.4),
IL6_2 = c(0.6, -Inf, 0.8, 0.5),
Albumin = c(3.9, 4.1, 4.0, NA)
)
m <- create_missingness_prop_matrix(
data = df,
index_col = "id",
cols_ignore = NULL,
repeat_feature_names = c("CRP", "IL6")
)
dim(m) # 4 x 3 (CRP, IL6, Albumin)
#> [1] 4 3
m[ , "CRP"] # per-sample proportion missing across CRP_1 and CRP_2
#> 1 2 3 4
#> 0.5 1.0 0.0 0.5
attr(m, "feature_columns_map")
#> NULL