Complete version of the sample dataset — df

The same simulated biomarker measurements as in df_missing, but with no missing values— useful as a ground truth for evaluating imputation methods.

Usage

df_complete

Format

A tibble with 8,000 rows and 30 variables containing full simulated data:

index: Integer. Row identifier imported from data_raw/df_complete.csv.
Age, Salary, ZipCode10001-ZipCode30003: Demographic columns.
Y11, ..., Y55: Simulated Biomarker columns

Source

Imported from data_raw/df_complete.csv, then renamed ...1 → index.

Examples

data(df_complete)
head(df_complete)
#> # A tibble: 6 × 31
#>   index   Age Salary ZipCode10001 ZipCode20002 ZipCode30003     Y11   Y12   Y13
#>   <dbl> <dbl>  <dbl>        <dbl>        <dbl>        <dbl>   <dbl> <dbl> <dbl>
#> 1     0 11.0    6.37            0            1            0  -4.05  -27.4 -19.1
#> 2     1  9.73   5.91            1            0            0   0.546 -19.6 -12.2
#> 3     2 11.4    6.64            0            1            0  -6.25  -28.3 -20.4
#> 4     3 13.6    5.90            0            0            1 -10.6   -31.8 -24.7
#> 5     4  9.54   6.13            1            0            0   0.358 -16.5 -11.3
#> 6     5  9.54   6.39            1            0            0   4.76  -19.0 -12.3
#> # ℹ 22 more variables: Y14 <dbl>, Y15 <dbl>, Y21 <dbl>, Y22 <dbl>, Y23 <dbl>,
#> #   Y24 <dbl>, Y25 <dbl>, Y31 <dbl>, Y32 <dbl>, Y33 <dbl>, Y34 <dbl>,
#> #   Y35 <dbl>, Y41 <dbl>, Y42 <dbl>, Y43 <dbl>, Y44 <dbl>, Y45 <dbl>,
#> #   Y51 <dbl>, Y52 <dbl>, Y53 <dbl>, Y54 <dbl>, Y55 <dbl>